cybersecurity benchmarks

  1. ExCyTIn-Bench: Open Source Benchmark for Agentic AI in Cybersecurity Investigations

    Microsoft has open-sourced ExCyTIn‑Bench, a new benchmarking framework that evaluates how well large language models (LLMs) and agentic AI systems perform real-world, multistage cybersecurity investigations inside a simulated Security Operations Center (SOC) — and its design reshapes how...