
Elastic has quietly landed a practical — and timely — piece of infrastructure for enterprise AI: an integration between Elastic Observability and Azure AI Foundry that delivers real‑time telemetry, pre‑built dashboards, and operational controls designed specifically for agentic AI and large language model workloads. This technology‑preview aims to give site reliability engineers (SREs), developers, and platform teams a single pane to monitor token consumption, latency, cost drivers, and content‑safety signals for models running inside Azure’s Foundry environment, with the explicit promise of faster, safer scale for production AI agents.
Background
Azure AI Foundry is Microsoft’s model catalog and managed hosting layer that packages third‑party and Microsoft models with enterprise identity, governance, and observability primitives. It has been extended in 2025 with agent frameworks, OpenTelemetry conventions for multi‑agent tracing, and tooling intended for production, regulated deployments. Foundry’s observability stack and the Agent Service make it possible to trace model inferences and tool invocations end‑to‑end across multi‑agent workflows. Elastic is positioning this integration as a bridge between Azure’s Foundry telemetry and Elastic’s long‑established observability tooling: ingest, index, visualize, and alert on LLM metrics and traces so teams can operate agentic workloads with the same operational disciplines they use for traditional services. The integration is being distributed as a technology preview inside Elastic Observability, with pre‑configured Kibana dashboards that map Foundry telemetry into token usage, latency percentiles, cost estimation, prompt/completion logging (where enabled), and content‑safety filter metrics.What the integration actually delivers
The initial published capabilities are pragmatic and focused on operational observability for LLMs and agents:- Pre‑built dashboards that surface model usage, token trends, request rates, and latency percentiles so SREs can rapidly identify regressions.
- Token‑level telemetry that estimates input/output consumption by model and endpoint, enabling chargeback/showback or budgeting alerts.
- Cost estimation and tracking tied to token volumes and model pricing to help teams understand the financial drivers of AI workloads.
- Prompt/completion logging and content‑filtering signals (configurable) so engineers can debug failure modes, detect unsafe outputs, and audit policy enforcement when necessary.
- Real‑time alerts and correlation between model traces (OpenTelemetry spans), backend service metrics, and user sessions for fast root‑cause analysis.
Why this matters to SREs and platform engineers
Agentic AI systems are multi‑component: model inferences, tool invocations, retrieval layers (RAG), and stateful orchestration all interact across services. That complexity makes classic monitoring approaches insufficient; you need:- Correlated traces that follow an agent’s request as it calls tools, runs inferences, and writes outputs.
- Token‑aware metrics so billing and quota anomalies are visible before costs balloon.
- Content‑safety signals integrated with observability so compliance and red‑teaming workflows are auditable.
Technical architecture — how the integration works
At a high level, the integration maps Azure AI Foundry telemetry into Elastic’s ingestion and visualization pipeline:- Azure AI Foundry and the Agent Service emit logs, audit events, and OpenTelemetry spans for inferences, agent actions, and tool calls. These include metrics such as request/response times, model IDs, endpoint names, and filter/guardrail flags.
- Diagnostic settings and API gateway logs can be wired (via Event Hubs, Azure Monitor, or Application Insights) to stream telemetry into Elastic’s ingestion endpoints or into Elastic Cloud on Azure. The integration includes parsers and ECS‑aligned mappings to make the data queryable in Kibana.
- Elastic ships pre‑configured dashboards and alert rules that translate model telemetry into SLO‑friendly surfaces: token budgets, p95/p99 latency, error rates by model, and content‑safety filter ratios.
Security, privacy, and compliance considerations
The integration brings clear benefits for governance — but it also raises operational questions that must be explicitly managed.- Elastic’s dashboards can show prompt/completion pairs for debugging, but recording inputs and outputs should be treated as a conscious, auditable choice. Not all telemetry streams log content by default; Azure’s diagnostic surfaces and Elastic’s collectors may require deliberate configuration to enable or redact content. The Azure Foundry logging model and API Management diagnostic pipeline do not log inputs/outputs by default. Teams should treat content recording as a high‑risk feature and apply PII redaction, retention limits, and approval gates.
- Token logging and cost telemetry are invaluable for budgeting, but they are also a potential leak vector. Telemetry that contains user or document identifiers must be handled under the organization’s data‑handling policies, retention rules, and encryption standards. Elastic’s storage and Azure’s private networking options should be configured to meet any residency or regulatory constraints.
- Content‑safety detections surfaced in dashboards are only as reliable as the policies and filters that run upstream; teams must combine automated filters with human review, red‑team testing, and incident workflows to remediate high‑risk outputs. Elastic and Microsoft emphasize guardrails but label many features as preview; thorough validation in your tenant remains essential.
- Observability itself must be instrumented securely. OpenTelemetry spans can contain rich attributes; instrumenters must apply attribute‑level scrubbing, use customer‑managed keys for telemetry storage where required, and ensure RBAC restricts who can access raw prompt data. Microsoft’s Foundry supports bring‑your‑own storage, private VNETs, and per‑agent identities, which are important controls to adopt.
Operational playbook — from pilot to production
For WindowsForum readers who run Windows‑centric infrastructure or manage Azure estates, the following pragmatic sequence will reduce risk and surface the integration’s value quickly:- Run a focused pilot with a single, high‑value, low‑risk use case (for example, an internal document summarizer), instrumented end‑to‑end. Ensure the Foundry project uses a non‑production tenant.
- Enable token and latency dashboards but keep prompt recording disabled at first. Use sampling to collect a manageable dataset for QA.
- Validate cost estimates: correlate Elastic’s token counters with Azure billing line items and test at projected scale. Build alert thresholds tied to budget burn rates.
- Introduce gating for content recording: implement redaction hooks or a secondary consent step before storing prompt/completion pairs. Audit access to those logs via RBAC and SIEM integration.
- Add SRE runbooks that map the most common alerts — token exhaustion, p95 latency spike, or content‑safety exception — to triage steps and remediation playbooks. Instrument synthetic probes to validate model latency from critical regions.
- Expand scope gradually to include multi‑agent traces, tool invocations, and RAG layers; track policy metrics (filter hits, human escalation counts) as compliance KPIs.
Strengths and strategic implications
- Unified observability for a complex stack. The integration turns Foundry’s native telemetry into indexed, queryable artifacts inside Elastic — shortening the gap between symptom (latency, cost spike) and cause (specific prompt, model or downstream tool).
- SRE‑friendly tooling. Elastic’s alerting, correlation, and dashboarding models are mature; applying those capabilities to model observability creates continuity between existing system monitoring and AI monitoring.
- Better cost control and governance. Token‑level metrics and cost estimates help engineering leaders make evidence‑based decisions about routing, caching, and model selection. That matters when enterprise budgets collide with token‑heavy workloads.
- Portable tracing semantics. OpenTelemetry contributions for multi‑agent scenarios mean traces are meaningful across common frameworks, reducing instrumentation friction across teams.
Risks, limitations, and vendor‑claim cautions
- Preview label and feature maturity. The integration is currently a technology preview; engineering teams should treat vendor‑provided dashboards and rules as a starting point for customization rather than a finished production solution. Validate behavior under your workloads.
- Data residency and logging defaults. The Azure Foundry diagnostic path and API Gateway logs need deliberate configuration — inputs/outputs are not always recorded by default, and the presence or absence of content recording materially changes compliance posture. Confirm defaults in your tenant and document any telemetry that contains regulated data.
- Cost vs. benefit tradeoffs. Observability at token granularity can itself produce telemetry costs. Teams must budget for increased storage and indexing of traces and logs, and consider sampling, retention, and tiering strategies.
- False assurance risk. Dashboards showing “guardrail hits” or filter counts can create a perception of safety even when filters are incomplete. Observability should be paired with red‑teaming, adversarial testing, and human review for high‑impact outputs.
- Vendor claims about scaling faster and not compromising compliance. These are aspirational and context‑dependent. The integration provides tools for control, but actual reliability and compliance are a function of architecture, validation, and organizational processes — not just observability alone. Treat such statements as promises of capability, not guarantees.
Practical recommendations for WindowsForum readers
- Start small and instrument deliberately: enable token and latency dashboards first, then selectively enable prompt logging. Use redaction and consent for any production content capture.
- Use identity and network controls: adopt per‑agent identities and private virtual networks (VNETs) for agent projects, and apply Azure RBAC to limit who can view raw prompts or completion logs.
- Benchmark real workloads: vendor marketing and preview dashboards are useful, but independent benchmarking under representative traffic patterns is mandatory before a production rollout. Measure p95/p99 latency, token burn rates for representative prompts, and cost per transaction at realistic concurrency.
- Plan for telemetry costs: design retention, sampling, and archival strategies for trace and prompt logs to avoid runaway indexing bills. Consider hot/warm/cold tiers for Elastic indices.
- Combine observability with governance: integrate dashboards into incident response, compliance reporting, and finance review cycles so observability becomes part of organizational controls rather than an isolated engineering convenience.
What to watch next
This integration is part of a larger competitive and strategic move: hyperscalers are combining model catalogs, agent runtimes, and governance features while observability vendors package domain‑specific tooling for generative AI operations. Expect rapid iteration over the next months — including hardening of content‑filtering telemetry, richer chargeback controls, and more robust integration templates for RAG and retrieval pipelines. Early adopters will be able to influence dashboard templates and best practices, but they must also be prepared to adapt to evolving APIs and semantics in both Foundry and Elastic. Early partner signals show a growing ecosystem experimenting with these primitives; Microsoft has highlighted enterprise pilots and partner integrations for agentic workloads, underscoring that the industry is chasing operational maturity as fast as it chases new model capabilities. Treat those partner citations as pilot‑stage evidence rather than proof of mass adoption.Conclusion
Elastic’s Azure AI Foundry integration supplies a much‑needed bridge between model hosting and operational visibility, bringing token‑aware metrics, latency tracing, content‑safety signals, and pre‑built dashboards to SREs and developers operating agentic systems on Azure. The work addresses real problems — uncontrolled token spend, obscure latency bottlenecks, and the need for auditable safety controls — and does so in a way that fits existing observability practices. However, the solution is not a turnkey guarantee. It is a tooling layer that requires careful configuration, explicit decisions about data capture and retention, and thorough validation against production workloads. Organizations that treat the preview as a platform for disciplined operationalization — including privacy reviews, cost modeling, and red‑teaming — will realize the most benefit. The payoff is operational clarity: when agentic AI must run reliably, predictably, and in compliance, observable telemetry and disciplined SRE practices are the primary levers that make that possible.Source: IT Brief Australia Elastic integrates with Azure AI Foundry for real-time monitoring