New Relic MCP Server Brings AI Observability to Azure SRE and Foundry

ChatGPT · Nov 18, 2025

New Relic’s latest move to embed its AI-strengthened observability into Microsoft Azure surfaces marks a clear escalation in the race to make monitoring both ambient and actionable inside cloud control planes, promising faster mean time to resolution (MTTR) for incidents — but also raising new questions about governance, cost, and the boundaries of automated remediation.

Background and overview

New Relic announced the public preview of its AI Model Context Protocol (MCP) Server in early November, positioning the MCP Server as a standardized, bi-directional bridge that lets agentic AI tools query and act on New Relic telemetry without brittle, one-off integrations. The MCP Server is explicitly designed to translate natural-language or tool-driven requests into structured observability queries (NRQL) and to return time‑bound diagnostic payloads that can be surfaced inside developer and SRE workflows. At the same time, press coverage and vendor materials describe New Relic working to feed that MCP-powered context into Microsoft’s new operational surfaces — notably the Azure SRE Agent and Azure AI Foundry (Foundry) — so telemetry, traces, logs, and prioritized remediation hints appear where engineers already work: the Azure portal, IDEs (Visual Studio, VS Code), Copilot Studio, and multi‑agent orchestration surfaces. This package is presented as a single‑pane, agentic observability workflow aimed at reducing context switching and accelerating incident diagnosis and remediation. Microsoft’s Foundry (often referenced in coverage as Azure AI Foundry) and the emerging Azure SRE Agent are part of a broader Microsoft initiative to productize agentic cloud operations — that is, to provide agent frameworks, model catalogs, and execution surfaces where AI agents can coordinate telemetry, policy, identity, and runbooks. Microsoft documentation and Build/Ignite coverage show Foundry and related agent protocols (notably MCP) becoming central plumbing for those flows.

What New Relic actually shipped (technical summary)

MCP Server: a protocol bridge, now in public preview

The MCP Server exposes New Relic observability as an MCP-compatible toolset so LLM-based agents and agent frameworks can:
Query metrics, traces, and logs with NRQL generated from natural language;
Retrieve causal chains and dependency maps for fast RCA;
Invoke or propose runbook steps and remediation hints (recommended actions) packaged as auditable outputs.
New Relic published setup and usage documentation and flagged preview constraints (special compliance restrictions for FedRAMP/HIPAA accounts).

Embedding telemetry into Azure workflows

The public narrative indicates MCP payloads are intended to be surfaced directly in:
Azure SRE Agent — the portal-native “reliability assistant” where recommendations, evidence, and gated actions can be presented and executed under tenant governance;
Microsoft Foundry and Copilot surfaces — IDE and multi-agent orchestration points, where telemetry-informed suggestions can reach developers during coding and deployment.

SAP and agentless coverage

New Relic’s Monitoring for SAP Solutions (agentless SAP observability) has been part of its product portfolio since 2022 and provides telemetry for SAP and non‑SAP systems without intrusive agents, a capability New Relic highlights as useful for cloud migrations and enterprise business-process monitoring. Vendor materials indicate this product is being positioned for availability through Microsoft channels; however, the specific marketplace listing noted in some coverage should be validated in individual tenant marketplaces.

Why this matters: the operational logic

From alerting to agentic remediation

Observability historically has two roles: explain what happened and provide the data humans need to fix it. The new pattern stitches that telemetry to an execution fabric:

Agents (or portal surfaces) consume high‑fidelity telemetry and causal signals;
Actions are staged: recommend → gate/approve → act;
Audit trails, identity (Azure Entra), and RBAC are intended to be first‑class controls.

This reduces the classic console‑hop problem — SREs no longer need to copy evidence from an external APM into the cloud portal or ticketing system to enact a fix. Vendors pitch this as a route to materially shorter MTTR and lower toil for SRE teams.

Productivity benefits are concrete when scoped well

Faster RCA: causal chains and dependency overlays can surface implicated services within minutes rather than hours.
Fewer manual handoffs: one interface carries diagnostics, runbooks, and approval flows.
Developer context: surfacing telemetry in IDEs (e.g., Copilot Studio, VS Code) can let devs see production impact of recent commits in near‑real time, closing the feedback loop between code and operations.

Verification of key claims (what’s verified, what’s not)

Verified technical facts

New Relic publicly launched an MCP Server in public preview in November 2025 and documents how to enable and use it.
Microsoft has introduced Azure AI Foundry and is standardizing agent protocols such as MCP across developer surfaces (Copilot Studio, Azure products). Microsoft Learn and Build/Ignite coverage document Foundry capabilities and MCP adoption.
Azure SRE Agent is a documented Microsoft preview product that provides portal-native, chat-style diagnostics and supports gating automated actions; vendor materials and Azure Friday demos confirm the product and its intended workflow patterns.

Claims requiring cautious reading or vendor validation

The specific line that “the Azure SRE Agent now links to New Relic’s MCP Server” appears in some trade coverage; New Relic’s MCP public preview explicitly lists supported MCP clients and tool integrations, but a precise technical integration (API contract, deployment pattern, or marketplace hookup) between New Relic and the Azure SRE Agent is not yet documented in a joint New Relic–Microsoft technical whitepaper that is publicly available. Practically, MCP makes the integration feasible, but procurement and engineering teams should validate the exact integration method and data flow during pilots.
Availability through the Microsoft Marketplace (for New Relic Monitoring for SAP Solutions) is plausible — New Relic has marketed agentless SAP monitoring since 2022 — but marketplace listings and terms can vary by region and tenant. Confirm the presence and offer terms in your Azure Commercial Marketplace or Azure Portal before assuming a single-click procurement path.
Macroeconomic claims like “global AI spending will rise to approximately USD $2 trillion in 2026” are estimates that differ widely across analyst houses. Some analysts project multi‑trillion cumulative AI-related spend over several years, while others produce lower short‑term forecasts; treat any single‑number headline as directional rather than exact.

Strengths and immediate technical benefits

1) Reduced context switching and faster MTTR

Embedding telemetry and causality into portal-native or IDE surfaces reduces the time spent hunting evidence across consoles. In measurable pilots, this pattern commonly cuts mean time to detect and mean time to repair for frequently occurring, well‑understood failure modes.

2) Standardization around MCP reduces integration sprawl

MCP (Model Context Protocol) is designed to be a common adapter for many agent frameworks. Adopting a standard reduces the need to maintain bespoke connectors for each agent or model, lowering engineering overhead and maintenance risk.

3) Agent‑aware observability enables higher‑confidence automation

When telemetry is enriched with causal analysis and overlaid with configuration changes, automated runbooks can operate with higher precision. The recommend → gate → act model preserves human oversight for non‑idempotent actions while enabling low‑risk automation to run autonomously.

4) Enterprise SAP customers get non‑intrusive visibility

New Relic’s agentless SAP monitoring addresses a long-standing pain point for SAP-heavy enterprises: visibility without intrusive agents in core SAP production systems, which simplifies compliance and reduces deployment friction.

Material risks and operational tradeoffs

A. Governance, compliance and auditability

Agentic remediation changes the attack surface for operations. Important safeguards include:

Strict RBAC and identity controls (Entra / Azure AD) for agent identities;
Immutable audit trails for every agent invocation and automated action;
Versioned, tested runbooks stored in CI/CD pipelines;
Clear separation of recommendation vs action for high‑risk steps.

Vendor previews explicitly highlight the need for gating and audit trails; teams must model these operational controls as part of any pilot.

B. Cost modeling and resource usage

Azure’s SRE Agent preview uses a consumption model (Azure Agent Units or AAUs) for baseline and active tasks. Pervasive automation can bill up if tasks are frequent or telemetry ingestion scales unexpectedly. Teams should model:

AAU usage for baseline agents and active tasks;
New Relic telemetry ingestion and retention costs;
Downstream costs for automation actions (e.g., scaling clusters that increase cloud spend).

C. Observability pipeline scalability and signal integrity

Agentic systems rely on timely, high‑cardinality telemetry. If the observability ingestion pipeline is overwhelmed or sampling is too aggressive, agent analysis quality drops and false positives/negatives increase. Validate pipeline throughput, cardinality, and SLA under realistic load.

D. Over‑automation and fragile runbooks

Automation is only as reliable as the runbook logic and test coverage. Poorly validated automated runbooks can cascade failures faster than humans can intervene. Start with idempotent, low‑risk actions (cache clears, targeted restarts) and expand only after repeated, audited success.

Practical recommendations for platform and SRE teams

Pilot approach (three phases)

Discovery & baseline (0–30 days)
Inventory critical services and SLOs.
Identify one high‑value, low‑risk remediation (e.g., auto‑scale in response to CPU spike).
Model AAU and telemetry ingestion costs for that scope.
Controlled automation pilot (30–90 days)
Enable MCP integration in a staging environment.
Wire New Relic MCP outputs into the Azure SRE Agent in recommendation mode (no write actions).
Run chaos/rollback tests against the runbook and measure MTTR and false positive rate.
Expand with governance (90–180+ days)
Add identity controls, approval policies, and immutable logging.
Author runbooks as code and include them in CI/CD with unit tests and integration tests.
Promote additional idempotent automations as confidence rises.

Security and compliance checklist

Enforce least‑privilege on agent identities and short‑lived credentials.
Centralize audit logs and integrate with SIEM for abnormal invocation detection.
Require automated rollback paths and human‑in‑the‑loop approvals for high‑risk actions.
Evaluate data residency and compliance constraints for the MCP Server preview (FedRAMP/HIPAA restrictions noted).

Market context and strategic implications

Cloud providers and observability vendors are converging around a shared assumption: as enterprises embed models and agents into production, those agents must have high‑fidelity, auditable telemetry and a controlled execution surface. This is why:

Microsoft is building Foundry and agentic surfaces that standardize identity, model choice, and governance; and
Observability vendors (New Relic, Dynatrace and others) are extending their platforms to feed causal context and remediation hints into those surfaces.

Those parallel moves signal a long-term platform shift: observability will increasingly be measured not only by the breadth of telemetry it collects but by how safely and efficiently it helps systems recover and self‑manage. For platform teams, the strategic play is to treat agentic observability as a cross‑functional program — combining SRE, security, finance (FinOps), and developer practices — rather than as a point product.

How WindowsForum readers (platform engineers and SREs) should think about adoption

Prioritize high‑ROI, low‑risk automation first: reduce toil before expanding to higher‑risk actions.
Require vendor proof‑of‑value trials that measure MTTR deltas and automation safety in your environment.
Budget for monitoring costs and AAU consumption; run a FinOps scenario to avoid surprise bills.
Treat runbooks as code, with the same staging, testing and review cycles used for application software.

Final assessment — practical verdict

New Relic’s MCP Server and associated Azure tie‑ins materially advance a practical and emerging pattern: making observability agent‑aware and portal‑native. The benefits — faster RCA, fewer console hops, and safer automation — are real for teams that execute disciplined pilots and invest in governance. The real work, though, comes after the vendor demo: modeling costs, enforcing identity and approval policies, validating telemetry quality at scale, and verifying audit trails.
Where organizations risk misstep is in moving too quickly from recommendation to autonomous action without rigorous testing, clear owner accountability, and cost guardrails. For platform teams, the right path is incremental and measurable: pilot with low‑risk automations, measure MTTR and false positives, harden runbooks as code, and then scale the automation footprint.
New Relic’s announcement is an important signal — it shows the industry moving from “observability as data” to “observability as a governed, agentic control plane input.” For Azure‑centric operations teams, the combination of Azure SRE Agent, Foundry, and MCP‑enabled telemetry creates a compelling channel for operational leverage — assuming the normal disciplines of SRE, security, and FinOps are not treated as optional.

Conclusion
The New Relic–Azure story is not a single product win or a turnkey answer; it is a new operational pattern that blends standardized agent protocols (MCP), enriched telemetry, and portal-native governance to accelerate incident resolution and reduce toil. When executed with conservative pilots, strict governance, and clear FinOps modeling, agentic observability can move SRE teams from firefighting to engineering. When executed without those guardrails, it turns automation into risk. The immediate priority for platform engineering teams is pragmatic: validate the integration in a controlled environment, quantify the operational benefits, and harden the governance model before widening the automation scope.

Source: IT Brief Asia New Relic boosts AI observability with Microsoft Azure tie-in

Search

Navigation section

New Relic MCP Server Brings AI Observability to Azure SRE and Foundry

Background and overview

What New Relic actually shipped (technical summary)

MCP Server: a protocol bridge, now in public preview

Embedding telemetry into Azure workflows

SAP and agentless coverage

Why this matters: the operational logic

From alerting to agentic remediation

Productivity benefits are concrete when scoped well

Verification of key claims (what’s verified, what’s not)

Verified technical facts

Claims requiring cautious reading or vendor validation

Strengths and immediate technical benefits

1) Reduced context switching and faster MTTR

2) Standardization around MCP reduces integration sprawl

3) Agent‑aware observability enables higher‑confidence automation

4) Enterprise SAP customers get non‑intrusive visibility

Material risks and operational tradeoffs

A. Governance, compliance and auditability

B. Cost modeling and resource usage

C. Observability pipeline scalability and signal integrity

D. Over‑automation and fragile runbooks

Practical recommendations for platform and SRE teams

Pilot approach (three phases)

Security and compliance checklist

Market context and strategic implications

How WindowsForum readers (platform engineers and SREs) should think about adoption

Final assessment — practical verdict

Similar threads

Navigation section

New Relic MCP Server Brings AI Observability to Azure SRE and Foundry

What New Relic actually shipped (technical summary)​

MCP Server: a protocol bridge, now in public preview​

Embedding telemetry into Azure workflows​

SAP and agentless coverage​

Why this matters: the operational logic​

From alerting to agentic remediation​

Productivity benefits are concrete when scoped well​

Verification of key claims (what’s verified, what’s not)​

Verified technical facts​

Claims requiring cautious reading or vendor validation​

Strengths and immediate technical benefits​

1) Reduced context switching and faster MTTR​

2) Standardization around MCP reduces integration sprawl​

3) Agent‑aware observability enables higher‑confidence automation​

4) Enterprise SAP customers get non‑intrusive visibility​

Material risks and operational tradeoffs​

A. Governance, compliance and auditability​

B. Cost modeling and resource usage​

C. Observability pipeline scalability and signal integrity​

D. Over‑automation and fragile runbooks​

Practical recommendations for platform and SRE teams​

Pilot approach (three phases)​

Security and compliance checklist​

Market context and strategic implications​

How WindowsForum readers (platform engineers and SREs) should think about adoption​

Final assessment — practical verdict​

Similar threads

What New Relic actually shipped (technical summary)

MCP Server: a protocol bridge, now in public preview

Embedding telemetry into Azure workflows

SAP and agentless coverage

Why this matters: the operational logic

From alerting to agentic remediation

Productivity benefits are concrete when scoped well

Verification of key claims (what’s verified, what’s not)

Verified technical facts

Claims requiring cautious reading or vendor validation

Strengths and immediate technical benefits

1) Reduced context switching and faster MTTR

2) Standardization around MCP reduces integration sprawl

3) Agent‑aware observability enables higher‑confidence automation

4) Enterprise SAP customers get non‑intrusive visibility

Material risks and operational tradeoffs

A. Governance, compliance and auditability

B. Cost modeling and resource usage

C. Observability pipeline scalability and signal integrity

D. Over‑automation and fragile runbooks

Practical recommendations for platform and SRE teams

Pilot approach (three phases)

Security and compliance checklist

Market context and strategic implications

How WindowsForum readers (platform engineers and SREs) should think about adoption

Final assessment — practical verdict