You’ve heard about GPUs and data centers for AI, but the infrastructure story that matters next is the network—because agentic AI and the coming surge of machine-to-machine chatter will push networks into territory most were never built to handle. The latest episode of The Five Nine warns that agentic AI—autonomous AI agents working 24/7 and communicating with each other—will create new east‑west and border‑crossing traffic patterns that threaten capacity, latency, cost controls, and governance unless operators redesign how they think about connectivity, telemetry, and policy.
The Five Nine episode, hosted by Diana Goovaerts, foregrounds a simple but urgent premise: AI is not only compute‑heavy; it is a networking problem. The program’s guests—Dave Ward (CTO at Lumen Technologies and soon to be President and Chief Architect at Salesforce) and Paul Savill (Global Practice Leader for Networking and Edge Compute at Kyndryl)—frame agentic AI as a use model that transforms occasional human‑driven API calls into persistent, concurrent dialog among software agents that require continuous data streams and frequent coordination. That shift multiplies traffic volume, changes traffic patterns from north‑south to intense east‑west flows, and raises new requirements for latency, reliability, and observability. This is not theoretical. Analysts and practitioners are already measuring early spikes in agentic traffic and forecasting operational failures for many agentic projects. Gartner, as reported in the press, estimates over 40% of agentic AI projects may be scrapped by 2027, mainly because of high costs and muddy business cases—an implicit warning that operator readiness (including networks) will be a gating factor for success. Two immediate corollaries follow:
Source: Fierce Network https://www.fierce-network.com/cloud/five-nine-networks-era-ai/
Background / Overview
The Five Nine episode, hosted by Diana Goovaerts, foregrounds a simple but urgent premise: AI is not only compute‑heavy; it is a networking problem. The program’s guests—Dave Ward (CTO at Lumen Technologies and soon to be President and Chief Architect at Salesforce) and Paul Savill (Global Practice Leader for Networking and Edge Compute at Kyndryl)—frame agentic AI as a use model that transforms occasional human‑driven API calls into persistent, concurrent dialog among software agents that require continuous data streams and frequent coordination. That shift multiplies traffic volume, changes traffic patterns from north‑south to intense east‑west flows, and raises new requirements for latency, reliability, and observability. This is not theoretical. Analysts and practitioners are already measuring early spikes in agentic traffic and forecasting operational failures for many agentic projects. Gartner, as reported in the press, estimates over 40% of agentic AI projects may be scrapped by 2027, mainly because of high costs and muddy business cases—an implicit warning that operator readiness (including networks) will be a gating factor for success. Two immediate corollaries follow:- Networks built around human‑centric behaviors and occasional API calls will struggle with machine‑speed micro‑transactions.
- Effective agentic deployments require rethinking where models run, how data moves, and how networks make fast, centralized or distributed decisions.
Why agentic AI reshapes network requirements
Agentic traffic is different—both in scale and shape
Traditional enterprise and public‑cloud traffic is dominated by predictable north‑south flows: users request, servers reply. Agentic AI inverts that dynamic. Agents frequently:- Interrogate services and data stores,
- Interact with other agents to coordinate tasks,
- Spawn short lived tool calls that look like hundreds or thousands of micro‑transactions per use case.
East‑west dominates: data center fabrics feel the pain
AI workloads already stress intra‑rack and inter‑rack fabrics because distributed model training and large inference clusters require synchronous communication, high bandwidth, and low jitter. Agentic AI adds a different axis: large numbers of semi‑independent agents spinning up inference requests, policy lookups, or small dataset fetches across the cluster continuously. That means:- Increased east‑west load inside a site,
- Greater demand on site‑to‑site backbones for cross‑site coordination,
- A higher baseline of small‑packet traffic, which is inefficient for networks tuned around large bulk transfers.
Security, governance, and cost become network problems
Agentic agents often call tools and external APIs, changing internal state and reaching into clouds and third‑party services. That raises three overlapping risks:- Security: machine identities proliferate, increasing the attack surface; tool and agent compromise can propagate quickly.
- Governance: policy, data lineage, and audit trails must be preserved when agents act on behalf of users.
- Cost: micro‑transactions and cross‑region data flows can dramatically increase cloud egress and API costs if not controlled.
What industry experts and providers are saying
From the podcast: Ward and Savill’s diagnosis
Dave Ward and Paul Savill emphasize that the network’s role is transitioning from connectivity to policy enforcement, telemetry, and real‑time flow optimization. Networks must surface the right telemetry to orchestration systems, allow rapid reconfiguration for latency‑sensitive flows, and enable local inference and caching to reduce unnecessary WAN trips. These are practical directions for operators who want agentic systems to be reliable and cost‑effective.Vendor perspectives: network vendors are updating roadmaps
Large networking players are explicit: agentic AI breaks the assumptions modern networks were built on and requires hardware and software changes. Cisco’s engineering blogs argue that microsecond‑level fabrics, embedded security, and dynamic segmentation will be essential in the agentic era. Cisco recommends baked‑in zero‑trust, telemetry tied to intent‑driven controllers, and hardware that can prioritize machine‑to‑machine flows. At the hyperscaler level, operators are moving to purpose‑built AI WANs and optical backbones to create lower‑jitter, higher‑bandwidth links between sites where large GPU racks act like a single distributed supercomputer. Those efforts include private fiber, rack‑scale NVLink domains, and custom congestion protocols tuned for GPU‑to‑GPU synchronization. These architectural shifts show the network is being co‑designed with compute.The five technical realities network teams must confront
- Traffic volume and baseline change
Agentic AI raises the persistent traffic baseline through continuous, small exchanges and API calls. Expect increased north‑south API hits and a multiplication of east‑west chatter inside data centers and at the edge. Measured agent traffic has grown rapidly following major agent releases, and enterprises already report agent‑caused congestion events. - Latency and jitter become business constraints
Many agentic workflows are synchronous or near‑real‑time. Even small increases in tail latency can cascade into failed workflows or wasted compute cycles in synchronous training loops. This is one reason hyperscalers are pursuing private optical backbones and fabric optimizations. - Telemetry and observability must be machine‑grade
Human debugging is too slow. Operators need real‑time flow telemetry, intent‑aware controllers, and automated remediation that tie network telemetry into orchestration and policy engines. Cisco and other vendors are building workflows to fuse security and network telemetry at machine scale. - Security and identity multiply
With more machine identities and agents invoking tools, identity and behavioral controls—zero trust applied to agents—are non‑negotiable. The industry is already highlighting the need to limit tools that can mutate state unless there’s strong oversight. - Costs and economics shift to the network plane
Egress, cross‑region transfers, and API invocations become direct line items. Without architectural mitigations—edge inference, model caching, or regional model placement—agentic workloads can produce runaway cloud bills. Analysts warn that poor cost models will kill many agentic pilots.
Real‑world patterns: how hyperscalers are re‑architecting for AI
Hyperscalers provide the clearest blueprint for what networks need to look like in the AI era. Key elements being deployed today include:- Rack‑scale GPU domains with NVLink to enable very high intra‑rack GPU bandwidth and pooled memory domains—reducing interconnect pressure for tightly coupled training. Hyperscaler designs cite per‑rack GPU cross‑bandwidth numbers measured in the terabytes per second.
- Dedicated optical AI WANs to stitch sites into a distributed supercomputer: private fiber backbones and controlled routing reduce congestion and jitter versus shared backbones. Microsoft and others describe private backbones and multi‑path routing as a way to guarantee the synchronous transfers distributed training demands.
- Custom congestion control and multipath protocols tuned for AI traffic. Some clouds are developing protocols whose goals include aggressive telemetry, rapid retransmits and packet spraying across multiple physical routes to maintain throughput for AI synchronization. These are early and proprietary moves, and experts caution about interoperability and vendor lock‑in risks.
- Higher link rates and 400GE fabrics for intra‑site and site‑to‑site backbones to move massive datasets quickly and avoid capacity bottlenecks. Meanwhile, large providers are upgrading from 100GE to 400GE in many pockets to keep pace with AI demand. Industry reporting on supplier deals has noted this migration as an operational priority.
Practical mitigation strategies for enterprise networks
Network teams don’t need hyperscaler budgets to make effective changes. Practical, high‑impact steps include:- Localize inference and cache models at the edge
- Host smaller models and caches near data sources to avoid repeated round trips for frequently asked queries. This reduces WAN and cloud egress costs and lowers latency.
- Adopt intent‑based networking and policy automation
- Tie agent identities, roles, and policies into an intent controller that can dynamically route or throttle agent traffic based on business intent and cost budgets.
- Implement agent‑aware rate limits and quotas
- Apply per‑agent rate controls and budgeted egress policies; enforce caps at the edge or ingress/egress gateways.
- Prioritize flows with programmable QoS and telemetry
- Use modern programmable switches and SmartNICs to mark and prioritize latency‑sensitive agent flows and gather per‑flow telemetry for automated responses.
- Use private connectivity where it matters most
- For high‑value, synchronous workloads, consider private interconnects or dedicated circuits that avoid public internet variability.
- Harden machine identity, tool permissions, and audit trails
- Treat agents as first‑class identities under zero‑trust; instrument tool access and require attestations before permitting state‑changing actions.
- Monitor cost signals and build chargeback models for agent usage
- Make agent activities observable in finance dashboards so teams can correlate agent behavior with cost and adjust incentive structures.
Strengths and opportunities in the network‑centric approach
- Performance gains when networks and compute are co‑designed
Putting networking on the critical path for AI yields lower latency, higher utilization, and fewer expensive compute stalls. Hyperscaler investments in fabrics and private fiber show the payoff at scale. - New product and business models
Network providers and telcos can differentiate by offering AI‑optimized connectivity products—private AI WANs, regional inference hosting, and managed agent orchestration. - Security and compliance advantages
Network‑level enforcement of agent identity and policy creates an extra layer of governance that is harder to bypass than application‑only controls. This is crucial as agents gain permission to act across systems. - Operational efficiency through automation
Intents, automated remediation, and telemetry that links to orchestration reduce MTTR and allow machine‑grade operations that match agent speed.
Risks, vendor traps, and cautionary notes
- Proprietary protocol and lock‑in risk
Early vendor solutions—custom congestion protocols and private backbones—can accelerate vendor lock‑in if standards and interoperability don’t keep pace. Treat proprietary claims as design intent until independently verified. - Hype vs. value: many agentic projects will fail
Analyst warnings about high failure rates for agentic pilots are a sober reminder that not all use cases produce clear ROI. Teams should run small, measurable pilots tied to cost and performance KPIs. - Security risk from poorly governed tools and MCPs
Standards for agent‑to‑tool interfaces (Model Context Protocols and the like) have been improving, but weak defaults or lax authentication can create systemic vulnerabilities—especially when agents have capability to change systems or write data. Guard tool permissions tightly. - Operational complexity and skills gap
Building and operating AI WANs, telemetry pipelines, and programmable fabrics requires skills that many IT teams lack today. Upskilling or partnering with managed providers should be part of the operational plan.
A practical roadmap for CIOs and network leaders
- Inventory current agent activity and simulate cost/latency exposure
- Map where agents will live, the data they will need, and which links they will traverse. Use sandboxing to estimate egress and API charges.
- Start small with edge localization and model caching pilots
- Deploy cached models near high‑volume data sources and measure reduction in egress and latency.
- Add per‑agent identity and least‑privilege tooling now
- Implement zero‑trust for agents as a foundational security step.
- Integrate network telemetry with orchestration systems
- Ensure controllers can see and act on flow telemetry in sub‑second windows.
- Evaluate private connectivity for mission‑critical workloads
- For high‑value synchronous training or inference, private circuits or dedicated backbones may pay for themselves via reduced wasted compute and improved SLOs.
- Build chargeback and ROI practices for agentic workloads
- Make teams accountable for agent consumption and tie investment decisions to measurable outcomes.
Conclusion
The Five Nine podcast put the challenge simply: agentic AI changes the network’s role from passive carrier to active participant in compute workflows. Networks must evolve—faster fabrics, private backbones where necessary, programmatic QoS and telemetry, and machine‑grade security and governance—to prevent agentic AI from becoming a catalyst for outages, runaway costs, and security incidents. The opportunity is large: operators and vendors that co‑design networks and compute for agentic workloads will unlock higher utilization, lower latency, and new revenue models. The risk is real: many pilots will fail if teams treat agentic AI as merely a software problem and ignore the network’s pivotal place in the stack. Acting now—focused on telemetry, local inference, intent‑driven policy, and cost governance—will separate those who harvest agentic AI’s productivity gains from those who learn the hard way that networks are the limiters of machine‑scale intelligence.Source: Fierce Network https://www.fierce-network.com/cloud/five-nine-networks-era-ai/