Tipalti’s .NET Framework Monolith on EKS: Ops Modernization Cuts Cost 60%

Tipalti moved a legacy .NET Framework 4.7 payment-processing monolith from Amazon EC2 to Amazon EKS on Windows Server containers, using Kubernetes-based orchestration, RabbitMQ-driven autoscaling, centralized logging, and Windows node tuning to cut costs by 60 percent and improve performance by 50 percent. The story matters because it is not the usual cloud-native morality play in which the old system is burned down and replaced by microservices. It is a more useful kind of modernization tale: one where the monolith survives, but the operating model around it changes dramatically.

Infographic of a Kubernetes-based .NET Framework 4.7 legacy monolith with EKS, KEDA scaling, and RabbitMQ metrics.The Monolith Did Not Fail Because It Was Old​

There is a comforting myth in enterprise software that legacy systems become dangerous simply because they are legacy systems. The reality is usually less theatrical. Systems like Tipalti’s core payment-processing application often become problematic because success changes the workload faster than the architecture changes around it.
Tipalti’s application was built on Microsoft .NET Framework 4.7 and ran on Amazon EC2 instances. That is not an exotic setup; it is the default biography of a large slice of serious Windows server software. The company’s business, however, had moved into a different operating regime: billions of dollars in payments, thousands of customers, and predictable but demanding spikes around month-end processing.
The original EC2 deployment had three classic symptoms of a maturing monolith under load. Capacity increases required manual provisioning. Deployments could terminate active work abruptly. Logs lived in files scattered across instances and child processes, which meant troubleshooting depended too much on finding the right machine at the right time.
That diagnosis is important because it explains why Tipalti did not begin with a rewrite. The failure mode was not necessarily application logic. It was operational gravity.

AWS Sells a Middle Path Between Rehost and Rewrite​

The migration path AWS describes is neither a lift-and-shift nor a full decomposition into microservices. Tipalti containerized the existing .NET Framework application, placed it on Amazon EKS with Windows Server nodes, and used Kubernetes to impose a more modern control plane around an old codebase.
That is the central argument of the case study: for some Windows workloads, containers are not primarily a developer convenience. They are an operational retrofit. The application can remain recognizably the same while deployment, scaling, logging, and lifecycle behavior are dragged into the present.
This is a particularly relevant message for WindowsForum readers because .NET Framework remains embedded in many line-of-business systems. Those applications often cannot move quickly to modern .NET, Linux containers, or a cloud-native service mesh without years of work and substantial business risk. Windows containers on EKS give such teams a way to change the platform contract first.
The bargain is not free. Windows containers on Kubernetes have historically carried more caveats than Linux containers, especially around networking, node density, image size, and host behavior. Tipalti’s experience is valuable because it does not pretend otherwise.

The First Win Was Packaging, Not Architecture​

Tipalti began by creating a Windows Server 2019 Core container image for the .NET Framework 4.7 application and configuring an EKS cluster with Windows nodes. On paper, this sounds like the easy part. In practice, it is the first major cultural shift for any team used to mutable EC2 instances.
A virtual machine can accumulate local state, machine-specific fixes, file-based logs, and undocumented operational habits. A container image forces the team to declare what the runtime is supposed to be. That alone can flush out years of assumptions.
The AWS case study does not frame this as a glamorous refactoring project, and that is precisely why it is credible. The early work was about making the existing application runnable in a repeatable package. That is the unsexy foundation on which the later gains depended.
For Windows shops, this is often the hardest mental transition. Containerization is not merely “put the EXE in a box.” It requires answering how the application starts, how it stops, where it writes, what it assumes about the host, and how much of the old server identity was actually part of the application.

Graceful Shutdown Became a Payments Problem​

The migration’s first major technical challenge appeared during pod restarts. The application process would terminate, but the Kubernetes pod could remain stuck in a terminating state. For a generic web service, that is annoying. For a payment-processing workflow, it is a reliability problem.
The underlying issue involved SIGTERM behavior and Windows containers. Kubernetes expects workloads to receive termination signals and shut down cleanly within a defined grace period. But Tipalti and AWS Support found that the containerd version on the Windows nodes did not yet propagate SIGTERM to Windows containers in the way the team needed.
The interim workaround was Kubernetes lifecycle hooks and explicit graceful termination periods. Later, once compatible EKS-optimized Windows AMIs became available, Tipalti moved the graceful shutdown logic into the application itself. That evolution is worth dwelling on because it shows the difference between a workaround and a durable platform adaptation.
In a monolith, shutdown is often treated as a blunt event. In Kubernetes, shutdown is part of the application contract. The scheduler will move work, replace pods, and roll deployments; the workload must be able to cooperate without corrupting state or dropping in-flight tasks.
Tipalti’s reported zero data loss during deployments is therefore not a small implementation detail. It is the hinge between “we put a Windows app in Kubernetes” and “we can operate a payment system this way.”

Logging Was the Modernization Tax Nobody Could Avoid​

The second major challenge was logging, and it may be the most familiar to administrators. Tipalti’s legacy application wrote logs to local files, a pattern that made sense in an EC2 world where engineers could inspect disks or aggregate files after the fact. In Kubernetes, that pattern fights the platform.
Container platforms want logs emitted to standard output and standard error so that node agents, daemonsets, and observability pipelines can collect them consistently. Tipalti initially used Microsoft’s LogMonitor, which can redirect Windows container logs from files or event logs to standard output. That let the migration proceed without immediately rewriting logging internals.
But the stopgap had a cost. An additional process layer introduced overhead and complexity. Tipalti ultimately refactored the application’s logging configuration to write directly to standard output, removing the middleware.
This is a useful lesson for modernization projects: compatibility shims can buy time, but they should not be mistaken for the destination. The best container migrations eventually make the application speak the platform’s native language. In this case, that language was stdout, centralized collection, and correlation across distributed pods.
The benefit was not only prettier dashboards. Tipalti moved from fragmented file logs across machines and child processes to a model where engineers could identify affected pods and processes quickly. That is the kind of operational improvement that rarely appears in architecture diagrams but changes the daily life of a support team.

Autoscaling Worked Because the Queue Was the Business Signal​

Once containerization and logging were stable, Tipalti introduced Kubernetes Event-Driven Autoscaling, better known as KEDA. The scaler watched self-hosted RabbitMQ queue depths and adjusted pod replicas based on workload demand. During quieter periods, the application could run at baseline capacity; during payment surges, it could scale out automatically.
This was the right abstraction. CPU utilization is often a weak proxy for business pressure in queue-based systems. Queue depth tells the platform what users and upstream systems are actually experiencing: work is waiting.
Tipalti’s reported scaling pattern is dramatic. The application can move from roughly 10 pods to more than 100 during month-end processing periods, with scaling happening in minutes rather than hours. That is the difference between capacity planning as a human ritual and capacity response as a control loop.
It also explains much of the cost reduction. If a team provisions EC2 instances for the peak, it pays for idle headroom. If it provisions for the average, it suffers during spikes. Event-driven autoscaling gives the system a fighting chance to follow demand more closely.
The subtle point is that Tipalti did not need to split the monolith into dozens of services to get this benefit. It needed to make the unit of deployment replicable, observable, and safe to terminate. Kubernetes then had something it could scale.

Windows Containers Still Made the Team Pay Attention to the Metal​

The most sobering part of the migration is that Kubernetes did not erase infrastructure. It changed which infrastructure problems mattered. Tipalti discovered that Windows node startup and image pull performance were too slow for its operational needs.
New nodes took about seven minutes to join the cluster. Pulling Tipalti’s 4.7 GB container images initially took about four minutes. Combined, scale-up time could reach roughly 11 minutes, which is not ideal when payment-processing demand arrives in surges.
The team traced part of the problem to disk I/O. Doubling Amazon EBS throughput from 125 MB/s to 250 MB/s and IOPS from 3,000 to 6,000 helped. AWS-optimized AMIs with pre-cached base layers also reduced the cost of pulling large Windows images. Tipalti then went further by deploying an internal container image registry.
The result was a reduction in total scale-up time from 11 minutes to under seven minutes. That 36 percent improvement is meaningful, but it also reveals an uncomfortable truth: Windows container images are still heavy enough that storage, caching, and registry locality matter.
This is where Linux-native Kubernetes intuitions can mislead teams. Windows nodes, Windows base images, and .NET Framework dependencies create a different performance envelope. The control plane may be Kubernetes, but the physics are still Windows.

The Zombie Pod Bug Was a Reminder That Windows Networking Is Its Own Discipline​

Production introduced a more serious problem: pods getting stuck in the terminating state during deployments, effectively becoming “zombies.” Tipalti and AWS Support traced the issue to a race condition in the Windows Host Networking Service. When one pod was being created at the same moment another was terminating, the HNS registry could become corrupted, leaving a stale networking endpoint attached to a dead process.
The temporary mitigation was blunt but practical: rotate Windows nodes with a three-hour time-to-live. That forced periodic cleanup while the team waited for a better fix. AWS later released an updated Windows AMI that resolved the race condition, allowing Tipalti to remove the rotation workaround.
This is the sort of detail that separates a credible migration report from a marketing diagram. Running 106 pods across 23 Windows nodes is not a toy deployment. At that scale, lifecycle edge cases become production incidents.
For administrators, the lesson is not “avoid Windows containers.” It is that Windows container operations require Windows-specific expertise, not just generic Kubernetes knowledge. HNS, ENA behavior, AMI versions, and PowerShell-based node bootstrap changes are part of the operational surface.

DNS Broke at the Edge of Pod Density​

The second production issue was stranger. When pod density exceeded roughly 20 pods per node, containers began crashing with DNS resolution errors. Packet monitoring showed DNS queries reaching the host but being dropped by the virtual switch before they reached the containers.
The culprit was UDP Checksum Offload. The physical network adapter calculated the checksum, but the virtual switch interpreted it as invalid and dropped the packets. Tipalti disabled UDP Checksum Offload on the Amazon Elastic Network Adapter using PowerShell in node user data, and the DNS errors disappeared.
That fix is both satisfying and unsettling. It is satisfying because it is precise: identify the adapter, disable the offload behavior, stop the packet drops. It is unsettling because DNS failures at scale can look like application instability, service discovery trouble, or random platform flakiness until someone gets deep enough into packet behavior.
There was also an AWS-specific constraint in the background: link-local traffic limits per elastic network interface can affect DNS-heavy workloads. Modern Kubernetes applications often generate more DNS traffic than teams expect, and Windows networking adds its own translation layers.
This incident reinforces one of the oldest infrastructure lessons in a new costume. The higher-level platform only works when the lower-level network is behaving. Kubernetes does not make packet drops less real.

The Cost Reduction Came From Operations, Not Magic​

Tipalti reports a 60 percent cost reduction after the migration. That number is attention-grabbing, but it should not be interpreted as a universal discount attached to Windows containers on EKS. It reflects a particular workload with variable demand, a previous EC2 deployment that required manual capacity management, and a successful shift to autoscaling.
The important point is that the savings came from matching capacity to demand. A payment processor with month-end spikes is an ideal candidate for event-driven scaling because the workload has measurable backlog and predictable burst patterns. Kubernetes and KEDA converted that signal into replica counts.
The 50 percent performance improvement is similarly workload-specific but plausible in context. Better scaling, cleaner deployment behavior, faster access to logs, and tuned storage all reduce operational drag. Performance is not only raw instruction throughput; it is also how quickly a system can add capacity, avoid disruption, and recover from bottlenecks.
Still, enterprises should resist reading the case study as “containers reduce cost.” Containers can raise costs if teams overbuild clusters, ignore image size, run too many nodes, or add observability and service layers without discipline. The modernization win came because Tipalti changed the operating model, not because Kubernetes is a discount coupon.
That distinction matters. The platform enabled the savings. Engineering work captured them.

The Deployment Cadence Changed the Risk Model​

One of the quieter but more important results is that Tipalti moved from weekly deployments to multiple deployments per day. That kind of improvement often matters more than infrastructure cost over the long term. Faster deployment cadence changes how quickly teams can fix bugs, roll out operational improvements, and respond to customer needs.
But the reason it became possible is not simply that Kubernetes supports rolling updates. Tipalti first had to solve graceful shutdown. It had to solve logging. It had to understand pod termination behavior. It had to fix production networking issues.
This is why modernization sequencing matters. A team that chases deployment frequency before making termination safe can make outages more frequent. A team that chases autoscaling before making logs coherent can create a larger, harder-to-debug mess.
Tipalti’s migration reads as successful because the phases built on each other. Packaging came first. Shutdown semantics followed. Logging became platform-native. Autoscaling arrived after the workload could be safely replicated. Performance tuning and production debugging then made the system viable at scale.
There is a message here for IT leaders: the visible metrics arrive late. The less visible engineering hygiene comes first.

The Monolith Became More Observable Without Becoming Less Monolithic​

Perhaps the most interesting claim in the AWS post is that engineers gained visibility into individual service components and child processes even though the core application remained a monolith. This is a reminder that observability and microservices are not the same thing.
A monolith can be opaque or observable. A microservices architecture can be elegantly instrumented or a distributed fog machine. The architectural style does not automatically determine the debugging experience.
Tipalti’s EC2 deployment had file-based logs fragmented across instances and child processes. Its EKS deployment centralized logs and metrics, making it easier to identify which pod and child process were affected. Custom metrics exposed health signals that were difficult or impossible to obtain in the old model.
For Windows administrators, this is a practical modernization path. You do not have to wait for a complete rewrite to improve incident response. You can containerize, standardize output, collect telemetry, and make the system more legible while the application remains structurally familiar.
That may not satisfy purists. It should satisfy anyone who has had to debug a production payment system at 2 a.m.

The Real Pattern Is “Operational Decomposition”​

The phrase monolith modernization often implies code decomposition: identify bounded contexts, carve out services, replace old runtime dependencies, and gradually shrink the original application. That path is valid, but it is not the only path.
Tipalti’s story is closer to operational decomposition. The application logic remained largely intact, but responsibilities once tangled together on EC2 instances were separated into platform mechanisms. Scheduling moved to Kubernetes. Scaling moved to KEDA. Log collection moved to the cluster’s observability pipeline. Node lifecycle became an explicit part of infrastructure management.
This can be a powerful intermediate state. It reduces the blast radius of deployments, improves capacity elasticity, and gives teams better instrumentation before they attempt deeper refactoring. It can also reveal which parts of the monolith actually deserve to be split later.
The risk is that organizations stop there and declare victory. Containers can preserve legacy code so well that they remove pressure to modernize the codebase itself. That is fine if the business problem was primarily operational. It is less fine if the application remains brittle, tightly coupled, or hard to change.
The mature view is that containerizing a monolith is not an endpoint. It is a platform reset. Once the system is safer to deploy and easier to observe, the team has more room to make careful architectural changes.

Windows Shops Should Read This as Permission, Not a Prescription​

There is a temptation to treat every successful case study as a blueprint. Tipalti’s migration is better understood as permission. It shows that a legacy .NET Framework monolith can be moved into a modern Kubernetes operating model without a full rewrite, but it does not prove that every such application should be.
Good candidates share a few traits. They have variable demand. They can be replicated safely once shutdown is handled. Their external dependencies can tolerate multiple instances. Their logging can be moved to standard output or collected reliably. Their teams are willing to learn the Windows-specific corners of Kubernetes operations.
Bad candidates are easy to imagine. Applications with hard machine affinity, undocumented local state, fragile licensing tied to host identity, or unsafe concurrent processing may not benefit quickly. Neither will organizations that think Kubernetes removes the need for infrastructure expertise.
For many Windows estates, Amazon ECS, Azure Kubernetes Service, App Service, traditional VM scale sets, or a direct move to modern .NET may be more appropriate. The right answer depends on operational pain, team skill, compliance boundaries, and the realistic lifespan of the application.
What makes Tipalti’s example valuable is not that EKS wins every comparison. It is that the company found a way to get cloud-native operational properties around a Windows workload that could not simply be wished into a new architecture.

The Windows Container Story Is Better Than It Was, But Not Boring Yet​

Amazon EKS has supported Windows nodes for years, and AWS has continued improving areas such as managed node groups, optimized Windows AMIs, cached container layers, and higher pod density options. The platform is no longer experimental in the way it once was. But Tipalti’s production issues show that “supported” does not mean “frictionless.”
This is especially true for teams operating at meaningful scale. A demo cluster will not necessarily expose HNS race conditions, DNS packet drops, image pull bottlenecks, or graceful shutdown gaps. Production traffic has a way of finding the seams between abstractions.
The upside is that these seams are now better documented through real deployments. Tipalti’s mitigations are specific enough to be useful: lifecycle hooks before containerd support matures, direct stdout logging instead of LogMonitor when possible, EBS throughput tuning, pre-cached base layers, internal registries, AMI updates, and ENA checksum offload changes where needed.
That is a more honest kind of cloud story. It does not say the platform makes hard problems disappear. It says the platform gives teams better levers, if they are willing to pull them carefully.

The Tipalti Playbook Is Smaller Than a Rewrite and Larger Than a Lift-and-Shift​

The most concrete lesson from this migration is that the middle path is real but demanding. Tipalti did not merely rehost a Windows application. It changed how the application is packaged, started, stopped, scaled, observed, and debugged.
  • Tipalti kept its .NET Framework 4.7 monolith intact while moving the runtime model from EC2 instances to Windows containers on Amazon EKS.
  • The company used RabbitMQ queue depth through KEDA as the autoscaling signal, allowing capacity to follow payment-processing demand more closely.
  • The migration required Windows-specific fixes around shutdown behavior, Host Networking Service race conditions, DNS packet drops, and network adapter offload settings.
  • Logging moved from fragmented local files to centralized container output, which materially improved troubleshooting and reduced time to resolution.
  • The reported gains were substantial: 60 percent lower cost, 50 percent better performance, zero data loss during deployments, and a move from weekly releases to multiple deployments per day.
  • The result is best understood as an operational modernization of the monolith, not a replacement for deeper application refactoring when the business eventually demands it.
The practical value of Tipalti’s migration is that it lowers the emotional temperature around legacy Windows modernization. Not every monolith needs to be condemned, and not every successful cloud migration begins with a rewrite. The more interesting future is one where old .NET Framework systems are first made safer, more observable, and more elastic—and only then, from a stronger operational footing, selectively rebuilt where the business case is real.

References​

  1. Primary source: Amazon Web Services (AWS)
    Published: 2026-06-03T15:40:10.935871
  2. Related coverage: docs.aws.amazon.com
  3. Related coverage: techtarget.com
 

Back
Top