OpenAI Postgres Scaling on Azure: Writes, Replicas, and Read Optimization

  • Thread Author
OpenAI’s database story is unexpectedly simple on paper but fiendishly complex in practice: a single primary PostgreSQL writer, dozens of read-only replicas, and a relentless focus on smoothing writes, offloading work, and squeezing every millisecond out of reads — all running on Azure Database for PostgreSQL and pushed to the limit in real production traffic.

Central PostgreSQL primary with WAL streaming to global read replicas.Background / Overview​

PostgreSQL is often described as a developer-friendly, rock‑solid relational database — and for many workloads that’s true. But the common perception is that relational databases break under “web-scale” modern AI workloads. OpenAI’s experience proves a more pragmatic truth: with disciplined engineering and the right managed platform features, PostgreSQL can be a dependable foundation even at extraordinary scale.
OpenAI’s story — as told in a Microsoft for Startups blog that summarizes a POSETTE/PGConf presentation by Bohan Zhang of OpenAI — shows a classic lifecycle: start simple, hit bottlenecks, instrument and optimize, add managed platform features, and iterate. The company began with a single-writer, multi-reader Postgres topology on Azure Database for PostgreSQL, then added practices and platform extensions to address emerging bottlenecks as ChatGPT usage exploded.
This article unpacks that journey, verifies key technical claims against available documentation and conference sources, and provides practical, risk-aware guidance for teams that want to replicate OpenAI’s PostgreSQL strategy using Azure Database for PostgreSQL, PgBouncer, elastic clusters (Citus), and cascading read replicas.

Why OpenAI stayed with PostgreSQL​

OpenAI’s decision to keep PostgreSQL at the center of critical systems comes down to three pragmatic advantages:
  • Familiar transactional semantics (ACID) for critical metadata.
  • Mature tooling and ecosystem (extensions, monitoring, schema migration patterns).
  • A managed platform (Azure Database for PostgreSQL) that removes heavy operational burden and provides enterprise features like automated backups, HA, and integrated replicas.
The company’s early architecture used a single primary for writes and dozens of read-only replicas for reads — a pattern that scales reads superbly while keeping application complexity low. Geographic replica placement reduced latency for global users without forcing distributed application logic. But as writes grew, the primary became a bottleneck — precisely the moment when OpenAI moved from “scale by throwing replicas at the problem” to focused engineering on write smoothing and platform-level scale-out.

The technical playbook: how OpenAI scaled Postgres on Azure​

Below are the core strategies OpenAI used (and explicitly described in the POSETTE / conference talk and the Microsoft Startup blog), augmented with verification from Azure documentation and independent conference listings.

1) Offload and smooth writes — treat the primary like a scarce resource​

Writes are the hardest part of scaling a single-primary Postgres cluster. OpenAI’s approach:
  • Reduce unnecessary writes at the source. Push filtering and deduplication upstream so that only essential operations hit the primary.
  • Introduce controlled timing (batching, rate-limits, windows) for background jobs and non-critical writes.
  • Offload heavy write patterns to purpose-built subsystems (e.g., object storage, caches, or specialized append-only stores) when strict transactional semantics are not required.
These measures keep WAL generation and table bloat under control, reduce the chance of write-related contention, and make replication more predictable. The POSETTE / PGConf abstracts and subsequent writeups corroborate this emphasis on write smoothing as a core lever for staying with a single writer.

2) Scale reads with replicas and smart routing​

With writes under control, OpenAI invested heavily in read scaling:
  • Dozens of read replicas, including cross‑region replicas, to serve global traffic.
  • Priority-based routing: dedicate replica pools to high-priority queries to prevent noisy, low-priority workloads from affecting latency-sensitive reads.
  • Connection pooling (PgBouncer) to avoid exploding backend connections and to stabilize per-node resource usage.
  • Query optimization and slow‑query triage to eliminate pathological queries that can stall replicas.
Azure’s read replica features and the newer cascading read replicas capability are central here: the platform supports creating replicas asynchronously and — in preview — building second‑level replicas from existing ones to scale read capacity across regions more efficiently. Note the platform’s documented limits and caveats (replication is asynchronous, slots can cause WAL accumulation, certain intermediate replicas cannot be promoted).

3) Schema governance, transaction hygiene, and operational guardrails​

Scaling requires discipline as much as horsepower:
  • Strict schema change policies (avoid full table rewrites on hot tables; use CONCURRENTLY for index changes).
  • Limit long-running transactions that block VACUUM and DDL operations.
  • Application- and DB-level rate limits for connections and queries.
  • Comprehensive observability and runbooks for failover, replication lag, and WAL management.
These are operational best practices reported by OpenAI and reinforced across conference materials. They’re necessary to keep large replica fleets healthy and to preserve the ability to perform online maintenance without surprise SEV0 outages.

Platform features that made scale feasible on Azure​

OpenAI’s engineering work mattered — but the managed platform features in Azure Database for PostgreSQL accelerated, simplified, and made resilient the cluster operations. Three features stand out:

Elastic clusters (Citus) for horizontal sharding (preview)​

Azure’s elastic clusters are a managed implementation of the Citus extension that enables horizontal sharding across multiple PostgreSQL nodes, offering:
  • Row‑based and schema‑based sharding models, selectable per workload.
  • A unified cluster endpoint and built-in node management, simplifying cluster-level DDL and distribution.
  • Support for coordinator and worker nodes with port-based routing for cluster-aware connection balancing.
Elastic clusters are available in preview and are explicitly designed to offload application-level sharding complexity into the managed service — a path OpenAI can take once single-writer constraints make sharding necessary for writes. Microsoft docs and the product blog detail the preview and its current limitations (region coverage, certain unsupported extensions, and preview-level constraints).

Cascading read replicas​

To scale reads beyond a small fixed number of replicas, Azure introduced cascading read replicas (preview). With cascading replicas:
  • A primary can have up to 5 first-level replicas.
  • Each first-level replica may generate up to 5 second-level replicas, enabling up to 30 read servers in a two-level hierarchy.
  • This can reduce pressure on the primary and let teams create region-local replica chains to serve global traffic more efficiently.
There are trade-offs: replication remains asynchronous, intermediate replicas typically cannot be promoted, and the feature is subject to preview limitations and regional availability. These operational caveats are critical when designing failover and recovery plans.

Connection pooling and PgBouncer​

Connection pooling is a core piece of OpenAI’s posture. PgBouncer reduces backend connection counts, enabling thousands of client connections to be multiplexed efficiently against a much smaller set of Postgres backend processes. Azure’s elastic cluster guidance and OpenAI’s talks both recommend PgBouncer as a pragmatic way to avoid exhaustion of database connections and to reduce per-connection overhead.

What the numbers say — verification and caveats​

The Microsoft for Startups blog and Bohan Zhang’s POSETTE / PGConf talks include several headline metrics that make the architecture story tangible:
  • “Millions of queries per second (combined reads and writes) across OpenAI’s PostgreSQL cluster.” This figure was reported by OpenAI in conference abstracts and repeated in the Microsoft blog; the claims originate with OpenAI’s infra team rather than an independent benchmark publication. POSETTE/PGConf session listings and third‑party writeups of the talk mirror the claim, but third‑party, reproducible benchmarks are not publicly available at the same granularity, so treat the QPS claim as OpenAI’s operational measurement rather than a verified independent test.
  • Read scale with “dozens” of replicas including cross-region placements. Azure documentation and product blogs show that the platform supports many read replicas (and now cascading replicas in preview), enabling broad geographic coverage. However, default architecture limits (e.g., five direct replicas per primary before cascading) come into play and require planning.
  • “Ten times faster response times — from ≈50ms to under 5ms for many queries” after connection pooling and query optimizations. This is a reported improvement by OpenAI and Microsoft; again, it reflects measured production results for specific query shapes and cannot be generalized without careful measurement in other environments. Still, the direction is credible: connection pooling + query tuning commonly yields order-of-magnitude improvements for short, frequently-executed queries.
  • Incident reduction: one Sev0 related to Postgres in nine months after the improvements. This is a reliability statistic reported by OpenAI/Microsoft. External verification is limited; the number is best treated as a vendor-provided operational metric.
In short: the claims are consistent across OpenAI presentations and Microsoft’s writeups, and they are plausible given the architecture and platform features. But independent, reproducible public benchmarks for the exact workload are not available, so readers should view the numbers as credible operator-reported outcomes, not as a neutral benchmark that can be copied blindly into another environment.

Critical analysis — strengths, trade-offs, and risks​

OpenAI’s approach highlights key strengths but also surfaces nontrivial operational trade-offs that every engineering leader must weigh.

Strengths​

  • Proven, maintainable stack. Using PostgreSQL means a wide pool of tools, operators, and expertise. OpenAI’s team could focus on specific bottlenecks instead of inventing a new storage system.
  • Scaling reads cheaply. As long as the workload remains read-dominant, adding replicas is a low-friction way to grow global capacity and reduce tail latency.
  • Managed platform speed. Azure mitigates much of the operational toil (patching, backups, availability), which lets a small team punch above its weight operationally.

Trade-offs and operational risks​

  • Single-writer bottleneck for writes. At some point — particularly for write-heavy workloads — a single-writer architecture requires either batching+offload strategies or horizontal sharding (Citus/elastic clusters). The transition is nontrivial: sharding introduces complexity in schema design, cross-shard transactions, and tooling.
  • Asynchronous replication caveats. Replication lag, WAL accumulation due to inactive replication slots, and the inability to promote intermediate cascading replicas in some cases are real hazards. Misconfigured replication slots can cause primary storage to fill, triggering read-only fallbacks or outages. Azure documentation explicitly warns about WAL growth and recommends monitoring.
  • Operational surface area expands with scale. Dozens of replicas, PgBouncer instances, and region-distributed topologies require hardened observability, automated remediation, and SRE processes that many startups do not initially have.
  • Preview features and region limits. Elastic clusters and cascading replicas have preview status and are subject to regional availability and functional limitations (unsupported extensions, promotion restrictions). Building critical systems onto preview features without contingency plans is risky.
  • Vendor lock-in and escape complexity. Heavy reliance on managed features like Citus as implemented by a cloud provider can make future migration or multi-cloud operations more complex and costly.

Practical checklist: how to apply these lessons in your organization​

Below is a pragmatic checklist distilled from OpenAI’s playbook and platform docs. Use it as a starting point for teams aiming for Postgres-based scale on Azure.
  • Prioritize measurement:
  • Define SLOs for latency and throughput.
  • Instrument query-level metrics, WAL generation rates, replication lag, and connection counts.
  • Start simple, then iterate:
  • Deploy a single-primary with read replicas for early growth.
  • Add PgBouncer to control connections before adding replicas.
  • Optimize writes:
  • Batch noisy writes; implement write windows for noncritical tasks.
  • Offload append-only or analytic workloads to purpose-built stores.
  • Harden schema change processes:
  • Allow only lightweight online schema changes on hot tables.
  • Use CONCURRENTLY for indexes; schedule heavy migrations with traffic throttling.
  • Manage replication and WAL:
  • Monitor replication slots and set alerts for WAL accumulation.
  • Test behavior when a replica lags or becomes unhealthy (do not assume automatic rebalancing).
  • Plan for sharding deliberately:
  • If writes cannot be reduced, evaluate elastic clusters (Citus) for horizontal sharding.
  • Prototype schema‑ or row‑based sharding in staging; measure cross-shard query patterns.
  • Use cascading replicas judiciously:
  • Reserve cascading replicas for read-heavy, geo-distributed workloads.
  • Understand that not all intermediate replicas are promotable; design DR accordingly.
  • Automate and document SRE playbooks:
  • Failure scenarios, replica promotion workflows, backup/restore tests, and capacity growth runbooks.
  • Run periodic disaster recovery drills.
  • Security and compliance:
  • Use VNet and Private Endpoint where possible.
  • Ensure encryption, access control, and logging meet regulatory needs.
(Technical references for read replica limits, cascading replica behavior, and elastic cluster design are available in Azure’s PostgreSQL documentation and product updates.)

When to consider sharding and managed distributed Postgres​

Citus (elastic clusters) provides two sharding models that matter for design:
  • Row‑based sharding (best density and single-table performance) — requires schema changes and query adjustments.
  • Schema‑based sharding (per-tenant schemas) — easier to adopt for multi-tenant apps but provides fewer tenants per node.
Azure’s elastic clusters bring Citus into the managed space, simplifying node management, but remember preview limitations (extensions, region coverage, and some features). Migration to sharded clusters requires careful data modeling and testing; it’s not a drop-in fix for write-heavy problems.

Short case study: what the evidence shows (conservative reading)​

  • Multiple independent conference listings and writeups document that Bohan Zhang presented OpenAI’s Postgres scaling approach at POSETTE and PGConf, claiming “millions of QPS” and describing the use of PgBouncer, replicas, and operational governance. These are first-hand operator claims.
  • Microsoft’s for Startups blog synthesizes that talk and details Azure features (elastic clusters, cascading replicas) that were developed and previewed concurrent with OpenAI’s scaling work. Azure docs and TechCommunity posts confirm the new features and their preview status.
  • Third‑party technical writeups and conference summaries (independent blogs and community pages) echo the core claims and add practical notes about replication slots, PgBouncer usage, and the trade‑offs of single-writer architectures. These independent writeups support the plausibility of the metrics while underscoring that the precise workload shapes that drove OpenAI’s numbers are not publicly reproducible.
Conclusion from evidence: OpenAI’s reported outcomes are credible operator-reported results backed by conference talks and platform features. They demonstrate that a relational database plus a disciplined operational model and managed cloud features can scale very far — but the practicalities (preview features, operational rigor, and the need to plan for sharding) remain real constraints.

Final verdict and guidance for WindowsForum readers​

OpenAI’s experience is instructive for startups and engineering teams that want a low‑friction path to scale without reinventing the storage layer:
  • If your workload is read-heavy today, a single-writer Postgres with aggressive read-replica strategy, PgBouncer, and query optimization is a strong, pragmatic architecture. Azure Database for PostgreSQL offers features that help implement that model quickly.
  • If write scale becomes the bottleneck, prepare for a deliberate migration toward horizontal sharding using Citus/elastic clusters — this is available in preview on Azure, but you must model cross-shard queries, transactional needs, and toolchain differences.
  • Operational discipline matters more than exotic tech choices. Schema governance, connection pooling, rate limits, and observability are what turned Postgres from a fragile point of failure into a highly reliable component of OpenAI’s stack.
  • Treat vendor preview features carefully. Elastic clusters and cascading replicas can be powerful, but their preview caveats, region limits, and functional restrictions must factor into any production rollout plan.
OpenAI’s story is inspiring because it’s not about mystical infrastructure; it’s about incremental engineering rigor plus the right managed platform capabilities. For Windows-focused teams building AI-backed apps, this is a compelling counterexample to the “must use a purpose-built exotic database” narrative — but it’s also a reminder that scaling requires planning, automation, and SRE muscle before the traffic arrives.

Action plan (1–2–3 for teams that want to follow OpenAI’s playbook)​

  • Baseline and instrument:
  • Measure current QPS, median and p95 latency, WAL generation, and connection counts.
  • Apply quick wins:
  • Deploy PgBouncer, enable read replicas, add prioritized replica pools for critical reads.
  • Prepare the long game:
  • Harden schema-change policies and rate limiting.
  • Prototype elastic clusters (Citus) in staging and model cross-shard behavior.
  • Build DR playbooks that explicitly cover replication lag, WAL accumulation, and replica promotion scenarios.
Implementing these steps will not make Postgres “magically infinite,” but it will put you on a engineering path that scaled a company serving millions daily — and give your team the tools to measure when the next architectural transition (sharding, specialized stores) is truly required.

OpenAI’s experience shows a repeatable pattern: use a managed relational database to move fast, invest in operational optimizations to postpone complexity, and then adopt managed sharding features when real write scale forces a topology change. That is a model most startups and enterprise teams can copy — if they pair it with metrics, SRE discipline, and realistic contingency planning.

Source: Microsoft How OpenAI scaled with Azure Database for PostgreSQL - Microsoft for Startups Blog
 

Back
Top