OpenAI’s database story is unexpectedly simple on paper but fiendishly complex in practice: a single primary PostgreSQL writer, dozens of read-only replicas, and a relentless focus on smoothing writes, offloading work, and squeezing every millisecond out of reads — all running on Azure Database for PostgreSQL and pushed to the limit in real production traffic.
PostgreSQL is often described as a developer-friendly, rock‑solid relational database — and for many workloads that’s true. But the common perception is that relational databases break under “web-scale” modern AI workloads. OpenAI’s experience proves a more pragmatic truth: with disciplined engineering and the right managed platform features, PostgreSQL can be a dependable foundation even at extraordinary scale.
OpenAI’s story — as told in a Microsoft for Startups blog that summarizes a POSETTE/PGConf presentation by Bohan Zhang of OpenAI — shows a classic lifecycle: start simple, hit bottlenecks, instrument and optimize, add managed platform features, and iterate. The company began with a single-writer, multi-reader Postgres topology on Azure Database for PostgreSQL, then added practices and platform extensions to address emerging bottlenecks as ChatGPT usage exploded.
This article unpacks that journey, verifies key technical claims against available documentation and conference sources, and provides practical, risk-aware guidance for teams that want to replicate OpenAI’s PostgreSQL strategy using Azure Database for PostgreSQL, PgBouncer, elastic clusters (Citus), and cascading read replicas.
OpenAI’s experience shows a repeatable pattern: use a managed relational database to move fast, invest in operational optimizations to postpone complexity, and then adopt managed sharding features when real write scale forces a topology change. That is a model most startups and enterprise teams can copy — if they pair it with metrics, SRE discipline, and realistic contingency planning.
Source: Microsoft How OpenAI scaled with Azure Database for PostgreSQL - Microsoft for Startups Blog
Background / Overview
PostgreSQL is often described as a developer-friendly, rock‑solid relational database — and for many workloads that’s true. But the common perception is that relational databases break under “web-scale” modern AI workloads. OpenAI’s experience proves a more pragmatic truth: with disciplined engineering and the right managed platform features, PostgreSQL can be a dependable foundation even at extraordinary scale.OpenAI’s story — as told in a Microsoft for Startups blog that summarizes a POSETTE/PGConf presentation by Bohan Zhang of OpenAI — shows a classic lifecycle: start simple, hit bottlenecks, instrument and optimize, add managed platform features, and iterate. The company began with a single-writer, multi-reader Postgres topology on Azure Database for PostgreSQL, then added practices and platform extensions to address emerging bottlenecks as ChatGPT usage exploded.
This article unpacks that journey, verifies key technical claims against available documentation and conference sources, and provides practical, risk-aware guidance for teams that want to replicate OpenAI’s PostgreSQL strategy using Azure Database for PostgreSQL, PgBouncer, elastic clusters (Citus), and cascading read replicas.
Why OpenAI stayed with PostgreSQL
OpenAI’s decision to keep PostgreSQL at the center of critical systems comes down to three pragmatic advantages:- Familiar transactional semantics (ACID) for critical metadata.
- Mature tooling and ecosystem (extensions, monitoring, schema migration patterns).
- A managed platform (Azure Database for PostgreSQL) that removes heavy operational burden and provides enterprise features like automated backups, HA, and integrated replicas.
The technical playbook: how OpenAI scaled Postgres on Azure
Below are the core strategies OpenAI used (and explicitly described in the POSETTE / conference talk and the Microsoft Startup blog), augmented with verification from Azure documentation and independent conference listings.1) Offload and smooth writes — treat the primary like a scarce resource
Writes are the hardest part of scaling a single-primary Postgres cluster. OpenAI’s approach:- Reduce unnecessary writes at the source. Push filtering and deduplication upstream so that only essential operations hit the primary.
- Introduce controlled timing (batching, rate-limits, windows) for background jobs and non-critical writes.
- Offload heavy write patterns to purpose-built subsystems (e.g., object storage, caches, or specialized append-only stores) when strict transactional semantics are not required.
2) Scale reads with replicas and smart routing
With writes under control, OpenAI invested heavily in read scaling:- Dozens of read replicas, including cross‑region replicas, to serve global traffic.
- Priority-based routing: dedicate replica pools to high-priority queries to prevent noisy, low-priority workloads from affecting latency-sensitive reads.
- Connection pooling (PgBouncer) to avoid exploding backend connections and to stabilize per-node resource usage.
- Query optimization and slow‑query triage to eliminate pathological queries that can stall replicas.
3) Schema governance, transaction hygiene, and operational guardrails
Scaling requires discipline as much as horsepower:- Strict schema change policies (avoid full table rewrites on hot tables; use CONCURRENTLY for index changes).
- Limit long-running transactions that block VACUUM and DDL operations.
- Application- and DB-level rate limits for connections and queries.
- Comprehensive observability and runbooks for failover, replication lag, and WAL management.
Platform features that made scale feasible on Azure
OpenAI’s engineering work mattered — but the managed platform features in Azure Database for PostgreSQL accelerated, simplified, and made resilient the cluster operations. Three features stand out:Elastic clusters (Citus) for horizontal sharding (preview)
Azure’s elastic clusters are a managed implementation of the Citus extension that enables horizontal sharding across multiple PostgreSQL nodes, offering:- Row‑based and schema‑based sharding models, selectable per workload.
- A unified cluster endpoint and built-in node management, simplifying cluster-level DDL and distribution.
- Support for coordinator and worker nodes with port-based routing for cluster-aware connection balancing.
Cascading read replicas
To scale reads beyond a small fixed number of replicas, Azure introduced cascading read replicas (preview). With cascading replicas:- A primary can have up to 5 first-level replicas.
- Each first-level replica may generate up to 5 second-level replicas, enabling up to 30 read servers in a two-level hierarchy.
- This can reduce pressure on the primary and let teams create region-local replica chains to serve global traffic more efficiently.
Connection pooling and PgBouncer
Connection pooling is a core piece of OpenAI’s posture. PgBouncer reduces backend connection counts, enabling thousands of client connections to be multiplexed efficiently against a much smaller set of Postgres backend processes. Azure’s elastic cluster guidance and OpenAI’s talks both recommend PgBouncer as a pragmatic way to avoid exhaustion of database connections and to reduce per-connection overhead.What the numbers say — verification and caveats
The Microsoft for Startups blog and Bohan Zhang’s POSETTE / PGConf talks include several headline metrics that make the architecture story tangible:- “Millions of queries per second (combined reads and writes) across OpenAI’s PostgreSQL cluster.” This figure was reported by OpenAI in conference abstracts and repeated in the Microsoft blog; the claims originate with OpenAI’s infra team rather than an independent benchmark publication. POSETTE/PGConf session listings and third‑party writeups of the talk mirror the claim, but third‑party, reproducible benchmarks are not publicly available at the same granularity, so treat the QPS claim as OpenAI’s operational measurement rather than a verified independent test.
- Read scale with “dozens” of replicas including cross-region placements. Azure documentation and product blogs show that the platform supports many read replicas (and now cascading replicas in preview), enabling broad geographic coverage. However, default architecture limits (e.g., five direct replicas per primary before cascading) come into play and require planning.
- “Ten times faster response times — from ≈50ms to under 5ms for many queries” after connection pooling and query optimizations. This is a reported improvement by OpenAI and Microsoft; again, it reflects measured production results for specific query shapes and cannot be generalized without careful measurement in other environments. Still, the direction is credible: connection pooling + query tuning commonly yields order-of-magnitude improvements for short, frequently-executed queries.
- Incident reduction: one Sev0 related to Postgres in nine months after the improvements. This is a reliability statistic reported by OpenAI/Microsoft. External verification is limited; the number is best treated as a vendor-provided operational metric.
Critical analysis — strengths, trade-offs, and risks
OpenAI’s approach highlights key strengths but also surfaces nontrivial operational trade-offs that every engineering leader must weigh.Strengths
- Proven, maintainable stack. Using PostgreSQL means a wide pool of tools, operators, and expertise. OpenAI’s team could focus on specific bottlenecks instead of inventing a new storage system.
- Scaling reads cheaply. As long as the workload remains read-dominant, adding replicas is a low-friction way to grow global capacity and reduce tail latency.
- Managed platform speed. Azure mitigates much of the operational toil (patching, backups, availability), which lets a small team punch above its weight operationally.
Trade-offs and operational risks
- Single-writer bottleneck for writes. At some point — particularly for write-heavy workloads — a single-writer architecture requires either batching+offload strategies or horizontal sharding (Citus/elastic clusters). The transition is nontrivial: sharding introduces complexity in schema design, cross-shard transactions, and tooling.
- Asynchronous replication caveats. Replication lag, WAL accumulation due to inactive replication slots, and the inability to promote intermediate cascading replicas in some cases are real hazards. Misconfigured replication slots can cause primary storage to fill, triggering read-only fallbacks or outages. Azure documentation explicitly warns about WAL growth and recommends monitoring.
- Operational surface area expands with scale. Dozens of replicas, PgBouncer instances, and region-distributed topologies require hardened observability, automated remediation, and SRE processes that many startups do not initially have.
- Preview features and region limits. Elastic clusters and cascading replicas have preview status and are subject to regional availability and functional limitations (unsupported extensions, promotion restrictions). Building critical systems onto preview features without contingency plans is risky.
- Vendor lock-in and escape complexity. Heavy reliance on managed features like Citus as implemented by a cloud provider can make future migration or multi-cloud operations more complex and costly.
Practical checklist: how to apply these lessons in your organization
Below is a pragmatic checklist distilled from OpenAI’s playbook and platform docs. Use it as a starting point for teams aiming for Postgres-based scale on Azure.- Prioritize measurement:
- Define SLOs for latency and throughput.
- Instrument query-level metrics, WAL generation rates, replication lag, and connection counts.
- Start simple, then iterate:
- Deploy a single-primary with read replicas for early growth.
- Add PgBouncer to control connections before adding replicas.
- Optimize writes:
- Batch noisy writes; implement write windows for noncritical tasks.
- Offload append-only or analytic workloads to purpose-built stores.
- Harden schema change processes:
- Allow only lightweight online schema changes on hot tables.
- Use CONCURRENTLY for indexes; schedule heavy migrations with traffic throttling.
- Manage replication and WAL:
- Monitor replication slots and set alerts for WAL accumulation.
- Test behavior when a replica lags or becomes unhealthy (do not assume automatic rebalancing).
- Plan for sharding deliberately:
- If writes cannot be reduced, evaluate elastic clusters (Citus) for horizontal sharding.
- Prototype schema‑ or row‑based sharding in staging; measure cross-shard query patterns.
- Use cascading replicas judiciously:
- Reserve cascading replicas for read-heavy, geo-distributed workloads.
- Understand that not all intermediate replicas are promotable; design DR accordingly.
- Automate and document SRE playbooks:
- Failure scenarios, replica promotion workflows, backup/restore tests, and capacity growth runbooks.
- Run periodic disaster recovery drills.
- Security and compliance:
- Use VNet and Private Endpoint where possible.
- Ensure encryption, access control, and logging meet regulatory needs.
When to consider sharding and managed distributed Postgres
Citus (elastic clusters) provides two sharding models that matter for design:- Row‑based sharding (best density and single-table performance) — requires schema changes and query adjustments.
- Schema‑based sharding (per-tenant schemas) — easier to adopt for multi-tenant apps but provides fewer tenants per node.
Short case study: what the evidence shows (conservative reading)
- Multiple independent conference listings and writeups document that Bohan Zhang presented OpenAI’s Postgres scaling approach at POSETTE and PGConf, claiming “millions of QPS” and describing the use of PgBouncer, replicas, and operational governance. These are first-hand operator claims.
- Microsoft’s for Startups blog synthesizes that talk and details Azure features (elastic clusters, cascading replicas) that were developed and previewed concurrent with OpenAI’s scaling work. Azure docs and TechCommunity posts confirm the new features and their preview status.
- Third‑party technical writeups and conference summaries (independent blogs and community pages) echo the core claims and add practical notes about replication slots, PgBouncer usage, and the trade‑offs of single-writer architectures. These independent writeups support the plausibility of the metrics while underscoring that the precise workload shapes that drove OpenAI’s numbers are not publicly reproducible.
Final verdict and guidance for WindowsForum readers
OpenAI’s experience is instructive for startups and engineering teams that want a low‑friction path to scale without reinventing the storage layer:- If your workload is read-heavy today, a single-writer Postgres with aggressive read-replica strategy, PgBouncer, and query optimization is a strong, pragmatic architecture. Azure Database for PostgreSQL offers features that help implement that model quickly.
- If write scale becomes the bottleneck, prepare for a deliberate migration toward horizontal sharding using Citus/elastic clusters — this is available in preview on Azure, but you must model cross-shard queries, transactional needs, and toolchain differences.
- Operational discipline matters more than exotic tech choices. Schema governance, connection pooling, rate limits, and observability are what turned Postgres from a fragile point of failure into a highly reliable component of OpenAI’s stack.
- Treat vendor preview features carefully. Elastic clusters and cascading replicas can be powerful, but their preview caveats, region limits, and functional restrictions must factor into any production rollout plan.
Action plan (1–2–3 for teams that want to follow OpenAI’s playbook)
- Baseline and instrument:
- Measure current QPS, median and p95 latency, WAL generation, and connection counts.
- Apply quick wins:
- Deploy PgBouncer, enable read replicas, add prioritized replica pools for critical reads.
- Prepare the long game:
- Harden schema-change policies and rate limiting.
- Prototype elastic clusters (Citus) in staging and model cross-shard behavior.
- Build DR playbooks that explicitly cover replication lag, WAL accumulation, and replica promotion scenarios.
OpenAI’s experience shows a repeatable pattern: use a managed relational database to move fast, invest in operational optimizations to postpone complexity, and then adopt managed sharding features when real write scale forces a topology change. That is a model most startups and enterprise teams can copy — if they pair it with metrics, SRE discipline, and realistic contingency planning.
Source: Microsoft How OpenAI scaled with Azure Database for PostgreSQL - Microsoft for Startups Blog