Nebius’s new Nebius Token Factory, unveiled on November 5, 2025, is a full-stack production inference platform that explicitly targets enterprises tired of closed, proprietary AI stacks and hyperscaler lock‑in — promising support for more than 60 open‑source models, sub‑second inference latency, autoscaling throughput and a 99.9% uptime SLA while running on Nebius’s growing global AI cloud.
Nebius began life as the non‑Russian remnant of Yandex and completed its separation in 2024. Since then the company has rebranded, raised large pools of capital and pivoted to building an AI‑native cloud business that combines custom hardware, GPU farms and software for model lifecycle management. The company’s SEC filings and major news coverage document its pivot from search assets to a dedicated AI infrastructure provider and a rapid footprint expansion in Europe, the United States and Israel. Token Factory is the latest product in that evolution: a platform Nebius describes as the successor to Nebius AI Studio that folds together inference, fine‑tuning, endpoint management, governance (teams, SSO, RBAC), compliance controls and dedicated performance SLAs into a single enterprise offering. Nebius positions the product as both a technical and commercial response to organizations that want open model freedom without trading away production guarantees.
Source: Rolling Out Nebius takes on Amazon and Microsoft with new AI platform
Background / Overview
Nebius began life as the non‑Russian remnant of Yandex and completed its separation in 2024. Since then the company has rebranded, raised large pools of capital and pivoted to building an AI‑native cloud business that combines custom hardware, GPU farms and software for model lifecycle management. The company’s SEC filings and major news coverage document its pivot from search assets to a dedicated AI infrastructure provider and a rapid footprint expansion in Europe, the United States and Israel. Token Factory is the latest product in that evolution: a platform Nebius describes as the successor to Nebius AI Studio that folds together inference, fine‑tuning, endpoint management, governance (teams, SSO, RBAC), compliance controls and dedicated performance SLAs into a single enterprise offering. Nebius positions the product as both a technical and commercial response to organizations that want open model freedom without trading away production guarantees. What Nebius Token Factory actually offers
Feature set at launch
Nebius describes Token Factory as a production inference and model‑lifecycle platform built on Nebius AI Cloud 3.0 (“Aether”) with the following headline capabilities:- Support for 60+ open‑source models across text, code and vision families, including named weights such as DeepSeek, GPT‑OSS, Meta Llama, NVIDIA Nemotron and Qwen.
- OpenAI‑compatible APIs to simplify migrations from proprietary endpoints.
- Enterprise governance: team workspaces, SSO, unified billing, audit trails and region‑specific zero‑retention inference endpoints.
- Fine‑tuning and post‑training pipelines (LoRA and full‑model support) with one‑click promotion of tuned models to production endpoints.
- Performance SLAs: Nebius claims sub‑second latency for many workloads, autoscaling throughput to handle bursty traffic, and a 99.9% availability guarantee even at very high QPS.
Packaging and availability
Token Factory is rolled out as the successor to Nebius AI Studio. Existing Nebius AI customers are slated to be automatically upgraded to Token Factory, and Nebius reports early enterprise adopters and partners — including Prosus and Higgsfield AI — who participated in pilots or early deployments. Nebius also says it is collaborating with open‑model ecosystem players such as Hugging Face to improve developer access and model portability. Note that much of this information is published in Nebius press materials and syndication outlets.The technical backbone: hardware, MLPerf and NVIDIA relationships
Nebius’s AI cloud and data‑center footprint
Nebius has been explicit about building a vertically integrated AI cloud: proprietary rack and chassis designs, custom ODM hardware choices, and plans for large build‑to‑suit campuses. Public filings state that Nebius operates a proprietary data center in Finland, co‑location clusters in Paris and Iceland, and new U.S. sites including Kansas City and a phased 300 MW New Jersey campus that will host large Blackwell (NVIDIA GB200/Blackwell‑class) clusters. Company filings also disclose that Nebius had on the order of tens of thousands of GPUs in service or planned expansion as of early‑to‑mid 2025.Benchmarks and vendor certifications
Nebius claims leading MLPerf® Inference v5.1 submissions on NVIDIA GB200 NVL72 and HGX B200 systems and notes it has qualified as one of NVIDIA’s Exemplar Clouds — a designation and ecosystem program NVIDIA launched to signal partners that meet high performance and integration bar for Blackwell‑class hardware. These claims are published on Nebius’s technical blog and in product announcements; MLPerf results are public and vendor‑submitted, so they reflect reproducible synthetic benchmarks under specified test configurations but may not map exactly to every real‑world workload. Important nuance: vendor MLPerf submissions are helpful for capacity comparisons and relative throughput, but synthetic benchmarks are not a substitute for customer‑specific performance testing. Nebius’s MLPerf achievement is a material engineering milestone, but enterprises should still benchmark token throughput, latency and tail‑latency using representative application workloads before committing SLA‑sensitive production traffic.Competitive landscape — why Nebius thinks it can take share
Nebius positions Token Factory as a direct response to three market dynamics: (1) enterprises resisting hyperscaler lock‑in; (2) the rapid maturation of open‑model quality; and (3) rising demand for predictable, cost‑efficient inference at scale.- The primary incumbents that most enterprises consider are Amazon Web Services (Bedrock/EC2), Microsoft Azure (Azure AI / Foundry / OpenAI on Azure) and Google Cloud Platform — these hyperscalers combine model catalogs with deeply integrated cloud services and global reach. Nebius’s pitch is differentiated on a single axis: open model freedom + production grade SLAs on a dedicated AI‑native footprint.
- A second group of competitors are specialist model‑serving startups — Fireworks and Baseten among them — that offer developer‑friendly model deployment, autoscaling and low‑latency inference with focus on open models and parameter‑efficient fine‑tuning. These startups compete on price/performance, developer ergonomics, and fast path‑to‑production for early stage teams. Nebius argues that its scale, custom hardware and validated performance give it an edge for large enterprises and very high QPS workloads.
- Finally, other “neocloud” AI infrastructure players (CoreWeave, Lambda, etc. and NVIDIA’s DGX ecosystem — enabled through partnerships — are alternate choices for enterprises balancing cost, regional footprint and specific GPU availability. Nebius’s strategy of offering dedicated endpoints and regional zero‑retention inference is explicitly aimed at regulated industries and customers with data‑residency obligations.
Early customer claims and real‑world economics — take with a grain of salt
Nebius’s press materials and partner statements include notable claims: Prosus reportedly saw up to 26× cost reductions on certain workloads versus proprietary models; Higgsfield AI cites autoscaling and on‑demand economics as decisive; Hugging Face’s engineering leads are quoted as cooperating on developer access. These are persuasive case studies for marketing, but they are — by design — vendor‑provided testimonials and should be validated independently in any procurement process. Prosus’s cost reductions and similar multiplier claims tend to be highly workload‑specific (model sizes, request patterns, caching, and prompt engineering all change cost math dramatically). Enterprises evaluating Token Factory should ask for reproducible cost models, representative trial runs and a clear mapping between Nebius’s SLAs and their own operational observability and incident management expectations. Independent pilots remain the most reliable way to confirm claimed token‑per‑dollar improvements.Strengths — where Token Factory could matter for IT teams
- Model portability and escape routes. Open‑model support and OpenAI‑compatible endpoints make it easier to avoid long‑term dependence on a single proprietary API and give engineering teams the optionality to swap models as quality/economics change.
- Integrated lifecycle tooling. A single product that moves a model from fine‑tuning (LoRA or full) to optimized inference endpoints with governance reduces the engineering friction of building bespoke MLOps pipelines.
- High throughput hardware and MLPerf‑validated performance. Nebius’s MLPerf submissions and NVIDIA Exemplar Cloud participation indicate the provider can design and operate Blackwell‑class clusters and tune software to extract high token throughput. For large, latency‑sensitive services that demand predictable token throughput, that matters.
- Regional compliance options. Zero‑retention inference options and a distributed data‑center footprint help regulated customers meet data‑residency and audit requirements. Public filings and product announcements show Nebius’s explicit focus on local compliance in the EU, U.S. and Israel.
Risks, caveats and unanswered questions
1) Benchmarks vs. production behaviour
MLPerf and other synthetic benchmarks are useful reference points but do not guarantee performance on complex, multi‑tenant production workloads with variable prompt mixes, long‑context interactions or strict tail‑latency SLAs. Nebius’s own writing acknowledges this nuance; enterprises should require representative load tests and SLO verification before moving to mission‑critical deployments.2) Vendor statements require independent verification
Nebius’s claims about sub‑second latency at hundreds of millions of requests per minute and large multipliers in cost reduction come from company press materials and partner testimonials that are published alongside the product launch. Treat them as vendor‑provided performance claims until you’ve run your own pilot or obtained third‑party audits. In other words, Nebius’s statements are plausible — but verifiability matters.3) Operational maturity and support model
Large enterprises demand more than raw throughput: predictable change management, incident response, capacity reservations, predictable billing and legal terms that align with regulatory obligations. Nebius is young as a public, independent company (having reorganized from the Yandex era) and will need to demonstrate consistent operational maturity across multi‑region operations as customer scale grows. Public filings indicate an aggressive expansion and capital plan, but that growth itself brings operational risk.4) Geopolitics and supply constraints
Access to the latest accelerators, export controls and regional supply chain constraints remain real factors for any vendor working with Blackwell‑class GPUs. Nebius’s pledge to bring Blackwell capacity to customers is contingent on supply, manufacturing allocation and the broader geopolitics of advanced AI accelerators. This is especially salient for organizations with strict locality or export‑control obligations.5) Lock‑in risk in a different form
While Token Factory is explicitly marketed to reduce model lock‑in, adopting any managed inference platform creates new operational dependencies — notably data pipelines, metric collection, and governance workflows that are not always trivially portable. Contracts should include clear exit terms, data export guarantees (embeddings, vectors, logs), and tests for moving workloads off the platform if needed. Many vendor playbooks emphasize these negotiation points; enterprise buyers should insist on them.Practical checklist — what to evaluate in a Token Factory trial
- Run a representative, end‑to‑end pilot with your exact prompt mix and data residency needs. Measure 95th/99th percentile latencies, cold‑start behaviour and multi‑tenant interference.
- Validate the cost model: request a month‑long trial that includes realistic token volumes and the same caching/embedding strategies you plan to use. Compare per‑token cost across multiple query shapes.
- Confirm SLAs in writing and test failover: request a playbook for incident response, capacity reservation options and the escalation matrix for P1 incidents.
- Audit governance features: test RBAC, SSO, audit logs, and cross‑project billing. Ensure retention options meet your compliance needs (e.g., HIPAA, regional regulators).
- Verify performance claims in situ: MLPerf numbers are evidence of capacity, but only live trials reveal the true production fit. Ask for workload replay support and dedicated test windows.
What Token Factory means for the market
Nebius’s Token Factory launch formalizes an existing market movement: enterprises want the choice to run open models with the same production controls they expect from hyperscalers. That trend creates an opening for neoclouds and specialized inference platforms that can legitimately demonstrate better throughput economics and regional compliance than general‑purpose clouds. If Nebius can sustain performance, operational reliability and competitive pricing at scale, it will compound the industry’s shift toward multi‑model, multi‑vendor inference architectures. However, market share gains are not automatic. Hyperscalers remain deeply embedded in enterprise stacks and offer bundling advantages (data, storage, analytics and networking) that are hard to displace. Nebius’s path to differentiation is realistic — but it depends on consistent execution, transparent cost economics and the ability to convert pilot wins into multi‑year contracts without compromising margins. Public filings show Nebius is prioritizing customer growth and product expansion alongside capital investment in capacity; the tradeoffs between margin and scale will shape how the company competes in 2026 and beyond.Bottom line
Nebius Token Factory is a credible and well‑packaged attempt to give enterprises the best of two worlds: open‑model freedom and enterprise production guarantees. The product launch is backed by MLPerf submissions, NVIDIA ecosystem ties and a fast‑moving infrastructure expansion that together make the technical claims plausible. That said, many of the most important metrics — sustained latency at customer scales, real world cost per token across varying use cases, and contractual protections around data and exit — are buyer‑specific and should be validated through pilots and contractual negotiation. Enterprises evaluating Token Factory should treat Nebius’s announcements as a strong invitation to test, not an automatic replacement for hyperscaler contracts. For IT leaders building or buying inference platforms, the sensible next step is a short, rigorous proof‑of‑concept that exercises your worst‑case prompts and data‑residency constraints, validates Nebius’s billing model against your token distributions, and tests SLO behaviour under real load — only then will you be able to separate vendor marketing from operational reality.Source: Rolling Out Nebius takes on Amazon and Microsoft with new AI platform