Overview
OpenAI announced their gpt-oss series that comprises two open-weight language models, gpt-oss-120b and gpt-oss-20b, released under the permissive Apache 2.0 license. They deliver strong real-world performance at low cost, outperform similarly sized open models on reasoning benchmarks, demonstrate robust tool-use capabilities, and are optimized for deployment on consumer-grade hardware.
Significance of the release
A milestone in open-weight AI
This is OpenAI’s first open-weight language model release since GPT 2, marking a return to broad accessibility for state-of-the-art reasoning models. By using a mixture-of-experts architecture, gpt-oss-120b and gpt-oss-20b deliver performance on par with OpenAI’s proprietary o-series mini models, yet remain fully open under Apache 2.0, bridging the gap between closed and open AI systems.
Democratizing advanced AI research and deployment
These models lower the barrier to entry for organizations and developers who lack large-scale infrastructure budgets. With gpt-oss-20b able to run on a single 16 GB GPU and gpt-oss-120b on an 80 GB GPU, researchers in emerging markets, academic labs, and startups can now experiment with cutting-edge reasoning capabilities without incurring prohibitive costs.
Setting new open-model safety and transparency standards
Safety evaluations for gpt-oss include adversarial fine-tuning under OpenAI’s Preparedness Framework and external expert review, which provides an unprecedented level of scrutiny for open-weight models. By publishing full chain-of-thought traces and safety results, OpenAI invites the community to audit, reproduce, and build upon these safeguards.
Strengthening the broader AI ecosystem
Open-weight release complements hosted API offerings, giving developers the freedom to choose between managed services and self-hosted deployments. Partnerships with hardware vendors (NVIDIA, AMD, Cerebras, Groq) and platform providers (Azure, Hugging Face, vLLM, Ollama, LM Studio) ensure these models can be seamlessly integrated into diverse workflows, accelerating innovation across research, industry, and government.
Performance
- gpt-oss-120b achieves near parity with OpenAI o4-mini on core reasoning evaluations while running efficiently on a single 80 GB GPU. The model has 117 billion total parameters and activates 5.1 billion parameters per token.
- gpt-oss-20b delivers performance comparable to OpenAI o3-mini on common benchmarks and can run on edge devices with just 16 GB of memory. It comprises 21 billion total parameters with 3.6 billion active parameters per token.
Both models excel at few-shot function calling, chain-of-thought reasoning, and agentic tool use across evaluations such as Tau-Bench and HealthBench.
License
The gpt-oss models are distributed under the Apache 2.0 license, allowing unrestricted commercial use, modification, and redistribution without copyleft obligations or patent risk. This makes them ideal for experimentation, customization, and production deployment.
Usage
For the complete usage guide visit the HuggingFace model documentation.
Install dependencies
pip install -U transformers kernels torch
Load the model with Hugging Face Transformers
from transformers import pipeline model_id = "openai/gpt-oss-120b" pipe = pipeline( "text-generation", model=model_id, torch_dtype="auto", device_map="auto", ) messages = [ {"role": "user", "content": "Explain quantum mechanics clearly and concisely."}, ] outputs = pipe(messages, max_new_tokens=256) print(outputs[0]["generated_text"])
Run an OpenAI-compatible server
transformers serve transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
For alternative inference setups, gpt-oss-120b supports vLLM, PyTorch/Triton backends, Ollama for consumer-hardware deployment, and LM Studio integrations.