OpenAI gpt-oss: Open-Weight Models for Advanced AI

Written by Vince Jankovics2025-08-06
facebooktwitterlinkedin
cover image

Overview

OpenAI announced their gpt-oss series that comprises two open-weight language models, gpt-oss-120b and gpt-oss-20b, released under the permissive Apache 2.0 license. They deliver strong real-world performance at low cost, outperform similarly sized open models on reasoning benchmarks, demonstrate robust tool-use capabilities, and are optimized for deployment on consumer-grade hardware.

Significance of the release

A milestone in open-weight AI

This is OpenAI’s first open-weight language model release since GPT 2, marking a return to broad accessibility for state-of-the-art reasoning models. By using a mixture-of-experts architecture, gpt-oss-120b and gpt-oss-20b deliver performance on par with OpenAI’s proprietary o-series mini models, yet remain fully open under Apache 2.0, bridging the gap between closed and open AI systems.

Democratizing advanced AI research and deployment

These models lower the barrier to entry for organizations and developers who lack large-scale infrastructure budgets. With gpt-oss-20b able to run on a single 16 GB GPU and gpt-oss-120b on an 80 GB GPU, researchers in emerging markets, academic labs, and startups can now experiment with cutting-edge reasoning capabilities without incurring prohibitive costs.

Setting new open-model safety and transparency standards

Safety evaluations for gpt-oss include adversarial fine-tuning under OpenAI’s Preparedness Framework and external expert review, which provides an unprecedented level of scrutiny for open-weight models. By publishing full chain-of-thought traces and safety results, OpenAI invites the community to audit, reproduce, and build upon these safeguards.

Strengthening the broader AI ecosystem

Open-weight release complements hosted API offerings, giving developers the freedom to choose between managed services and self-hosted deployments. Partnerships with hardware vendors (NVIDIA, AMD, Cerebras, Groq) and platform providers (Azure, Hugging Face, vLLM, Ollama, LM Studio) ensure these models can be seamlessly integrated into diverse workflows, accelerating innovation across research, industry, and government.

Performance

  • gpt-oss-120b achieves near parity with OpenAI o4-mini on core reasoning evaluations while running efficiently on a single 80 GB GPU. The model has 117 billion total parameters and activates 5.1 billion parameters per token.
  • gpt-oss-20b delivers performance comparable to OpenAI o3-mini on common benchmarks and can run on edge devices with just 16 GB of memory. It comprises 21 billion total parameters with 3.6 billion active parameters per token.

Both models excel at few-shot function calling, chain-of-thought reasoning, and agentic tool use across evaluations such as Tau-Bench and HealthBench.

License

The gpt-oss models are distributed under the Apache 2.0 license, allowing unrestricted commercial use, modification, and redistribution without copyleft obligations or patent risk. This makes them ideal for experimentation, customization, and production deployment.

Usage

For the complete usage guide visit the HuggingFace model documentation.

Install dependencies

pip install -U transformers kernels torch

Load the model with Hugging Face Transformers

from transformers import pipeline

model_id = "openai/gpt-oss-120b"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(messages, max_new_tokens=256)
print(outputs[0]["generated_text"])

Run an OpenAI-compatible server

transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b

For alternative inference setups, gpt-oss-120b supports vLLM, PyTorch/Triton backends, Ollama for consumer-hardware deployment, and LM Studio integrations.

facebooktwitterlinkedin