Don’t see your model? We’re always adding new models based on user needs. Contact our support team to request support for additional models.
- OpenAI
- Anthropic
- Google Gemini
GPT-5 Series
| Model | Input Price | Output Price |
|---|---|---|
gpt-5 | $0.00125 per 1K tokens | $0.01 per 1K tokens |
gpt-5-mini | $0.00025 per 1K tokens | $0.002 per 1K tokens |
gpt-5-nano | $0.00005 per 1K tokens | $0.0004 per 1K tokens |
gpt-5-pro | ChatGPT Pro subscription | ($200/month) |
GPT-4 Series
| Model | Input Price | Output Price |
|---|---|---|
gpt-4 | $0.03 per 1K tokens | $0.06 per 1K tokens |
gpt-4-turbo | $0.01 per 1K tokens | $0.03 per 1K tokens |
gpt-4o | $0.0025 per 1K tokens | $0.01 per 1K tokens |
gpt-4o-mini | $0.00015 per 1K tokens | $0.0006 per 1K tokens |
gpt-4.1 | $0.002 per 1K tokens | $0.008 per 1K tokens |
gpt-4.1-mini | $0.0004 per 1K tokens | $0.0016 per 1K tokens |
gpt-4.1-nano | $0.0001 per 1K tokens | $0.0004 per 1K tokens |
O-Series (Reasoning Models)
| Model | Input Price | Output Price |
|---|---|---|
o1 | $0.015 per 1K tokens | $0.06 per 1K tokens |
o1-pro | $0.15 per 1K tokens | $0.60 per 1K tokens |
o3 | $0.002 per 1K tokens | $0.008 per 1K tokens |
o3-mini | $0.0011 per 1K tokens | $0.0044 per 1K tokens |
o4-mini | $0.0011 per 1K tokens | $0.0044 per 1K tokens |
Specialized Models
| Model | Input Price | Output Price |
|---|---|---|
codex-mini-latest | $0.0015 per 1K tokens | $0.006 per 1K tokens |
gpt-4o-mini-search-preview | gpt-4o-mini pricing | + web search fees |
gpt-4o-search-preview | gpt-4o pricing | + web search fees |
Understanding Pricing
Why Input and Output Pricing Differs
Why Input and Output Pricing Differs
Input and output pricing differs because generation requires more computation than processing. The model must read and understand input tokens relatively quickly, but generating output tokens requires iterative computation and reasoning. Applications with long outputs benefit from models with lower output costs, while applications processing large inputs should consider input pricing carefully.
Choosing Cost-Effective Models
Choosing Cost-Effective Models
Select models based on your specific use case:
- High-volume, simple tasks: Use the most cost-effective models like
gpt-4o-mini,claude-haiku-4-5, orgemini-2.5-flash-lite - Complex reasoning: Consider
claude-sonnet-4-5,gpt-4o, or O-series models for tasks requiring careful analysis - Long outputs: Prioritize models with lower output token costs like Gemini Flash models
- Large context inputs: Look at input pricing—Gemini models often offer the best value for processing large documents
Monitoring Costs in Production
Monitoring Costs in Production
The gateway automatically tracks token usage and costs for every request. Use the ABV dashboard to:
- Monitor real-time spending across providers and models
- Identify expensive requests or usage patterns
- Compare costs between different providers for the same task
- Set up alerts when spending exceeds thresholds
Next Steps
Ready to start using these models? Here’s where to go next:Quickstart
Get up and running with your first gateway request in 5 minutes
TypeScript Guide
Learn how to use these models in TypeScript/JavaScript applications
Python Guide
Learn how to use these models in Python applications
LLM Gateway Overview
Understand the core concepts and architecture of the gateway