Skip to main content
The LLM Gateway supports a comprehensive range of models from OpenAI, Anthropic, and Google Gemini. For the latest official model information and pricing, see:
Don’t see your model? We’re always adding new models based on user needs. Contact our support team to request support for additional models.

GPT-5 Series

ModelInput PriceOutput Price
gpt-5$0.00125 per 1K tokens$0.01 per 1K tokens
gpt-5-mini$0.00025 per 1K tokens$0.002 per 1K tokens
gpt-5-nano$0.00005 per 1K tokens$0.0004 per 1K tokens
gpt-5-proChatGPT Pro subscription($200/month)

GPT-4 Series

ModelInput PriceOutput Price
gpt-4$0.03 per 1K tokens$0.06 per 1K tokens
gpt-4-turbo$0.01 per 1K tokens$0.03 per 1K tokens
gpt-4o$0.0025 per 1K tokens$0.01 per 1K tokens
gpt-4o-mini$0.00015 per 1K tokens$0.0006 per 1K tokens
gpt-4.1$0.002 per 1K tokens$0.008 per 1K tokens
gpt-4.1-mini$0.0004 per 1K tokens$0.0016 per 1K tokens
gpt-4.1-nano$0.0001 per 1K tokens$0.0004 per 1K tokens

O-Series (Reasoning Models)

ModelInput PriceOutput Price
o1$0.015 per 1K tokens$0.06 per 1K tokens
o1-pro$0.15 per 1K tokens$0.60 per 1K tokens
o3$0.002 per 1K tokens$0.008 per 1K tokens
o3-mini$0.0011 per 1K tokens$0.0044 per 1K tokens
o4-mini$0.0011 per 1K tokens$0.0044 per 1K tokens

Specialized Models

ModelInput PriceOutput Price
codex-mini-latest$0.0015 per 1K tokens$0.006 per 1K tokens
gpt-4o-mini-search-previewgpt-4o-mini pricing+ web search fees
gpt-4o-search-previewgpt-4o pricing+ web search fees

Understanding Pricing

Input and output pricing differs because generation requires more computation than processing. The model must read and understand input tokens relatively quickly, but generating output tokens requires iterative computation and reasoning. Applications with long outputs benefit from models with lower output costs, while applications processing large inputs should consider input pricing carefully.
Select models based on your specific use case:
  • High-volume, simple tasks: Use the most cost-effective models like gpt-4o-mini, claude-haiku-4-5, or gemini-2.5-flash-lite
  • Complex reasoning: Consider claude-sonnet-4-5, gpt-4o, or O-series models for tasks requiring careful analysis
  • Long outputs: Prioritize models with lower output token costs like Gemini Flash models
  • Large context inputs: Look at input pricing—Gemini models often offer the best value for processing large documents
The gateway automatically tracks token usage and costs for every request. Use the ABV dashboard to:
  • Monitor real-time spending across providers and models
  • Identify expensive requests or usage patterns
  • Compare costs between different providers for the same task
  • Set up alerts when spending exceeds thresholds
This visibility helps you optimize costs while maintaining the quality your application needs.

Next Steps

Ready to start using these models? Here’s where to go next: