Skip to main content
ABV’s Playground and LLM-as-a-Judge evaluations need to call LLM APIs directly from the platform. Your observability SDK doesn’t provide these credentials—they’re separate connections configured in project settings for features that make LLM calls on your behalf.

How LLM Connections Work

LLM connections associate LLM provider API keys with your ABV project:

Add LLM connection to project

Navigate to Project Settings > LLM Connections and create a new connection. Provide:
  • Connection name: Friendly identifier (e.g., “OpenAI Production”, “Claude for Evals”)
  • Provider: OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, etc.
  • API key: Your provider’s API key for authentication
  • Base URL (optional): Custom endpoint for proxies or alternative hosts
  • Custom headers (optional): Additional headers for authentication or routing
Keys are encrypted at rest and never exposed in ABV’s UI after creation.

Select connection in Playground or evaluations

When using the Playground or creating LLM-as-a-Judge evaluators:
  • Select the LLM connection from a dropdown
  • Choose the model to use (connections show only supported models for that provider)
  • Configure model parameters (temperature, max tokens, etc.)
ABV uses the selected connection’s credentials to call the LLM API.

ABV makes API calls on your behalf

When you test in Playground or run evaluations:
  • ABV constructs the API request using the connection’s credentials
  • Calls the LLM provider’s API directly from ABV’s servers
  • Returns responses to the Playground UI or evaluation results
Important: API calls consume tokens from your provider account (billed by the provider), not ABV. Monitor usage in your provider’s dashboard.

Manage and rotate credentials

Update LLM connections anytime in Project Settings:
  • Rotate API keys when compromised or for security hygiene
  • Update base URLs for proxy configuration changes
  • Delete unused connections
  • Add connections for new providers
Changes take effect immediately—no code deployment required.

Setting Up LLM Connections

Navigate to project settings

Open the project where you want to configure LLM connections. Go to Project Settings > LLM Connections.

Add new connection

Click Add new LLM API key button.

Configure connection details

Connection name: Choose a descriptive name (e.g., “OpenAI GPT-4”, “Anthropic Claude for Evals”, “Azure OpenAI US East”)Provider: Select from supported providers:
  • OpenAI
  • Azure OpenAI
  • Anthropic
  • Google AI Studio
  • Google Vertex AI
  • Amazon Bedrock
  • Custom (for proxies using supported API schemas)
API Key: Paste your provider’s API keyAdvanced options (expand for additional configuration):
  • Base URL: Override default API endpoint (for proxies, custom deployments, regional endpoints)
  • Custom headers: Additional HTTP headers for authentication or routing (e.g., x-api-version, x-organization-id)
  • Custom model names: Add models not in ABV’s default list (for new models, fine-tuned models, or proxy-provided models)

Save connection

Click Save to create the connection. The API key is encrypted and stored securely.Security: After saving, the API key is never displayed again in the UI. You can update or delete the connection but cannot view the key.

Verify connection

Test the connection in the Playground:
  • Create a new prompt or open existing prompt
  • Select your newly created connection
  • Choose a model
  • Send a test request
Successful response confirms the connection works correctly.

Supported Providers and Models

ABV supports major LLM providers with extensive model coverage:
Supported models:
  • o3 series: o3, o3-2025-04-16
  • o4 series: o4-mini, o4-mini-2025-04-16
  • GPT-4.1 series: gpt-4.1, gpt-4.1-2025-04-14, gpt-4.1-mini-2025-04-14, gpt-4.1-nano-2025-04-14
  • GPT-4o series: gpt-4o, gpt-4o-2024-08-06, gpt-4o-2024-05-13, gpt-4o-mini, gpt-4o-mini-2024-07-18
  • o3-mini series: o3-mini, o3-mini-2025-01-31
  • o1 series: o1-preview, o1-preview-2024-09-12
  • GPT-4 Turbo: gpt-4-turbo-preview, gpt-4-1106-preview, gpt-4-0125-preview
  • GPT-4: gpt-4, gpt-4-0613
  • GPT-3.5 Turbo: gpt-3.5-turbo, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-16k
Configuration:
  • OpenAI: API key from OpenAI platform, default base URL
  • Azure OpenAI: API key from Azure, custom base URL pointing to Azure endpoint, API version in custom headers
Use cases: Playground testing, LLM-as-a-Judge evaluations, function calling evaluations
Supported models:
  • Claude Opus 4.1: claude-opus-4-1
  • Claude Opus 4.0: claude-opus-4-0
  • Claude Sonnet 4.5: claude-sonnet-4-5
  • Claude Sonnet 4.0: claude-sonnet-4-0
  • Claude Haiku 4.5: claude-haiku-4-5
Configuration: API key from Anthropic Console, default base URLUse cases: High-quality LLM-as-a-Judge evaluations, complex reasoning evaluations, safety assessments
Supported models:
  • Gemini 2.5: gemini-2.5-pro-exp-03-25
  • Gemini 2.0: gemini-2.0-pro-exp-02-05, gemini-2.0-flash-001, gemini-2.0-flash-lite-preview-02-05, gemini-2.0-flash-exp
  • Gemini 1.5: gemini-1.5-pro, gemini-1.5-flash
  • Gemini 1.0: gemini-1.0-pro
Configuration:
  • API Key: Service account key JSON from Google Cloud
  • Project ID: GCP project ID
  • Region: GCP region (e.g., us-central1, europe-west1)
  • Custom models: Add additional model names enabled in your GCP account via “Custom model names” field
Use cases: Multimodal evaluations (image + text), cost-effective evaluations (Flash models), enterprise GCP integrations
Supported models:
  • Gemini 2.5: gemini-2.5-pro-exp-03-25
  • Gemini 2.0: gemini-2.0-pro-exp-02-05, gemini-2.0-flash-001, gemini-2.0-flash-lite-preview-02-05, gemini-2.0-flash-exp
  • Gemini 1.5: gemini-1.5-pro, gemini-1.5-flash
  • Gemini 1.0: gemini-1.0-pro
Configuration: API key from Google AI Studio (simpler setup than Vertex AI for non-enterprise use)Use cases: Quick Playground testing, personal projects, development/staging evaluations
Supported models: All Amazon Bedrock models including:
  • Anthropic Claude models via Bedrock
  • Amazon Titan models
  • Cohere models
  • Meta Llama models
  • Mistral models
  • Stability AI models
Configuration:
  • AWS Access Key ID and Secret Access Key: IAM credentials
  • AWS Region: Region where Bedrock is enabled
  • Required IAM Permission: bedrock:InvokeModel
Use cases: Enterprise AWS environments, compliance-driven deployments, Bedrock-specific modelsNote: Add model names via “Custom model names” field when configuring the connection
Supported providers (via OpenAI adapter):
  • Groq
  • OpenRouter
  • Vercel AI Gateway
  • LiteLLM
  • Hugging Face (OpenAI-compatible endpoints)
  • Mistral AI (OpenAI-compatible API)
  • Any proxy implementing OpenAI’s API schema
Configuration:
  • Provider: Select “OpenAI”
  • Base URL: Set to proxy’s endpoint (e.g., https://api.groq.com/openai/v1 for Groq)
  • API Key: Proxy provider’s API key
  • Custom model names: Add supported models (e.g., llama-3.1-70b-versatile for Groq)
  • Custom headers: If proxy requires additional headers
Requirements: Proxy must support:
  • OpenAI chat completions API format
  • Tool calling (for LLM-as-a-Judge evaluations)
See example tool calling request below

Advanced Configuration

Provider-Specific Options

Many LLM providers support parameters beyond standard model configuration (temperature, max_tokens, top_p). Examples include reasoning_effort, service_tier, and response_format.
Provider options are available when using the Playground or configuring LLM-as-a-Judge evaluators.
How to use:

Open model parameters

In the Playground or evaluator configuration, expand the model parameters section.

Find provider options field

Scroll to the bottom of model parameters to find the Provider options field.

Add JSON configuration

Enter a JSON object with provider-specific parameters:
{
  "reasoning_effort": "minimal",
  "service_tier": "auto"
}
These key-value pairs are passed directly to the provider’s API alongside standard parameters.
Supported adapters: Example: Force reasoning_effort: minimal on OpenAI o3 model calls. Use cases:
  • Control reasoning depth for o-series models (minimal, medium, high)
  • Set service tier for OpenAI (auto, default)
  • Configure response format for structured outputs
  • Pass custom parameters to fine-tuned or custom models

Connecting via Proxies

Use LLM proxies to route Playground and LLM-as-a-Judge calls through centralized gateways for logging, rate limiting, cost management, or compliance.

Create LLM connection

Navigate to Project Settings > LLM Connections > Add new LLM API key.

Configure for proxy

  • Provider: Select the provider whose API schema your proxy implements (typically OpenAI for OpenAI-compatible proxies)
  • Base URL: Set to your proxy’s endpoint (e.g., https://proxy.company.com/v1)
  • API Key: Your proxy’s authentication token (or pass-through key for the underlying provider)
  • Custom headers (if needed): Add headers required by your proxy (e.g., x-tenant-id, x-route-to-region)
  • Custom model names: Add models available through your proxy

Verify tool calling support

For LLM-as-a-Judge evaluations, your proxy must support tool calling in the OpenAI API format.Test with this sample request (replace placeholders with your proxy’s values):
curl -X POST 'https://<your-proxy-host>/chat/completions' \
-H 'accept: application/json' \
-H 'content-type: application/json' \
-H 'authorization: Bearer <your-api-key>' \
-H 'x-custom-header: <custom-header-value>' \
-d '{
  "model": "<model-name>",
  "temperature": 0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "extract",
        "parameters": {
          "type": "object",
          "properties": {
            "score": {"type": "string"},
            "reasoning": {"type": "string"}
          },
          "required": ["score", "reasoning"]
        }
      }
    }
  ],
  "tool_choice": {"type": "function", "function": {"name": "extract"}},
  "messages": [
    {"role": "user", "content": "Evaluate the correctness: ..."}
  ]
}'
Successful response with tool call indicates compatibility.

Test in Playground

Open Playground, select the proxy connection, and send test requests. Verify responses match expected behavior.
Common proxy use cases:
  • LiteLLM: Unified interface to 100+ LLM providers
  • OpenRouter: Access to multiple providers through a single API
  • Vercel AI Gateway: Caching, rate limiting, and observability for LLM calls
  • Corporate proxies: Centralized logging, compliance, and cost control
  • Regional routing: Route requests to geographically appropriate endpoints

Custom Model Names

ABV includes default model lists for supported providers, but providers constantly release new models, organizations use fine-tuned models, and proxies expose custom model identifiers. How to add custom models:

Expand advanced options

When creating or editing an LLM connection, expand the Advanced options section.

Add custom model names

In the Custom model names field, enter model identifiers (one per line or comma-separated):
my-fine-tuned-gpt-4
llama-3.1-70b-versatile
gemini-pro-exp-0209

Save and use

After saving, custom model names appear in the model dropdown alongside default models when using this connection.
Use cases:
  • Newly released models not yet in ABV’s default list
  • Fine-tuned models from OpenAI, Azure, or other providers
  • Custom models deployed on Vertex AI or Bedrock
  • Proxy-provided models with non-standard names

Security Best Practices

Create dedicated API keys for ABV connections rather than reusing keys from production applications.Benefits:
  • Easier rotation (disconnect doesn’t affect production)
  • Usage tracking (identify ABV-specific usage in provider dashboards)
  • Blast radius containment (compromised key doesn’t expose production systems)
Pattern: Create provider API keys named “ABV-Playground-Production”, “ABV-Evals-Staging”, etc.
Configure provider API keys with minimal permissions necessary for Playground and evaluations.OpenAI: Keys only need chat completion access (not fine-tuning, assistants, file management)AWS Bedrock: IAM role only needs bedrock:InvokeModel permission (not model management, provisioned throughput)Google Vertex AI: Service account only needs Vertex AI User role (not Vertex AI Admin)Benefit: Limits damage if keys are compromised.
Rotate LLM connection API keys periodically for security hygiene:Recommended rotation schedule:
  • Every 90 days for production projects
  • Immediately if key compromise suspected
  • After team member departures (if they had access)
Process:
  1. Generate new API key in provider console
  2. Update ABV LLM connection with new key
  3. Verify Playground and evaluations still work
  4. Revoke old key in provider console
Benefit: ABV’s LLM connections make rotation painless—no code changes required.
ABV’s Playground and evaluation calls consume tokens from your provider account (billed by the provider).Monitoring:
  • Check provider usage dashboards regularly (OpenAI Usage, AWS Cost Explorer, Google Cloud billing)
  • Set usage alerts in provider consoles
  • Identify unexpected usage spikes (runaway evaluations, excessive Playground testing)
Cost control:
  • Use cheaper models for development/testing (gpt-3.5-turbo, gemini-flash, claude-haiku)
  • Reserve expensive models (gpt-4o, claude-opus) for critical evaluations
  • Set rate limits or spending caps in provider consoles
Only users with appropriate permissions should create or modify LLM connections (requires Admin or Owner roles).Access control:
  • Limit Owner/Admin roles to trusted team members
  • Use Viewer or Member roles for users who only need to use existing connections
  • Review permissions quarterly
Learn more about RBAC →

Troubleshooting

Symptoms: Playground requests fail, evaluations can’t run, error messages about authentication or connection.Possible causes:
  1. Invalid API key: Key revoked, expired, or entered incorrectly
  2. Incorrect base URL: Typo in custom endpoint, wrong region
  3. Provider service issues: OpenAI, Anthropic, or other provider experiencing outages
  4. Rate limiting: Exceeded provider’s rate limits for your API key
  5. Insufficient permissions: API key lacks necessary permissions (e.g., Bedrock key without InvokeModel permission)
Resolution:
  • Verify API key is valid in provider console
  • Check base URL matches provider documentation
  • Review provider status pages for outages
  • Check provider usage/rate limit dashboards
  • Verify API key permissions (IAM role for Bedrock, service account role for Vertex AI)
Symptoms: Desired model doesn’t appear when selecting from model dropdown.Possible causes:
  1. Model not in default list: Newly released model or provider-specific model
  2. Wrong provider selected: Model belongs to different provider than selected connection
  3. Regional restrictions: Model not available in configured region (Vertex AI, Bedrock)
  4. Account limitations: Model access not enabled in your provider account
Resolution:
  • Add model to “Custom model names” field in connection configuration
  • Verify connection provider matches model’s provider
  • Check model availability in your configured region
  • Enable model access in provider console (Bedrock requires per-model enablement)
Symptoms: Evaluations fail with errors about tool calling, function calling, or JSON schema.Possible causes:
  1. Model doesn’t support tool calling: Older models or certain providers lack function calling support
  2. Proxy doesn’t support tool calling: Custom proxy doesn’t implement OpenAI’s tool calling format
  3. Incorrect API format: Base URL points to non-compatible endpoint
Resolution:
  • Use models that support tool calling (gpt-4o, gpt-3.5-turbo-1106+, claude-4+, gemini-pro+)
  • Verify proxy implements OpenAI tool calling format (test with sample request above)
  • For Playground testing only (not evaluations), use any model—tool calling only required for LLM-as-a-Judge
Symptoms: Provider bills higher than expected despite minimal application usage.Possible causes:
  1. Extensive Playground testing: Manual testing consuming significant tokens
  2. Large-scale evaluations: Running evaluations on thousands of examples
  3. Expensive model selection: Using GPT-4o or Claude Opus for evaluations instead of cheaper alternatives
  4. Runaway evaluation jobs: Evaluations running longer than expected
Resolution:
  • Review provider usage dashboards to identify ABV-specific usage
  • Use cheaper models for development (gpt-3.5-turbo, claude-haiku, gemini-flash)
  • Monitor running evaluations and cancel if unexpectedly long
  • Set provider-side spending limits or rate limits

Related Topics