ABV’s Playground and LLM-as-a-Judge evaluations need to call LLM APIs directly from the platform. Your observability SDK doesn’t provide these credentials—they’re separate connections configured in project settings for features that make LLM calls on your behalf.

How LLM Connections Work

LLM connections associate LLM provider API keys with your ABV project:

Add LLM connection to project

Navigate to Project Settings > LLM Connections and create a new connection. Provide:

Connection name: Friendly identifier (e.g., “OpenAI Production”, “Claude for Evals”)
Provider: OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, etc.
API key: Your provider’s API key for authentication
Base URL (optional): Custom endpoint for proxies or alternative hosts
Custom headers (optional): Additional headers for authentication or routing

Keys are encrypted at rest and never exposed in ABV’s UI after creation.

Select connection in Playground or evaluations

When using the Playground or creating LLM-as-a-Judge evaluators:

Select the LLM connection from a dropdown
Choose the model to use (connections show only supported models for that provider)
Configure model parameters (temperature, max tokens, etc.)

ABV uses the selected connection’s credentials to call the LLM API.

ABV makes API calls on your behalf

When you test in Playground or run evaluations:

ABV constructs the API request using the connection’s credentials
Calls the LLM provider’s API directly from ABV’s servers
Returns responses to the Playground UI or evaluation results

Important: API calls consume tokens from your provider account (billed by the provider), not ABV. Monitor usage in your provider’s dashboard.

Manage and rotate credentials

Update LLM connections anytime in Project Settings:

Rotate API keys when compromised or for security hygiene
Update base URLs for proxy configuration changes
Delete unused connections
Add connections for new providers

Changes take effect immediately—no code deployment required.

Setting Up LLM Connections

Navigate to project settings

Open the project where you want to configure LLM connections. Go to Project Settings > LLM Connections.

Add new connection

Click Add new LLM API key button.

Configure connection details

Connection name: Choose a descriptive name (e.g., “OpenAI GPT-4”, “Anthropic Claude for Evals”, “Azure OpenAI US East”)Provider: Select from supported providers:

OpenAI
Azure OpenAI
Anthropic
Google AI Studio
Google Vertex AI
Amazon Bedrock
Custom (for proxies using supported API schemas)

API Key: Paste your provider’s API keyAdvanced options (expand for additional configuration):

Base URL: Override default API endpoint (for proxies, custom deployments, regional endpoints)
Custom headers: Additional HTTP headers for authentication or routing (e.g., x-api-version, x-organization-id)
Custom model names: Add models not in ABV’s default list (for new models, fine-tuned models, or proxy-provided models)

Save connection

Click Save to create the connection. The API key is encrypted and stored securely.Security: After saving, the API key is never displayed again in the UI. You can update or delete the connection but cannot view the key.

Verify connection

Test the connection in the Playground:

Create a new prompt or open existing prompt
Select your newly created connection
Choose a model
Send a test request

Successful response confirms the connection works correctly.

Supported Providers and Models

ABV supports major LLM providers with extensive model coverage:

OpenAI & Azure OpenAI

Supported models:

o3 series: o3, o3-2025-04-16
o4 series: o4-mini, o4-mini-2025-04-16
GPT-4.1 series: gpt-4.1, gpt-4.1-2025-04-14, gpt-4.1-mini-2025-04-14, gpt-4.1-nano-2025-04-14
GPT-4o series: gpt-4o, gpt-4o-2024-08-06, gpt-4o-2024-05-13, gpt-4o-mini, gpt-4o-mini-2024-07-18
o3-mini series: o3-mini, o3-mini-2025-01-31
o1 series: o1-preview, o1-preview-2024-09-12
GPT-4 Turbo: gpt-4-turbo-preview, gpt-4-1106-preview, gpt-4-0125-preview
GPT-4: gpt-4, gpt-4-0613
GPT-3.5 Turbo: gpt-3.5-turbo, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-16k

Configuration:

OpenAI: API key from OpenAI platform, default base URL
Azure OpenAI: API key from Azure, custom base URL pointing to Azure endpoint, API version in custom headers

Use cases: Playground testing, LLM-as-a-Judge evaluations, function calling evaluations

Anthropic

Supported models:

Claude Opus 4.1: claude-opus-4-1
Claude Opus 4.0: claude-opus-4-0
Claude Sonnet 4.5: claude-sonnet-4-5
Claude Sonnet 4.0: claude-sonnet-4-0
Claude Haiku 4.5: claude-haiku-4-5

Configuration: API key from Anthropic Console, default base URLUse cases: High-quality LLM-as-a-Judge evaluations, complex reasoning evaluations, safety assessments

Google Vertex AI

Supported models:

Gemini 2.5: gemini-2.5-pro-exp-03-25
Gemini 2.0: gemini-2.0-pro-exp-02-05, gemini-2.0-flash-001, gemini-2.0-flash-lite-preview-02-05, gemini-2.0-flash-exp
Gemini 1.5: gemini-1.5-pro, gemini-1.5-flash
Gemini 1.0: gemini-1.0-pro

Configuration:

API Key: Service account key JSON from Google Cloud
Project ID: GCP project ID
Region: GCP region (e.g., us-central1, europe-west1)
Custom models: Add additional model names enabled in your GCP account via “Custom model names” field

Use cases: Multimodal evaluations (image + text), cost-effective evaluations (Flash models), enterprise GCP integrations

Google AI Studio

Supported models:

Gemini 2.5: gemini-2.5-pro-exp-03-25
Gemini 2.0: gemini-2.0-pro-exp-02-05, gemini-2.0-flash-001, gemini-2.0-flash-lite-preview-02-05, gemini-2.0-flash-exp
Gemini 1.5: gemini-1.5-pro, gemini-1.5-flash
Gemini 1.0: gemini-1.0-pro

Configuration: API key from Google AI Studio (simpler setup than Vertex AI for non-enterprise use)Use cases: Quick Playground testing, personal projects, development/staging evaluations

Amazon Bedrock

Supported models: All Amazon Bedrock models including:

Anthropic Claude models via Bedrock
Amazon Titan models
Cohere models
Meta Llama models
Mistral models
Stability AI models

Configuration:

AWS Access Key ID and Secret Access Key: IAM credentials
AWS Region: Region where Bedrock is enabled
Required IAM Permission: bedrock:InvokeModel

Use cases: Enterprise AWS environments, compliance-driven deployments, Bedrock-specific modelsNote: Add model names via “Custom model names” field when configuring the connection

OpenAI-Compatible Proxies and Providers

Supported providers (via OpenAI adapter):

Groq
OpenRouter
Vercel AI Gateway
LiteLLM
Hugging Face (OpenAI-compatible endpoints)
Mistral AI (OpenAI-compatible API)
Any proxy implementing OpenAI’s API schema

Configuration:

Provider: Select “OpenAI”
Base URL: Set to proxy’s endpoint (e.g., https://api.groq.com/openai/v1 for Groq)
API Key: Proxy provider’s API key
Custom model names: Add supported models (e.g., llama-3.1-70b-versatile for Groq)
Custom headers: If proxy requires additional headers

Requirements: Proxy must support:

OpenAI chat completions API format
Tool calling (for LLM-as-a-Judge evaluations)

See example tool calling request below

Advanced Configuration

Provider-Specific Options

Many LLM providers support parameters beyond standard model configuration (temperature, max_tokens, top_p). Examples include reasoning_effort, service_tier, and response_format.

Provider options are available when using the Playground or configuring LLM-as-a-Judge evaluators.

How to use:

Open model parameters

In the Playground or evaluator configuration, expand the model parameters section.

Find provider options field

Scroll to the bottom of model parameters to find the Provider options field.

Add JSON configuration

Enter a JSON object with provider-specific parameters:

{
  "reasoning_effort": "minimal",
  "service_tier": "auto"
}

These key-value pairs are passed directly to the provider’s API alongside standard parameters.

Supported adapters:

OpenAI: Chat Completions API Reference
Anthropic: Messages API Reference
Amazon Bedrock: Provider-specific parameters for Bedrock models

Example: Force reasoning_effort: minimal on OpenAI o3 model calls. Use cases:

Control reasoning depth for o-series models (minimal, medium, high)
Set service tier for OpenAI (auto, default)
Configure response format for structured outputs
Pass custom parameters to fine-tuned or custom models

Connecting via Proxies

Use LLM proxies to route Playground and LLM-as-a-Judge calls through centralized gateways for logging, rate limiting, cost management, or compliance.

Create LLM connection

Navigate to Project Settings > LLM Connections > Add new LLM API key.

Configure for proxy

Provider: Select the provider whose API schema your proxy implements (typically OpenAI for OpenAI-compatible proxies)
Base URL: Set to your proxy’s endpoint (e.g., https://proxy.company.com/v1)
API Key: Your proxy’s authentication token (or pass-through key for the underlying provider)
Custom headers (if needed): Add headers required by your proxy (e.g., x-tenant-id, x-route-to-region)
Custom model names: Add models available through your proxy

Verify tool calling support

For LLM-as-a-Judge evaluations, your proxy must support tool calling in the OpenAI API format.Test with this sample request (replace placeholders with your proxy’s values):

curl -X POST 'https://<your-proxy-host>/chat/completions' \
-H 'accept: application/json' \
-H 'content-type: application/json' \
-H 'authorization: Bearer <your-api-key>' \
-H 'x-custom-header: <custom-header-value>' \
-d '{
  "model": "<model-name>",
  "temperature": 0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "extract",
        "parameters": {
          "type": "object",
          "properties": {
            "score": {"type": "string"},
            "reasoning": {"type": "string"}
          },
          "required": ["score", "reasoning"]
        }
      }
    }
  ],
  "tool_choice": {"type": "function", "function": {"name": "extract"}},
  "messages": [
    {"role": "user", "content": "Evaluate the correctness: ..."}
  ]
}'

Successful response with tool call indicates compatibility.

Test in Playground

Open Playground, select the proxy connection, and send test requests. Verify responses match expected behavior.

Common proxy use cases:

LiteLLM: Unified interface to 100+ LLM providers
OpenRouter: Access to multiple providers through a single API
Vercel AI Gateway: Caching, rate limiting, and observability for LLM calls
Corporate proxies: Centralized logging, compliance, and cost control
Regional routing: Route requests to geographically appropriate endpoints

Custom Model Names

ABV includes default model lists for supported providers, but providers constantly release new models, organizations use fine-tuned models, and proxies expose custom model identifiers. How to add custom models:

Expand advanced options

When creating or editing an LLM connection, expand the Advanced options section.

Add custom model names

In the Custom model names field, enter model identifiers (one per line or comma-separated):

my-fine-tuned-gpt-4
llama-3.1-70b-versatile
gemini-pro-exp-0209

Save and use

After saving, custom model names appear in the model dropdown alongside default models when using this connection.

Use cases:

Newly released models not yet in ABV’s default list
Fine-tuned models from OpenAI, Azure, or other providers
Custom models deployed on Vertex AI or Bedrock
Proxy-provided models with non-standard names

Security Best Practices

Use Separate Keys for ABV

Create dedicated API keys for ABV connections rather than reusing keys from production applications.Benefits:

Easier rotation (disconnect doesn’t affect production)
Usage tracking (identify ABV-specific usage in provider dashboards)
Blast radius containment (compromised key doesn’t expose production systems)

Pattern: Create provider API keys named “ABV-Playground-Production”, “ABV-Evals-Staging”, etc.

Implement Least-Privilege Access

Configure provider API keys with minimal permissions necessary for Playground and evaluations.OpenAI: Keys only need chat completion access (not fine-tuning, assistants, file management)AWS Bedrock: IAM role only needs bedrock:InvokeModel permission (not model management, provisioned throughput)Google Vertex AI: Service account only needs Vertex AI User role (not Vertex AI Admin)Benefit: Limits damage if keys are compromised.

Rotate Keys Regularly

Rotate LLM connection API keys periodically for security hygiene:Recommended rotation schedule:

Every 90 days for production projects
Immediately if key compromise suspected
After team member departures (if they had access)

Process:

Generate new API key in provider console
Update ABV LLM connection with new key
Verify Playground and evaluations still work
Revoke old key in provider console

Benefit: ABV’s LLM connections make rotation painless—no code changes required.

Monitor Usage in Provider Dashboards

ABV’s Playground and evaluation calls consume tokens from your provider account (billed by the provider).Monitoring:

Check provider usage dashboards regularly (OpenAI Usage, AWS Cost Explorer, Google Cloud billing)
Set usage alerts in provider consoles
Identify unexpected usage spikes (runaway evaluations, excessive Playground testing)

Cost control:

Use cheaper models for development/testing (gpt-3.5-turbo, gemini-flash, claude-haiku)
Reserve expensive models (gpt-4o, claude-opus) for critical evaluations
Set rate limits or spending caps in provider consoles

Restrict Project Access with RBAC

Only users with appropriate permissions should create or modify LLM connections (requires Admin or Owner roles).Access control:

Limit Owner/Admin roles to trusted team members
Use Viewer or Member roles for users who only need to use existing connections
Review permissions quarterly

Learn more about RBAC →

Troubleshooting

Connection Fails or Returns Errors

Symptoms: Playground requests fail, evaluations can’t run, error messages about authentication or connection.Possible causes:

Invalid API key: Key revoked, expired, or entered incorrectly
Incorrect base URL: Typo in custom endpoint, wrong region
Provider service issues: OpenAI, Anthropic, or other provider experiencing outages
Rate limiting: Exceeded provider’s rate limits for your API key
Insufficient permissions: API key lacks necessary permissions (e.g., Bedrock key without InvokeModel permission)

Resolution:

Verify API key is valid in provider console
Check base URL matches provider documentation
Review provider status pages for outages
Check provider usage/rate limit dashboards
Verify API key permissions (IAM role for Bedrock, service account role for Vertex AI)

Model Not Available in Dropdown

LLM-as-a-Judge Evaluations Fail with Tool Calling Errors

Symptoms: Evaluations fail with errors about tool calling, function calling, or JSON schema.Possible causes:

Model doesn’t support tool calling: Older models or certain providers lack function calling support
Proxy doesn’t support tool calling: Custom proxy doesn’t implement OpenAI’s tool calling format
Incorrect API format: Base URL points to non-compatible endpoint

Resolution:

Use models that support tool calling (gpt-4o, gpt-3.5-turbo-1106+, claude-4+, gemini-pro+)
Verify proxy implements OpenAI tool calling format (test with sample request above)
For Playground testing only (not evaluations), use any model—tool calling only required for LLM-as-a-Judge

Unexpected Costs in Provider Billing

Symptoms: Provider bills higher than expected despite minimal application usage.Possible causes:

Extensive Playground testing: Manual testing consuming significant tokens
Large-scale evaluations: Running evaluations on thousands of examples
Expensive model selection: Using GPT-4o or Claude Opus for evaluations instead of cheaper alternatives
Runaway evaluation jobs: Evaluations running longer than expected

Resolution:

Review provider usage dashboards to identify ABV-specific usage
Use cheaper models for development (gpt-3.5-turbo, claude-haiku, gemini-flash)
Monitor running evaluations and cancel if unexpectedly long
Set provider-side spending limits or rate limits

Prompt Playground

Test prompts interactively using configured LLM connections

LLM-as-a-Judge

Automate quality evaluations with LLM-based scoring

Role-Based Access Controls

Control who can create and manage LLM connections

Evaluations Overview

Learn about ABV’s comprehensive evaluation system

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

LLM Connections

How LLM Connections Work

Setting Up LLM Connections

Supported Providers and Models

Advanced Configuration

Provider-Specific Options

Connecting via Proxies

Custom Model Names

Security Best Practices

Troubleshooting

Prompt Playground

LLM-as-a-Judge

Role-Based Access Controls

Evaluations Overview

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​How LLM Connections Work

​Setting Up LLM Connections

​Supported Providers and Models

​Advanced Configuration

​Provider-Specific Options

​Connecting via Proxies

​Custom Model Names

​Security Best Practices

​Troubleshooting

​Related Topics

Prompt Playground

LLM-as-a-Judge

Role-Based Access Controls

Evaluations Overview

How LLM Connections Work

Setting Up LLM Connections

Supported Providers and Models

Advanced Configuration

Provider-Specific Options

Connecting via Proxies

Custom Model Names

Security Best Practices

Troubleshooting

Related Topics