Python Guide

Installation and Setup

The ABV client library installs via pip and works with Python 3.8 or later. Install it in your project or virtual environment:

pip install abvdev

The package includes both synchronous and asynchronous clients, so you don’t need separate packages for async support. Type stubs are included for better IDE support if you’re using type checkers like mypy or Pylance.

Client Initialization Patterns

How you initialize the ABV client affects your application’s structure. Let’s explore different patterns and their use cases. The simplest initialization provides your API key directly:

from abvdev import ABV

abv = ABV(api_key='sk_...')

This works for quick prototypes or scripts, but storing credentials in code isn’t recommended for production. Instead, use environment variables:

import os
from abvdev import ABV

abv = ABV(api_key=os.environ.get('ABV_API_KEY'))

The client checks for the ABV_API_KEY environment variable automatically, so you can simplify further:

from abvdev import ABV

abv = ABV()  # Automatically uses ABV_API_KEY from environment

This pattern keeps credentials out of your codebase and makes it easy to use different keys in different environments. For applications that need to support multiple regions, specify the region during initialization:

from abvdev import ABV

abv = ABV(region='eu')  # 'us' (default) or 'eu'

Region selection determines which ABV infrastructure handles your requests. Choose the region closest to your users or matching your data residency requirements.

Client Lifecycle and Module Organization

Creating an ABV client is lightweight, but you should generally create one client instance and reuse it throughout your application. Python’s module system makes this pattern natural. Create a module that initializes and exports the client:

# abv_client.py
from abvdev import ABV

abv = ABV()  # Uses ABV_API_KEY from environment

Then import this client wherever you need it:

# chat_handler.py
from abv_client import abv

def handle_chat_request(user_message: str) -> str:
    """Process a chat request and return the response."""
    response = abv.gateway.chat.completions.create(
        provider='openai',
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': user_message}]
    )

    return response['choices'][0]['message']['content']

This pattern ensures you’re reusing the same client instance across your application, which is more efficient and simplifies testing since you can mock the imported client in test files.

Working with Type Hints

Python’s type hints help catch errors during development and improve code readability. While the ABV client library works fine without type hints, adding them makes your code more maintainable. Use type hints to document function signatures that work with gateway responses:

from typing import Dict, List, Any

def extract_response(response: Dict[str, Any]) -> str:
    """Extract the text content from a gateway response."""
    return response['choices'][0]['message']['content']

def build_messages(system_prompt: str, user_message: str) -> List[Dict[str, str]]:
    """Build a messages array with system and user messages."""
    return [
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': user_message}
    ]

Type hints help your IDE provide better autocomplete and catch type errors before runtime. They also serve as documentation for other developers reading your code. For more sophisticated type checking, you can define TypedDict classes that represent the structure of gateway requests and responses:

from typing import TypedDict, List, Literal

class Message(TypedDict):
    role: Literal['system', 'user', 'assistant']
    content: str

class ChatRequest(TypedDict):
    provider: Literal['openai', 'anthropic', 'gemini']
    model: str
    messages: List[Message]
    temperature: float
    max_tokens: int

def create_completion(request: ChatRequest) -> Dict[str, Any]:
    """Create a chat completion with type-checked parameters."""
    return abv.gateway.chat.completions.create(**request)

This level of type safety catches more errors during development, though it requires more upfront definition effort.

Handling Streaming Responses

Streaming responses arrive as an iterator that yields chunks as the model generates tokens. Python’s iteration protocol makes working with streams natural. The basic streaming pattern uses a for loop to process chunks:

stream = abv.gateway.chat.completions.create(
    provider='openai',
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': 'Tell me a story'}],
    stream=True
)

for chunk in stream:
    content = chunk.get('choices', [{}])[0].get('delta', {}).get('content')
    if content:
        print(content, end='', flush=True)

The nested .get() calls with default values handle the varying structure of chunks safely. Early chunks might not have content, and this pattern avoids KeyError exceptions. For applications that need both real-time display and the complete response, accumulate chunks while iterating:

stream = abv.gateway.chat.completions.create(
    provider='openai',
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': 'Explain Python decorators'}],
    stream=True
)

full_response = ''

for chunk in stream:
    content = chunk.get('choices', [{}])[0].get('delta', {}).get('content')
    if content:
        full_response += content
        print(content, end='', flush=True)

print(f'\n\nComplete response length: {len(full_response)} characters')

This pattern works well for chatbots that display streaming text while also saving the complete conversation history.

Async/Await for Concurrent Requests

Python’s asyncio support enables efficient concurrent processing of multiple AI requests. The ABV client provides async methods for applications built with asyncio. For async streaming, use the create_async method and async for to iterate:

import asyncio

async def stream_response():
    """Stream a response asynchronously."""
    stream = await abv.gateway.chat.completions.create_async(
        provider='openai',
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': 'Tell me a story'}],
        stream=True
    )

    async for chunk in stream:
        content = chunk.get('choices', [{}])[0].get('delta', {}).get('content')
        if content:
            print(content, end='', flush=True)

# Run the async function
asyncio.run(stream_response())

The real power of async comes when processing multiple requests concurrently. This is much faster than processing them sequentially:

import asyncio

async def process_prompt(prompt: str) -> str:
    """Process a single prompt asynchronously."""
    response = await abv.gateway.chat.completions.create_async(
        provider='openai',
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': prompt}]
    )
    return response['choices'][0]['message']['content']

async def process_multiple_prompts():
    """Process multiple prompts concurrently."""
    prompts = [
        'What is Python?',
        'Explain asyncio',
        'What are decorators?',
        'How does type hinting work?'
    ]

    # Create tasks for all prompts and run them concurrently
    tasks = [process_prompt(prompt) for prompt in prompts]
    results = await asyncio.gather(*tasks)

    for prompt, result in zip(prompts, results):
        print(f'\nPrompt: {prompt}')
        print(f'Response: {result}')

asyncio.run(process_multiple_prompts())

Using asyncio.gather processes all requests concurrently, which is much faster than processing them one by one. This pattern is particularly valuable for batch processing or applications that need to make multiple AI requests in response to a single user action.

Error Handling Strategies

Gateway requests can fail for various reasons, and handling these failures appropriately improves application reliability. Python’s exception system gives you several approaches to error handling. The basic pattern uses try-except to catch failures:

try:
    response = abv.gateway.chat.completions.create(
        provider='openai',
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': 'Hello'}]
    )
    print(response['choices'][0]['message']['content'])
except Exception as error:
    print(f'Gateway request failed: {error}')

For production applications, distinguish between different error types to handle them appropriately. Rate limit errors need different handling than authentication errors:

def make_request_safe(messages: List[Dict[str, str]]) -> str:
    """Make a gateway request with comprehensive error handling."""
    try:
        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=messages
        )
        return response['choices'][0]['message']['content']
    except Exception as error:
        error_message = str(error).lower()

        if 'rate limit' in error_message:
            print('Rate limited - need to slow down requests')
            raise ValueError('Service temporarily unavailable')
        elif 'authentication' in error_message:
            print('Authentication failed - check API key')
            raise ValueError('Configuration error')
        elif 'invalid' in error_message:
            print('Invalid request parameters')
            raise ValueError('Invalid request')
        else:
            print(f'Unexpected error: {error}')
            raise ValueError('Request failed')

This pattern examines the error message to determine the failure type and responds appropriately. Different error types map to different user-facing messages or retry strategies. For applications requiring retry logic, implement exponential backoff to handle transient failures:

import time

def make_request_with_retry(
    messages: List[Dict[str, str]],
    max_retries: int = 3
) -> str:
    """Make a gateway request with exponential backoff retry logic."""
    last_error = None

    for attempt in range(max_retries):
        try:
            response = abv.gateway.chat.completions.create(
                provider='openai',
                model='gpt-4o-mini',
                messages=messages
            )
            return response['choices'][0]['message']['content']
        except Exception as error:
            last_error = error
            error_message = str(error).lower()

            # Don't retry authentication errors - they won't succeed
            if 'authentication' in error_message:
                raise error

            # On final attempt, give up
            if attempt == max_retries - 1:
                raise error

            # Wait before retrying, with exponential backoff
            delay = (2 ** attempt) * 1.0
            print(f'Retry {attempt + 1}/{max_retries} after {delay}s')
            time.sleep(delay)

    raise last_error or Exception('Max retries exceeded')

This retry logic handles temporary failures like network issues while avoiding endless loops on permanent failures. The exponential backoff prevents overwhelming the service with rapid retries.

Building Conversation Context

Most AI applications involve multi-turn conversations where the model needs context from previous messages. Managing this context is essential for building chat applications. The straightforward approach maintains a list of messages that grows with the conversation:

class Conversation:
    """Manages a conversation with context history."""

    def __init__(self, system_prompt: str):
        """Initialize conversation with a system prompt."""
        self.messages = [
            {'role': 'system', 'content': system_prompt}
        ]

    def send_message(self, user_message: str) -> str:
        """Send a message and get a response, maintaining context."""
        # Add user message to history
        self.messages.append({
            'role': 'user',
            'content': user_message
        })

        # Get response from model
        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=self.messages
        )

        assistant_message = response['choices'][0]['message']

        # Add assistant response to history
        self.messages.append(assistant_message)

        return assistant_message['content']

    def get_history(self) -> List[Dict[str, str]]:
        """Get a copy of the conversation history."""
        return self.messages.copy()

This class encapsulates conversation state and ensures history stays synchronized. Using it looks like this:

conversation = Conversation(
    'You are a helpful assistant that explains programming concepts clearly.'
)

print(conversation.send_message('What is a Python decorator?'))
print(conversation.send_message('Can you show me an example?'))
print(conversation.send_message('How is this different from a regular function?'))

Each message includes the full conversation history, enabling the model to reference earlier exchanges and maintain context. For long-running conversations, manage the context window size to avoid token limits:

class Conversation:
    """Manages a conversation with automatic history truncation."""

    def __init__(self, system_prompt: str, max_messages: int = 20):
        """Initialize conversation with history limit."""
        self.messages = [
            {'role': 'system', 'content': system_prompt}
        ]
        self.max_messages = max_messages

    def send_message(self, user_message: str) -> str:
        """Send a message with automatic history management."""
        self.messages.append({
            'role': 'user',
            'content': user_message
        })

        # Keep only recent messages (always preserve system message)
        if len(self.messages) > self.max_messages:
            self.messages = [
                self.messages[0],  # System message
                *self.messages[-(self.max_messages - 1):]  # Recent messages
            ]

        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=self.messages
        )

        assistant_message = response['choices'][0]['message']
        self.messages.append(assistant_message)

        return assistant_message['content']

This approach prevents conversations from exceeding token limits by keeping only recent messages. The tradeoff is losing access to earlier context.

Working with Dataclasses

Python’s dataclasses provide a clean way to structure data for AI applications. They’re particularly useful for managing conversation state and request parameters:

from dataclasses import dataclass, field
from typing import List, Dict

@dataclass
class ChatMessage:
    """Represents a single message in a conversation."""
    role: str
    content: str

    def to_dict(self) -> Dict[str, str]:
        """Convert to dictionary for API requests."""
        return {'role': self.role, 'content': self.content}

@dataclass
class Conversation:
    """Manages a conversation with structured message handling."""
    system_prompt: str
    messages: List[ChatMessage] = field(default_factory=list)
    max_history: int = 20

    def __post_init__(self):
        """Initialize with system message."""
        self.messages.append(
            ChatMessage(role='system', content=self.system_prompt)
        )

    def send_message(self, content: str) -> str:
        """Send a user message and get a response."""
        user_message = ChatMessage(role='user', content=content)
        self.messages.append(user_message)

        # Manage history size
        if len(self.messages) > self.max_history:
            self.messages = [
                self.messages[0],  # System message
                *self.messages[-(self.max_history - 1):]
            ]

        # Convert to API format
        api_messages = [msg.to_dict() for msg in self.messages]

        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=api_messages
        )

        assistant_content = response['choices'][0]['message']['content']
        assistant_message = ChatMessage(role='assistant', content=assistant_content)
        self.messages.append(assistant_message)

        return assistant_content

Dataclasses make the structure explicit and provide useful methods like repr automatically, which helps with debugging.

Framework Integration

The gateway integrates naturally with popular Python web frameworks. Here are patterns for common frameworks. For Flask applications, create an endpoint that handles AI requests:

from flask import Flask, request, jsonify
from abv_client import abv

app = Flask(__name__)

@app.route('/api/chat', methods=['POST'])
def chat():
    """Handle chat requests."""
    try:
        data = request.get_json()
        user_message = data.get('message')

        if not user_message:
            return jsonify({'error': 'Message is required'}), 400

        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': user_message}]
        )

        return jsonify({
            'response': response['choices'][0]['message']['content']
        })
    except Exception as error:
        print(f'Chat error: {error}')
        return jsonify({'error': 'Failed to generate response'}), 500

For FastAPI applications, the pattern is similar but uses FastAPI’s async support:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from abv_client import abv

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

class ChatResponse(BaseModel):
    response: str

@app.post('/api/chat', response_model=ChatResponse)
async def chat(request: ChatRequest):
    """Handle chat requests asynchronously."""
    try:
        response = await abv.gateway.chat.completions.create_async(
            provider='openai',
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': request.message}]
        )

        return ChatResponse(
            response=response['choices'][0]['message']['content']
        )
    except Exception as error:
        print(f'Chat error: {error}')
        raise HTTPException(
            status_code=500,
            detail='Failed to generate response'
        )

FastAPI’s async support and Pydantic models provide type safety and automatic validation, making it an excellent choice for AI-powered APIs. For Django applications, create a view that processes AI requests:

from django.http import JsonResponse
from django.views.decorators.http import require_http_methods
from django.views.decorators.csrf import csrf_exempt
import json
from abv_client import abv

@csrf_exempt
@require_http_methods(['POST'])
def chat_view(request):
    """Handle chat requests in Django."""
    try:
        data = json.loads(request.body)
        user_message = data.get('message')

        if not user_message:
            return JsonResponse(
                {'error': 'Message is required'},
                status=400
            )

        response = abv.gateway.chat.completions.create(
            provider='openai',
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': user_message}]
        )

        return JsonResponse({
            'response': response['choices'][0]['message']['content']
        })
    except Exception as error:
        print(f'Chat error: {error}')
        return JsonResponse(
            {'error': 'Failed to generate response'},
            status=500
        )

These patterns integrate the gateway into your existing framework naturally without requiring architectural changes.

Testing Strategies

Testing code that calls AI models requires different approaches than testing deterministic functions. You can’t assert exact outputs since model responses vary, but you can test your code’s structure and error handling. Mock the ABV client for unit tests to avoid making actual API calls:

from unittest.mock import Mock, patch
import pytest
from chat_handler import handle_chat_request

def test_handle_chat_request():
    """Test chat request handling with mocked client."""
    mock_response = {
        'choices': [{
            'message': {
                'role': 'assistant',
                'content': 'This is a test response'
            }
        }]
    }

    with patch('abv_client.abv.gateway.chat.completions.create') as mock_create:
        mock_create.return_value = mock_response

        result = handle_chat_request('Hello')

        assert result == 'This is a test response'
        mock_create.assert_called_once()

This approach tests your code’s logic without depending on external services, making tests fast and deterministic. For integration tests where you want to verify actual API behavior, make real requests but structure tests to be flexible about specific outputs:

import pytest
from abv_client import abv

def test_gateway_returns_valid_responses():
    """Test that gateway returns properly structured responses."""
    response = abv.gateway.chat.completions.create(
        provider='openai',
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': 'Say hello'}]
    )

    # Assert structure, not specific content
    assert 'choices' in response
    assert len(response['choices']) == 1
    assert 'message' in response['choices'][0]
    assert 'content' in response['choices'][0]['message']
    assert response['choices'][0]['message']['content']
    assert 'usage' in response
    assert response['usage']['total_tokens'] > 0

This approach verifies that your API integration works without depending on specific model outputs, making tests more robust.

Next Steps

You now understand how to implement the gateway in Python applications, handle errors, manage conversations, and integrate with frameworks. Here’s where to go next:

TypeScript Guide

Learn how to implement the gateway in TypeScript/JavaScript applications

Available Models

See all supported providers and models with pricing

LLM Gateway Overview

Understand the core concepts and architecture of the gateway

Quickstart

Get up and running with your first gateway request in 5 minutes

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

Installation and Setup

Client Initialization Patterns

Client Lifecycle and Module Organization

Working with Type Hints

Handling Streaming Responses

Async/Await for Concurrent Requests

Error Handling Strategies

Building Conversation Context

Working with Dataclasses

Framework Integration

Testing Strategies

Next Steps

TypeScript Guide

Available Models

LLM Gateway Overview

Quickstart

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​Installation and Setup

​Client Initialization Patterns

​Client Lifecycle and Module Organization

​Working with Type Hints

​Handling Streaming Responses

​Async/Await for Concurrent Requests

​Error Handling Strategies

​Building Conversation Context

​Working with Dataclasses

​Framework Integration

​Testing Strategies

​Next Steps

TypeScript Guide

Available Models

LLM Gateway Overview

Quickstart

Installation and Setup

Client Initialization Patterns

Client Lifecycle and Module Organization

Working with Type Hints

Handling Streaming Responses

Async/Await for Concurrent Requests

Error Handling Strategies

Building Conversation Context

Working with Dataclasses

Framework Integration

Testing Strategies

Next Steps