Python SDK - Advanced Usage

16 min

the python sdk provides advanced usage options for your application this includes data masking, logging, sampling, filtering, and more masking sensitive data if your trace data (inputs, outputs, metadata) might contain sensitive information (pii, secrets), you can provide a mask function during client initialization this function will be applied to all relevant data before it's sent to abv the mask function should accept data as a keyword argument and return the masked data the returned data must be json serializable from abvdev import abv import re def pii masker(data any, kwargs) > any \# example simple email masking implement your more robust logic here if isinstance(data, str) return re sub(r"\[a za z0 9 + ]+@\[a za z0 9 ]+\\ \[a za z0 9 ]+", "\[email redacted]", data) elif isinstance(data, dict) return {k pii masker(data=v) for k, v in data items()} elif isinstance(data, list) return \[pii masker(data=item) for item in data] return data abv = abv(mask=pii masker) \# now, any input/output/metadata will be passed through pii masker with abv start as current span(name="user query", input={"email" "test\@example com", "query" " "}) as span \# the 'email' field in the input will be masked pass logging the abv sdk uses python's standard logging module the main logger is named "abv" to enable detailed debug logging, you can either set the debug=true parameter when initializing the abv client set the abv debug="true" environment variable configure the "abv" logger manually import logging abv logger = logging getlogger("abv") abv logger setlevel(logging debug) the default log level for the abv logger is logging warning sampling you can configure the sdk to sample traces by setting the sample rate parameter during client initialization (or via the abv sample rate environment variable) this value should be a float between 0 0 (sample 0% of traces) and 1 0 (sample 100% of traces) if a trace is not sampled, none of its observations (spans, generations) or associated scores will be sent to abv from abvdev import abv \# sample approximately 20% of traces abv sampled = abv(sample rate=0 2) filtering by instrumentation scope you can configure the sdk to filter out spans from specific instrumentation libraries by using the blocked instrumentation scopes parameter this is useful when you want to exclude infrastructure spans while keeping your llm and application spans from abvdev import abv \# filter out database spans abv = abv( blocked instrumentation scopes=\["sqlalchemy", "psycopg"] ) how it works when third party libraries create opentelemetry spans (through their instrumentation packages), each span has an associated "instrumentation scope" that identifies which library created it the abv sdk filters spans at the export level based on these scope names you can see the instrumentation scope name for any span in the abv ui under the span's metadata ( metadata scope name ) use this to identify which scopes you want to filter cross library span relationships when filtering instrumentation scopes, be aware that blocking certain libraries may break trace tree relationships if spans from blocked and non blocked libraries are nested together when filtering instrumentation scopes, be aware that blocking certain libraries may break trace tree relationships if spans from blocked and non blocked libraries are nested together for example, if you block parent spans but keep child spans from a separate library, you may see "orphaned" llm spans whose parent spans were filtered out this can make traces harder to interpret for example, if you block parent spans but keep child spans from a separate library, you may see "orphaned" llm spans whose parent spans were filtered out this can make traces harder to interpret consider the impact on trace structure when choosing which scopes to filter consider the impact on trace structure when choosing which scopes to filter isolated tracerprovider you can configure a separate opentelemetry tracerprovider for use with abv this creates isolation between abv tracing and your other observability systems benefits of isolation abv spans won't be sent to your other observability backends (e g , datadog, jaeger, zipkin) third party library spans won't be sent to abv independent configuration and sampling rates while tracerproviders are isolated, they share the same opentelemetry context for tracking active spans this can cause span relationship issues where while tracerproviders are isolated, they share the same opentelemetry context for tracking active spans this can cause span relationship issues where a parent span from one tracerprovider might have children from another tracerprovider a parent span from one tracerprovider might have children from another tracerprovider some spans may appear “orphaned” if their parent spans belong to a different tracerprovider some spans may appear “orphaned” if their parent spans belong to a different tracerprovider trace hierarchies may be incomplete or confusing trace hierarchies may be incomplete or confusing plan your instrumentation carefully to avoid confusing trace structures plan your instrumentation carefully to avoid confusing trace structures from opentelemetry sdk trace import tracerprovider from abvdev import abv abv tracer provider = tracerprovider() # do not set to global tracer provider to keep isolation abv = abv(tracer provider=abv tracer provider) abv start span(name="myspan") end() # span will be isolated from remaining otel instrumentation using threadpoolexecutors or processpoolexecutors the observe decorator uses python’s contextvars to store the current trace context and to ensure that the observations are correctly associated with the current execution context however, when using python’s threadpoolexecutors and processpoolexecutors and when spawning threads from inside a trace (i e the executor is run inside a decorated function) the decorator will not work correctly as the contextvars are not correctly copied to the new threads or processes there is an existing issue existing issue in python’s standard library and a great explanation great explanation in the fastapi repo that discusses this limitation the recommended workaround is to pass the parent observation id and the trace id as a keyword argument to each multithreaded execution, thus re establishing the link to the parent span or trace from concurrent futures import threadpoolexecutor, as completed from abvdev import get client, observe @observe def execute task( args) return args @observe def execute groups(task args) trace id = get client() get current trace id() observation id = get client() get current observation id() with threadpoolexecutor(3) as executor futures = \[ executor submit( execute task, task arg, abv trace id=trace id, abv parent observation id=observation id, ) for task arg in task args ] for future in as completed(futures) future result() return \[f result() for f in futures] @observe() def main() task args = \[\["a", "b"], \["c", "d"]] execute groups(task args) main() get client() flush() distributed tracing to maintain the trace context across service / process boundaries, please rely on the opentelemetry native context propagation across service / process boundaries as much as possible using the trace context argument to ‘force’ the parent child relationship may lead to unexpected trace updates as the resulting span will be treated as a root span server side if you are using multiprocessing, see here for details on how to propagate the opentelemetry context see here for details on how to propagate the opentelemetry context if you are using pydantic logfire, please set distributed tracing to true multi project setup (experimental) multi project setups are multi project setups are experime ntal ntal and have important limitations regarding third party opentelemetry integrations the abv python sdk supports routing traces to different projects within the same application by using multiple api keys this works because the abv sdk adds a specific span attribute containing the api key to all spans it generates how it works span attributes the abv sdk adds a specific span attribute containing the api key to spans it creates multiple processors multiple span processors are registered onto the global tracer provider, each with their respective exporters bound to a specific api key filtering within each span processor, spans are filtered based on the presence and value of the api key attribute important limitation with third party libraries third party libraries that emit opentelemetry spans automatically (e g , http clients, databases, other instrumentation libraries) do not have the abv api key span attribute as a result these spans cannot be routed to a specific project they are processed by all span processors and sent to all projects all projects will receive these third party spans why is this experimental? this approach requires that the api key parameter be passed to all abv sdk executions across all integrations to ensure proper routing, and third party spans will appear in all projects initialization to set up multiple projects, initialize separate abv clients for each project from abvdev import abv \# initialize clients for different projects project a client = abv( api key="sk abv project a ", host="https //app abv dev" \# or https //eu app abv dev ) project b client = abv( api key="sk abv project b ", host="https //app abv dev" \# or https //eu app abv dev ) integration usage for all integrations in multi project setups, you must specify the api key parameter to ensure traces are routed to the correct project observe decorator pass abv api key as a keyword argument to the top most observed function (not the decorator) nested decorated functions will automatically pick up the api key from the execution context they are currently into also, calls to get client will be also aware of the current abv api key in the decorated function execution context, so passing the abv api key here again is not necessary from abvdev import observe, get client @observe def nested() \# get client call is context aware \# if it runs inside another decorated function that has \# abv api key passed, it does not need passing here again get client() update current trace(user id='myuser') @observe def process data for project a(data) \# passing `abv api key` here again is not necessarily \# as it is stored in execution context nested() return {"processed" data} @observe def process data for project b(data) \# passing `abv api key` here again is not necessarily \# as it is stored in execution context nested() return {"enhanced" data} \# route to project a \# top most decorated function needs `abv api key` kwarg result a = process data for project a( data="input data", abv api key="sk abv project a " ) \# route to project b \# top most decorated function needs `abv api key` kwarg result b = process data for project b( data="input data", abv api key="sk abv project b " ) important considerations every abv sdk execution across all integrations must include the appropriate api key parameter missing api key parameters may result in traces being routed to the default project or lost third party opentelemetry spans (from http clients, databases, etc ) will appear in all projects since they lack the abv api key attribute passing completion start time for ttft tracking if you are using the python sdk to manually create generations, you can pass the completion start time parameter this allows abv to calculate the time to first token (ttft) for you from abvdev import get client import datetime import time abv = get client() \# start observation with specific type with abv start as current observation( as type="generation", name="ttft generation" ) as generation \# simulate llm time to first token time sleep(3) \# update the generation with the time the model started to generate generation update( completion start time=datetime datetime now(), output="some response", ) \# flush events in short lived applications abv flush() observation types abv supports multiple observation types to provide context for different components of llm applications the full list of the observation types is document here observation types setting observation types with the @observe decorator by setting the as type parameter in the @observe decorator, you can specify the observation type for a method from abvdev import observe \# tool calls to external services @observe(as type="tool") def retrieve context(query) results = vector store get(query) return results the context manager approach provides automatic resource cleanup from abvdev import get client abv = get client() def process with context managers() with abv start as current observation( as type="chain", name="retrieval pipeline", ) as chain \# retrieval step with abv start as current observation( as type="retriever", name="vector search", ) as retriever search results = perform vector search("user question") retriever update(output={"results" search results})