Basic Features
Masking of Sensitive LLM Data
11 min
masking is a feature that allows precise control over the tracing https //docs abv dev/docs/ce chig5lhx6g8whxfw /9akbmg1c2o3bzkr0t9xuj data sent to the abv server with custom masking functions, you can control and sanitize the data that gets traced and sent to the server whether it's for compliance reasons or to protect user privacy , masking sensitive data is a crucial step in responsible application development it enables you to redact sensitive information from trace or observation inputs and outputs customize the content of events before transmission implement fine grained data filtering based on your specific requirements learn more about abv's data security and privacy measures concerning the stored data in our security & compliance overview docid 1g7muwfq8b3a1nwejcdh how it works you define a custom masking function and pass it to the abv client constructor all event inputs and outputs are processed through this function the masked data is then sent to the abv server this approach ensures that you have complete control over the event input and output data traced by your application python sdk define a masking function the masking function will apply to all event inputs and outputs regardless of the abv maintained integration you are using def masking function(data any, kwargs) > any """function to mask sensitive data before sending to abv """ if isinstance(data, str) and data startswith("secret ") return "redacted" \# for more complex data structures elif isinstance(data, dict) return {k masking function(v) for k, v in data items()} elif isinstance(data, list) return \[masking function(item) for item in data] return data apply the masking function when initializing the abv client from abvdev import abv \# initialize with masking function abv = abv(mask=masking function) \# then get the client from abvdev import get client abv = get client() with the decorator from abvdev import observe abv = abv(mask=masking function) @observe() def my function() \# this data will be masked before being sent to abv return "secret data" result = my function() print(result) # original "secret data" \# the trace output in abv will have the output masked as "redacted" using context managers from abvdev import abv abv = abv(mask=masking function) with abv start as current span( name="sensitive operation", input="secret input data" ) as span \# processing span update(output="secret output data") \# both input and output will be masked as "redacted" in abv js/ts sdk to prevent sensitive data from being sent to abv, you can provide a mask function to the abvspanprocessor this function will be applied to the input , output , and metadata of every observation the function receives an object { data } , where data is the stringified json of the attribute’s value it should return the masked data instrumentation ts import { nodesdk } from "@opentelemetry/sdk node"; import { abvspanprocessor } from "@abvdev/otel"; const spanprocessor = new abvspanprocessor({ mask ({ data }) => { // a simple regex to mask credit card numbers const maskeddata = data replace( /\b\d{4}\[ ]?\d{4}\[ ]?\d{4}\[ ]?\d{4}\b/g, " masked credit card " ); return maskeddata; }, }); const sdk = new nodesdk({ spanprocessors \[spanprocessor], }); sdk start(); see typescript sdk overview docid\ j4sdnlmdmnfmk99ootgn7 for more details examples now, we'll show you examples how to use the masking feature we'll use the abv decorator for this, but you can also use the low level sdk or the js/ts sdk analogously example 1 redacting credit card numbers in this example, we'll demonstrate how to redact credit card numbers from strings using a regular expression https //docs python org/3/library/re html this helps in complying with pci dss by ensuring that credit card numbers are not transmitted or stored improperly abvs masking feature allows you to define a custom masking function with parameters, which you then pass to the abv client constructor this function is applied to all event inputs and outputs, processing each piece of data to mask or redact sensitive information according to your specifications by ensuring that all events are processed through your masking function before being sent, abv guarantees that only the masked data is transmitted to the abv server steps import necessary modules define a masking function that uses a regular expression to detect and replace credit card numbers configure the masking function in abv create a sample function to simulate processing sensitive data observe the trace to see the masked output import re from abvdev import abv, observe, get client \# step 2 define the masking function def masking function(data, kwargs) if isinstance(data, str) \# regular expression to match credit card numbers (visa, mastercard, amex, etc ) pattern = r'\b(? \d\[ ] ?){13,19}\b' data = re sub(pattern, '\[redacted credit card]', data) return data \# step 3 configure the masking function abv = abv(mask=masking function) \# step 4 create a sample function with sensitive data @observe() def process payment() \# simulated sensitive data containing a credit card number transaction info = "customer paid with card number 4111 1111 1111 1111 " return transaction info \# step 5 observe the trace result = process payment() print(result) \# output customer paid with card number \[redacted credit card] \# flush events in short lived applications abv flush() example 2 using the llm guard library in this example, we'll use the anonymize scanner from llm guard to remove personal names and other pii from the data this is useful for anonymizing user data and protecting privacy find our more about the llm guard library in their documentation https //llm guard com/ steps install the llm guard library import necessary modules initialize the vault and configure the anonymize scanner define a masking function that uses the anonymize scanner configure the masking function in abv create a sample function to simulate processing data with pii observe the trace to see the masked output pip install llm guard from abvdev import abv, observe, get client from llm guard vault import vault from llm guard input scanners import anonymize from llm guard input scanners anonymize helpers import bert large ner conf \# step 3 initialize the vault and configure the anonymize scanner vault = vault() def create anonymize scanner() scanner = anonymize( vault, recognizer conf=bert large ner conf, language="en" ) return scanner \# step 4 define the masking function def masking function(data, kwargs) if isinstance(data, str) scanner = create anonymize scanner() \# scan and redact the data sanitized data, is valid, risk score = scanner scan(data) return sanitized data return data \# step 5 configure the masking function abv = abv(mask=masking function) \# step 6 create a sample function with pii @observe() def generate report() \# simulated data containing personal names report = "john doe met with jane smith to discuss the project " return report \# step 7 observe the trace result = generate report() print(result) \# output \[redacted person] met with \[redacted person] to discuss the project \# flush events in short lived applications abv flush() example 3 masking email and phone numbers you can extend the masking function to redact other types of pii such as email addresses and phone numbers using regular expressions import re from abvdev import abv, observe, get client def masking function(data, kwargs) if isinstance(data, str) \# mask email addresses data = re sub(r'\b\[\w\ ]+?@\w+?\\ \w+?\b', '\[redacted email]', data) \# mask phone numbers data = re sub(r'\b\d{3}\[ ]?\d{3}\[ ]?\d{4}\b', '\[redacted phone]', data) return data abv = abv(mask=masking function) @observe() def contact customer() info = "please contact john at john doe\@example com or call 555 123 4567 " return info result = contact customer() print(result) \# output please contact john at \[redacted email] or call \[redacted phone] \# flush events in short lived applications abv flush()