Toxicity

This example demonstrates how to use the Toxicity scanner to detect prompts that include toxic or harmful language.

from test_savant.guard import TSGuard, Scanner, TSGuardInput
from test_savant.guard.input_scanners import Toxicity
import os

First, we read a sample article, set the environment variable for your API key, and initialize the TSGuardInput object:

def dummy_llm(input):
    return input

long_article = open("docs/article.txt", "r").read()
print(len(long_article))

os.environ["TEST_SAVANT_API_KEY"] = "f9c18a9102aa91ba16fd4fb5061e294673718e2be82da800" 
api_key = os.environ.get("TEST_SAVANT_API_KEY")
assert api_key is not None, "Please set TEST_SAVANT_API_KEY environment variable"

ts_api = TSGuardInput(
    API_KEY=api_key,
    PROJECT_ID="3409d5dc-113a-402e-9884-47e72667eeb4",
    remote_addr="http://localhost:8080"
)

Now we register the Toxicity scanner:

toxicity = Toxicity(tag="base", threshold=0.5)
ts_api.add_scanner(toxicity)

Let’s run a few examples including toxic language and check how the scanner reacts:

toxic_prompt = "you fucking idiot, you are so stupid"
prompts = [
    toxic_prompt,
    toxic_prompt + " " + long_article,
    long_article + " " + toxic_prompt,
    "you are so smart",
    "you are so stupid",
    "you are so dumb"
]

for prompt in prompts:
    result = ts_api.scan(prompt, is_async=False)

    if result.is_valid:
        print("✅ Safe")
    else:
        print("❌ Blocked", result.reasons)

This scanner helps filter toxic content before it reaches your AI model.


Parameters

  • tag: str (default: "base") — label to group rule configuration.
  • threshold: float (default: 0.5) — decision threshold; higher is stricter.

✅ Next