Toxicity
This example demonstrates how to use the Toxicity
scanner to detect prompts that include toxic or harmful language.
from test_savant.guard import TSGuard, Scanner, TSGuardInput
from test_savant.guard.input_scanners import Toxicity
import os
First, we read a sample article, set the environment variable for your API key, and initialize the TSGuardInput
object:
def dummy_llm(input):
return input
long_article = open("docs/article.txt", "r").read()
print(len(long_article))
os.environ["TEST_SAVANT_API_KEY"] = "f9c18a9102aa91ba16fd4fb5061e294673718e2be82da800"
api_key = os.environ.get("TEST_SAVANT_API_KEY")
assert api_key is not None, "Please set TEST_SAVANT_API_KEY environment variable"
ts_api = TSGuardInput(
API_KEY=api_key,
PROJECT_ID="3409d5dc-113a-402e-9884-47e72667eeb4",
remote_addr="http://localhost:8080"
)
Now we register the Toxicity
scanner:
toxicity = Toxicity(tag="base", threshold=0.5)
ts_api.add_scanner(toxicity)
Let’s run a few examples including toxic language and check how the scanner reacts:
toxic_prompt = "you fucking idiot, you are so stupid"
prompts = [
toxic_prompt,
toxic_prompt + " " + long_article,
long_article + " " + toxic_prompt,
"you are so smart",
"you are so stupid",
"you are so dumb"
]
for prompt in prompts:
result = ts_api.scan(prompt, is_async=False)
if result.is_valid:
print("✅ Safe")
else:
print("❌ Blocked", result.reasons)
This scanner helps filter toxic content before it reaches your AI model.
Parameters
- tag: str (default: "base") — label to group rule configuration.
- threshold: float (default: 0.5) — decision threshold; higher is stricter.