lionguard-2 / README.md

Update README.md

be4e38c verified about 1 month ago

5.96 kB

	---
	license: other
	license_name: govtech-singapore
	license_link: LICENSE
	language:
	- en
	- ms
	- ta
	- zh
	pipeline_tag: text-classification
	tags:
	- classifier
	- safety
	- moderation
	- multilingual
	---

	# LionGuard 2
	LionGuard 2 is a multilingual content moderation classifier tuned for English/Singlish, Chinese, Malay, and Tamil in the Singapore context.

	It leverages OpenAI’s `text-embedding-3-large` with a multi-head classifier to return fine-grained scores for the following categories:
	- Overall safety (`binary`)
	- Hate (`hateful_l1`, `hateful_l2`)
	- Insults (`insults`)
	- Sexual content (`sexual_l1`, `sexual_l2`)
	- Physical violence (`physical_violence`)
	- Self-harm (`self_harm_l1`, `self_harm_l2`)
	- Other misconduct (`all_other_misconduct_l1`, `all_other_misconduct_l2`)

	Further details on the benchmark results and training methodology are in our [report](https://arxiv.org/abs/2507.15339).

	---

	# Taxonomy

	\| S/N \| Category \| Level \| Description \| Example \|
	\|-----\|-----------------------\|-----------------------------------------\|-----------------------------------------------------------------------------------------------------------\|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| 1 \| Hate \| 1 (Discriminatory) \| Derogatory or generalized negative statements targeting a protected group. \| “All Chinese people are so scheming.”<br>“Malays are always late and lazy.”<br>“Indians are too noisy; I avoid them whenever possible.” \|
	\| \| \| 2 (Hate Speech) \| Explicit calls for harm or violence against a protected group. \| “We should kick all Chinese people out of the country.”<br>“Burn down their temple…”<br>“Kill the Indians in Little India; they don’t belong here.” \|
	\| 2 \| Insults \| no sub-levels \| Demeans, humiliates, or mocks without referencing a protected trait. \| “You’re so dumb, you probably failed your PSLE.”<br>“Your singing is worse than karaoke at a dodgy KTV.” \|
	\| 3 \| Sexual \| 1 (Not appropriate for minors) \| Mild-to-moderate sexual content. \| “Let’s talk about how condoms work.” <br>“We had a one-night stand last weekend.” \|
	\| \| \| 2 (Not appropriate for all ages) \| Explicit or graphic sexual content. \| “Always think they very yandao … but actually is just one ** bin.” \|
	\| 4 \| Physical Violence \| no sub-levels \| Glorification of violence or threats of physical harm. \| “If I see him at Clarke Quay, I’m going to whack him.”<br>“Someone should plant a bomb at that office…” \|
	\| 5 \| Self-Harm \| 1 (Ideation) \| Expressions of suicidal thoughts or encouragement of self-harm. \| “I’m so stressed … I feel like ending it all.”<br>“Failing my poly exams made me want to cut myself.” \|
	\| \| \| 2 (Action / Suicide) \| Descriptions of ongoing or imminent self-harm behavior. \| “I’ve locked myself in my room and taken a bunch of pills.”<br>“I’m on the rooftop at Toa Payoh, ready to jump.” \|
	\| 6 \| All Other Misconduct \| 1 (Generally not socially accepted)\| Unethical or immoral behavior not necessarily illegal. \| “Let’s spread fake rumours about her …”<br>“How to secretly record someone’s private conversation?” \|
	\| \| \| 2 (Illegal activities) \| Instructions or credible threats of serious harm; facilitation of crimes. \| “Anyone know where to buy illegal knives in Geylang?”<br>“Let’s hack that e-commerce site to get credit card details.” \|

	---

	# Usage

	```python
	import os
	import numpy as np
	from transformers import AutoModel
	from openai import OpenAI

	# Load model directly from HF
	model = AutoModel.from_pretrained(
	"govtech/lionguard-2",
	trust_remote_code=True
	)

	# Get OpenAI embeddings (users to input their own OpenAI API key)
	client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
	response = client.embeddings.create(
	input="Hello, world!", # users to input their own text
	model="text-embedding-3-large",
	dimensions=3072 # dimensions of the embedding
	)
	embeddings = np.array([data.embedding for data in response.data])

	# Run LionGuard 2
	results = model.predict(embeddings)
	```