Model Card for Model ID

This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".

Model Details

Model Description

This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".

  • Developed by: Anni Eskelinen
  • Model type: Text classification
  • Language(s) (NLP): Finnish
  • Finetuned from model: TurkuNLP/bert-base-finnish-cased-v1

Use

This model is intended to be used to as a helpful tool for content moderation.

Bias, Risks, and Limitations

The model is sometimes very sensitive to toxicity and might classify non-toxic texts as toxic.

How to Get Started with the Model

Use the code below to get started with the model.

>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("annieske/bert-base-finnish-cased-toxicity")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1")
>>> pipe = transformers.pipeline(task="text-classification", model=model, tokenizer=tokenizer)
>>> pipe("This text is neutral!")
>>> pipe("You suck!")

Training Details

Training Data

Training data included ten different toxicity and related task datasets that were machine translated to Finnish and the labels were unified.

The datasets can be found in GitHub.

Training Procedure

Preprocessing

No preprocessing was done on the training data.

Training Hyperparameters

  • learning rate 1e-05
  • batch size 8
  • sequence length 512
  • 5 epochs with early stopping
  • evaluation every 25,000 steps

Evaluation

Testing Data

A manually annotated Finnish dataset consisting of 600 examples which is a sample of the "TurkuNLP/Suomi24-toxicity-annotated" dataset. Includes 299 non-toxic examples and 301 toxic examples.

The dataset can be foung in GitHub.

Metrics

  • Accuracy (corresponds to micro F1)
  • Precision (macro)
  • Recall (macro)
  • F1 (macro)

Results

  • Accuracy: 0.71
  • Precision: 0.73
  • Recall: 0.71
  • F1: 0.71

Citation

BibTeX:

Citation information coming later.

Downloads last month
7
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for annieske/bert-base-finnish-cased-toxicity

Finetuned
(34)
this model