Model Card for Model ID

This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".

Model Details

Model Description

This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".

Developed by: Anni Eskelinen
Model type: Text classification
Language(s) (NLP): Finnish
Finetuned from model: TurkuNLP/bert-base-finnish-cased-v1

Use

This model is intended to be used to as a helpful tool for content moderation.

Bias, Risks, and Limitations

The model is sometimes very sensitive to toxicity and might classify non-toxic texts as toxic.

How to Get Started with the Model

Use the code below to get started with the model.

>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("annieske/bert-base-finnish-cased-toxicity")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1")
>>> pipe = transformers.pipeline(task="text-classification", model=model, tokenizer=tokenizer)
>>> pipe("This text is neutral!")
>>> pipe("You suck!")

Training Details

Training Data

Training data included ten different toxicity and related task datasets that were machine translated to Finnish and the labels were unified.

The datasets can be found in GitHub.

Training Procedure

Preprocessing

No preprocessing was done on the training data.

Training Hyperparameters

learning rate 1e-05
batch size 8
sequence length 512
5 epochs with early stopping
evaluation every 25,000 steps

Evaluation

Testing Data

A manually annotated Finnish dataset consisting of 600 examples which is a sample of the "TurkuNLP/Suomi24-toxicity-annotated" dataset. Includes 299 non-toxic examples and 301 toxic examples.

The dataset can be foung in GitHub.

Metrics

Accuracy (corresponds to micro F1)
Precision (macro)
Recall (macro)
F1 (macro)

Results

Accuracy: 0.71
Precision: 0.73
Recall: 0.71
F1: 0.71

Citation

BibTeX:

Citation information coming later.

Downloads last month: 7

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for annieske/bert-base-finnish-cased-toxicity

Base model

TurkuNLP/bert-base-finnish-cased-v1

Finetuned

(34)

this model