Model Card for Model ID
This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".
Model Details
Model Description
This is a toxicity identification model which classifies a text as either "toxic" or "non-toxic".
- Developed by: Anni Eskelinen
- Model type: Text classification
- Language(s) (NLP): Finnish
- Finetuned from model: TurkuNLP/bert-base-finnish-cased-v1
Use
This model is intended to be used to as a helpful tool for content moderation.
Bias, Risks, and Limitations
The model is sometimes very sensitive to toxicity and might classify non-toxic texts as toxic.
How to Get Started with the Model
Use the code below to get started with the model.
>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("annieske/bert-base-finnish-cased-toxicity")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1")
>>> pipe = transformers.pipeline(task="text-classification", model=model, tokenizer=tokenizer)
>>> pipe("This text is neutral!")
>>> pipe("You suck!")
Training Details
Training Data
Training data included ten different toxicity and related task datasets that were machine translated to Finnish and the labels were unified.
The datasets can be found in GitHub.
Training Procedure
Preprocessing
No preprocessing was done on the training data.
Training Hyperparameters
- learning rate 1e-05
- batch size 8
- sequence length 512
- 5 epochs with early stopping
- evaluation every 25,000 steps
Evaluation
Testing Data
A manually annotated Finnish dataset consisting of 600 examples which is a sample of the "TurkuNLP/Suomi24-toxicity-annotated" dataset. Includes 299 non-toxic examples and 301 toxic examples.
The dataset can be foung in GitHub.
Metrics
- Accuracy (corresponds to micro F1)
- Precision (macro)
- Recall (macro)
- F1 (macro)
Results
- Accuracy: 0.71
- Precision: 0.73
- Recall: 0.71
- F1: 0.71
Citation
BibTeX:
Citation information coming later.
- Downloads last month
- 7
Model tree for annieske/bert-base-finnish-cased-toxicity
Base model
TurkuNLP/bert-base-finnish-cased-v1