Model overview
This model is a binary text classifier designed to identify biodiversity-related commitments in corporate sustainability reports at the paragraph level. It distinguishes commitments from general or descriptive biodiversity-related statements in formal corporate disclosures.
The model classifies paragraphs into two categories:
Commitment (label=1): the paragraph contains a biodiversity-related action, target, or stated intention
Non-commitment (label=0): the paragraph mentions biodiversity but does not contain an action, target, or intention
The model is intended for research use in the analysis of corporate sustainability and ESG disclosures.
Training approach
The model was trained on a curated dataset of 2,000 manually annotated paragraphs extracted from sustainability reports of Fortune Global 500 companies. Model architecture and training
The classifier is based on climatebert/distilroberta-base-climate-commitment, a DistilRoBERTa-based language model pre-trained on climate-related corpora and previously fine-tuned for commitment detection in environmental disclosures. This model was further fine-tuned for biodiversity-specific commitment classification at the paragraph level.
Key training characteristics include:
unit of analysis: paragraph
maximum sequence length: 256 tokens
task: binary sequence classification
loss function: cross-entropy
optimisation: supervised fine-tuning using the Hugging Face Trainer API
training regime: 5-fold stratified cross-validation
Training was performed on CPU using fixed hyperparameters selected prior to cross-validation. The released model checkpoint corresponds to the fold achieving the highest weighted F1 score.
Recommended Pipeline
First, use ESGBERT/EnvironmentalBERT-biodiversity to identify biodiversity-related paragraphs, then apply this model to identify commitments.
Evaluation
Performance is reported as averages across 5-fold cross-validation on the annotated dataset:
weighted F1 score: 0.928
weighted precision: 0.930
weighted recall: 0.929
AUC–ROC: 0.976
- Downloads last month
- -