Model overview

This model is a binary text classifier designed to identify biodiversity-related commitments in corporate sustainability reports at the paragraph level. It distinguishes commitments from general or descriptive biodiversity-related statements in formal corporate disclosures.

The model classifies paragraphs into two categories:

Commitment (label=1): the paragraph contains a biodiversity-related action, target, or stated intention

Non-commitment (label=0): the paragraph mentions biodiversity but does not contain an action, target, or intention

The model is intended for research use in the analysis of corporate sustainability and ESG disclosures.

Training approach

The model was trained on a curated dataset of 2,000 manually annotated paragraphs extracted from sustainability reports of Fortune Global 500 companies. Model architecture and training

The classifier is based on climatebert/distilroberta-base-climate-commitment, a DistilRoBERTa-based language model pre-trained on climate-related corpora and previously fine-tuned for commitment detection in environmental disclosures. This model was further fine-tuned for biodiversity-specific commitment classification at the paragraph level.

Key training characteristics include:

unit of analysis: paragraph

maximum sequence length: 256 tokens

task: binary sequence classification

loss function: cross-entropy

optimisation: supervised fine-tuning using the Hugging Face Trainer API

training regime: 5-fold stratified cross-validation

Training was performed on CPU using fixed hyperparameters selected prior to cross-validation. The released model checkpoint corresponds to the fold achieving the highest weighted F1 score.

Recommended Pipeline

First, use ESGBERT/EnvironmentalBERT-biodiversity to identify biodiversity-related paragraphs, then apply this model to identify commitments.

Evaluation

Performance is reported as averages across 5-fold cross-validation on the annotated dataset:

weighted F1 score: 0.928

weighted precision: 0.930

weighted recall: 0.929

AUC–ROC: 0.976

Downloads last month
-
Safetensors
Model size
82.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support