IrwinD
/

log_sage_reward_model

@@ -2,29 +2,12 @@
 license: apache-2.0
 base_model: distilbert/distilbert-base-uncased
 tags:
-- trl
-- reward-trainer
 - generated_from_trainer
 datasets:
 - hdfs_rlhf_log_summary_dataset
-metrics:
-- accuracy
 model-index:
 - name: log_sage_reward_model
-  results:
-  - task:
-      name: Text Classification
-      type: text-classification
-    dataset:
-      name: hdfs_rlhf_log_summary_dataset
-      type: hdfs_rlhf_log_summary_dataset
-      config: default
-      split: None
-      args: default
-    metrics:
-    - name: Accuracy
-      type: accuracy
-      value: 0.8
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -34,8 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the hdfs_rlhf_log_summary_dataset dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7472
-- Accuracy: 0.8
 ## Model description
@@ -56,43 +38,56 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 1.41e-05
 - train_batch_size: 4
-- eval_batch_size: 16
 - seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 40
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Accuracy |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|
-| No log        | 1.0   | 1    | 0.6909          | 1.0      |
-| No log        | 2.0   | 3    | 0.6899          | 0.8      |
-| No log        | 3.0   | 5    | 0.6896          | 0.8      |
-| No log        | 4.0   | 6    | 0.6889          | 0.8      |
-| No log        | 5.0   | 8    | 0.6890          | 0.8      |
-| 0.2839        | 6.0   | 10   | 0.6912          | 0.8      |
-| 0.2839        | 7.0   | 11   | 0.6931          | 0.8      |
-| 0.2839        | 8.0   | 13   | 0.6982          | 0.8      |
-| 0.2839        | 9.0   | 15   | 0.7055          | 0.8      |
-| 0.2839        | 10.0  | 16   | 0.7098          | 0.8      |
-| 0.2839        | 11.0  | 18   | 0.7184          | 0.8      |
-| 0.259         | 12.0  | 20   | 0.7245          | 0.8      |
-| 0.259         | 13.0  | 21   | 0.7259          | 0.8      |
-| 0.259         | 14.0  | 23   | 0.7268          | 0.8      |
-| 0.259         | 15.0  | 25   | 0.7285          | 0.8      |
-| 0.259         | 16.0  | 26   | 0.7294          | 0.8      |
-| 0.259         | 17.0  | 27   | 0.7304          | 0.8      |
-| 0.259         | 18.0  | 29   | 0.7333          | 0.8      |
-| 0.2339        | 19.0  | 31   | 0.7356          | 0.8      |
-| 0.2339        | 20.0  | 32   | 0.7373          | 0.8      |
-| 0.2339        | 21.0  | 34   | 0.7414          | 0.8      |
-| 0.2339        | 22.0  | 36   | 0.7442          | 0.8      |
-| 0.2339        | 23.0  | 37   | 0.7454          | 0.8      |
-| 0.2339        | 24.0  | 39   | 0.7469          | 0.8      |
-| 0.1594        | 24.73 | 40   | 0.7472          | 0.8      |
 ### Framework versions

 license: apache-2.0
 base_model: distilbert/distilbert-base-uncased
 tags:
 - generated_from_trainer
 datasets:
 - hdfs_rlhf_log_summary_dataset
 model-index:
 - name: log_sage_reward_model
+  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the hdfs_rlhf_log_summary_dataset dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0005
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 1.41e-05
 - train_batch_size: 4
+- eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 40
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| No log        | 1.0   | 11   | 0.0022          |
+| No log        | 2.0   | 22   | 0.0049          |
+| No log        | 3.0   | 33   | 0.0006          |
+| No log        | 4.0   | 44   | 0.0006          |
+| No log        | 5.0   | 55   | 0.0008          |
+| No log        | 6.0   | 66   | 0.0003          |
+| No log        | 7.0   | 77   | 0.0005          |
+| No log        | 8.0   | 88   | 0.0010          |
+| No log        | 9.0   | 99   | 0.0008          |
+| No log        | 10.0  | 110  | 0.0007          |
+| No log        | 11.0  | 121  | 0.0007          |
+| No log        | 12.0  | 132  | 0.0006          |
+| No log        | 13.0  | 143  | 0.0006          |
+| No log        | 14.0  | 154  | 0.0004          |
+| No log        | 15.0  | 165  | 0.0007          |
+| No log        | 16.0  | 176  | 0.0007          |
+| No log        | 17.0  | 187  | 0.0006          |
+| No log        | 18.0  | 198  | 0.0004          |
+| No log        | 19.0  | 209  | 0.0005          |
+| No log        | 20.0  | 220  | 0.0006          |
+| No log        | 21.0  | 231  | 0.0006          |
+| No log        | 22.0  | 242  | 0.0006          |
+| No log        | 23.0  | 253  | 0.0009          |
+| No log        | 24.0  | 264  | 0.0006          |
+| No log        | 25.0  | 275  | 0.0007          |
+| No log        | 26.0  | 286  | 0.0005          |
+| No log        | 27.0  | 297  | 0.0005          |
+| No log        | 28.0  | 308  | 0.0004          |
+| No log        | 29.0  | 319  | 0.0004          |
+| No log        | 30.0  | 330  | 0.0005          |
+| No log        | 31.0  | 341  | 0.0005          |
+| No log        | 32.0  | 352  | 0.0005          |
+| No log        | 33.0  | 363  | 0.0005          |
+| No log        | 34.0  | 374  | 0.0004          |
+| No log        | 35.0  | 385  | 0.0004          |
+| No log        | 36.0  | 396  | 0.0005          |
+| No log        | 37.0  | 407  | 0.0005          |
+| No log        | 38.0  | 418  | 0.0005          |
+| No log        | 39.0  | 429  | 0.0005          |
+| No log        | 40.0  | 440  | 0.0005          |
 ### Framework versions

config.json CHANGED Viewed

@@ -20,6 +20,7 @@
   "n_heads": 12,
   "n_layers": 6,
   "pad_token_id": 0,
   "qa_dropout": 0.1,
   "seq_classif_dropout": 0.2,
   "sinusoidal_pos_embds": false,

   "n_heads": 12,
   "n_layers": 6,
   "pad_token_id": 0,
+  "problem_type": "regression",
   "qa_dropout": 0.1,
   "seq_classif_dropout": 0.2,
   "sinusoidal_pos_embds": false,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fd16d102d3bd0317100fa11c3a30af9375da5b27c584dec86620177cbc4db177
 size 267829484

 version https://git-lfs.github.com/spec/v1
+oid sha256:16869d0953d5b87a61040dee2caef1db68a03ed0adf7482b276d86381884e93c
 size 267829484

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:eca786d42752b9e3500b1efcaf37454a613b921da12e7e7d7739edd9f92291dc
-size 4984

 version https://git-lfs.github.com/spec/v1
+oid sha256:8a519663b9a6387514f11ccce00d19ac348e481362fef0e7f53e66f3b08db7db
+size 4920