Bonnief commited on
Commit
4eb9e1f
·
verified ·
1 Parent(s): cd068d9

End of training

Browse files
README.md CHANGED
@@ -3,6 +3,8 @@ library_name: transformers
3
  base_model: castorini/afriberta_small
4
  tags:
5
  - generated_from_trainer
 
 
6
  model-index:
7
  - name: afriberta-ti-finetuned
8
  results: []
@@ -14,6 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
14
  # afriberta-ti-finetuned
15
 
16
  This model is a fine-tuned version of [castorini/afriberta_small](https://huggingface.co/castorini/afriberta_small) on an unknown dataset.
 
 
 
17
 
18
  ## Model description
19
 
 
3
  base_model: castorini/afriberta_small
4
  tags:
5
  - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
  model-index:
9
  - name: afriberta-ti-finetuned
10
  results: []
 
16
  # afriberta-ti-finetuned
17
 
18
  This model is a fine-tuned version of [castorini/afriberta_small](https://huggingface.co/castorini/afriberta_small) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: nan
21
+ - Accuracy: 0.4430
22
 
23
  ## Model description
24
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.4121545049831776,
3
+ "eval_accuracy": 0.44298179472842897,
4
+ "eval_loss": NaN,
5
+ "eval_runtime": 641.3922,
6
+ "eval_samples": 141626,
7
+ "eval_samples_per_second": 220.81,
8
+ "eval_steps_per_second": 55.203,
9
+ "perplexity": NaN,
10
+ "total_flos": 1.1173396591784448e+16,
11
+ "train_loss": 3.454826737060547,
12
+ "train_runtime": 11942.9392,
13
+ "train_samples": 1133010,
14
+ "train_samples_per_second": 133.97,
15
+ "train_steps_per_second": 8.373
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.4121545049831776,
3
+ "eval_accuracy": 0.44298179472842897,
4
+ "eval_loss": NaN,
5
+ "eval_runtime": 641.3922,
6
+ "eval_samples": 141626,
7
+ "eval_samples_per_second": 220.81,
8
+ "eval_steps_per_second": 55.203,
9
+ "perplexity": NaN
10
+ }
runs/May05_14-44-01_41efdc5d42b0/events.out.tfevents.1746468922.41efdc5d42b0.95320.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e03a71e3362fc2ad537a8974b30a144c61a6874a67b933e8ccba1ebbd13cec95
3
+ size 417
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.4121545049831776,
3
+ "total_flos": 1.1173396591784448e+16,
4
+ "train_loss": 3.454826737060547,
5
+ "train_runtime": 11942.9392,
6
+ "train_samples": 1133010,
7
+ "train_samples_per_second": 133.97,
8
+ "train_steps_per_second": 8.373
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 1.4121545049831776,
6
+ "eval_steps": 10000,
7
+ "global_step": 100000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.007060825481106996,
14
+ "grad_norm": 17.72658920288086,
15
+ "learning_rate": 9.920000000000002e-06,
16
+ "loss": 4.3846,
17
+ "step": 500
18
+ },
19
+ {
20
+ "epoch": 0.014121650962213992,
21
+ "grad_norm": 20.74139976501465,
22
+ "learning_rate": 1.9920000000000002e-05,
23
+ "loss": 4.206,
24
+ "step": 1000
25
+ },
26
+ {
27
+ "epoch": 0.02118247644332099,
28
+ "grad_norm": 16.99892234802246,
29
+ "learning_rate": 1.989979797979798e-05,
30
+ "loss": 4.2092,
31
+ "step": 1500
32
+ },
33
+ {
34
+ "epoch": 0.028243301924427984,
35
+ "grad_norm": 25.406461715698242,
36
+ "learning_rate": 1.979878787878788e-05,
37
+ "loss": 4.0983,
38
+ "step": 2000
39
+ },
40
+ {
41
+ "epoch": 0.03530412740553498,
42
+ "grad_norm": 18.238832473754883,
43
+ "learning_rate": 1.969777777777778e-05,
44
+ "loss": 4.0354,
45
+ "step": 2500
46
+ },
47
+ {
48
+ "epoch": 0.04236495288664198,
49
+ "grad_norm": 16.974328994750977,
50
+ "learning_rate": 1.9596969696969698e-05,
51
+ "loss": 4.0445,
52
+ "step": 3000
53
+ },
54
+ {
55
+ "epoch": 0.049425778367748974,
56
+ "grad_norm": 15.910529136657715,
57
+ "learning_rate": 1.9495959595959597e-05,
58
+ "loss": 4.0171,
59
+ "step": 3500
60
+ },
61
+ {
62
+ "epoch": 0.05648660384885597,
63
+ "grad_norm": 14.3439359664917,
64
+ "learning_rate": 1.9394949494949497e-05,
65
+ "loss": 3.9674,
66
+ "step": 4000
67
+ },
68
+ {
69
+ "epoch": 0.06354742932996296,
70
+ "grad_norm": 23.51453399658203,
71
+ "learning_rate": 1.9293939393939396e-05,
72
+ "loss": 3.949,
73
+ "step": 4500
74
+ },
75
+ {
76
+ "epoch": 0.07060825481106996,
77
+ "grad_norm": 18.576505661010742,
78
+ "learning_rate": 1.9193131313131315e-05,
79
+ "loss": 3.9888,
80
+ "step": 5000
81
+ },
82
+ {
83
+ "epoch": 0.07766908029217696,
84
+ "grad_norm": 14.903977394104004,
85
+ "learning_rate": 1.909212121212121e-05,
86
+ "loss": 3.9265,
87
+ "step": 5500
88
+ },
89
+ {
90
+ "epoch": 0.08472990577328396,
91
+ "grad_norm": 16.43514633178711,
92
+ "learning_rate": 1.8991111111111114e-05,
93
+ "loss": 3.8921,
94
+ "step": 6000
95
+ },
96
+ {
97
+ "epoch": 0.09179073125439095,
98
+ "grad_norm": 23.28466796875,
99
+ "learning_rate": 1.889010101010101e-05,
100
+ "loss": 3.9371,
101
+ "step": 6500
102
+ },
103
+ {
104
+ "epoch": 0.09885155673549795,
105
+ "grad_norm": 20.507205963134766,
106
+ "learning_rate": 1.878909090909091e-05,
107
+ "loss": 3.8411,
108
+ "step": 7000
109
+ },
110
+ {
111
+ "epoch": 0.10591238221660494,
112
+ "grad_norm": 16.7827205657959,
113
+ "learning_rate": 1.868808080808081e-05,
114
+ "loss": 3.8232,
115
+ "step": 7500
116
+ },
117
+ {
118
+ "epoch": 0.11297320769771194,
119
+ "grad_norm": 15.237981796264648,
120
+ "learning_rate": 1.858707070707071e-05,
121
+ "loss": 3.8742,
122
+ "step": 8000
123
+ },
124
+ {
125
+ "epoch": 0.12003403317881893,
126
+ "grad_norm": 19.958890914916992,
127
+ "learning_rate": 1.8486262626262628e-05,
128
+ "loss": 3.8373,
129
+ "step": 8500
130
+ },
131
+ {
132
+ "epoch": 0.12709485865992592,
133
+ "grad_norm": 17.775405883789062,
134
+ "learning_rate": 1.8385252525252527e-05,
135
+ "loss": 3.7978,
136
+ "step": 9000
137
+ },
138
+ {
139
+ "epoch": 0.13415568414103293,
140
+ "grad_norm": 16.42775535583496,
141
+ "learning_rate": 1.8284242424242427e-05,
142
+ "loss": 3.8275,
143
+ "step": 9500
144
+ },
145
+ {
146
+ "epoch": 0.1412165096221399,
147
+ "grad_norm": 15.955525398254395,
148
+ "learning_rate": 1.8183232323232326e-05,
149
+ "loss": 3.7711,
150
+ "step": 10000
151
+ },
152
+ {
153
+ "epoch": 0.14827733510324692,
154
+ "grad_norm": 23.253620147705078,
155
+ "learning_rate": 1.8082222222222222e-05,
156
+ "loss": 3.7913,
157
+ "step": 10500
158
+ },
159
+ {
160
+ "epoch": 0.15533816058435393,
161
+ "grad_norm": 17.985246658325195,
162
+ "learning_rate": 1.7981212121212125e-05,
163
+ "loss": 3.8015,
164
+ "step": 11000
165
+ },
166
+ {
167
+ "epoch": 0.1623989860654609,
168
+ "grad_norm": 18.045351028442383,
169
+ "learning_rate": 1.788020202020202e-05,
170
+ "loss": 3.7623,
171
+ "step": 11500
172
+ },
173
+ {
174
+ "epoch": 0.16945981154656792,
175
+ "grad_norm": 14.922831535339355,
176
+ "learning_rate": 1.777919191919192e-05,
177
+ "loss": 3.7497,
178
+ "step": 12000
179
+ },
180
+ {
181
+ "epoch": 0.1765206370276749,
182
+ "grad_norm": 13.136314392089844,
183
+ "learning_rate": 1.767838383838384e-05,
184
+ "loss": 3.7319,
185
+ "step": 12500
186
+ },
187
+ {
188
+ "epoch": 0.1835814625087819,
189
+ "grad_norm": 18.23394775390625,
190
+ "learning_rate": 1.757737373737374e-05,
191
+ "loss": 3.7274,
192
+ "step": 13000
193
+ },
194
+ {
195
+ "epoch": 0.1906422879898889,
196
+ "grad_norm": 17.231151580810547,
197
+ "learning_rate": 1.747636363636364e-05,
198
+ "loss": 3.6995,
199
+ "step": 13500
200
+ },
201
+ {
202
+ "epoch": 0.1977031134709959,
203
+ "grad_norm": 20.387908935546875,
204
+ "learning_rate": 1.7375555555555558e-05,
205
+ "loss": 3.7243,
206
+ "step": 14000
207
+ },
208
+ {
209
+ "epoch": 0.2047639389521029,
210
+ "grad_norm": 18.140853881835938,
211
+ "learning_rate": 1.7274545454545454e-05,
212
+ "loss": 3.7182,
213
+ "step": 14500
214
+ },
215
+ {
216
+ "epoch": 0.21182476443320988,
217
+ "grad_norm": 13.923667907714844,
218
+ "learning_rate": 1.7173535353535357e-05,
219
+ "loss": 3.7216,
220
+ "step": 15000
221
+ },
222
+ {
223
+ "epoch": 0.2188855899143169,
224
+ "grad_norm": 19.91632843017578,
225
+ "learning_rate": 1.7072525252525253e-05,
226
+ "loss": 3.7033,
227
+ "step": 15500
228
+ },
229
+ {
230
+ "epoch": 0.22594641539542387,
231
+ "grad_norm": 15.506410598754883,
232
+ "learning_rate": 1.6971515151515152e-05,
233
+ "loss": 3.6777,
234
+ "step": 16000
235
+ },
236
+ {
237
+ "epoch": 0.23300724087653088,
238
+ "grad_norm": 16.471162796020508,
239
+ "learning_rate": 1.6870505050505052e-05,
240
+ "loss": 3.6762,
241
+ "step": 16500
242
+ },
243
+ {
244
+ "epoch": 0.24006806635763786,
245
+ "grad_norm": 13.96306324005127,
246
+ "learning_rate": 1.676949494949495e-05,
247
+ "loss": 3.6377,
248
+ "step": 17000
249
+ },
250
+ {
251
+ "epoch": 0.24712889183874487,
252
+ "grad_norm": 18.485990524291992,
253
+ "learning_rate": 1.666848484848485e-05,
254
+ "loss": 3.6504,
255
+ "step": 17500
256
+ },
257
+ {
258
+ "epoch": 0.25418971731985185,
259
+ "grad_norm": 14.49156665802002,
260
+ "learning_rate": 1.6567676767676767e-05,
261
+ "loss": 3.6563,
262
+ "step": 18000
263
+ },
264
+ {
265
+ "epoch": 0.2612505428009589,
266
+ "grad_norm": 25.981748580932617,
267
+ "learning_rate": 1.646666666666667e-05,
268
+ "loss": 3.6884,
269
+ "step": 18500
270
+ },
271
+ {
272
+ "epoch": 0.26831136828206587,
273
+ "grad_norm": 17.442602157592773,
274
+ "learning_rate": 1.6365656565656565e-05,
275
+ "loss": 3.6264,
276
+ "step": 19000
277
+ },
278
+ {
279
+ "epoch": 0.27537219376317285,
280
+ "grad_norm": 17.210800170898438,
281
+ "learning_rate": 1.6264646464646465e-05,
282
+ "loss": 3.6124,
283
+ "step": 19500
284
+ },
285
+ {
286
+ "epoch": 0.2824330192442798,
287
+ "grad_norm": 15.370673179626465,
288
+ "learning_rate": 1.6163838383838387e-05,
289
+ "loss": 3.6067,
290
+ "step": 20000
291
+ },
292
+ {
293
+ "epoch": 0.28949384472538686,
294
+ "grad_norm": 18.17014503479004,
295
+ "learning_rate": 1.6062828282828284e-05,
296
+ "loss": 3.5987,
297
+ "step": 20500
298
+ },
299
+ {
300
+ "epoch": 0.29655467020649384,
301
+ "grad_norm": 15.005146026611328,
302
+ "learning_rate": 1.5961818181818183e-05,
303
+ "loss": 3.6615,
304
+ "step": 21000
305
+ },
306
+ {
307
+ "epoch": 0.3036154956876008,
308
+ "grad_norm": 18.63738441467285,
309
+ "learning_rate": 1.5860808080808082e-05,
310
+ "loss": 3.5629,
311
+ "step": 21500
312
+ },
313
+ {
314
+ "epoch": 0.31067632116870786,
315
+ "grad_norm": 20.25809097290039,
316
+ "learning_rate": 1.576e-05,
317
+ "loss": 3.5566,
318
+ "step": 22000
319
+ },
320
+ {
321
+ "epoch": 0.31773714664981484,
322
+ "grad_norm": 15.953923225402832,
323
+ "learning_rate": 1.56589898989899e-05,
324
+ "loss": 3.574,
325
+ "step": 22500
326
+ },
327
+ {
328
+ "epoch": 0.3247979721309218,
329
+ "grad_norm": 16.332191467285156,
330
+ "learning_rate": 1.5557979797979797e-05,
331
+ "loss": 3.5761,
332
+ "step": 23000
333
+ },
334
+ {
335
+ "epoch": 0.3318587976120288,
336
+ "grad_norm": 12.492526054382324,
337
+ "learning_rate": 1.54569696969697e-05,
338
+ "loss": 3.573,
339
+ "step": 23500
340
+ },
341
+ {
342
+ "epoch": 0.33891962309313584,
343
+ "grad_norm": 15.54623031616211,
344
+ "learning_rate": 1.5355959595959596e-05,
345
+ "loss": 3.5607,
346
+ "step": 24000
347
+ },
348
+ {
349
+ "epoch": 0.3459804485742428,
350
+ "grad_norm": 20.771549224853516,
351
+ "learning_rate": 1.5255151515151517e-05,
352
+ "loss": 3.5742,
353
+ "step": 24500
354
+ },
355
+ {
356
+ "epoch": 0.3530412740553498,
357
+ "grad_norm": 13.80363655090332,
358
+ "learning_rate": 1.5154141414141415e-05,
359
+ "loss": 3.587,
360
+ "step": 25000
361
+ },
362
+ {
363
+ "epoch": 0.36010209953645683,
364
+ "grad_norm": 22.49805450439453,
365
+ "learning_rate": 1.5053131313131316e-05,
366
+ "loss": 3.5951,
367
+ "step": 25500
368
+ },
369
+ {
370
+ "epoch": 0.3671629250175638,
371
+ "grad_norm": 18.43703269958496,
372
+ "learning_rate": 1.4952121212121214e-05,
373
+ "loss": 3.549,
374
+ "step": 26000
375
+ },
376
+ {
377
+ "epoch": 0.3742237504986708,
378
+ "grad_norm": 26.025529861450195,
379
+ "learning_rate": 1.4851313131313133e-05,
380
+ "loss": 3.5285,
381
+ "step": 26500
382
+ },
383
+ {
384
+ "epoch": 0.3812845759797778,
385
+ "grad_norm": 17.635419845581055,
386
+ "learning_rate": 1.4750303030303032e-05,
387
+ "loss": 3.5189,
388
+ "step": 27000
389
+ },
390
+ {
391
+ "epoch": 0.3883454014608848,
392
+ "grad_norm": 16.968599319458008,
393
+ "learning_rate": 1.4649292929292932e-05,
394
+ "loss": 3.5697,
395
+ "step": 27500
396
+ },
397
+ {
398
+ "epoch": 0.3954062269419918,
399
+ "grad_norm": 19.92640495300293,
400
+ "learning_rate": 1.454828282828283e-05,
401
+ "loss": 3.5111,
402
+ "step": 28000
403
+ },
404
+ {
405
+ "epoch": 0.40246705242309877,
406
+ "grad_norm": 15.327055931091309,
407
+ "learning_rate": 1.4447676767676768e-05,
408
+ "loss": 3.5431,
409
+ "step": 28500
410
+ },
411
+ {
412
+ "epoch": 0.4095278779042058,
413
+ "grad_norm": 20.179370880126953,
414
+ "learning_rate": 1.434666666666667e-05,
415
+ "loss": 3.512,
416
+ "step": 29000
417
+ },
418
+ {
419
+ "epoch": 0.4165887033853128,
420
+ "grad_norm": 19.772951126098633,
421
+ "learning_rate": 1.4245656565656567e-05,
422
+ "loss": 3.5076,
423
+ "step": 29500
424
+ },
425
+ {
426
+ "epoch": 0.42364952886641977,
427
+ "grad_norm": 14.148645401000977,
428
+ "learning_rate": 1.4144646464646465e-05,
429
+ "loss": 3.5435,
430
+ "step": 30000
431
+ },
432
+ {
433
+ "epoch": 0.43071035434752675,
434
+ "grad_norm": 18.644636154174805,
435
+ "learning_rate": 1.4043636363636364e-05,
436
+ "loss": 3.4785,
437
+ "step": 30500
438
+ },
439
+ {
440
+ "epoch": 0.4377711798286338,
441
+ "grad_norm": 17.168704986572266,
442
+ "learning_rate": 1.3942828282828285e-05,
443
+ "loss": 3.4956,
444
+ "step": 31000
445
+ },
446
+ {
447
+ "epoch": 0.44483200530974076,
448
+ "grad_norm": 19.559444427490234,
449
+ "learning_rate": 1.3841818181818183e-05,
450
+ "loss": 3.4984,
451
+ "step": 31500
452
+ },
453
+ {
454
+ "epoch": 0.45189283079084774,
455
+ "grad_norm": 13.651810646057129,
456
+ "learning_rate": 1.374080808080808e-05,
457
+ "loss": 3.5154,
458
+ "step": 32000
459
+ },
460
+ {
461
+ "epoch": 0.4589536562719548,
462
+ "grad_norm": 14.946928977966309,
463
+ "learning_rate": 1.3639797979797982e-05,
464
+ "loss": 3.5427,
465
+ "step": 32500
466
+ },
467
+ {
468
+ "epoch": 0.46601448175306176,
469
+ "grad_norm": 17.958921432495117,
470
+ "learning_rate": 1.353878787878788e-05,
471
+ "loss": 3.4956,
472
+ "step": 33000
473
+ },
474
+ {
475
+ "epoch": 0.47307530723416874,
476
+ "grad_norm": 17.883209228515625,
477
+ "learning_rate": 1.343777777777778e-05,
478
+ "loss": 3.5101,
479
+ "step": 33500
480
+ },
481
+ {
482
+ "epoch": 0.4801361327152757,
483
+ "grad_norm": 14.099715232849121,
484
+ "learning_rate": 1.3336767676767677e-05,
485
+ "loss": 3.4633,
486
+ "step": 34000
487
+ },
488
+ {
489
+ "epoch": 0.48719695819638276,
490
+ "grad_norm": 18.380918502807617,
491
+ "learning_rate": 1.3235757575757578e-05,
492
+ "loss": 3.4568,
493
+ "step": 34500
494
+ },
495
+ {
496
+ "epoch": 0.49425778367748974,
497
+ "grad_norm": 15.116865158081055,
498
+ "learning_rate": 1.3134949494949496e-05,
499
+ "loss": 3.4707,
500
+ "step": 35000
501
+ },
502
+ {
503
+ "epoch": 0.5013186091585967,
504
+ "grad_norm": 15.808046340942383,
505
+ "learning_rate": 1.3033939393939395e-05,
506
+ "loss": 3.4796,
507
+ "step": 35500
508
+ },
509
+ {
510
+ "epoch": 0.5083794346397037,
511
+ "grad_norm": 13.205853462219238,
512
+ "learning_rate": 1.2932929292929294e-05,
513
+ "loss": 3.447,
514
+ "step": 36000
515
+ },
516
+ {
517
+ "epoch": 0.5154402601208107,
518
+ "grad_norm": 18.99825096130371,
519
+ "learning_rate": 1.2831919191919194e-05,
520
+ "loss": 3.462,
521
+ "step": 36500
522
+ },
523
+ {
524
+ "epoch": 0.5225010856019178,
525
+ "grad_norm": 15.703340530395508,
526
+ "learning_rate": 1.2731111111111111e-05,
527
+ "loss": 3.5031,
528
+ "step": 37000
529
+ },
530
+ {
531
+ "epoch": 0.5295619110830248,
532
+ "grad_norm": 21.517732620239258,
533
+ "learning_rate": 1.2630101010101011e-05,
534
+ "loss": 3.4986,
535
+ "step": 37500
536
+ },
537
+ {
538
+ "epoch": 0.5366227365641317,
539
+ "grad_norm": 16.335920333862305,
540
+ "learning_rate": 1.252909090909091e-05,
541
+ "loss": 3.4656,
542
+ "step": 38000
543
+ },
544
+ {
545
+ "epoch": 0.5436835620452387,
546
+ "grad_norm": 16.09105110168457,
547
+ "learning_rate": 1.242808080808081e-05,
548
+ "loss": 3.462,
549
+ "step": 38500
550
+ },
551
+ {
552
+ "epoch": 0.5507443875263457,
553
+ "grad_norm": 18.461313247680664,
554
+ "learning_rate": 1.2327272727272727e-05,
555
+ "loss": 3.4566,
556
+ "step": 39000
557
+ },
558
+ {
559
+ "epoch": 0.5578052130074527,
560
+ "grad_norm": 16.069425582885742,
561
+ "learning_rate": 1.2226464646464646e-05,
562
+ "loss": 3.4643,
563
+ "step": 39500
564
+ },
565
+ {
566
+ "epoch": 0.5648660384885597,
567
+ "grad_norm": 19.436756134033203,
568
+ "learning_rate": 1.2125454545454547e-05,
569
+ "loss": 3.4813,
570
+ "step": 40000
571
+ },
572
+ {
573
+ "epoch": 0.5719268639696667,
574
+ "grad_norm": 17.705595016479492,
575
+ "learning_rate": 1.2024444444444445e-05,
576
+ "loss": 3.4515,
577
+ "step": 40500
578
+ },
579
+ {
580
+ "epoch": 0.5789876894507737,
581
+ "grad_norm": 16.269590377807617,
582
+ "learning_rate": 1.1923434343434343e-05,
583
+ "loss": 3.4066,
584
+ "step": 41000
585
+ },
586
+ {
587
+ "epoch": 0.5860485149318807,
588
+ "grad_norm": 15.331748008728027,
589
+ "learning_rate": 1.1822626262626264e-05,
590
+ "loss": 3.4035,
591
+ "step": 41500
592
+ },
593
+ {
594
+ "epoch": 0.5931093404129877,
595
+ "grad_norm": 14.219168663024902,
596
+ "learning_rate": 1.1721616161616163e-05,
597
+ "loss": 3.4631,
598
+ "step": 42000
599
+ },
600
+ {
601
+ "epoch": 0.6001701658940947,
602
+ "grad_norm": 15.630831718444824,
603
+ "learning_rate": 1.1620606060606061e-05,
604
+ "loss": 3.385,
605
+ "step": 42500
606
+ },
607
+ {
608
+ "epoch": 0.6072309913752016,
609
+ "grad_norm": 15.31431770324707,
610
+ "learning_rate": 1.1519595959595959e-05,
611
+ "loss": 3.4145,
612
+ "step": 43000
613
+ },
614
+ {
615
+ "epoch": 0.6142918168563086,
616
+ "grad_norm": 22.632341384887695,
617
+ "learning_rate": 1.141858585858586e-05,
618
+ "loss": 3.428,
619
+ "step": 43500
620
+ },
621
+ {
622
+ "epoch": 0.6213526423374157,
623
+ "grad_norm": 18.16883659362793,
624
+ "learning_rate": 1.1317575757575758e-05,
625
+ "loss": 3.4082,
626
+ "step": 44000
627
+ },
628
+ {
629
+ "epoch": 0.6284134678185227,
630
+ "grad_norm": 13.323620796203613,
631
+ "learning_rate": 1.1216565656565657e-05,
632
+ "loss": 3.4373,
633
+ "step": 44500
634
+ },
635
+ {
636
+ "epoch": 0.6354742932996297,
637
+ "grad_norm": 15.68764591217041,
638
+ "learning_rate": 1.1115555555555557e-05,
639
+ "loss": 3.4315,
640
+ "step": 45000
641
+ },
642
+ {
643
+ "epoch": 0.6425351187807367,
644
+ "grad_norm": 18.161067962646484,
645
+ "learning_rate": 1.1014747474747476e-05,
646
+ "loss": 3.4019,
647
+ "step": 45500
648
+ },
649
+ {
650
+ "epoch": 0.6495959442618436,
651
+ "grad_norm": 16.415048599243164,
652
+ "learning_rate": 1.0913737373737374e-05,
653
+ "loss": 3.4094,
654
+ "step": 46000
655
+ },
656
+ {
657
+ "epoch": 0.6566567697429506,
658
+ "grad_norm": 15.824934959411621,
659
+ "learning_rate": 1.0812727272727273e-05,
660
+ "loss": 3.4408,
661
+ "step": 46500
662
+ },
663
+ {
664
+ "epoch": 0.6637175952240576,
665
+ "grad_norm": 14.806204795837402,
666
+ "learning_rate": 1.0711717171717173e-05,
667
+ "loss": 3.4002,
668
+ "step": 47000
669
+ },
670
+ {
671
+ "epoch": 0.6707784207051647,
672
+ "grad_norm": 22.690622329711914,
673
+ "learning_rate": 1.0610707070707072e-05,
674
+ "loss": 3.4085,
675
+ "step": 47500
676
+ },
677
+ {
678
+ "epoch": 0.6778392461862717,
679
+ "grad_norm": 13.887063026428223,
680
+ "learning_rate": 1.050969696969697e-05,
681
+ "loss": 3.3532,
682
+ "step": 48000
683
+ },
684
+ {
685
+ "epoch": 0.6849000716673787,
686
+ "grad_norm": 14.655476570129395,
687
+ "learning_rate": 1.0408686868686871e-05,
688
+ "loss": 3.3489,
689
+ "step": 48500
690
+ },
691
+ {
692
+ "epoch": 0.6919608971484856,
693
+ "grad_norm": 16.351926803588867,
694
+ "learning_rate": 1.0307676767676769e-05,
695
+ "loss": 3.3768,
696
+ "step": 49000
697
+ },
698
+ {
699
+ "epoch": 0.6990217226295926,
700
+ "grad_norm": 23.84049415588379,
701
+ "learning_rate": 1.0206868686868688e-05,
702
+ "loss": 3.3656,
703
+ "step": 49500
704
+ },
705
+ {
706
+ "epoch": 0.7060825481106996,
707
+ "grad_norm": 20.276376724243164,
708
+ "learning_rate": 1.0105858585858586e-05,
709
+ "loss": 3.3996,
710
+ "step": 50000
711
+ },
712
+ {
713
+ "epoch": 0.7131433735918066,
714
+ "grad_norm": 19.38592529296875,
715
+ "learning_rate": 1.0004848484848487e-05,
716
+ "loss": 3.4112,
717
+ "step": 50500
718
+ },
719
+ {
720
+ "epoch": 0.7202041990729137,
721
+ "grad_norm": 17.104467391967773,
722
+ "learning_rate": 9.903838383838385e-06,
723
+ "loss": 3.3521,
724
+ "step": 51000
725
+ },
726
+ {
727
+ "epoch": 0.7272650245540206,
728
+ "grad_norm": 18.661767959594727,
729
+ "learning_rate": 9.802828282828284e-06,
730
+ "loss": 3.4185,
731
+ "step": 51500
732
+ },
733
+ {
734
+ "epoch": 0.7343258500351276,
735
+ "grad_norm": 17.60913848876953,
736
+ "learning_rate": 9.702020202020203e-06,
737
+ "loss": 3.3682,
738
+ "step": 52000
739
+ },
740
+ {
741
+ "epoch": 0.7413866755162346,
742
+ "grad_norm": 22.111852645874023,
743
+ "learning_rate": 9.601010101010103e-06,
744
+ "loss": 3.334,
745
+ "step": 52500
746
+ },
747
+ {
748
+ "epoch": 0.7484475009973416,
749
+ "grad_norm": 12.673060417175293,
750
+ "learning_rate": 9.5e-06,
751
+ "loss": 3.3648,
752
+ "step": 53000
753
+ },
754
+ {
755
+ "epoch": 0.7555083264784486,
756
+ "grad_norm": 15.114596366882324,
757
+ "learning_rate": 9.3989898989899e-06,
758
+ "loss": 3.3536,
759
+ "step": 53500
760
+ },
761
+ {
762
+ "epoch": 0.7625691519595555,
763
+ "grad_norm": 15.487835884094238,
764
+ "learning_rate": 9.298181818181819e-06,
765
+ "loss": 3.3503,
766
+ "step": 54000
767
+ },
768
+ {
769
+ "epoch": 0.7696299774406626,
770
+ "grad_norm": 18.261688232421875,
771
+ "learning_rate": 9.197171717171719e-06,
772
+ "loss": 3.3773,
773
+ "step": 54500
774
+ },
775
+ {
776
+ "epoch": 0.7766908029217696,
777
+ "grad_norm": 23.699548721313477,
778
+ "learning_rate": 9.096161616161618e-06,
779
+ "loss": 3.3165,
780
+ "step": 55000
781
+ },
782
+ {
783
+ "epoch": 0.7837516284028766,
784
+ "grad_norm": 15.470009803771973,
785
+ "learning_rate": 8.995151515151516e-06,
786
+ "loss": 3.3609,
787
+ "step": 55500
788
+ },
789
+ {
790
+ "epoch": 0.7908124538839836,
791
+ "grad_norm": 16.061735153198242,
792
+ "learning_rate": 8.894141414141415e-06,
793
+ "loss": 3.3722,
794
+ "step": 56000
795
+ },
796
+ {
797
+ "epoch": 0.7978732793650906,
798
+ "grad_norm": 18.046903610229492,
799
+ "learning_rate": 8.793333333333334e-06,
800
+ "loss": 3.3175,
801
+ "step": 56500
802
+ },
803
+ {
804
+ "epoch": 0.8049341048461975,
805
+ "grad_norm": 15.547821998596191,
806
+ "learning_rate": 8.692323232323234e-06,
807
+ "loss": 3.3002,
808
+ "step": 57000
809
+ },
810
+ {
811
+ "epoch": 0.8119949303273045,
812
+ "grad_norm": 15.785500526428223,
813
+ "learning_rate": 8.591313131313132e-06,
814
+ "loss": 3.3243,
815
+ "step": 57500
816
+ },
817
+ {
818
+ "epoch": 0.8190557558084116,
819
+ "grad_norm": 19.183290481567383,
820
+ "learning_rate": 8.490303030303031e-06,
821
+ "loss": 3.3658,
822
+ "step": 58000
823
+ },
824
+ {
825
+ "epoch": 0.8261165812895186,
826
+ "grad_norm": 17.534652709960938,
827
+ "learning_rate": 8.38969696969697e-06,
828
+ "loss": 3.3541,
829
+ "step": 58500
830
+ },
831
+ {
832
+ "epoch": 0.8331774067706256,
833
+ "grad_norm": 14.905038833618164,
834
+ "learning_rate": 8.28868686868687e-06,
835
+ "loss": 3.3344,
836
+ "step": 59000
837
+ },
838
+ {
839
+ "epoch": 0.8402382322517326,
840
+ "grad_norm": 18.182802200317383,
841
+ "learning_rate": 8.187676767676769e-06,
842
+ "loss": 3.3364,
843
+ "step": 59500
844
+ },
845
+ {
846
+ "epoch": 0.8472990577328395,
847
+ "grad_norm": 18.967235565185547,
848
+ "learning_rate": 8.086666666666667e-06,
849
+ "loss": 3.3446,
850
+ "step": 60000
851
+ },
852
+ {
853
+ "epoch": 0.8543598832139465,
854
+ "grad_norm": 18.88031768798828,
855
+ "learning_rate": 7.985656565656566e-06,
856
+ "loss": 3.3677,
857
+ "step": 60500
858
+ },
859
+ {
860
+ "epoch": 0.8614207086950535,
861
+ "grad_norm": 15.02967643737793,
862
+ "learning_rate": 7.884646464646466e-06,
863
+ "loss": 3.3527,
864
+ "step": 61000
865
+ },
866
+ {
867
+ "epoch": 0.8684815341761606,
868
+ "grad_norm": 13.849907875061035,
869
+ "learning_rate": 7.783636363636365e-06,
870
+ "loss": 3.3243,
871
+ "step": 61500
872
+ },
873
+ {
874
+ "epoch": 0.8755423596572676,
875
+ "grad_norm": 15.146381378173828,
876
+ "learning_rate": 7.682828282828282e-06,
877
+ "loss": 3.3019,
878
+ "step": 62000
879
+ },
880
+ {
881
+ "epoch": 0.8826031851383745,
882
+ "grad_norm": 20.377721786499023,
883
+ "learning_rate": 7.581818181818183e-06,
884
+ "loss": 3.3191,
885
+ "step": 62500
886
+ },
887
+ {
888
+ "epoch": 0.8896640106194815,
889
+ "grad_norm": 17.49558448791504,
890
+ "learning_rate": 7.480808080808082e-06,
891
+ "loss": 3.3441,
892
+ "step": 63000
893
+ },
894
+ {
895
+ "epoch": 0.8967248361005885,
896
+ "grad_norm": 16.852956771850586,
897
+ "learning_rate": 7.37979797979798e-06,
898
+ "loss": 3.2858,
899
+ "step": 63500
900
+ },
901
+ {
902
+ "epoch": 0.9037856615816955,
903
+ "grad_norm": 23.284814834594727,
904
+ "learning_rate": 7.2787878787878795e-06,
905
+ "loss": 3.2968,
906
+ "step": 64000
907
+ },
908
+ {
909
+ "epoch": 0.9108464870628025,
910
+ "grad_norm": 18.788135528564453,
911
+ "learning_rate": 7.177777777777778e-06,
912
+ "loss": 3.3394,
913
+ "step": 64500
914
+ },
915
+ {
916
+ "epoch": 0.9179073125439096,
917
+ "grad_norm": 15.641369819641113,
918
+ "learning_rate": 7.076767676767678e-06,
919
+ "loss": 3.3051,
920
+ "step": 65000
921
+ },
922
+ {
923
+ "epoch": 0.9249681380250165,
924
+ "grad_norm": 37.24586486816406,
925
+ "learning_rate": 6.975757575757577e-06,
926
+ "loss": 3.3256,
927
+ "step": 65500
928
+ },
929
+ {
930
+ "epoch": 0.9320289635061235,
931
+ "grad_norm": 20.65641975402832,
932
+ "learning_rate": 6.874747474747475e-06,
933
+ "loss": 3.3239,
934
+ "step": 66000
935
+ },
936
+ {
937
+ "epoch": 0.9390897889872305,
938
+ "grad_norm": 16.292463302612305,
939
+ "learning_rate": 6.773939393939395e-06,
940
+ "loss": 3.2797,
941
+ "step": 66500
942
+ },
943
+ {
944
+ "epoch": 0.9461506144683375,
945
+ "grad_norm": 14.888529777526855,
946
+ "learning_rate": 6.673131313131314e-06,
947
+ "loss": 3.2913,
948
+ "step": 67000
949
+ },
950
+ {
951
+ "epoch": 0.9532114399494445,
952
+ "grad_norm": 17.32676887512207,
953
+ "learning_rate": 6.572121212121213e-06,
954
+ "loss": 3.2906,
955
+ "step": 67500
956
+ },
957
+ {
958
+ "epoch": 0.9602722654305514,
959
+ "grad_norm": 21.099210739135742,
960
+ "learning_rate": 6.471111111111111e-06,
961
+ "loss": 3.3143,
962
+ "step": 68000
963
+ },
964
+ {
965
+ "epoch": 0.9673330909116585,
966
+ "grad_norm": 20.450485229492188,
967
+ "learning_rate": 6.370101010101011e-06,
968
+ "loss": 3.2968,
969
+ "step": 68500
970
+ },
971
+ {
972
+ "epoch": 0.9743939163927655,
973
+ "grad_norm": 17.318950653076172,
974
+ "learning_rate": 6.269090909090909e-06,
975
+ "loss": 3.2946,
976
+ "step": 69000
977
+ },
978
+ {
979
+ "epoch": 0.9814547418738725,
980
+ "grad_norm": 18.73086929321289,
981
+ "learning_rate": 6.168080808080809e-06,
982
+ "loss": 3.303,
983
+ "step": 69500
984
+ },
985
+ {
986
+ "epoch": 0.9885155673549795,
987
+ "grad_norm": 18.808128356933594,
988
+ "learning_rate": 6.067070707070708e-06,
989
+ "loss": 3.2916,
990
+ "step": 70000
991
+ },
992
+ {
993
+ "epoch": 0.9955763928360865,
994
+ "grad_norm": 19.83098793029785,
995
+ "learning_rate": 5.966060606060606e-06,
996
+ "loss": 3.2576,
997
+ "step": 70500
998
+ },
999
+ {
1000
+ "epoch": 1.0026266270789719,
1001
+ "grad_norm": 22.168275833129883,
1002
+ "learning_rate": 5.865252525252526e-06,
1003
+ "loss": 3.3076,
1004
+ "step": 71000
1005
+ },
1006
+ {
1007
+ "epoch": 1.0096874525600787,
1008
+ "grad_norm": 14.332313537597656,
1009
+ "learning_rate": 5.764242424242425e-06,
1010
+ "loss": 3.2915,
1011
+ "step": 71500
1012
+ },
1013
+ {
1014
+ "epoch": 1.0167482780411858,
1015
+ "grad_norm": 16.781579971313477,
1016
+ "learning_rate": 5.663232323232324e-06,
1017
+ "loss": 3.2836,
1018
+ "step": 72000
1019
+ },
1020
+ {
1021
+ "epoch": 1.0238091035222927,
1022
+ "grad_norm": 18.176483154296875,
1023
+ "learning_rate": 5.562222222222222e-06,
1024
+ "loss": 3.2945,
1025
+ "step": 72500
1026
+ },
1027
+ {
1028
+ "epoch": 1.0308699290033998,
1029
+ "grad_norm": 16.951435089111328,
1030
+ "learning_rate": 5.461414141414142e-06,
1031
+ "loss": 3.3042,
1032
+ "step": 73000
1033
+ },
1034
+ {
1035
+ "epoch": 1.037930754484507,
1036
+ "grad_norm": 17.18007469177246,
1037
+ "learning_rate": 5.360404040404041e-06,
1038
+ "loss": 3.2748,
1039
+ "step": 73500
1040
+ },
1041
+ {
1042
+ "epoch": 1.0449915799656138,
1043
+ "grad_norm": 15.710241317749023,
1044
+ "learning_rate": 5.25939393939394e-06,
1045
+ "loss": 3.2725,
1046
+ "step": 74000
1047
+ },
1048
+ {
1049
+ "epoch": 1.0520524054467209,
1050
+ "grad_norm": 16.880611419677734,
1051
+ "learning_rate": 5.15858585858586e-06,
1052
+ "loss": 3.2747,
1053
+ "step": 74500
1054
+ },
1055
+ {
1056
+ "epoch": 1.0591132309278277,
1057
+ "grad_norm": 14.415606498718262,
1058
+ "learning_rate": 5.057575757575758e-06,
1059
+ "loss": 3.2699,
1060
+ "step": 75000
1061
+ },
1062
+ {
1063
+ "epoch": 1.0661740564089348,
1064
+ "grad_norm": 16.14620590209961,
1065
+ "learning_rate": 4.956565656565657e-06,
1066
+ "loss": 3.266,
1067
+ "step": 75500
1068
+ },
1069
+ {
1070
+ "epoch": 1.0732348818900417,
1071
+ "grad_norm": 15.977879524230957,
1072
+ "learning_rate": 4.855555555555556e-06,
1073
+ "loss": 3.2763,
1074
+ "step": 76000
1075
+ },
1076
+ {
1077
+ "epoch": 1.0802957073711488,
1078
+ "grad_norm": 21.809913635253906,
1079
+ "learning_rate": 4.754545454545455e-06,
1080
+ "loss": 3.3093,
1081
+ "step": 76500
1082
+ },
1083
+ {
1084
+ "epoch": 1.0873565328522559,
1085
+ "grad_norm": 19.541011810302734,
1086
+ "learning_rate": 4.653535353535354e-06,
1087
+ "loss": 3.2863,
1088
+ "step": 77000
1089
+ },
1090
+ {
1091
+ "epoch": 1.0944173583333627,
1092
+ "grad_norm": 20.050395965576172,
1093
+ "learning_rate": 4.5525252525252525e-06,
1094
+ "loss": 3.2623,
1095
+ "step": 77500
1096
+ },
1097
+ {
1098
+ "epoch": 1.1014781838144698,
1099
+ "grad_norm": 15.69351577758789,
1100
+ "learning_rate": 4.451515151515152e-06,
1101
+ "loss": 3.2633,
1102
+ "step": 78000
1103
+ },
1104
+ {
1105
+ "epoch": 1.1085390092955767,
1106
+ "grad_norm": 17.333478927612305,
1107
+ "learning_rate": 4.350707070707071e-06,
1108
+ "loss": 3.2609,
1109
+ "step": 78500
1110
+ },
1111
+ {
1112
+ "epoch": 1.1155998347766838,
1113
+ "grad_norm": 21.0096435546875,
1114
+ "learning_rate": 4.24989898989899e-06,
1115
+ "loss": 3.2834,
1116
+ "step": 79000
1117
+ },
1118
+ {
1119
+ "epoch": 1.1226606602577907,
1120
+ "grad_norm": 18.113319396972656,
1121
+ "learning_rate": 4.148888888888889e-06,
1122
+ "loss": 3.2634,
1123
+ "step": 79500
1124
+ },
1125
+ {
1126
+ "epoch": 1.1297214857388977,
1127
+ "grad_norm": 11.683279037475586,
1128
+ "learning_rate": 4.047878787878788e-06,
1129
+ "loss": 3.2916,
1130
+ "step": 80000
1131
+ },
1132
+ {
1133
+ "epoch": 1.1367823112200046,
1134
+ "grad_norm": 16.4996280670166,
1135
+ "learning_rate": 3.946868686868687e-06,
1136
+ "loss": 3.2395,
1137
+ "step": 80500
1138
+ },
1139
+ {
1140
+ "epoch": 1.1438431367011117,
1141
+ "grad_norm": 16.545175552368164,
1142
+ "learning_rate": 3.845858585858586e-06,
1143
+ "loss": 3.2636,
1144
+ "step": 81000
1145
+ },
1146
+ {
1147
+ "epoch": 1.1509039621822188,
1148
+ "grad_norm": 18.139225006103516,
1149
+ "learning_rate": 3.744848484848485e-06,
1150
+ "loss": 3.2226,
1151
+ "step": 81500
1152
+ },
1153
+ {
1154
+ "epoch": 1.1579647876633257,
1155
+ "grad_norm": 18.051326751708984,
1156
+ "learning_rate": 3.643838383838384e-06,
1157
+ "loss": 3.2456,
1158
+ "step": 82000
1159
+ },
1160
+ {
1161
+ "epoch": 1.1650256131444328,
1162
+ "grad_norm": 13.863969802856445,
1163
+ "learning_rate": 3.543030303030303e-06,
1164
+ "loss": 3.2267,
1165
+ "step": 82500
1166
+ },
1167
+ {
1168
+ "epoch": 1.1720864386255396,
1169
+ "grad_norm": 18.651126861572266,
1170
+ "learning_rate": 3.442020202020202e-06,
1171
+ "loss": 3.2624,
1172
+ "step": 83000
1173
+ },
1174
+ {
1175
+ "epoch": 1.1791472641066467,
1176
+ "grad_norm": 20.253204345703125,
1177
+ "learning_rate": 3.3410101010101017e-06,
1178
+ "loss": 3.2335,
1179
+ "step": 83500
1180
+ },
1181
+ {
1182
+ "epoch": 1.1862080895877538,
1183
+ "grad_norm": 15.840412139892578,
1184
+ "learning_rate": 3.2400000000000003e-06,
1185
+ "loss": 3.2268,
1186
+ "step": 84000
1187
+ },
1188
+ {
1189
+ "epoch": 1.1932689150688607,
1190
+ "grad_norm": 16.637964248657227,
1191
+ "learning_rate": 3.1389898989898994e-06,
1192
+ "loss": 3.2617,
1193
+ "step": 84500
1194
+ },
1195
+ {
1196
+ "epoch": 1.2003297405499678,
1197
+ "grad_norm": 13.255701065063477,
1198
+ "learning_rate": 3.037979797979798e-06,
1199
+ "loss": 3.2882,
1200
+ "step": 85000
1201
+ },
1202
+ {
1203
+ "epoch": 1.2073905660310746,
1204
+ "grad_norm": 18.700159072875977,
1205
+ "learning_rate": 2.936969696969697e-06,
1206
+ "loss": 3.245,
1207
+ "step": 85500
1208
+ },
1209
+ {
1210
+ "epoch": 1.2144513915121817,
1211
+ "grad_norm": 15.274190902709961,
1212
+ "learning_rate": 2.8359595959595965e-06,
1213
+ "loss": 3.279,
1214
+ "step": 86000
1215
+ },
1216
+ {
1217
+ "epoch": 1.2215122169932886,
1218
+ "grad_norm": 16.90398597717285,
1219
+ "learning_rate": 2.735151515151515e-06,
1220
+ "loss": 3.221,
1221
+ "step": 86500
1222
+ },
1223
+ {
1224
+ "epoch": 1.2285730424743957,
1225
+ "grad_norm": 15.663399696350098,
1226
+ "learning_rate": 2.6343434343434343e-06,
1227
+ "loss": 3.2685,
1228
+ "step": 87000
1229
+ },
1230
+ {
1231
+ "epoch": 1.2356338679555026,
1232
+ "grad_norm": 15.966713905334473,
1233
+ "learning_rate": 2.5333333333333338e-06,
1234
+ "loss": 3.2385,
1235
+ "step": 87500
1236
+ },
1237
+ {
1238
+ "epoch": 1.2426946934366097,
1239
+ "grad_norm": 18.072582244873047,
1240
+ "learning_rate": 2.432323232323233e-06,
1241
+ "loss": 3.2613,
1242
+ "step": 88000
1243
+ },
1244
+ {
1245
+ "epoch": 1.2497555189177167,
1246
+ "grad_norm": 13.807425498962402,
1247
+ "learning_rate": 2.3313131313131315e-06,
1248
+ "loss": 3.2489,
1249
+ "step": 88500
1250
+ },
1251
+ {
1252
+ "epoch": 1.2568163443988236,
1253
+ "grad_norm": 22.68337059020996,
1254
+ "learning_rate": 2.2303030303030305e-06,
1255
+ "loss": 3.2249,
1256
+ "step": 89000
1257
+ },
1258
+ {
1259
+ "epoch": 1.2638771698799307,
1260
+ "grad_norm": 17.15445327758789,
1261
+ "learning_rate": 2.1292929292929296e-06,
1262
+ "loss": 3.255,
1263
+ "step": 89500
1264
+ },
1265
+ {
1266
+ "epoch": 1.2709379953610376,
1267
+ "grad_norm": 21.880109786987305,
1268
+ "learning_rate": 2.0282828282828286e-06,
1269
+ "loss": 3.2334,
1270
+ "step": 90000
1271
+ },
1272
+ {
1273
+ "epoch": 1.2779988208421447,
1274
+ "grad_norm": 19.8821964263916,
1275
+ "learning_rate": 1.9272727272727273e-06,
1276
+ "loss": 3.2677,
1277
+ "step": 90500
1278
+ },
1279
+ {
1280
+ "epoch": 1.2850596463232518,
1281
+ "grad_norm": 16.566150665283203,
1282
+ "learning_rate": 1.8264646464646466e-06,
1283
+ "loss": 3.2446,
1284
+ "step": 91000
1285
+ },
1286
+ {
1287
+ "epoch": 1.2921204718043586,
1288
+ "grad_norm": 18.1463680267334,
1289
+ "learning_rate": 1.7254545454545456e-06,
1290
+ "loss": 3.2346,
1291
+ "step": 91500
1292
+ },
1293
+ {
1294
+ "epoch": 1.2991812972854657,
1295
+ "grad_norm": 15.523479461669922,
1296
+ "learning_rate": 1.6244444444444447e-06,
1297
+ "loss": 3.261,
1298
+ "step": 92000
1299
+ },
1300
+ {
1301
+ "epoch": 1.3062421227665726,
1302
+ "grad_norm": 16.982173919677734,
1303
+ "learning_rate": 1.5234343434343435e-06,
1304
+ "loss": 3.253,
1305
+ "step": 92500
1306
+ },
1307
+ {
1308
+ "epoch": 1.3133029482476797,
1309
+ "grad_norm": 14.034078598022461,
1310
+ "learning_rate": 1.4226262626262626e-06,
1311
+ "loss": 3.2074,
1312
+ "step": 93000
1313
+ },
1314
+ {
1315
+ "epoch": 1.3203637737287865,
1316
+ "grad_norm": 20.547330856323242,
1317
+ "learning_rate": 1.3216161616161619e-06,
1318
+ "loss": 3.2563,
1319
+ "step": 93500
1320
+ },
1321
+ {
1322
+ "epoch": 1.3274245992098936,
1323
+ "grad_norm": 16.003023147583008,
1324
+ "learning_rate": 1.2206060606060607e-06,
1325
+ "loss": 3.2645,
1326
+ "step": 94000
1327
+ },
1328
+ {
1329
+ "epoch": 1.3344854246910005,
1330
+ "grad_norm": 22.99709701538086,
1331
+ "learning_rate": 1.1195959595959596e-06,
1332
+ "loss": 3.2235,
1333
+ "step": 94500
1334
+ },
1335
+ {
1336
+ "epoch": 1.3415462501721076,
1337
+ "grad_norm": 15.624763488769531,
1338
+ "learning_rate": 1.0187878787878789e-06,
1339
+ "loss": 3.2188,
1340
+ "step": 95000
1341
+ },
1342
+ {
1343
+ "epoch": 1.3486070756532147,
1344
+ "grad_norm": 13.590816497802734,
1345
+ "learning_rate": 9.177777777777778e-07,
1346
+ "loss": 3.2178,
1347
+ "step": 95500
1348
+ },
1349
+ {
1350
+ "epoch": 1.3556679011343216,
1351
+ "grad_norm": 16.545881271362305,
1352
+ "learning_rate": 8.167676767676769e-07,
1353
+ "loss": 3.2647,
1354
+ "step": 96000
1355
+ },
1356
+ {
1357
+ "epoch": 1.3627287266154287,
1358
+ "grad_norm": 16.860748291015625,
1359
+ "learning_rate": 7.157575757575757e-07,
1360
+ "loss": 3.271,
1361
+ "step": 96500
1362
+ },
1363
+ {
1364
+ "epoch": 1.3697895520965355,
1365
+ "grad_norm": 19.71615219116211,
1366
+ "learning_rate": 6.14949494949495e-07,
1367
+ "loss": 3.2211,
1368
+ "step": 97000
1369
+ },
1370
+ {
1371
+ "epoch": 1.3768503775776426,
1372
+ "grad_norm": 14.094860076904297,
1373
+ "learning_rate": 5.13939393939394e-07,
1374
+ "loss": 3.2333,
1375
+ "step": 97500
1376
+ },
1377
+ {
1378
+ "epoch": 1.3839112030587497,
1379
+ "grad_norm": 16.053531646728516,
1380
+ "learning_rate": 4.131313131313132e-07,
1381
+ "loss": 3.1983,
1382
+ "step": 98000
1383
+ },
1384
+ {
1385
+ "epoch": 1.3909720285398566,
1386
+ "grad_norm": 19.86145782470703,
1387
+ "learning_rate": 3.1212121212121213e-07,
1388
+ "loss": 3.2367,
1389
+ "step": 98500
1390
+ },
1391
+ {
1392
+ "epoch": 1.3980328540209637,
1393
+ "grad_norm": 16.2021427154541,
1394
+ "learning_rate": 2.1111111111111113e-07,
1395
+ "loss": 3.2032,
1396
+ "step": 99000
1397
+ },
1398
+ {
1399
+ "epoch": 1.4050936795020705,
1400
+ "grad_norm": 16.72536849975586,
1401
+ "learning_rate": 1.101010101010101e-07,
1402
+ "loss": 3.2313,
1403
+ "step": 99500
1404
+ },
1405
+ {
1406
+ "epoch": 1.4121545049831776,
1407
+ "grad_norm": 17.82211685180664,
1408
+ "learning_rate": 9.090909090909092e-09,
1409
+ "loss": 3.2355,
1410
+ "step": 100000
1411
+ },
1412
+ {
1413
+ "epoch": 1.4121545049831776,
1414
+ "step": 100000,
1415
+ "total_flos": 1.1173396591784448e+16,
1416
+ "train_loss": 3.454826737060547,
1417
+ "train_runtime": 11942.9392,
1418
+ "train_samples_per_second": 133.97,
1419
+ "train_steps_per_second": 8.373
1420
+ }
1421
+ ],
1422
+ "logging_steps": 500,
1423
+ "max_steps": 100000,
1424
+ "num_input_tokens_seen": 0,
1425
+ "num_train_epochs": 2,
1426
+ "save_steps": 10000,
1427
+ "stateful_callbacks": {
1428
+ "TrainerControl": {
1429
+ "args": {
1430
+ "should_epoch_stop": false,
1431
+ "should_evaluate": false,
1432
+ "should_log": false,
1433
+ "should_save": true,
1434
+ "should_training_stop": true
1435
+ },
1436
+ "attributes": {}
1437
+ }
1438
+ },
1439
+ "total_flos": 1.1173396591784448e+16,
1440
+ "train_batch_size": 4,
1441
+ "trial_name": null,
1442
+ "trial_params": null
1443
+ }