oneryalcin commited on
Commit
a107131
·
verified ·
1 Parent(s): 7165fdb

Upload 2-epoch fine-tuned sparse encoder for financial documents

Browse files
README.md ADDED
@@ -0,0 +1,1060 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - asymmetric
10
+ - inference-free
11
+ - splade
12
+ - generated_from_trainer
13
+ - dataset_size:18247
14
+ - loss:SpladeLoss
15
+ - loss:SparseMultipleNegativesRankingLoss
16
+ - loss:FlopsLoss
17
+ base_model: opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
18
+ widget:
19
+ - text: '```markdown
20
+
21
+ Interval UK Holdings Limited
22
+
23
+ Balance sheet
24
+
25
+ As at 31 December 2024
26
+
27
+
28
+ || Note | 2024 £ | 2023 £ |
29
+
30
+ |-----------------------|------|--------|--------|
31
+
32
+ | Non current assets||||
33
+
34
+ | Investments| 9| -| -|
35
+
36
+ | Current assets||||
37
+
38
+ | Cash at bank and in hand || 115,921| 111,205|
39
+
40
+ ||| 115,921| 111,205|
41
+
42
+ | Creditors: amounts falling due within one year | 10 | (100,017)| (100,344)|
43
+
44
+ | Net current assets|| 15,904 | 10,861 |
45
+
46
+ | Creditors: amounts falling due after more than one year | 11 | (430,000)|
47
+ (430,000)|
48
+
49
+ | Net liabilities|| (414,096)| (419,139)|
50
+
51
+ | Capital and reserves ||||
52
+
53
+ | Called-up share capital | 12 | 19,811,905| 19,811,905|
54
+
55
+ | Profit and loss account || (20,226,001)| (20,231,044)|
56
+
57
+ | Shareholder''s deficit || (414,096)| (419,139)|
58
+
59
+
60
+ The financial statements of Interval UK Holdings Ltd (03000895) were approved
61
+ by the Board on 25th September 2025
62
+
63
+ and signed on its behalf by:
64
+
65
+
66
+ D
67
+
68
+ DP Ettridge
69
+
70
+ Director
71
+
72
+
73
+ 10
74
+
75
+ ```'
76
+ - text: '25 | Page
77
+
78
+
79
+ 2023 (£) 2022 (£) Raw materials and consumables 62,693 50,917 Finished goods and
80
+ goods for resale 61,306 110,746 Total 123,999 161,663
81
+
82
+
83
+ An impairment loss of £36,779 (2022: gain of £33,801) was recognised in cost of
84
+ sales against stock during the year due to revaluation criteria. The value of
85
+ stock at year end is not materially different to its replacement cost.
86
+
87
+
88
+ Stock
89
+
90
+
91
+ Of the amounts owed by group undertakings: £ 398,051 (2022: £1,972,482) are trade
92
+ related, unsecured and have an interest charge of 0%. £ 3,900,000 (2022: £1,700,000)
93
+ are a deposit, unsecured and have an interest charge of 5.22%.
94
+
95
+
96
+ Stock recognised in cost of sales during the year as an expense was £241,612 (2022:
97
+ £258,218).
98
+
99
+
100
+ Cash at bank and in hand
101
+
102
+
103
+ 2023 (£) 2022 (£) Trade debtors 544,252 739,053 Amounts owed by group undertakings
104
+ 4,298,051 3,672,482 Other debtors 75,183 3,866 Corporation Tax receivable 38,958
105
+ - Prepayments and accrued income 45,623 44,125 Total 5,002,067 4,459,526
106
+
107
+
108
+ Trade debtors are stated after provisions for impairment of £217,545 (2022: £220,257)
109
+
110
+
111
+ 2023 (£) 2022 (£) Cash at bank and in hand 657,275 545,542
112
+
113
+
114
+ Debtors'
115
+ - text: 'Page 23
116
+
117
+
118
+ Nominal value: 2022 (€) 2021 (€) £1 120,924,800 120,924,800
119
+
120
+
121
+ Employee benefit obligations
122
+
123
+
124
+ Notes to the Financial Statements - continued for the year ended 31 December 2022
125
+
126
+
127
+ The Scheme closed to the future accrual of benefits on 31 August 2015 and the
128
+ 4 members who were active at the closure date were granted deferred benefits in
129
+ the Scheme. Since 1 April 2003, the Scheme has provided benefits on a Career Average
130
+ Revalued Earnings ("CARE") basis with Pensionable Earnings revalued each year
131
+ in line with the increase in Retail Price Index ("RPI") inflation. This method
132
+ of revaluation is broadly consistent with the way that deferred pensions increase
133
+ before retirement and therefore the closure of future accrual does not have a
134
+ material impact on the value of members'' accrued benefits.
135
+
136
+
137
+ Called up share capital
138
+
139
+
140
+ Defined benefit pension plans 2022 (€''000) Interest cost 1,674 Interest income
141
+ (1,697) Actual return on plan assets (23) 31,933
142
+
143
+
144
+ Defined benefit pension plans 2022 (€''000) Present value of funded obligations
145
+ (53,460) Fair value of plan assets 51,628 (Deficit)/surplus (1,832) Net (liability)/surplus
146
+ (1,832)
147
+
148
+
149
+ The amounts recognised in profit or loss are as follows:
150
+
151
+
152
+ Number: 103,276,582 Class: Ordinary shares
153
+
154
+
155
+ The amounts recognised in the balance sheet are as follows:
156
+
157
+
158
+ The company operates a defined benefit pension scheme. An actuarial valuation
159
+ was carried out at 31 March 2020 by a qualified independent actuary.
160
+
161
+
162
+ Allotted, issued and fully paid:
163
+
164
+
165
+ Dunlop International Europe Limited'
166
+ - text: 'Komline-Sanderson Ltd
167
+
168
+ Notes to the financial statements
169
+
170
+ For the year ended 31 March 2025
171
+
172
+
173
+ 3. Employees
174
+
175
+ The average monthly number of employees, including directors, during the year
176
+ was 4 (2024 - 4).
177
+
178
+
179
+ 4. Taxation
180
+
181
+ Factors affecting tax charge for the year
182
+
183
+ There was no current tax charge in either year as the Company made a taxable loss
184
+ during the current reporting period and utilised brought forward trading losses
185
+ in the prior period.
186
+
187
+ Factors that may affect future tax charges
188
+
189
+ The Company has tax losses of approximately £134,000 which will reduce future
190
+ charges to corporation tax.
191
+
192
+
193
+ 5. Tangible fixed assets
194
+
195
+
196
+ | Cost or valuation | Office equipment £ |
197
+
198
+ |---|---|
199
+
200
+ | At 1 April 2024 | 2,201 |
201
+
202
+ | At 31 March 2025 | 2,201 |
203
+
204
+ | Depreciation | |
205
+
206
+ | At 1 April 2024 | 2,201 |
207
+
208
+ | At 31 March 2025 | 2,201 |
209
+
210
+ | Net book value | |
211
+
212
+ | At 31 March 2025 | - |
213
+
214
+ | At 31 March 2024 | - |
215
+
216
+
217
+ 6. Debtors
218
+
219
+
220
+ | | 2025 £ | 2024 € |
221
+
222
+ |---|---|---|
223
+
224
+ | Trade debtors | 881 | 11,736 |
225
+
226
+ | Other debtors | 134 | - |
227
+
228
+ | | 1,015 | 11,736 |
229
+
230
+
231
+ Page 3'
232
+ datasets:
233
+ - oneryalcin/financial-filings-sparse-retrieval-training
234
+ pipeline_tag: feature-extraction
235
+ library_name: sentence-transformers
236
+ metrics:
237
+ - dot_accuracy@1
238
+ - dot_accuracy@3
239
+ - dot_accuracy@5
240
+ - dot_accuracy@10
241
+ - dot_precision@1
242
+ - dot_precision@3
243
+ - dot_precision@5
244
+ - dot_precision@10
245
+ - dot_recall@1
246
+ - dot_recall@3
247
+ - dot_recall@5
248
+ - dot_recall@10
249
+ - dot_ndcg@10
250
+ - dot_mrr@10
251
+ - dot_map@100
252
+ - query_active_dims
253
+ - query_sparsity_ratio
254
+ - corpus_active_dims
255
+ - corpus_sparsity_ratio
256
+ - avg_flops
257
+ model-index:
258
+ - name: Financial Domain Sparse Encoder (doc-v3-gte fine-tuned)
259
+ results:
260
+ - task:
261
+ type: sparse-information-retrieval
262
+ name: Sparse Information Retrieval
263
+ dataset:
264
+ name: NanoNFCorpus
265
+ type: NanoNFCorpus
266
+ metrics:
267
+ - type: dot_accuracy@1
268
+ value: 0.44
269
+ name: Dot Accuracy@1
270
+ - type: dot_accuracy@3
271
+ value: 0.62
272
+ name: Dot Accuracy@3
273
+ - type: dot_accuracy@5
274
+ value: 0.66
275
+ name: Dot Accuracy@5
276
+ - type: dot_accuracy@10
277
+ value: 0.76
278
+ name: Dot Accuracy@10
279
+ - type: dot_precision@1
280
+ value: 0.44
281
+ name: Dot Precision@1
282
+ - type: dot_precision@3
283
+ value: 0.38
284
+ name: Dot Precision@3
285
+ - type: dot_precision@5
286
+ value: 0.35600000000000004
287
+ name: Dot Precision@5
288
+ - type: dot_precision@10
289
+ value: 0.314
290
+ name: Dot Precision@10
291
+ - type: dot_recall@1
292
+ value: 0.047208191733070066
293
+ name: Dot Recall@1
294
+ - type: dot_recall@3
295
+ value: 0.10033853359651303
296
+ name: Dot Recall@3
297
+ - type: dot_recall@5
298
+ value: 0.12344673739385978
299
+ name: Dot Recall@5
300
+ - type: dot_recall@10
301
+ value: 0.15735846776073342
302
+ name: Dot Recall@10
303
+ - type: dot_ndcg@10
304
+ value: 0.37746727490101206
305
+ name: Dot Ndcg@10
306
+ - type: dot_mrr@10
307
+ value: 0.5380793650793652
308
+ name: Dot Mrr@10
309
+ - type: dot_map@100
310
+ value: 0.17865969564136092
311
+ name: Dot Map@100
312
+ - type: query_active_dims
313
+ value: 4.760000228881836
314
+ name: Query Active Dims
315
+ - type: query_sparsity_ratio
316
+ value: 0.999844046909479
317
+ name: Query Sparsity Ratio
318
+ - type: corpus_active_dims
319
+ value: 1493.3485107421875
320
+ name: Corpus Active Dims
321
+ - type: corpus_sparsity_ratio
322
+ value: 0.9510730453200253
323
+ name: Corpus Sparsity Ratio
324
+ - type: avg_flops
325
+ value: 1.0107077360153198
326
+ name: Avg Flops
327
+ - task:
328
+ type: sparse-information-retrieval
329
+ name: Sparse Information Retrieval
330
+ dataset:
331
+ name: NanoSciFact
332
+ type: NanoSciFact
333
+ metrics:
334
+ - type: dot_accuracy@1
335
+ value: 0.54
336
+ name: Dot Accuracy@1
337
+ - type: dot_accuracy@3
338
+ value: 0.84
339
+ name: Dot Accuracy@3
340
+ - type: dot_accuracy@5
341
+ value: 0.86
342
+ name: Dot Accuracy@5
343
+ - type: dot_accuracy@10
344
+ value: 0.9
345
+ name: Dot Accuracy@10
346
+ - type: dot_precision@1
347
+ value: 0.54
348
+ name: Dot Precision@1
349
+ - type: dot_precision@3
350
+ value: 0.3
351
+ name: Dot Precision@3
352
+ - type: dot_precision@5
353
+ value: 0.18799999999999997
354
+ name: Dot Precision@5
355
+ - type: dot_precision@10
356
+ value: 0.09999999999999998
357
+ name: Dot Precision@10
358
+ - type: dot_recall@1
359
+ value: 0.53
360
+ name: Dot Recall@1
361
+ - type: dot_recall@3
362
+ value: 0.82
363
+ name: Dot Recall@3
364
+ - type: dot_recall@5
365
+ value: 0.845
366
+ name: Dot Recall@5
367
+ - type: dot_recall@10
368
+ value: 0.89
369
+ name: Dot Recall@10
370
+ - type: dot_ndcg@10
371
+ value: 0.7393057169200965
372
+ name: Dot Ndcg@10
373
+ - type: dot_mrr@10
374
+ value: 0.6897222222222222
375
+ name: Dot Mrr@10
376
+ - type: dot_map@100
377
+ value: 0.6905205627705626
378
+ name: Dot Map@100
379
+ - type: query_active_dims
380
+ value: 19.040000915527344
381
+ name: Query Active Dims
382
+ - type: query_sparsity_ratio
383
+ value: 0.999376187637916
384
+ name: Query Sparsity Ratio
385
+ - type: corpus_active_dims
386
+ value: 1752.0487060546875
387
+ name: Corpus Active Dims
388
+ - type: corpus_sparsity_ratio
389
+ value: 0.9425971854382187
390
+ name: Corpus Sparsity Ratio
391
+ - type: avg_flops
392
+ value: 4.766234874725342
393
+ name: Avg Flops
394
+ - task:
395
+ type: sparse-information-retrieval
396
+ name: Sparse Information Retrieval
397
+ dataset:
398
+ name: NanoFiQA2018
399
+ type: NanoFiQA2018
400
+ metrics:
401
+ - type: dot_accuracy@1
402
+ value: 0.36
403
+ name: Dot Accuracy@1
404
+ - type: dot_accuracy@3
405
+ value: 0.58
406
+ name: Dot Accuracy@3
407
+ - type: dot_accuracy@5
408
+ value: 0.64
409
+ name: Dot Accuracy@5
410
+ - type: dot_accuracy@10
411
+ value: 0.7
412
+ name: Dot Accuracy@10
413
+ - type: dot_precision@1
414
+ value: 0.36
415
+ name: Dot Precision@1
416
+ - type: dot_precision@3
417
+ value: 0.2733333333333333
418
+ name: Dot Precision@3
419
+ - type: dot_precision@5
420
+ value: 0.20800000000000002
421
+ name: Dot Precision@5
422
+ - type: dot_precision@10
423
+ value: 0.11999999999999998
424
+ name: Dot Precision@10
425
+ - type: dot_recall@1
426
+ value: 0.18724603174603174
427
+ name: Dot Recall@1
428
+ - type: dot_recall@3
429
+ value: 0.40712698412698417
430
+ name: Dot Recall@3
431
+ - type: dot_recall@5
432
+ value: 0.49556349206349204
433
+ name: Dot Recall@5
434
+ - type: dot_recall@10
435
+ value: 0.5478095238095239
436
+ name: Dot Recall@10
437
+ - type: dot_ndcg@10
438
+ value: 0.44010103944970097
439
+ name: Dot Ndcg@10
440
+ - type: dot_mrr@10
441
+ value: 0.46983333333333327
442
+ name: Dot Mrr@10
443
+ - type: dot_map@100
444
+ value: 0.38215757668624306
445
+ name: Dot Map@100
446
+ - type: query_active_dims
447
+ value: 12.0600004196167
448
+ name: Query Active Dims
449
+ - type: query_sparsity_ratio
450
+ value: 0.999604875158259
451
+ name: Query Sparsity Ratio
452
+ - type: corpus_active_dims
453
+ value: 1723.0986328125
454
+ name: Corpus Active Dims
455
+ - type: corpus_sparsity_ratio
456
+ value: 0.9435456840045706
457
+ name: Corpus Sparsity Ratio
458
+ - type: avg_flops
459
+ value: 4.071389198303223
460
+ name: Avg Flops
461
+ - task:
462
+ type: sparse-nano-beir
463
+ name: Sparse Nano BEIR
464
+ dataset:
465
+ name: NanoBEIR mean
466
+ type: NanoBEIR_mean
467
+ metrics:
468
+ - type: dot_accuracy@1
469
+ value: 0.4466666666666666
470
+ name: Dot Accuracy@1
471
+ - type: dot_accuracy@3
472
+ value: 0.68
473
+ name: Dot Accuracy@3
474
+ - type: dot_accuracy@5
475
+ value: 0.7200000000000001
476
+ name: Dot Accuracy@5
477
+ - type: dot_accuracy@10
478
+ value: 0.7866666666666667
479
+ name: Dot Accuracy@10
480
+ - type: dot_precision@1
481
+ value: 0.4466666666666666
482
+ name: Dot Precision@1
483
+ - type: dot_precision@3
484
+ value: 0.31777777777777777
485
+ name: Dot Precision@3
486
+ - type: dot_precision@5
487
+ value: 0.25066666666666665
488
+ name: Dot Precision@5
489
+ - type: dot_precision@10
490
+ value: 0.17799999999999996
491
+ name: Dot Precision@10
492
+ - type: dot_recall@1
493
+ value: 0.25481807449303395
494
+ name: Dot Recall@1
495
+ - type: dot_recall@3
496
+ value: 0.44248850590783234
497
+ name: Dot Recall@3
498
+ - type: dot_recall@5
499
+ value: 0.4880034098191173
500
+ name: Dot Recall@5
501
+ - type: dot_recall@10
502
+ value: 0.5317226638567525
503
+ name: Dot Recall@10
504
+ - type: dot_ndcg@10
505
+ value: 0.5189580104236031
506
+ name: Dot Ndcg@10
507
+ - type: dot_mrr@10
508
+ value: 0.5658783068783069
509
+ name: Dot Mrr@10
510
+ - type: dot_map@100
511
+ value: 0.41711261169938885
512
+ name: Dot Map@100
513
+ - type: query_active_dims
514
+ value: 11.953333854675293
515
+ name: Query Active Dims
516
+ - type: query_sparsity_ratio
517
+ value: 0.9996083699018847
518
+ name: Query Sparsity Ratio
519
+ - type: corpus_active_dims
520
+ value: 1666.2235158269893
521
+ name: Corpus Active Dims
522
+ - type: corpus_sparsity_ratio
523
+ value: 0.945409097836741
524
+ name: Corpus Sparsity Ratio
525
+ - type: avg_flops
526
+ value: 2.325575590133667
527
+ name: Avg Flops
528
+ ---
529
+
530
+ # Financial Domain Sparse Encoder (doc-v3-gte fine-tuned)
531
+
532
+ This is a [Asymmetric Inference-free SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte) on the [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
533
+ ## Model Details
534
+
535
+ ### Model Description
536
+ - **Model Type:** Asymmetric Inference-free SPLADE Sparse Encoder
537
+ - **Base model:** [opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte) <!-- at revision 1646fef40807937e8e130c66d327a26421c408d5 -->
538
+ - **Maximum Sequence Length:** 512 tokens
539
+ - **Output Dimensionality:** 30522 dimensions
540
+ - **Similarity Function:** Dot Product
541
+ - **Training Dataset:**
542
+ - [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training)
543
+ - **Language:** en
544
+ - **License:** apache-2.0
545
+
546
+ ### Model Sources
547
+
548
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
549
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
550
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
551
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
552
+
553
+ ### Full Model Architecture
554
+
555
+ ```
556
+ SparseEncoder(
557
+ (0): Router(
558
+ (sub_modules): ModuleDict(
559
+ (query): Sequential(
560
+ (0): SparseStaticEmbedding({'frozen': False}, dim=30522, tokenizer=TokenizersBackend)
561
+ )
562
+ (document): Sequential(
563
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'NewForMaskedLM'})
564
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'log1p_relu', 'word_embedding_dimension': 30522})
565
+ )
566
+ )
567
+ )
568
+ )
569
+ ```
570
+
571
+ ## Usage
572
+
573
+ ### Direct Usage (Sentence Transformers)
574
+
575
+ First install the Sentence Transformers library:
576
+
577
+ ```bash
578
+ pip install -U sentence-transformers
579
+ ```
580
+
581
+ Then you can load this model and run inference.
582
+ ```python
583
+ from sentence_transformers import SparseEncoder
584
+
585
+ # Download from the 🤗 Hub
586
+ model = SparseEncoder("sparse_encoder_model_id")
587
+ # Run inference
588
+ queries = [
589
+ "How much did the company charge for depreciation of tangible assets in 2025?",
590
+ ]
591
+ documents = [
592
+ "CoolerAid Holdings Ltd\nNotes to the Consolidated Financial Statements (continued)\nYear ended 31 March 2025\n\n5. Operating profit\nOperating profit or loss is stated after charging/crediting\n\n|| 2025| 2024|\n|------------------------------------------------|---------|-----------|\n| Depreciation of tangible assets| 870,340 | 752,425 |\n| Loss/(gains) on disposal of tangible assets| 1,535 | (173,401) |\n| Impairment of trade debtors| 32,465 | 15,860|\n| Operating lease rentals| 112,292 | 139,908 |\n\n6. Auditor's remuneration\n\n|| 2025 | 2024 |\n|--------------------------------------------------------------|-------|-------|\n| Fees payable for the audit of the consolidated financial statements | 5,000 | 5,000 |\n\n7. Staff costs\nThe average number of persons employed by the group during the year, including the directors, amounted to:\n\n|| 2025 | 2024 |\n|---------------------|------|------|\n| Distribution staff | 44 | 42 |\n| Administrative staff| 14 | 13 |\n| Management staff| 2| 2|\n| Directors| 4| 4|\n|| 64 | 61 |\n\nThe aggregate payroll costs incurred during the year, relating to the above, were:\n\n|| 2025| 2024|\n|---------------------|-----------|-----------|\n| Wages and salaries | 2,828,063 | 2,596,870 |\n| Social security costs| 314,830 | 303,444 |\n| Other pension costs | 79,769| 68,413|\n|| 3,222,662 | 2,968,727 |\n\n8. Directors' remuneration\nThe directors' aggregate remuneration in respect of qualifying services was:\n\n|| 2025 | 2024 |\n|---------------|--------|--------|\n| Remuneration | 18,880 | 18,880 |\n\n- 20 -",
593
+ 'Kenneth Forbes (Holdings) Limited\nNotes to the Financial Statements (continued)\nYear ended 30 April 2025\n\n13. Tangible assets.\n\n| Group and company | Freehold property £ | Plant and machinery £ | Fixtures and fittings £ | Motor vehicles £ | Assets held under the course of construction £ | Total £ |\n|---|---|---|---|---|---|---|\n| Cost | | | | | | |\n| At 1 May 2024 | 3,941,316 | 3,684,443 | 1,326,032 | 294,882 | 41,948 | 9,288,621 |\n| Additions | 570,435 | 395,946 | 187,049 | 194,503 | 106,519 | 1,454,452 |\n| Disposals | | (317,265) | (214,785) | (164,244) | | (696,294) |\n| At 30 Apr 2025 | 4,511,751 | 3,763,124 | 1,298,296 | 325,141 | 148,467 | 10,046,779 |\n| Depreciation | | | | | | |\n| At 1 May 2024 | 1,158,100 | 3,257,887 | 904,088 | 156,223 | | 5,476,298 |\n| Charge for the year | 133,789 | 160,512 | 101,477 | 78,993 | | 474,771 |\n| Disposals | | (317,265) | (214,785) | (133,585) | | (665,635) |\n| At 30 Apr 2025 | 1,291,889 | 3,101,134 | 790,780 | 101,631 | | 5,285,434 |\n| Carrying amount | | | | | | |\n| At 30 Apr 2025 | 3,219,862 | 661,990 | 507,516 | 223,510 | 148,467 | 4,761,345 |\n| At 30 Apr 2024 | 2,783,216 | 426,556 | 421,944 | 138,659 | 41,948 | 3,812,323 |\n\nFreehold land amounting to £475,010 (2024: £475,010) is not depreciated.\nAll assets are held by the company but used by the group undertakings or associates.\n\nTangible assets held at valuation\n\nIn respect of tangible assets held at valuation, aggregate cost, depreciation and comparable carrying\namount that would have been recognised if the assets had been carried under the historical cost model are\nas follows:\n\nGroup and company\n\n| | Freehold property £ |\n|---|---|\n| At 30 April 2025 | |\n| Aggregate cost | 3,854,887 |\n| Aggregate depreciation | (1,963,225) |\n| Carrying value | 1,891,662 |\n| At 30 April 2024 | |\n| Aggregate cost | 3,854,887 |\n| Aggregate depreciation | (1,700,013) |\n| Carrying value | 2,154,874 |\n\n1\n-24-\n1',
594
+ 'WHOCANFIXMYCAR.COM LTD\nNOTES TO THE FINANCIAL STATEMENTS\nFOR THE YEAR ENDED 31 MARCH 2025\n\n7. Tangible fixed assets\n\n| | Fixtures and fittings £ | Computer equipment £ | Right of use assets £ | Total £ | £ |\n|---|---|---|---|---|---|\n| **Cost** | | | | | |\n| At 1 April 2024 | 61,966 | 131,878 | - | 193,844 | |\n| Additions | - | 3,187 | 287,367 | 290,554 | |\n| At 31 March 2025 | 61,966 | 135,065 | 287,367 | 484,398 | |\n| **Depreciation** | | | | | |\n| At 1 April 2024 | 61,966 | 99,890 | - | 161,856 | |\n| Charge for the year | - | 12,982 | 82,657 | 95,639 | |\n| At 31 March 2025 | 61,966 | 112,872 | 82,657 | 257,495 | |\n| **Net book value** | | | | | |\n| At 31 March 2025 | - | 22,193 | 204,710 | 226,903 | |\n| At 31 March 2024 | - | 31,988 | - | 31,988 | |\n\nThe net book value of owned and leased assets included as "Tangible fixed assets" in the Balance Sheet is as follows:\n\n| | 2025 £ | 2024 £ |\n|---|---|---|\n| Tangible fixed assets owned | 22,193 | 31,988 |\n| Right-of-use tangible fixed assets | 204,710 | - |\n| | 226,903 | 31,988 |\n\nInformation about right-of-use assets is summarised below:\n\nNet book value\n\n| | 2025 £ | 2024 £ |\n|---|---|---|\n| Office and computer equipment | 204,710 | - |\n| | 204,710 | - |\n\nPage 11',
595
+ ]
596
+ query_embeddings = model.encode_query(queries)
597
+ document_embeddings = model.encode_document(documents)
598
+ print(query_embeddings.shape, document_embeddings.shape)
599
+ # [1, 30522] [3, 30522]
600
+
601
+ # Get the similarity scores for the embeddings
602
+ similarities = model.similarity(query_embeddings, document_embeddings)
603
+ print(similarities)
604
+ # tensor([[18.4437, 21.8515, 17.4189]])
605
+ ```
606
+
607
+ <!--
608
+ ### Direct Usage (Transformers)
609
+
610
+ <details><summary>Click to see the direct usage in Transformers</summary>
611
+
612
+ </details>
613
+ -->
614
+
615
+ <!--
616
+ ### Downstream Usage (Sentence Transformers)
617
+
618
+ You can finetune this model on your own dataset.
619
+
620
+ <details><summary>Click to expand</summary>
621
+
622
+ </details>
623
+ -->
624
+
625
+ <!--
626
+ ### Out-of-Scope Use
627
+
628
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
629
+ -->
630
+
631
+ ## Evaluation
632
+
633
+ ### Metrics
634
+
635
+ #### Sparse Information Retrieval
636
+
637
+ * Datasets: `NanoNFCorpus`, `NanoSciFact` and `NanoFiQA2018`
638
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
639
+
640
+ | Metric | NanoNFCorpus | NanoSciFact | NanoFiQA2018 |
641
+ |:----------------------|:-------------|:------------|:-------------|
642
+ | dot_accuracy@1 | 0.44 | 0.54 | 0.36 |
643
+ | dot_accuracy@3 | 0.62 | 0.84 | 0.58 |
644
+ | dot_accuracy@5 | 0.66 | 0.86 | 0.64 |
645
+ | dot_accuracy@10 | 0.76 | 0.9 | 0.7 |
646
+ | dot_precision@1 | 0.44 | 0.54 | 0.36 |
647
+ | dot_precision@3 | 0.38 | 0.3 | 0.2733 |
648
+ | dot_precision@5 | 0.356 | 0.188 | 0.208 |
649
+ | dot_precision@10 | 0.314 | 0.1 | 0.12 |
650
+ | dot_recall@1 | 0.0472 | 0.53 | 0.1872 |
651
+ | dot_recall@3 | 0.1003 | 0.82 | 0.4071 |
652
+ | dot_recall@5 | 0.1234 | 0.845 | 0.4956 |
653
+ | dot_recall@10 | 0.1574 | 0.89 | 0.5478 |
654
+ | **dot_ndcg@10** | **0.3775** | **0.7393** | **0.4401** |
655
+ | dot_mrr@10 | 0.5381 | 0.6897 | 0.4698 |
656
+ | dot_map@100 | 0.1787 | 0.6905 | 0.3822 |
657
+ | query_active_dims | 4.76 | 19.04 | 12.06 |
658
+ | query_sparsity_ratio | 0.9998 | 0.9994 | 0.9996 |
659
+ | corpus_active_dims | 1493.3485 | 1752.0487 | 1723.0986 |
660
+ | corpus_sparsity_ratio | 0.9511 | 0.9426 | 0.9435 |
661
+ | avg_flops | 1.0107 | 4.7662 | 4.0714 |
662
+
663
+ #### Sparse Nano BEIR
664
+
665
+ * Dataset: `NanoBEIR_mean`
666
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
667
+ ```json
668
+ {
669
+ "dataset_names": [
670
+ "nfcorpus",
671
+ "scifact",
672
+ "fiqa2018"
673
+ ],
674
+ "dataset_id": "sentence-transformers/NanoBEIR-en"
675
+ }
676
+ ```
677
+
678
+ | Metric | Value |
679
+ |:----------------------|:----------|
680
+ | dot_accuracy@1 | 0.4467 |
681
+ | dot_accuracy@3 | 0.68 |
682
+ | dot_accuracy@5 | 0.72 |
683
+ | dot_accuracy@10 | 0.7867 |
684
+ | dot_precision@1 | 0.4467 |
685
+ | dot_precision@3 | 0.3178 |
686
+ | dot_precision@5 | 0.2507 |
687
+ | dot_precision@10 | 0.178 |
688
+ | dot_recall@1 | 0.2548 |
689
+ | dot_recall@3 | 0.4425 |
690
+ | dot_recall@5 | 0.488 |
691
+ | dot_recall@10 | 0.5317 |
692
+ | **dot_ndcg@10** | **0.519** |
693
+ | dot_mrr@10 | 0.5659 |
694
+ | dot_map@100 | 0.4171 |
695
+ | query_active_dims | 11.9533 |
696
+ | query_sparsity_ratio | 0.9996 |
697
+ | corpus_active_dims | 1666.2235 |
698
+ | corpus_sparsity_ratio | 0.9454 |
699
+ | avg_flops | 2.3256 |
700
+
701
+ <!--
702
+ ## Bias, Risks and Limitations
703
+
704
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
705
+ -->
706
+
707
+ <!--
708
+ ### Recommendations
709
+
710
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
711
+ -->
712
+
713
+ ## Training Details
714
+
715
+ ### Training Dataset
716
+
717
+ #### financial-filings-sparse-retrieval-training
718
+
719
+ * Dataset: [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training) at [23e44ab](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training/tree/23e44abc3bfdb454da434ba8eb3e38bd1e01be84)
720
+ * Size: 18,247 training samples
721
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative_0</code>, <code>negative_1</code>, <code>negative_2</code>, <code>negative_3</code>, <code>negative_4</code>, <code>negative_5</code>, and <code>negative_6</code>
722
+ * Approximate statistics based on the first 1000 samples:
723
+ | | query | positive | negative_0 | negative_1 | negative_2 | negative_3 | negative_4 | negative_5 | negative_6 |
724
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
725
+ | type | string | string | string | string | string | string | string | string | string |
726
+ | details | <ul><li>min: 9 tokens</li><li>mean: 20.51 tokens</li><li>max: 79 tokens</li></ul> | <ul><li>min: 51 tokens</li><li>mean: 331.12 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 54 tokens</li><li>mean: 360.35 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 56 tokens</li><li>mean: 357.16 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 303.81 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 263.98 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 221.87 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 193.47 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 166.03 tokens</li><li>max: 512 tokens</li></ul> |
727
+ * Samples:
728
+ | query | positive | negative_0 | negative_1 | negative_2 | negative_3 | negative_4 | negative_5 | negative_6 |
729
+ |:------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------|
730
+ | <code>What was the actuarial gain on defined benefit pension plans for the year 2021?</code> | <code>2021 £'000 2020 £'000 Loss for the financial year (9,731) (50,454) Other comprehensive income/(expense) for the financial period Actuarial gain/(loss) on defined benefit pension plans 31,200 (4,400) Deferred tax impact of actuarial gain/(loss) (10,920) 1,540 Other comprehensive income/(expense) 20,280 (2,860) Total comprehensive income/(expense) for the financial period 10,549 (53,314)<br><br>STATEMENT OF COMPREHENSIVE INCOME FOR THE YEAR ENDED 31 DECEMBER 2021<br><br>11</code> | <code>Defined benefit pension plans 2022 2021 Equities 12% 26% Bonds 56% 31% Liability driven investment 32% 39% Other - 4% 100% 100%<br><br>Defined benefit pension plans 2022 €'000 2021 €'000 Actuarial losses from changes in financial assumptions 36 391 Actuarial gain/(losses) 3,629 (4,830) 3,665 (4,439)<br><br>```markdown Dunlop International Europe Limited Notes to the Financial Statements - continued for the year ended 31 December 2022<br><br>Defined benefit pension plans 2022 €'000 2021 €'000 Opening fair value of scheme assets 90,460 85,266 Contributions by employer 1,129 1,189 Expected return 1,697 1,183 Actuarial (losses) (33,630) (591) Benefits paid (3,397) (3,408) Exchange differences on foreign plans (4,631) 6,821 51,628 90,460<br><br>Changes in the fair value of scheme assets are as follows:<br><br>Page 24 ```<br><br>The major categories of scheme assets as a percentage of total scheme assets are as follows:<br><br>Changes in the present value of the defined benefit obligation are as follows:<br><br>Defined benefit pension pla...</code> | <code>The components of net periodic pension benefit cost recognized in our Consolidated Statements of Operations for the periods presented are as follows:<br><br>Years Ended December 31, 2021 Projected benefit obligation, beginning of year $ 97,740 Service cost $ 1,282 Interest cost $ 1,452 Actuarial (gain) loss $ (8,682) Benefits paid $ (2,010) Translation adjustment $ (4,006) Projected benefit obligation, end of year $ 85,776 Fair value of plan assets, beginning of year $ 17,293 Actual return on plan assets $ 641 Contributions $ 1,775 Benefits paid $ (1,112) Actuarial gain $ 71 Translation adjustment $ (147) Fair value of plan assets, end of year $ 18,521 Funded status of plan $ (67,255)<br><br>Our projected benefit obligation and plan assets for defined benefit pension plans and the related assumptions used to determine the related liabilities are as follows:<br><br>Defined Benefit Plan<br><br>We maintain defined benefit pension plans for certain of our non-U.S. employees in the U.K., Germany, and Philippines. ...</code> | <code></code> | <code></code> | <code></code> | <code></code> | <code></code> |
731
+ | <code>What is the interest rate for the GBP short term loan with Canadian Natural Resources Limited?</code> | <code>CREDITORS: amounts falling due within one year<br><br>33<br><br>DEBTORS<br><br>The carrying amount of debtors is a reasonable approximation to fair value. Trade debtors and other receivables are not overdue as payment terms have not been exceeded. The expected credit loss on the trade debtor's balance was negligible and therefore no adjustment has been applied.<br><br>Other creditors includes £4.9 million (2022 - £4.3 million) in respect of future share options (note 21). The short term loan with CNR International (U.K.) Developments Limited expired on 31 December 2023. Interest was generated at a rate of Secured Overnight Financing Rate (SOFR) + 1.75%.<br><br>Amounts owed by group undertakings are unsecured and repayable on demand. The GBP short term loan with Canadian Natural Resources Limited generates interest at a rate of Sterling Overnight Index Average (SONIA) + 1.15% per annum. Trading balances are interest free. Management considered the expected credit loss on amounts due from Group undertakings at 31 Dec...</code> | <code>for the year ended 31 December 2022<br><br>The share options creditor relates to amounts payable to the ultimate parent Canadian Natural Resources Limited relating to employee options to purchase stock in the aforementioned company. The provision is based on the specifics of the agreed plan with Canadian Natural Resources Limited. The current portion of this totalling £4.3 million (2021 - £3.7 million) is included in creditor amounts falling due within one year.<br><br>34<br><br>The Company settled an intercompany term loan on 24 October 2022. The amount drawn down by the Company at 24 October 2022 was US$440.0 million (2021 - US$440.0 million). Interest was charged at US$ LIBOR + 2.8175% per annum on amounts drawn down from this facility until settlement.<br><br>2022 (£'000) 2021 (£'000) Share options 3,550 4,276 Lease liabilities (note 13) 4,786 6,428 Total 8,336 10,704<br><br>The short term loan with CNR International (U.K.) Developments Limited generates interest at a rate of USS LIBOR + 1.75%.<br><br>18. CREDITORS: ...</code> | <code>18 Cash and cash equivalents<br><br>Group 31 Dec 2021 (£000) Group 31 Dec 2020 (£000) Company 31 Dec 2021 (£000) Company 31 Dec 2020 (£000) Sterling 19,641 37,349 19,498 37,347 United States Dollar 8,574 6,826 — — Euros 361 — — — Canadian Dollar 227 364 — — Polish Zloty 225 283 — — Singapore Dollar 29 — — — Japanese Yen 7 — — — Total 29,064 44,822 19,498 37,347<br><br>Cash and cash equivalents are denominated in the following currencies:<br><br>On 14 October 2021, the Group and Company entered into a loan agreement with Bank Of Ireland Group plc consisting of a £10 million term loan in addition to a revolving credit facility of £10 million. The loan is secured on the assets of the Group. Operating covenants are limited to the Group’s net debt leverage and interest cover. The term loan is repayable over five years with an initial 12-month repayment holiday followed by annual capital repayments of £1,250,000. At the end of the term, a bullet payment of £5 million is due. The loan is denominated in Pound S...</code> | <code></code> | <code></code> | <code></code> | <code></code> | <code></code> |
732
+ | <code>What is the amount for Charges à imputer relative au personnel?</code> | <code>Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important<br><br>COMPTES DE RÉGULARISATION<br><br>Description Montant Charges à imputer relative au personnel 128.084.824,70 Charges à imputer: intérêts courus et non échus 37.536.987,21 Charges à imputer diverses 12.774.405,94 Charges à imputer: protocoles et conventions avec autres opérateurs et réseaux 7.240.507,84 Produits à reporter divers 7.841.755,07 Produits à reporter relatifs au trafic 116.123.554,80 Produis à reporter: Hello Belgium Railpass 21.396.182,64 Produis à reporter: NPV 20.088.866,29 Produits à reporter: financements alternatifs 15.285.835,38<br><br>72<br><br>First - C-Cap2022 - 38 / 82</code> | <code>```markdown N° 0203.430.576 C-Cap 6.9<br><br>Charges à imputer diverses Exercice Charges à imputer: protocoles et conventions avec autres opérateurs et reseaux 2.101.406,29 Charges à imputer relatives au personnel 1.378.996,29 Charges à imputer: intérêts courus et non échus 139.424.041,35 Produits à reporter divers 39.103.099,47 Produits à reporter relatifs au trafic 6.816.893,14 Produits à reporter: financements altematifs 146.853.054,23 Produis à reporter: NPV 9.576.751,43 15.185.940,48<br><br>71 Rapport annuel SNCB 2023 ```<br><br>COMPTES DE RÉGULARISATION<br><br>Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important</code> | <code>DETTES FISCALES, SALARIALES ET SOCIALES (€)<br><br>CODES BNB 2024 VENTILATION DE LA RUBRIQUE 492/3 DU PASSIF SI CELLE-CI REPRÉSENTE UN MONTANT IMPORTANT Intérêts crédits KBC à imputer Autres charges à imputer<br><br>ÉTAT DES DETTES (€)<br><br>COMPTES DE REGULARISATION (€)<br><br>12.4 Etat des dettes et comptes de régularisation du passif<br><br>CODES BNB 2024 VENTILATION DES DETTES A L'ORIGINE A PLUS D'UN AN, EN FONCTION DE LEUR DUREE RESDIEUELLE Dettes à plus d'un an échéant dans l'année Dettes financières 8801 Etablissements de crédit 8841 Total des dettes à plus d'un an échéant dans l'année (42) Dettes ayant plus d'un an mais 5 ans au plus à courir Dettes financières 8802 Etablissement de crédit 8842 Total des dettes ayant plus d'un an mais 5 ans au plus à courir 8912<br><br>12.5 Résultats d'exploitation (en milliers €)<br><br>DETTES GARANTIES (€)<br><br>CODES BNB 2024 Rémunérations et charges sociales Autres dettes salariales et sociales 9077<br><br>```markdown<br><br>CODES BNB 2023 2024 CHARGES D'EXPLOITATION Travailleurs pour lesquels la ...</code> | <code>Résultats financiers Charges financières récurrentes Ventilation des autres charges financières<br><br>11 Dettes fiscales, salariales et sociales<br><br>2022 2021 Impôts (rubriques 450/3 et 179 du passif) Dettes fiscales échues - - Dettes fiscales non échues - - Dettes fiscales estimées - - Rémunérations et charges sociales (rubriques 454/9 et 179 du passif) Dettes échues envers l'Office National de Sécurité Sociale - - Autres dettes salariales et sociales 28.500 25.000<br><br>FINANCIÈRE DE TUBIZE – RAPPORT FINANCIER ANNUEL 2022<br><br>Comptes de régularisation<br><br>FINANCIÈRE DE TUBIZE - RAPPORT ANNUEL 2022 40 41<br><br>2022 2021 Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important Charges à imputer: intérêts 345.843 40.556 Charges à imputer: commission de réservation 107.126 76.667<br><br>2022 2021 Charges d'exploitation Travailleurs pour lesquels la société a introduit une déclaration DIMONA ou qui sont inscrits au registre général du personnel - - Nombre total à la date de clôture - - Ef...</code> | <code>ANNEXES DES COMPTES DE LA SOCIÉTÉ AU 31 MAI 2025<br><br>SITUATION FISCALE DIFFEREE ET LATENTE<br><br>\| Accroissements de la dette future d'impôt \| Montant \|<br>\|---\|---\|<br>\| Impôt dû sur provisions réglementées: \| \|<br>\| Provisions pour hausse de prix \| \|<br>\| Provisions pour fluctuation des cours \| \|<br>\| Provisions pour investissements \| \|<br>\| Amortissements dérogatoires \| 3 548 \|<br>\| Subventions d'investissement \| \|<br>\| TOTAL ACCROISSEMENTS \| 3 548 \|<br><br>\| Allègements de la dette future d'impôt \| Montant \|<br>\|---\|---\|<br>\| Impôt payé d'avance sur: \| \|<br>\| Charges non déductibles temporairement (à déduire l'année suivante) \| 1 200 \|<br>\| Congés payés \| \|<br>\| Participation des salariés \| 53 \|<br>\| Autres \| \|<br>\| A déduire ultérieurement \| \|<br>\| Provisions pour propre assureur \| \|<br>\| Autres \| \|<br>\| TOTAL ALLÈGEMENTS \| 1 254 \|<br><br>SITUATION FISCALE DIFFÉRÉE NETTE<br>2 294<br><br>IMPÔT DÙ SUR: Plus-values différées<br>43 436<br><br>CREDIT A IMPUTER SUR: Déficits reportables<br><br>CREDIT A IMPUTER SUR: Moins-values à long terme<br><br>SITUATION FISCALE LATENTE NETT...</code> | <code>d'un mois au plus<br><br>Titres à revenu fixe émis par des établissements de crédit<br><br>AUTRES PLACEMENTS DE TRÉSORERIE<br><br>Autres placements de trésorerie non repris ci-avant<br><br>Codes Exercice Exercice précédent 51 8681 8682 8683 52 18.088.896,34 55.716.775,84 8684 53 226.519.496,99 136.305.308,18 8686 85.000.000,00 801.112,99 8687 1.599.690,84 8688 139.919.806,15 135.504.195,19 8689<br><br>Actions, parts et placements autres que placements à revenu fixe<br><br>Titres à revenu fixe<br><br>Actions et parts - Montant non appelé<br><br>Avec une durée résiduelle ou de préavis<br><br>Métaux précieux et œuvres d'art<br><br>Ventilation de la rubrique 490/1 de l'actif si celle-ci représente un montant important<br><br>PLACEMENTS DE TRÉSORERIE ET COMPTES DE RÉGULARISATION DE L'ACTIF<br><br>64 Rapport annuel SNCB 2023<br><br>Actions et parts - Valeur comptable augmentée du montant non appelé<br><br>Comptes à terme détenus auprès des établissements de crédit<br><br>de plus d'un mois à un an au plus<br><br>COMPTES DE RÉGULARISATION<br><br>Exercice Charges à reporter: redevance infrastru...</code> | <code>4.6. Accroissements et allégements de la dette future d'impôt<br><br>Les éléments entraînant un décalage d'imposition conduisent à un accroissement de la dette future d'impôt de 21 278K€ calculé au taux de 25.82%.<br><br>La situation fiscale latente s'analyse comme suit :<br><br>\| Base de calcul \| Montants en K€ \|<br>\|---\|---\|<br>\| BASE D'IMPOT SUR : \| \|<br>\| Provisions réglementées : \| \|<br>\| - Ecart de conversion Actif \| 0 \|<br>\| - Ecart de conversion Passif \| -4 \|<br>\| - Provision pour investissements \| \|<br>\| - Amortissements dérogatoires \| 94 455 \|<br>\| Subventions d'investissement \| 3 283 \|<br>\| Produits non imposables temporairement : \| \|<br>\| (à réintégrer l'année de leur acquisition) \| \|<br>\| - plafonnement TP \| \|<br>\| **TOTAL ACCROISSEMENTS** \| **97 734** \|<br>\| BASE D'IMPOT PAYE D'AVANCE SUR : \| \|<br>\| Charges non déductibles temporairement : \| \|<br>\| (à déduire l'année suivante) \| \|<br>\| - Provision pour risques et charges \| -928 \|<br>\| - Provision pour participation \| -4 083 \|<br>\| - Contribution solidarité \| -869 \|<br>\| - Provisions pou...</code> | <code></code> |
733
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
734
+ ```json
735
+ {
736
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score', gather_across_devices=False)",
737
+ "document_regularizer_weight": 3e-05,
738
+ "query_regularizer_weight": 0.0
739
+ }
740
+ ```
741
+
742
+ ### Training Hyperparameters
743
+ #### Non-Default Hyperparameters
744
+
745
+ - `num_train_epochs`: 2
746
+ - `learning_rate`: 2e-05
747
+ - `warmup_steps`: 114
748
+ - `weight_decay`: 0.01
749
+ - `gradient_accumulation_steps`: 4
750
+ - `bf16`: True
751
+ - `tf32`: True
752
+ - `eval_strategy`: steps
753
+ - `dataloader_num_workers`: 4
754
+ - `batch_sampler`: no_duplicates
755
+ - `router_mapping`: {'query': 'query', 'positive': 'document', 'negative_0': 'document', 'negative_1': 'document', 'negative_2': 'document', 'negative_3': 'document', 'negative_4': 'document', 'negative_5': 'document', 'negative_6': 'document'}
756
+ - `learning_rate_mapping`: {'sub_modules\\.query\\..*': 0.001}
757
+
758
+ #### All Hyperparameters
759
+ <details><summary>Click to expand</summary>
760
+
761
+ - `per_device_train_batch_size`: 8
762
+ - `num_train_epochs`: 2
763
+ - `max_steps`: -1
764
+ - `learning_rate`: 2e-05
765
+ - `lr_scheduler_type`: linear
766
+ - `lr_scheduler_kwargs`: None
767
+ - `warmup_steps`: 114
768
+ - `optim`: adamw_torch_fused
769
+ - `optim_args`: None
770
+ - `weight_decay`: 0.01
771
+ - `adam_beta1`: 0.9
772
+ - `adam_beta2`: 0.999
773
+ - `adam_epsilon`: 1e-08
774
+ - `optim_target_modules`: None
775
+ - `gradient_accumulation_steps`: 4
776
+ - `average_tokens_across_devices`: True
777
+ - `max_grad_norm`: 1.0
778
+ - `label_smoothing_factor`: 0.0
779
+ - `bf16`: True
780
+ - `fp16`: False
781
+ - `bf16_full_eval`: False
782
+ - `fp16_full_eval`: False
783
+ - `tf32`: True
784
+ - `gradient_checkpointing`: False
785
+ - `gradient_checkpointing_kwargs`: None
786
+ - `torch_compile`: False
787
+ - `torch_compile_backend`: None
788
+ - `torch_compile_mode`: None
789
+ - `use_liger_kernel`: False
790
+ - `liger_kernel_config`: None
791
+ - `use_cache`: False
792
+ - `neftune_noise_alpha`: None
793
+ - `torch_empty_cache_steps`: None
794
+ - `auto_find_batch_size`: False
795
+ - `log_on_each_node`: True
796
+ - `logging_nan_inf_filter`: True
797
+ - `include_num_input_tokens_seen`: no
798
+ - `log_level`: passive
799
+ - `log_level_replica`: warning
800
+ - `disable_tqdm`: False
801
+ - `project`: huggingface
802
+ - `trackio_space_id`: trackio
803
+ - `eval_strategy`: steps
804
+ - `per_device_eval_batch_size`: 8
805
+ - `prediction_loss_only`: True
806
+ - `eval_on_start`: False
807
+ - `eval_do_concat_batches`: True
808
+ - `eval_use_gather_object`: False
809
+ - `eval_accumulation_steps`: None
810
+ - `include_for_metrics`: []
811
+ - `batch_eval_metrics`: False
812
+ - `save_only_model`: False
813
+ - `save_on_each_node`: False
814
+ - `enable_jit_checkpoint`: False
815
+ - `push_to_hub`: False
816
+ - `hub_private_repo`: None
817
+ - `hub_model_id`: None
818
+ - `hub_strategy`: every_save
819
+ - `hub_always_push`: False
820
+ - `hub_revision`: None
821
+ - `load_best_model_at_end`: False
822
+ - `ignore_data_skip`: False
823
+ - `restore_callback_states_from_checkpoint`: False
824
+ - `full_determinism`: False
825
+ - `seed`: 42
826
+ - `data_seed`: None
827
+ - `use_cpu`: False
828
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
829
+ - `parallelism_config`: None
830
+ - `dataloader_drop_last`: False
831
+ - `dataloader_num_workers`: 4
832
+ - `dataloader_pin_memory`: True
833
+ - `dataloader_persistent_workers`: False
834
+ - `dataloader_prefetch_factor`: None
835
+ - `remove_unused_columns`: True
836
+ - `label_names`: None
837
+ - `train_sampling_strategy`: random
838
+ - `length_column_name`: length
839
+ - `ddp_find_unused_parameters`: None
840
+ - `ddp_bucket_cap_mb`: None
841
+ - `ddp_broadcast_buffers`: False
842
+ - `ddp_backend`: None
843
+ - `ddp_timeout`: 1800
844
+ - `fsdp`: []
845
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
846
+ - `deepspeed`: None
847
+ - `debug`: []
848
+ - `skip_memory_metrics`: True
849
+ - `do_predict`: False
850
+ - `resume_from_checkpoint`: None
851
+ - `warmup_ratio`: None
852
+ - `local_rank`: -1
853
+ - `prompts`: None
854
+ - `batch_sampler`: no_duplicates
855
+ - `multi_dataset_batch_sampler`: proportional
856
+ - `router_mapping`: {'query': 'query', 'positive': 'document', 'negative_0': 'document', 'negative_1': 'document', 'negative_2': 'document', 'negative_3': 'document', 'negative_4': 'document', 'negative_5': 'document', 'negative_6': 'document'}
857
+ - `learning_rate_mapping`: {'sub_modules\\.query\\..*': 0.001}
858
+
859
+ </details>
860
+
861
+ ### Training Logs
862
+ <details><summary>Click to expand</summary>
863
+
864
+ | Epoch | Step | Training Loss | NanoNFCorpus_dot_ndcg@10 | NanoSciFact_dot_ndcg@10 | NanoFiQA2018_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
865
+ |:------:|:----:|:-------------:|:------------------------:|:-----------------------:|:------------------------:|:-------------------------:|
866
+ | 0.0175 | 10 | 2.5846 | - | - | - | - |
867
+ | 0.0351 | 20 | 2.4596 | - | - | - | - |
868
+ | 0.0526 | 30 | 2.5787 | - | - | - | - |
869
+ | 0.0701 | 40 | 2.2135 | - | - | - | - |
870
+ | 0.0877 | 50 | 2.1444 | - | - | - | - |
871
+ | 0.1052 | 60 | 2.0011 | - | - | - | - |
872
+ | 0.1228 | 70 | 1.8179 | - | - | - | - |
873
+ | 0.1403 | 80 | 1.7744 | - | - | - | - |
874
+ | 0.1578 | 90 | 1.7054 | - | - | - | - |
875
+ | 0.1754 | 100 | 1.5427 | - | - | - | - |
876
+ | 0.1929 | 110 | 1.6134 | - | - | - | - |
877
+ | 0.2104 | 120 | 1.6381 | - | - | - | - |
878
+ | 0.2280 | 130 | 1.6946 | - | - | - | - |
879
+ | 0.2455 | 140 | 1.4456 | - | - | - | - |
880
+ | 0.2630 | 150 | 1.4302 | - | - | - | - |
881
+ | 0.2806 | 160 | 1.3097 | - | - | - | - |
882
+ | 0.2981 | 170 | 1.5755 | - | - | - | - |
883
+ | 0.3157 | 180 | 1.2906 | - | - | - | - |
884
+ | 0.3332 | 190 | 1.3424 | - | - | - | - |
885
+ | 0.3507 | 200 | 1.5477 | - | - | - | - |
886
+ | 0.3683 | 210 | 1.3442 | - | - | - | - |
887
+ | 0.3858 | 220 | 1.2810 | - | - | - | - |
888
+ | 0.4033 | 230 | 1.3157 | - | - | - | - |
889
+ | 0.4209 | 240 | 1.2839 | - | - | - | - |
890
+ | 0.4384 | 250 | 1.2428 | - | - | - | - |
891
+ | 0.4559 | 260 | 1.2376 | - | - | - | - |
892
+ | 0.4735 | 270 | 1.1353 | - | - | - | - |
893
+ | 0.4910 | 280 | 1.2513 | - | - | - | - |
894
+ | 0.5085 | 290 | 1.0490 | - | - | - | - |
895
+ | 0.5261 | 300 | 1.0669 | - | - | - | - |
896
+ | 0.5436 | 310 | 1.2219 | - | - | - | - |
897
+ | 0.5612 | 320 | 1.0313 | - | - | - | - |
898
+ | 0.5787 | 330 | 1.2846 | - | - | - | - |
899
+ | 0.5962 | 340 | 1.0939 | - | - | - | - |
900
+ | 0.6138 | 350 | 1.0299 | - | - | - | - |
901
+ | 0.6313 | 360 | 0.6464 | - | - | - | - |
902
+ | 0.6488 | 370 | 0.7067 | - | - | - | - |
903
+ | 0.6664 | 380 | 0.5505 | - | - | - | - |
904
+ | 0.6839 | 390 | 0.6885 | - | - | - | - |
905
+ | 0.7014 | 400 | 0.8663 | - | - | - | - |
906
+ | 0.7190 | 410 | 0.8602 | - | - | - | - |
907
+ | 0.7365 | 420 | 0.5517 | - | - | - | - |
908
+ | 0.7541 | 430 | 0.3781 | - | - | - | - |
909
+ | 0.7716 | 440 | 0.6533 | - | - | - | - |
910
+ | 0.7891 | 450 | 1.1145 | - | - | - | - |
911
+ | 0.8067 | 460 | 0.3240 | - | - | - | - |
912
+ | 0.8242 | 470 | 0.5818 | - | - | - | - |
913
+ | 0.8417 | 480 | 0.3394 | - | - | - | - |
914
+ | 0.8593 | 490 | 0.8986 | - | - | - | - |
915
+ | 0.8768 | 500 | 0.6177 | 0.3695 | 0.7388 | 0.3862 | 0.4982 |
916
+ | 0.8943 | 510 | 0.8443 | - | - | - | - |
917
+ | 0.9119 | 520 | 0.5454 | - | - | - | - |
918
+ | 0.9294 | 530 | 0.9840 | - | - | - | - |
919
+ | 0.9470 | 540 | 0.6111 | - | - | - | - |
920
+ | 0.9645 | 550 | 0.7095 | - | - | - | - |
921
+ | 0.9820 | 560 | 0.8391 | - | - | - | - |
922
+ | 0.9996 | 570 | 0.6461 | - | - | - | - |
923
+ | 1.0158 | 580 | 1.3053 | - | - | - | - |
924
+ | 1.0333 | 590 | 0.9817 | - | - | - | - |
925
+ | 1.0509 | 600 | 1.0531 | - | - | - | - |
926
+ | 1.0684 | 610 | 0.9087 | - | - | - | - |
927
+ | 1.0859 | 620 | 0.9186 | - | - | - | - |
928
+ | 1.1035 | 630 | 1.0373 | - | - | - | - |
929
+ | 1.1210 | 640 | 0.9417 | - | - | - | - |
930
+ | 1.1385 | 650 | 0.9963 | - | - | - | - |
931
+ | 1.1561 | 660 | 0.9058 | - | - | - | - |
932
+ | 1.1736 | 670 | 0.9252 | - | - | - | - |
933
+ | 1.1911 | 680 | 1.0170 | - | - | - | - |
934
+ | 1.2087 | 690 | 0.9957 | - | - | - | - |
935
+ | 1.2262 | 700 | 0.8720 | - | - | - | - |
936
+ | 1.2438 | 710 | 0.8776 | - | - | - | - |
937
+ | 1.2613 | 720 | 0.8562 | - | - | - | - |
938
+ | 1.2788 | 730 | 0.8772 | - | - | - | - |
939
+ | 1.2964 | 740 | 0.9591 | - | - | - | - |
940
+ | 1.3139 | 750 | 0.9495 | - | - | - | - |
941
+ | 1.3314 | 760 | 0.9933 | - | - | - | - |
942
+ | 1.3490 | 770 | 0.8449 | - | - | - | - |
943
+ | 1.3665 | 780 | 0.7833 | - | - | - | - |
944
+ | 1.3840 | 790 | 0.9574 | - | - | - | - |
945
+ | 1.4016 | 800 | 0.7727 | - | - | - | - |
946
+ | 1.4191 | 810 | 0.8997 | - | - | - | - |
947
+ | 1.4367 | 820 | 0.8796 | - | - | - | - |
948
+ | 1.4542 | 830 | 0.8535 | - | - | - | - |
949
+ | 1.4717 | 840 | 1.0049 | - | - | - | - |
950
+ | 1.4893 | 850 | 0.8912 | - | - | - | - |
951
+ | 1.5068 | 860 | 0.9883 | - | - | - | - |
952
+ | 1.5243 | 870 | 0.7190 | - | - | - | - |
953
+ | 1.5419 | 880 | 0.9274 | - | - | - | - |
954
+ | 1.5594 | 890 | 0.8372 | - | - | - | - |
955
+ | 1.5769 | 900 | 0.7986 | - | - | - | - |
956
+ | 1.5945 | 910 | 0.7205 | - | - | - | - |
957
+ | 1.6120 | 920 | 0.5797 | - | - | - | - |
958
+ | 1.6295 | 930 | 0.6741 | - | - | - | - |
959
+ | 1.6471 | 940 | 0.5253 | - | - | - | - |
960
+ | 1.6646 | 950 | 0.1963 | - | - | - | - |
961
+ | 1.6822 | 960 | 0.4864 | - | - | - | - |
962
+ | 1.6997 | 970 | 0.7439 | - | - | - | - |
963
+ | 1.7172 | 980 | 0.6164 | - | - | - | - |
964
+ | 1.7348 | 990 | 0.3680 | - | - | - | - |
965
+ | 1.7523 | 1000 | 0.5521 | 0.3775 | 0.7393 | 0.4401 | 0.5190 |
966
+ | 1.7698 | 1010 | 0.2149 | - | - | - | - |
967
+ | 1.7874 | 1020 | 0.5544 | - | - | - | - |
968
+ | 1.8049 | 1030 | 0.8062 | - | - | - | - |
969
+ | 1.8224 | 1040 | 0.2349 | - | - | - | - |
970
+ | 1.8400 | 1050 | 0.5362 | - | - | - | - |
971
+ | 1.8575 | 1060 | 0.8963 | - | - | - | - |
972
+ | 1.8751 | 1070 | 0.5910 | - | - | - | - |
973
+ | 1.8926 | 1080 | 0.3764 | - | - | - | - |
974
+ | 1.9101 | 1090 | 0.5331 | - | - | - | - |
975
+ | 1.9277 | 1100 | 1.0374 | - | - | - | - |
976
+ | 1.9452 | 1110 | 0.6087 | - | - | - | - |
977
+ | 1.9627 | 1120 | 0.4690 | - | - | - | - |
978
+ | 1.9803 | 1130 | 0.4651 | - | - | - | - |
979
+ | 1.9978 | 1140 | 0.5315 | - | - | - | - |
980
+
981
+ </details>
982
+
983
+ ### Framework Versions
984
+ - Python: 3.11.10
985
+ - Sentence Transformers: 5.2.3
986
+ - Transformers: 5.2.0
987
+ - PyTorch: 2.10.0+cu128
988
+ - Accelerate: 1.12.0
989
+ - Datasets: 4.5.0
990
+ - Tokenizers: 0.22.2
991
+
992
+ ## Citation
993
+
994
+ ### BibTeX
995
+
996
+ #### Sentence Transformers
997
+ ```bibtex
998
+ @inproceedings{reimers-2019-sentence-bert,
999
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1000
+ author = "Reimers, Nils and Gurevych, Iryna",
1001
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1002
+ month = "11",
1003
+ year = "2019",
1004
+ publisher = "Association for Computational Linguistics",
1005
+ url = "https://arxiv.org/abs/1908.10084",
1006
+ }
1007
+ ```
1008
+
1009
+ #### SpladeLoss
1010
+ ```bibtex
1011
+ @misc{formal2022distillationhardnegativesampling,
1012
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
1013
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
1014
+ year={2022},
1015
+ eprint={2205.04733},
1016
+ archivePrefix={arXiv},
1017
+ primaryClass={cs.IR},
1018
+ url={https://arxiv.org/abs/2205.04733},
1019
+ }
1020
+ ```
1021
+
1022
+ #### SparseMultipleNegativesRankingLoss
1023
+ ```bibtex
1024
+ @misc{henderson2017efficient,
1025
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1026
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1027
+ year={2017},
1028
+ eprint={1705.00652},
1029
+ archivePrefix={arXiv},
1030
+ primaryClass={cs.CL}
1031
+ }
1032
+ ```
1033
+
1034
+ #### FlopsLoss
1035
+ ```bibtex
1036
+ @article{paria2020minimizing,
1037
+ title={Minimizing flops to learn efficient sparse representations},
1038
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
1039
+ journal={arXiv preprint arXiv:2004.05665},
1040
+ year={2020}
1041
+ }
1042
+ ```
1043
+
1044
+ <!--
1045
+ ## Glossary
1046
+
1047
+ *Clearly define terms in order to be accessible across audiences.*
1048
+ -->
1049
+
1050
+ <!--
1051
+ ## Model Card Authors
1052
+
1053
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1054
+ -->
1055
+
1056
+ <!--
1057
+ ## Model Card Contact
1058
+
1059
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1060
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.3",
5
+ "transformers": "5.2.0",
6
+ "pytorch": "2.10.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
document_0_MLMTransformer/config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NewForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "auto_map": {
7
+ "AutoConfig": "configuration.NewConfig",
8
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
9
+ "AutoModelForMaskedLM": "modeling.NewForMaskedLM",
10
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
11
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
12
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
13
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
14
+ },
15
+ "classifier_dropout": 0.1,
16
+ "dtype": "float32",
17
+ "hidden_act": "gelu",
18
+ "hidden_dropout_prob": 0.1,
19
+ "hidden_size": 768,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "layer_norm_eps": 1e-12,
23
+ "layer_norm_type": "layer_norm",
24
+ "logn_attention_clip1": false,
25
+ "logn_attention_scale": false,
26
+ "max_position_embeddings": 8192,
27
+ "model_type": "new",
28
+ "num_attention_heads": 12,
29
+ "num_hidden_layers": 12,
30
+ "pack_qkv": true,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "rope",
33
+ "rope_parameters": null,
34
+ "rope_theta": 500000,
35
+ "transformers_version": "5.2.0",
36
+ "type_vocab_size": 0,
37
+ "unpad_inputs": false,
38
+ "use_memory_efficient_attention": false,
39
+ "vocab_size": 30522
40
+ }
document_0_MLMTransformer/configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
document_0_MLMTransformer/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a31dcc6d79cb8ce449a436f13b6031ba24d60ed20e074f5532d917b7559473dd
3
+ size 643355976
document_0_MLMTransformer/modeling.py ADDED
@@ -0,0 +1,1418 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """PyTorch NEW model."""
17
+
18
+ import math
19
+ from dataclasses import dataclass
20
+ from typing import List, Optional, Tuple, Union
21
+
22
+ import torch
23
+ import torch.utils.checkpoint
24
+ from torch import nn
25
+
26
+ from transformers.activations import ACT2FN
27
+ from transformers.modeling_outputs import (
28
+ BaseModelOutput,
29
+ BaseModelOutputWithPooling,
30
+ MaskedLMOutput,
31
+ MultipleChoiceModelOutput,
32
+ QuestionAnsweringModelOutput,
33
+ SequenceClassifierOutput,
34
+ ModelOutput,
35
+ )
36
+ from transformers.modeling_utils import PreTrainedModel
37
+ from transformers.utils import logging
38
+
39
+ try:
40
+ import xformers.ops as xops
41
+ except ImportError as e:
42
+ xops = None
43
+
44
+ from .configuration import NewConfig
45
+
46
+
47
+ logger = logging.get_logger(__name__)
48
+
49
+
50
+ # Adapted from https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/bert_padding.py
51
+ # Which was adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
52
+ class IndexFirstAxis(torch.autograd.Function):
53
+ @staticmethod
54
+ def forward(ctx, input, indices):
55
+ ctx.save_for_backward(indices)
56
+ assert input.ndim >= 2
57
+ ctx.first_axis_dim, other_shape = input.shape[0], input.shape[1:]
58
+ second_dim = other_shape.numel()
59
+ # TD [2022-03-04] For some reason torch.gather is a bit faster than indexing.
60
+ # return input[indices]
61
+ # return torch.gather(
62
+ # rearrange(input, "b ... -> b (...)"), 0, repeat(indices, "z -> z d", d=second_dim)
63
+ # ).reshape(-1, *other_shape)
64
+ return torch.gather(
65
+ input.view(ctx.first_axis_dim, second_dim),
66
+ 0,
67
+ indices.unsqueeze(-1).expand(indices.size(0), second_dim)
68
+ ).reshape(-1, *other_shape)
69
+
70
+ @staticmethod
71
+ def backward(ctx, grad_output):
72
+ (indices,) = ctx.saved_tensors
73
+ assert grad_output.ndim >= 2
74
+ other_shape = grad_output.shape[1:]
75
+ # grad_output = rearrange(grad_output, "b ... -> b (...)")
76
+ grad_output = grad_output.view(grad_output.size(0), other_shape.numel())
77
+ grad_input = torch.zeros(
78
+ [ctx.first_axis_dim, grad_output.shape[1]],
79
+ device=grad_output.device,
80
+ dtype=grad_output.dtype,
81
+ )
82
+ # TD [2022-03-04] For some reason torch.scatter is a bit faster than indexing.
83
+ # grad_input[indices] = grad_output
84
+ # grad_input.scatter_(0, repeat(indices, "z -> z d", d=grad_output.shape[1]), grad_output)
85
+ grad_input.scatter_(
86
+ 0, indices.unsqueeze(-1).expand(indices.size(0), grad_output.size(1)), grad_output
87
+ )
88
+ return grad_input.reshape(ctx.first_axis_dim, *other_shape), None
89
+
90
+
91
+ index_first_axis = IndexFirstAxis.apply
92
+
93
+
94
+ def unpad_input(hidden_states, attention_mask=None, indices=None):
95
+ """
96
+ Arguments:
97
+ hidden_states: (batch, seqlen, ...)
98
+ attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
99
+ indices: (total_nnz), the indices of non-masked tokens from the flattened input sequence.
100
+ Return:
101
+ hidden_states: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
102
+ """
103
+ if indices is None:
104
+ assert attention_mask is not None
105
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
106
+
107
+ # TD [2022-03-04] We don't want to index with a bool mask, because Pytorch will expand the
108
+ # bool mask, then call nonzero to get the indices, then index with those. The indices is @dim
109
+ # times larger than it needs to be, wasting memory. It's faster and more memory-efficient to
110
+ # index with integer indices. Moreover, torch's index is a bit slower than it needs to be,
111
+ # so we write custom forward and backward to make it a bit faster.
112
+ hidden_states = hidden_states.view(-1, *hidden_states.shape[2:])
113
+ return index_first_axis(hidden_states, indices)
114
+
115
+
116
+ class IndexPutFirstAxis(torch.autograd.Function):
117
+ @staticmethod
118
+ def forward(
119
+ ctx,
120
+ values: torch.Tensor,
121
+ indices: torch.Tensor,
122
+ first_axis_dim
123
+ ) -> torch.Tensor:
124
+ ctx.save_for_backward(indices)
125
+ assert indices.ndim == 1
126
+ assert values.ndim >= 2
127
+ output = torch.zeros(
128
+ first_axis_dim, *values.shape[1:], device=values.device, dtype=values.dtype
129
+ )
130
+ output[indices] = values
131
+ return output
132
+
133
+ @staticmethod
134
+ def backward(ctx, grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
135
+ indices, = ctx.saved_tensors
136
+ grad_values = grad_output[indices]
137
+ return grad_values, None, None
138
+
139
+
140
+ index_put_first_axis = IndexPutFirstAxis.apply
141
+
142
+
143
+ def pad_input(inputs: torch.Tensor, indices: torch.Tensor, batch: int, seqlen: int) -> torch.Tensor:
144
+ """Add padding to sequences.
145
+
146
+ Arguments:
147
+ inputs: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
148
+ indices: (total_nnz), `indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()`
149
+ batch: int batch_size
150
+ seqlen: int max sequence length
151
+
152
+ Returns:
153
+ inputs: (batch, seqlen, ...)
154
+ """
155
+ output = index_put_first_axis(inputs, indices, batch * seqlen)
156
+ return output.view(batch, seqlen, *inputs.shape[1:])
157
+
158
+
159
+ def rotate_half(x):
160
+ """Rotates half the hidden dims of the input."""
161
+ x1 = x[..., : x.shape[-1] // 2]
162
+ x2 = x[..., x.shape[-1] // 2 :]
163
+ return torch.cat((-x2, x1), dim=-1)
164
+
165
+
166
+ def apply_rotary_pos_emb(q, k, cos, sin):
167
+ """Applies Rotary Position Embedding to the query and key tensors.
168
+
169
+ Args:
170
+ q (`torch.Tensor`): The query tensor.
171
+ k (`torch.Tensor`): The key tensor.
172
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
173
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
174
+ Returns:
175
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
176
+ """
177
+ cos, sin = cos.to(q.dtype), sin.to(q.dtype)
178
+ q_embed = (q * cos) + (rotate_half(q) * sin)
179
+ k_embed = (k * cos) + (rotate_half(k) * sin)
180
+ return q_embed, k_embed
181
+
182
+
183
+ class RotaryEmbedding(torch.nn.Module):
184
+ def __init__(self, dim, max_position_embeddings=512, base=10000.0, device=None):
185
+ super().__init__()
186
+
187
+ self.dim = dim
188
+ self.max_position_embeddings = max_position_embeddings
189
+ self.base = base
190
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
191
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
192
+
193
+ # Build here to make `torch.jit.trace` work.
194
+ self._set_cos_sin_cache(
195
+ seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
196
+ )
197
+
198
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
199
+ self.max_seq_len_cached = seq_len
200
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
201
+
202
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
203
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
204
+ emb = torch.cat((freqs, freqs), dim=-1)
205
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
206
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
207
+
208
+ def forward(self, x, seq_len=None):
209
+ # x: [bs, num_attention_heads, seq_len, head_size]
210
+ if seq_len > self.max_seq_len_cached:
211
+ self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
212
+
213
+ return (
214
+ self.cos_cached[:seq_len, ...].to(dtype=x.dtype),
215
+ self.sin_cached[:seq_len, ...].to(dtype=x.dtype),
216
+ )
217
+
218
+
219
+ class NTKScalingRotaryEmbedding(RotaryEmbedding):
220
+ """RotaryEmbedding extended with fixed and mixed NTK scaling. https://kexue.fm/archives/9706 """
221
+
222
+ def __init__(self, dim, max_position_embeddings=512, base=10000, device=None, scaling_factor=1.0, mixed_b=None):
223
+ self.scaling_factor = scaling_factor
224
+ self.mixed_b = mixed_b
225
+ super().__init__(dim, max_position_embeddings, base, device)
226
+ max_position_embeddings = max_position_embeddings * self.scaling_factor
227
+ self._set_cos_sin_cache(max_position_embeddings, self.inv_freq.device, torch.get_default_dtype())
228
+
229
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
230
+ self.max_seq_len_cached = seq_len
231
+
232
+ if seq_len > self.max_position_embeddings:
233
+ base = self.base * (self.scaling_factor if self.mixed_b is None else 1)
234
+ inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
235
+
236
+ if self.mixed_b is None:
237
+ inv_freq = inv_freq / self.scaling_factor ** (2 / self.dim) # (6)
238
+ else:
239
+ a = torch.tensor(self.scaling_factor).log() / (self.dim / 2) ** self.mixed_b # (13)
240
+ lambda_1_m = (a * torch.arange(1, self.dim // 2 + 1).float().to(device) ** self.mixed_b).exp() # (12)
241
+ inv_freq = inv_freq / lambda_1_m # (10)
242
+
243
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
244
+
245
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
246
+
247
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
248
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
249
+ emb = torch.cat((freqs, freqs), dim=-1)
250
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
251
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
252
+
253
+
254
+ class RMSNorm(nn.Module):
255
+ def __init__(self, hidden_size, eps=1e-6):
256
+ """
257
+ RMSNorm is equivalent to T5LayerNorm
258
+ """
259
+ super().__init__()
260
+ self.weight = nn.Parameter(torch.ones(hidden_size))
261
+ self.variance_epsilon = eps
262
+
263
+ def forward(self, hidden_states):
264
+ input_dtype = hidden_states.dtype
265
+ hidden_states = hidden_states.to(torch.float32)
266
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
267
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
268
+ return self.weight * hidden_states.to(input_dtype)
269
+
270
+
271
+ LAYER_NORM = {
272
+ 'layer_norm': nn.LayerNorm,
273
+ 'rms_norm': RMSNorm
274
+ }
275
+
276
+
277
+ class NewEmbeddings(nn.Module):
278
+ """
279
+ Embedding and Unpadding.
280
+ """
281
+
282
+ def __init__(self, config: NewConfig):
283
+ super().__init__()
284
+ self.padding_idx = config.pad_token_id
285
+ self.word_embeddings = nn.Embedding(
286
+ config.vocab_size, config.hidden_size, padding_idx=self.padding_idx
287
+ )
288
+
289
+ self.position_embedding_type = config.position_embedding_type
290
+ if self.position_embedding_type == 'absolute':
291
+ self.position_embeddings = nn.Embedding(
292
+ config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
293
+ )
294
+ elif self.position_embedding_type == 'rope':
295
+ self._init_rope(config)
296
+ else:
297
+ raise ValueError
298
+
299
+ self.type_vocab_size = config.type_vocab_size
300
+ if self.type_vocab_size > 0:
301
+ self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
302
+
303
+ # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
304
+ # any TensorFlow checkpoint file
305
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
306
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
307
+ # position_ids is contiguous in memory and excluded when serialized
308
+ self.register_buffer(
309
+ "position_ids", torch.arange(config.max_position_embeddings), persistent=False
310
+ )
311
+
312
+ def _init_rope(self, config):
313
+ kwargs = dict(
314
+ dim=int(config.hidden_size / config.num_attention_heads),
315
+ max_position_embeddings=config.max_position_embeddings,
316
+ base=config.rope_theta
317
+ )
318
+ if config.rope_scaling is None:
319
+ self.rotary_emb = RotaryEmbedding(**kwargs)
320
+ else:
321
+ kwargs.update(scaling_factor=config.rope_scaling["factor"])
322
+ scaling_type = config.rope_scaling["type"]
323
+ if scaling_type == 'ntk':
324
+ kwargs.update(mixed_b=config.rope_scaling.get('mixed_b', None))
325
+ self.rotary_emb = NTKScalingRotaryEmbedding(**kwargs)
326
+ # elif scaling_type == "linear":
327
+ # self.rotary_emb = LinearScalingRotaryEmbedding(**kwargs)
328
+ # elif scaling_type == "dynamic":
329
+ # self.rotary_emb = DynamicNTKScalingRotaryEmbedding(**kwargs)
330
+ else:
331
+ raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
332
+
333
+ def forward(
334
+ self,
335
+ unpad_inputs: bool,
336
+ input_ids: Optional[torch.Tensor] = None,
337
+ attention_mask: Optional[torch.Tensor] = None,
338
+ length: Optional[List[int]] = None,
339
+ token_type_ids: Optional[torch.Tensor] = None,
340
+ position_ids: Optional[torch.Tensor] = None,
341
+ inputs_embeds: Optional[torch.Tensor] = None,
342
+ ) -> Tuple[torch.Tensor, torch.Tensor, Optional[Tuple], Optional[List[int]]]:
343
+ """
344
+ """
345
+ if inputs_embeds is None:
346
+ device, input_shape = input_ids.device, input_ids.shape
347
+ else:
348
+ device, input_shape = inputs_embeds.device, inputs_embeds.shape[:2]
349
+ batch_size, seq_length = input_shape
350
+
351
+ # Set attention_mask if it's None
352
+ if attention_mask is None:
353
+ attention_mask = torch.ones(input_shape, device=device)
354
+ if length is not None:
355
+ for i, l in enumerate(length):
356
+ attention_mask[i, l:] = 0
357
+
358
+ # Set attention_mask_bool for unpadding
359
+ if unpad_inputs:
360
+ attention_mask_bool = attention_mask.bool()
361
+ if length is None:
362
+ length = attention_mask.sum(-1).tolist()
363
+
364
+ # Get word embeddings
365
+ if inputs_embeds is None:
366
+ if unpad_inputs:
367
+ input_ids = input_ids[attention_mask_bool].unsqueeze(0)
368
+ inputs_embeds = self.word_embeddings(input_ids)
369
+ else:
370
+ if unpad_inputs:
371
+ inputs_embeds = inputs_embeds[attention_mask_bool].unsqueeze(0)
372
+ embeddings = inputs_embeds
373
+
374
+ # Set and unpad position_ids
375
+ if position_ids is None:
376
+ if seq_length > self.position_ids.size(0):
377
+ self.register_buffer(
378
+ "position_ids", torch.arange(seq_length, device=embeddings.device), persistent=False
379
+ )
380
+ if unpad_inputs:
381
+ # [1, cumsum_seq_len]
382
+ position_ids = torch.cat([self.position_ids[:l] for l in length]).unsqueeze(0)
383
+ else:
384
+ # [bs, seq_len]
385
+ position_ids = self.position_ids[:seq_length].expand(batch_size, -1)
386
+ elif unpad_inputs:
387
+ position_ids = position_ids[attention_mask_bool].unsqueeze(0) # [1, cumsum_seq_len]
388
+
389
+ # Compute rotary embedding
390
+ if self.position_embedding_type == 'rope':
391
+ rope_cos, rope_sin = self.rotary_emb(inputs_embeds, seq_len=seq_length)
392
+ rope_cos = rope_cos[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
393
+ rope_sin = rope_sin[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
394
+ rope_embeds = rope_cos, rope_sin
395
+ else:
396
+ rope_embeds = None
397
+
398
+ if self.type_vocab_size > 0:
399
+ if token_type_ids is None:
400
+ token_type_ids = position_ids.mul(0)
401
+ else:
402
+ if self.type_vocab_size < 2:
403
+ token_type_ids.mul_(0)
404
+ if unpad_inputs:
405
+ token_type_ids = token_type_ids[attention_mask_bool].unsqueeze(0)
406
+
407
+ token_type_embeddings = self.token_type_embeddings(token_type_ids)
408
+ embeddings = embeddings + token_type_embeddings
409
+
410
+ # BERT position
411
+ if self.position_embedding_type == "absolute":
412
+ position_embeddings = self.position_embeddings(position_ids)
413
+ embeddings = embeddings + position_embeddings
414
+
415
+ embeddings = self.LayerNorm(embeddings)
416
+ embeddings = self.dropout(embeddings)
417
+
418
+ return embeddings, attention_mask, rope_embeds, length
419
+
420
+
421
+ class NewAttention(nn.Module):
422
+ def __init__(self, config: NewConfig, pack_qkv=None, use_memory_efficient_attention=None):
423
+ super().__init__()
424
+ self.config = config
425
+ if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
426
+ raise ValueError(
427
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
428
+ f"heads ({config.num_attention_heads})"
429
+ )
430
+
431
+ self.hidden_size = config.hidden_size
432
+ self.num_attention_heads = config.num_attention_heads
433
+ self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
434
+ self.all_head_size = self.num_attention_heads * self.attention_head_size
435
+
436
+ if pack_qkv is None:
437
+ pack_qkv = config.pack_qkv
438
+ self.pack_qkv = pack_qkv
439
+
440
+ if self.pack_qkv:
441
+ self.qkv_proj = nn.Linear(config.hidden_size, self.all_head_size * 3, bias=True)
442
+ else:
443
+ self.q_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
444
+ self.k_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
445
+ self.v_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
446
+
447
+ self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
448
+ self.o_proj = nn.Linear(config.hidden_size, config.hidden_size, bias=True)
449
+
450
+ if use_memory_efficient_attention is None:
451
+ use_memory_efficient_attention = self.config.use_memory_efficient_attention
452
+ self.use_memory_efficient_attention = use_memory_efficient_attention
453
+ self.memory_efficient_attention = None if xops is None else xops.memory_efficient_attention
454
+ if self.use_memory_efficient_attention:
455
+ assert self.memory_efficient_attention is not None, 'please install xformers'
456
+
457
+ def forward(
458
+ self,
459
+ hidden_states: torch.Tensor,
460
+ attention_bias: torch.FloatTensor,
461
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
462
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
463
+ attention_scale: Optional[torch.FloatTensor] = None,
464
+ head_mask: Optional[torch.FloatTensor] = None,
465
+ output_attentions: Optional[bool] = False,
466
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
467
+ ) -> Tuple[torch.Tensor, ...]:
468
+ shape_hd = (self.num_attention_heads, self.attention_head_size)
469
+ # qkv
470
+ if self.pack_qkv and qkv_inputs is None:
471
+ qkv_pack = self.qkv_proj(hidden_states).split(self.all_head_size, dim=-1)
472
+ else:
473
+ if qkv_inputs is None:
474
+ qkv_inputs = (hidden_states, hidden_states, hidden_states)
475
+ qkv_pack = [
476
+ getattr(self, n + '_proj')(s) for s, n in zip(qkv_inputs, 'qkv')
477
+ ]
478
+ query_states, key_states, value_states = [t.view(t.shape[:-1] + shape_hd) for t in qkv_pack]
479
+
480
+ if self.config.position_embedding_type == 'rope':
481
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, *rope_embeds)
482
+
483
+ dtype = query_states.dtype
484
+
485
+ if self.config.logn_attention_scale and attention_scale is not None:
486
+ # https://kexue.fm/archives/8823
487
+ query_states = query_states * attention_scale.to(dtype)
488
+
489
+ if padding_inputs is not None:
490
+ query_states = pad_input(query_states.squeeze(), *padding_inputs)
491
+ key_states = pad_input(key_states.squeeze(), *padding_inputs)
492
+ value_states = pad_input(value_states.squeeze(), *padding_inputs)
493
+
494
+ if self.use_memory_efficient_attention:
495
+ assert self.memory_efficient_attention is not None, "xformers is not loaded"
496
+ assert output_attentions is False, "memory_efficient_attention do not output attentions"
497
+ assert head_mask is None, "Not support yet"
498
+ attention_probs = None
499
+ if torch.is_tensor(attention_bias):
500
+ attention_bias = attention_bias.to(dtype)
501
+ context_layer = self.memory_efficient_attention(
502
+ query_states,
503
+ key_states,
504
+ value_states,
505
+ attn_bias=attention_bias,
506
+ p=self.dropout.p
507
+ )
508
+ else:
509
+ if output_attentions and isinstance(self, NewSdpaAttention):
510
+ raise RuntimeError("SDPA do not output attentions")
511
+ context_layer, attention_probs = self._attention(
512
+ query_states, key_states, value_states, attention_bias, head_mask
513
+ )
514
+
515
+ if padding_inputs is not None:
516
+ context_layer = unpad_input(context_layer, indices=padding_inputs[0])
517
+
518
+ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
519
+ context_layer = context_layer.view(new_context_layer_shape)
520
+
521
+ # output proj
522
+ attn_output = self.o_proj(context_layer)
523
+
524
+ # add attentions if we output them
525
+ outputs = (attn_output, attention_probs) if output_attentions else (attn_output,)
526
+ return outputs
527
+
528
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
529
+ """
530
+ Args:
531
+ q/k/v: (B, L, n_head, head_dim),
532
+ Returns:
533
+ attn_output: (B L, n_head, head_dim)
534
+ """
535
+ query_states = query_states.transpose(1, 2)
536
+ key_states = key_states.transpose(1, 2)
537
+ value_states = value_states.transpose(1, 2)
538
+ # Take the dot product between "query" and "key" to get the raw attention scores.
539
+ attention_scores = torch.matmul(query_states, key_states.transpose(-1, -2))
540
+
541
+ attention_scores = attention_scores / math.sqrt(self.attention_head_size)
542
+ if attention_bias is not None:
543
+ # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
544
+ attention_scores = attention_scores + attention_bias
545
+
546
+ # Normalize the attention scores to probabilities.
547
+ attention_probs = nn.functional.softmax(attention_scores, dim=-1)
548
+
549
+ # This is actually dropping out entire tokens to attend to, which might
550
+ # seem a bit unusual, but is taken from the original Transformer paper.
551
+ if self.dropout.p > 0:
552
+ attention_probs = self.dropout(attention_probs)
553
+
554
+ # Mask heads if we want to
555
+ if head_mask is not None:
556
+ attention_probs = attention_probs * head_mask
557
+
558
+ context_layer = torch.matmul(attention_probs, value_states)
559
+
560
+ context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
561
+ return context_layer, attention_probs
562
+
563
+
564
+ class NewSdpaAttention(NewAttention):
565
+ """
566
+ New attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
567
+ `NewAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
568
+ SDPA API.
569
+ """
570
+ def __init__(self, config: NewConfig, **kwargs):
571
+ super().__init__(config, **kwargs)
572
+ # torch.backends.cuda.enable_mem_efficient_sdp(False)
573
+ # logger.warning(
574
+ # "Disable memory efficient attention kernel for `NewSdpaAttention`, you can set "
575
+ # "`use_memory_efficient_attention=True` if it expected to use."
576
+ # )
577
+
578
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
579
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
580
+ query_states.transpose(1, 2),
581
+ key_states.transpose(1, 2),
582
+ value_states.transpose(1, 2),
583
+ attn_mask=attention_bias,
584
+ dropout_p=self.dropout.p if self.training else 0.0,
585
+ )
586
+ attn_output = attn_output.permute(0, 2, 1, 3).contiguous()
587
+ return attn_output, None
588
+
589
+
590
+ NEW_ATTENTION_CLASSES = {
591
+ "eager": NewAttention,
592
+ # "flash_attention_2": , # TODO
593
+ "sdpa": NewSdpaAttention,
594
+ }
595
+
596
+
597
+ class NewGatedMLP(nn.Module):
598
+ """
599
+ GLU Variants Improve Transformer.
600
+ """
601
+
602
+ def __init__(self, config: NewConfig):
603
+ super().__init__()
604
+ self.intermediate_size = config.intermediate_size
605
+ self.up_gate_proj = nn.Linear(config.hidden_size, self.intermediate_size * 2, bias=False)
606
+ self.down_proj = nn.Linear(self.intermediate_size, config.hidden_size, bias=True)
607
+ self.act_fn = ACT2FN[config.hidden_act]
608
+ if config.hidden_dropout_prob > 0:
609
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
610
+ else:
611
+ self.hidden_dropout = None
612
+
613
+ def forward(self, hidden_states):
614
+ up_gate = self.up_gate_proj(hidden_states)
615
+ up_states, gate = torch.split(up_gate, self.intermediate_size, dim=-1)
616
+ gate = self.act_fn(gate)
617
+ gated_states = gate * up_states
618
+ if self.hidden_dropout is not None:
619
+ gated_states = self.hidden_dropout(gated_states)
620
+ down_states = self.down_proj(gated_states)
621
+ return down_states
622
+
623
+
624
+ class NewLayer(nn.Module):
625
+ def __init__(
626
+ self,
627
+ config: NewConfig,
628
+ pack_qkv=None,
629
+ use_memory_efficient_attention=None,
630
+ attn_implementation=None
631
+ ):
632
+ super().__init__()
633
+ if attn_implementation is None:
634
+ attn_implementation = config._attn_implementation
635
+ if use_memory_efficient_attention is None:
636
+ use_memory_efficient_attention = config.use_memory_efficient_attention
637
+ if use_memory_efficient_attention:
638
+ if attn_implementation != 'eager':
639
+ logger.warning_once(f"Override {attn_implementation=} to 'eager' as {use_memory_efficient_attention=}")
640
+ attn_implementation = 'eager' # Since it will be SDPA by default for torch>=2.1.1
641
+ self.attention = NEW_ATTENTION_CLASSES[attn_implementation](
642
+ config, pack_qkv=pack_qkv, use_memory_efficient_attention=use_memory_efficient_attention
643
+ )
644
+ self.mlp = NewGatedMLP(config)
645
+
646
+ ln_class = LAYER_NORM[config.layer_norm_type]
647
+ self.attn_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
648
+ self.mlp_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
649
+
650
+ if config.hidden_dropout_prob > 0:
651
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
652
+ else:
653
+ self.hidden_dropout = None
654
+
655
+ def forward(
656
+ self,
657
+ hidden_states: torch.Tensor,
658
+ attention_bias: torch.FloatTensor,
659
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
660
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
661
+ attention_scale: Optional[torch.FloatTensor] = None,
662
+ subset_indices: Optional[torch.LongTensor] = None,
663
+ head_mask: Optional[torch.FloatTensor] = None,
664
+ output_attentions: Optional[bool] = False,
665
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
666
+ ) -> Tuple[torch.Tensor, ...]:
667
+ # Multi head self attention
668
+ residual = hidden_states if qkv_inputs is None else qkv_inputs[0]
669
+ attention_outputs = self.attention(
670
+ hidden_states,
671
+ attention_bias,
672
+ rope_embeds,
673
+ padding_inputs,
674
+ attention_scale,
675
+ head_mask,
676
+ output_attentions=output_attentions,
677
+ qkv_inputs=qkv_inputs,
678
+ )
679
+ hidden_states = attention_outputs[0]
680
+ if self.hidden_dropout is not None:
681
+ hidden_states = self.hidden_dropout(hidden_states)
682
+ hidden_states = residual + hidden_states
683
+
684
+ # In pretraining, after the attention of last layer, we only need the masked tokens.
685
+ if subset_indices is not None:
686
+ hidden_states = hidden_states[subset_indices]
687
+
688
+ hidden_states = self.attn_ln(hidden_states)
689
+
690
+ # Fully Connected
691
+ residual = hidden_states
692
+ hidden_states = self.mlp(hidden_states)
693
+ if self.hidden_dropout is not None:
694
+ hidden_states = self.hidden_dropout(hidden_states)
695
+ hidden_states = residual + hidden_states
696
+ hidden_states = self.mlp_ln(hidden_states)
697
+
698
+ # add self attentions if we output attention weights
699
+ outputs = (hidden_states,) + attention_outputs[1:]
700
+ return outputs
701
+
702
+
703
+ class NewEncoder(nn.Module):
704
+ def __init__(self, config):
705
+ super().__init__()
706
+ self.config = config
707
+ self.layer = nn.ModuleList([NewLayer(config) for _ in range(config.num_hidden_layers)])
708
+ self.gradient_checkpointing = False
709
+
710
+ def forward(
711
+ self,
712
+ hidden_states: torch.Tensor,
713
+ attention_bias: Optional[torch.FloatTensor] = None,
714
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
715
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
716
+ attention_scale: Optional[torch.FloatTensor] = None,
717
+ subset_indices: Optional[torch.LongTensor] = None,
718
+ head_mask: Optional[torch.FloatTensor] = None,
719
+ output_attentions: Optional[bool] = False,
720
+ output_hidden_states: Optional[bool] = False,
721
+ return_dict: Optional[bool] = True,
722
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
723
+ all_hidden_states = () if output_hidden_states else None
724
+ all_self_attentions = () if output_attentions else None
725
+
726
+ for i, layer_module in enumerate(self.layer):
727
+ if output_hidden_states:
728
+ all_hidden_states = all_hidden_states + (hidden_states,)
729
+
730
+ if i >= len(self.layer) - 1:
731
+ layer_subset_indices = subset_indices
732
+ else:
733
+ layer_subset_indices = None
734
+
735
+ layer_head_mask = head_mask[i] if head_mask is not None else None
736
+
737
+ if self.gradient_checkpointing and self.training:
738
+ layer_outputs = self._gradient_checkpointing_func(
739
+ layer_module.__call__,
740
+ hidden_states,
741
+ attention_bias,
742
+ rope_embeds,
743
+ padding_inputs,
744
+ attention_scale,
745
+ layer_subset_indices,
746
+ layer_head_mask,
747
+ )
748
+ else:
749
+ layer_outputs = layer_module(
750
+ hidden_states,
751
+ attention_bias,
752
+ rope_embeds,
753
+ padding_inputs,
754
+ attention_scale,
755
+ layer_subset_indices,
756
+ layer_head_mask,
757
+ output_attentions,
758
+ )
759
+
760
+ hidden_states = layer_outputs[0]
761
+ if output_attentions:
762
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
763
+
764
+ if output_hidden_states:
765
+ all_hidden_states = all_hidden_states + (hidden_states,)
766
+
767
+ if not return_dict:
768
+ return tuple(
769
+ v
770
+ for v in [
771
+ hidden_states,
772
+ all_hidden_states,
773
+ all_self_attentions,
774
+ ]
775
+ if v is not None
776
+ )
777
+ return BaseModelOutput(
778
+ last_hidden_state=hidden_states,
779
+ hidden_states=all_hidden_states,
780
+ attentions=all_self_attentions,
781
+ )
782
+
783
+
784
+ # Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->New
785
+ class NewPooler(nn.Module):
786
+ def __init__(self, config):
787
+ super().__init__()
788
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
789
+ self.activation = nn.Tanh()
790
+
791
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
792
+ # We "pool" the model by simply taking the hidden state corresponding
793
+ # to the first token.
794
+ first_token_tensor = hidden_states[:, 0]
795
+ pooled_output = self.dense(first_token_tensor)
796
+ pooled_output = self.activation(pooled_output)
797
+ return pooled_output
798
+
799
+
800
+ class NewPreTrainedModel(PreTrainedModel):
801
+ """
802
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
803
+ models.
804
+ """
805
+
806
+ config_class = NewConfig
807
+ base_model_prefix = "new"
808
+ supports_gradient_checkpointing = True
809
+ _supports_sdpa = True
810
+
811
+ def _init_weights(self, module):
812
+ """Initialize the weights"""
813
+ if isinstance(module, nn.Linear):
814
+ # Slightly different from the TF version which uses truncated_normal for initialization
815
+ # cf https://github.com/pytorch/pytorch/pull/5617
816
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
817
+ if module.bias is not None:
818
+ module.bias.data.zero_()
819
+ elif isinstance(module, nn.Embedding):
820
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
821
+ if module.padding_idx is not None:
822
+ module.weight.data[module.padding_idx].zero_()
823
+ elif isinstance(module, nn.LayerNorm):
824
+ module.bias.data.zero_()
825
+ module.weight.data.fill_(1.0)
826
+
827
+
828
+ class NewModel(NewPreTrainedModel):
829
+ """
830
+ The bare New Model transformer outputting raw hidden-states without any specific head on top.
831
+ """
832
+
833
+ def __init__(self, config: NewConfig, add_pooling_layer=False):
834
+ super().__init__(config)
835
+ self.config = config
836
+
837
+ self.embeddings = NewEmbeddings(config)
838
+ self.encoder = NewEncoder(config)
839
+
840
+ self.pooler = NewPooler(config) if add_pooling_layer else None
841
+
842
+ # Initialize weights and apply final processing
843
+ self.post_init()
844
+
845
+ def get_input_embeddings(self):
846
+ return self.embeddings.word_embeddings
847
+
848
+ def set_input_embeddings(self, value):
849
+ self.embeddings.word_embeddings = value
850
+
851
+ def forward(
852
+ self,
853
+ input_ids: Optional[torch.Tensor] = None,
854
+ attention_mask: Optional[torch.Tensor] = None,
855
+ length: Optional[List[int]] = None,
856
+ subset_indices: Optional[torch.LongTensor] = None,
857
+ token_type_ids: Optional[torch.Tensor] = None,
858
+ position_ids: Optional[torch.Tensor] = None,
859
+ head_mask: Optional[torch.Tensor] = None,
860
+ inputs_embeds: Optional[torch.Tensor] = None,
861
+ output_attentions: Optional[bool] = None,
862
+ output_hidden_states: Optional[bool] = None,
863
+ return_dict: Optional[bool] = None,
864
+ unpad_inputs: Optional[bool] = None,
865
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]:
866
+ r"""
867
+ length (`list` of length `batch_size`, *optional*):
868
+ If is `None`, return padded `last_hidden_state`.
869
+ subset_indices ():
870
+ pass
871
+ unpad_inputs (`bool`, *optional*):
872
+ pass
873
+ """
874
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
875
+ output_hidden_states = (
876
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
877
+ )
878
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
879
+ unpad_inputs = unpad_inputs if unpad_inputs is not None else self.config.unpad_inputs
880
+ output_padded = length is None
881
+
882
+ if input_ids is not None and inputs_embeds is not None:
883
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
884
+ elif input_ids is not None:
885
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
886
+ input_shape = input_ids.size()
887
+ elif inputs_embeds is not None:
888
+ input_shape = inputs_embeds.size()[:-1]
889
+ else:
890
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
891
+
892
+ # TODO: not used
893
+ # # Prepare head mask if needed
894
+ # # 1.0 in head_mask indicate we keep the head
895
+ # # attention_probs has shape bsz x n_heads x N x N
896
+ # # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
897
+ # # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
898
+ # head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
899
+
900
+ # Get embeddings, may unpad them
901
+ (embedding_output, attention_mask, rope_embeds, length) = self.embeddings(
902
+ unpad_inputs,
903
+ input_ids=input_ids,
904
+ attention_mask=attention_mask,
905
+ length=length,
906
+ token_type_ids=token_type_ids,
907
+ position_ids=position_ids,
908
+ inputs_embeds=inputs_embeds
909
+ )
910
+
911
+ batch_size, seq_length = input_shape
912
+ if unpad_inputs and self.config.use_memory_efficient_attention:
913
+ attention_bias = xops.fmha.attn_bias.BlockDiagonalMask.from_seqlens(length)
914
+ else:
915
+ # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
916
+ # ourselves in which case we just need to make it broadcastable to all heads.
917
+ attention_bias = self.get_extended_attention_mask(attention_mask, input_shape)
918
+ if self.config.use_memory_efficient_attention:
919
+ # Invalid shape for attention bias: torch.Size([48, 1, 1, 512]) (expected (48, 12, 512, 512))
920
+ attention_bias = attention_bias.expand(-1, self.config.num_attention_heads, seq_length, -1)
921
+
922
+ padding_inputs = None
923
+ if unpad_inputs and (output_padded or not self.config.use_memory_efficient_attention):
924
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
925
+ if not self.config.use_memory_efficient_attention:
926
+ padding_inputs = (indices, *input_shape)
927
+
928
+ attention_scale = None
929
+ if self.config.logn_attention_scale:
930
+ logger.warning_once("TODO: logn_attention_scale")
931
+ # # attention scale log_512(input_len)
932
+ # attention_scale = attention_mask.sum(1).log() / torch.tensor(self.config.max_position_embeddings).log()
933
+ # # inference-time logn scale need clip 1
934
+ # if self.config.logn_attention_clip1:
935
+ # attention_scale.clip_(1)
936
+ # attention_scale = attention_scale[:, None, None, None]
937
+ # else:
938
+ # attention_scale = None
939
+
940
+ encoder_outputs = self.encoder(
941
+ embedding_output,
942
+ attention_bias=attention_bias,
943
+ rope_embeds=rope_embeds,
944
+ padding_inputs=padding_inputs,
945
+ attention_scale=attention_scale,
946
+ subset_indices=subset_indices,
947
+ head_mask=head_mask,
948
+ output_attentions=output_attentions,
949
+ output_hidden_states=output_hidden_states,
950
+ return_dict=return_dict,
951
+ )
952
+ sequence_output = encoder_outputs[0]
953
+ if unpad_inputs and output_padded:
954
+ sequence_output = pad_input(
955
+ sequence_output.squeeze(), indices, batch_size, seq_length
956
+ )
957
+
958
+ pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
959
+
960
+ if not return_dict:
961
+ return (sequence_output, pooled_output) + encoder_outputs[1:]
962
+
963
+ return BaseModelOutputWithPooling(
964
+ last_hidden_state=sequence_output,
965
+ pooler_output=pooled_output,
966
+ hidden_states=encoder_outputs.hidden_states,
967
+ attentions=encoder_outputs.attentions,
968
+ )
969
+
970
+
971
+ class NewLMPredictionHead(nn.Module):
972
+ def __init__(self, config):
973
+ super().__init__()
974
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
975
+ self.transform_act_fn = ACT2FN[config.hidden_act]
976
+ self.norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
977
+
978
+ # The output weights are the same as the input embeddings, but there is
979
+ # an output-only bias for each token.
980
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
981
+
982
+ def forward(self, hidden_states):
983
+ hidden_states = self.dense(hidden_states)
984
+ hidden_states = self.transform_act_fn(hidden_states)
985
+ hidden_states = self.norm(hidden_states)
986
+ hidden_states = self.decoder(hidden_states)
987
+ return hidden_states
988
+
989
+
990
+ class NewForMaskedLM(NewPreTrainedModel):
991
+ _tied_weights_keys = ["lm_head.decoder.bias", "lm_head.decoder.weight"]
992
+
993
+ def __init__(self, config: NewConfig):
994
+ super().__init__(config)
995
+ self.new = NewModel(config, add_pooling_layer=False)
996
+ self.lm_head = NewLMPredictionHead(config)
997
+ self.loss_fct = nn.CrossEntropyLoss()
998
+
999
+ # Initialize weights and apply final processing
1000
+ self.post_init()
1001
+
1002
+ def get_output_embeddings(self):
1003
+ return self.lm_head.decoder
1004
+
1005
+ def set_output_embeddings(self, new_embeddings):
1006
+ self.lm_head.decoder = new_embeddings
1007
+
1008
+ def forward(
1009
+ self,
1010
+ input_ids: Optional[torch.Tensor] = None,
1011
+ attention_mask: Optional[torch.Tensor] = None,
1012
+ token_type_ids: Optional[torch.Tensor] = None,
1013
+ position_ids: Optional[torch.Tensor] = None,
1014
+ head_mask: Optional[torch.Tensor] = None,
1015
+ inputs_embeds: Optional[torch.Tensor] = None,
1016
+ labels: Optional[torch.Tensor] = None,
1017
+ output_attentions: Optional[bool] = None,
1018
+ output_hidden_states: Optional[bool] = None,
1019
+ return_dict: Optional[bool] = None,
1020
+ unpad_inputs: Optional[bool] = None,
1021
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
1022
+ r"""
1023
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1024
+ Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1025
+ config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
1026
+ loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1027
+ """
1028
+
1029
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1030
+
1031
+ if labels is None or not self.new.config.unpad_inputs:
1032
+ length = None
1033
+ subset_indices = None
1034
+ else:
1035
+ length = attention_mask.sum(-1).tolist()
1036
+ labels = labels[attention_mask.bool()].unsqueeze(0)
1037
+ subset_indices = labels > -100
1038
+
1039
+ outputs = self.new(
1040
+ input_ids,
1041
+ attention_mask=attention_mask,
1042
+ length=length,
1043
+ subset_indices=subset_indices,
1044
+ token_type_ids=token_type_ids,
1045
+ position_ids=position_ids,
1046
+ head_mask=head_mask,
1047
+ inputs_embeds=inputs_embeds,
1048
+ output_attentions=output_attentions,
1049
+ output_hidden_states=output_hidden_states,
1050
+ return_dict=return_dict,
1051
+ unpad_inputs=unpad_inputs,
1052
+ )
1053
+
1054
+ sequence_output = outputs[0]
1055
+ prediction_scores = self.lm_head(sequence_output)
1056
+
1057
+ masked_lm_loss = None
1058
+ if labels is not None:
1059
+ if subset_indices is None:
1060
+ mask = attention_mask.bool()
1061
+ prediction_scores = prediction_scores[mask]
1062
+ labels = labels[mask]
1063
+ else:
1064
+ labels = labels[subset_indices]
1065
+ masked_lm_loss = self.loss_fct(prediction_scores, labels)
1066
+
1067
+ if not return_dict:
1068
+ output = (prediction_scores,) + outputs[2:]
1069
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
1070
+
1071
+ return MaskedLMOutput(
1072
+ loss=masked_lm_loss,
1073
+ logits=prediction_scores,
1074
+ hidden_states=outputs.hidden_states,
1075
+ attentions=outputs.attentions,
1076
+ )
1077
+
1078
+
1079
+ class NewForSequenceClassification(NewPreTrainedModel):
1080
+ def __init__(self, config):
1081
+ super().__init__(config)
1082
+ self.num_labels = config.num_labels
1083
+ self.config = config
1084
+
1085
+ self.new = NewModel(config, add_pooling_layer=True)
1086
+ classifier_dropout = (
1087
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1088
+ )
1089
+ self.dropout = nn.Dropout(classifier_dropout)
1090
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1091
+
1092
+ # Initialize weights and apply final processing
1093
+ self.post_init()
1094
+
1095
+ def forward(
1096
+ self,
1097
+ input_ids: Optional[torch.Tensor] = None,
1098
+ attention_mask: Optional[torch.Tensor] = None,
1099
+ token_type_ids: Optional[torch.Tensor] = None,
1100
+ position_ids: Optional[torch.Tensor] = None,
1101
+ head_mask: Optional[torch.Tensor] = None,
1102
+ inputs_embeds: Optional[torch.Tensor] = None,
1103
+ labels: Optional[torch.Tensor] = None,
1104
+ output_attentions: Optional[bool] = None,
1105
+ output_hidden_states: Optional[bool] = None,
1106
+ return_dict: Optional[bool] = None,
1107
+ unpad_inputs: Optional[bool] = None,
1108
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
1109
+ r"""
1110
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1111
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1112
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1113
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1114
+ """
1115
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1116
+
1117
+ outputs = self.new(
1118
+ input_ids,
1119
+ attention_mask=attention_mask,
1120
+ token_type_ids=token_type_ids,
1121
+ position_ids=position_ids,
1122
+ head_mask=head_mask,
1123
+ inputs_embeds=inputs_embeds,
1124
+ output_attentions=output_attentions,
1125
+ output_hidden_states=output_hidden_states,
1126
+ return_dict=return_dict,
1127
+ unpad_inputs=unpad_inputs,
1128
+ )
1129
+
1130
+ pooled_output = outputs[1]
1131
+
1132
+ pooled_output = self.dropout(pooled_output)
1133
+ logits = self.classifier(pooled_output)
1134
+
1135
+ loss = None
1136
+ if labels is not None:
1137
+ if self.config.problem_type is None:
1138
+ if self.num_labels == 1:
1139
+ self.config.problem_type = "regression"
1140
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1141
+ self.config.problem_type = "single_label_classification"
1142
+ else:
1143
+ self.config.problem_type = "multi_label_classification"
1144
+
1145
+ if self.config.problem_type == "regression":
1146
+ loss_fct = nn.MSELoss()
1147
+ if self.num_labels == 1:
1148
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
1149
+ else:
1150
+ loss = loss_fct(logits, labels)
1151
+ elif self.config.problem_type == "single_label_classification":
1152
+ loss_fct = nn.CrossEntropyLoss()
1153
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1154
+ elif self.config.problem_type == "multi_label_classification":
1155
+ loss_fct = nn.BCEWithLogitsLoss()
1156
+ loss = loss_fct(logits, labels)
1157
+
1158
+ if not return_dict:
1159
+ output = (logits,) + outputs[2:]
1160
+ return ((loss,) + output) if loss is not None else output
1161
+
1162
+ return SequenceClassifierOutput(
1163
+ loss=loss,
1164
+ logits=logits,
1165
+ hidden_states=outputs.hidden_states,
1166
+ attentions=outputs.attentions,
1167
+ )
1168
+
1169
+
1170
+ class NewForMultipleChoice(NewPreTrainedModel):
1171
+ def __init__(self, config):
1172
+ super().__init__(config)
1173
+
1174
+ self.new = NewModel(config, add_pooling_layer=True)
1175
+ classifier_dropout = (
1176
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1177
+ )
1178
+ self.dropout = nn.Dropout(classifier_dropout)
1179
+ self.classifier = nn.Linear(config.hidden_size, 1)
1180
+
1181
+ # Initialize weights and apply final processing
1182
+ self.post_init()
1183
+
1184
+ def forward(
1185
+ self,
1186
+ input_ids: Optional[torch.Tensor] = None,
1187
+ attention_mask: Optional[torch.Tensor] = None,
1188
+ token_type_ids: Optional[torch.Tensor] = None,
1189
+ position_ids: Optional[torch.Tensor] = None,
1190
+ head_mask: Optional[torch.Tensor] = None,
1191
+ inputs_embeds: Optional[torch.Tensor] = None,
1192
+ labels: Optional[torch.Tensor] = None,
1193
+ output_attentions: Optional[bool] = None,
1194
+ output_hidden_states: Optional[bool] = None,
1195
+ return_dict: Optional[bool] = None,
1196
+ unpad_inputs: Optional[bool] = None,
1197
+ ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
1198
+ r"""
1199
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1200
+ Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
1201
+ num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
1202
+ `input_ids` above)
1203
+ """
1204
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1205
+ num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
1206
+
1207
+ input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
1208
+ attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
1209
+ token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
1210
+ position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
1211
+ inputs_embeds = (
1212
+ inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
1213
+ if inputs_embeds is not None
1214
+ else None
1215
+ )
1216
+
1217
+ outputs = self.new(
1218
+ input_ids,
1219
+ attention_mask=attention_mask,
1220
+ token_type_ids=token_type_ids,
1221
+ position_ids=position_ids,
1222
+ head_mask=head_mask,
1223
+ inputs_embeds=inputs_embeds,
1224
+ output_attentions=output_attentions,
1225
+ output_hidden_states=output_hidden_states,
1226
+ return_dict=return_dict,
1227
+ unpad_inputs=unpad_inputs,
1228
+ )
1229
+
1230
+ pooled_output = outputs[1]
1231
+
1232
+ pooled_output = self.dropout(pooled_output)
1233
+ logits = self.classifier(pooled_output)
1234
+ reshaped_logits = logits.view(-1, num_choices)
1235
+
1236
+ loss = None
1237
+ if labels is not None:
1238
+ loss_fct = nn.CrossEntropyLoss()
1239
+ loss = loss_fct(reshaped_logits, labels)
1240
+
1241
+ if not return_dict:
1242
+ output = (reshaped_logits,) + outputs[2:]
1243
+ return ((loss,) + output) if loss is not None else output
1244
+
1245
+ return MultipleChoiceModelOutput(
1246
+ loss=loss,
1247
+ logits=reshaped_logits,
1248
+ hidden_states=outputs.hidden_states,
1249
+ attentions=outputs.attentions,
1250
+ )
1251
+
1252
+
1253
+ @dataclass
1254
+ class NewTokenClassifierOutput(ModelOutput):
1255
+ loss: Optional[torch.FloatTensor] = None
1256
+ logits: torch.FloatTensor = None
1257
+ last_hidden_state: torch.FloatTensor = None
1258
+ hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
1259
+ attentions: Optional[Tuple[torch.FloatTensor, ...]] = None
1260
+
1261
+
1262
+ class NewForTokenClassification(NewPreTrainedModel):
1263
+ def __init__(self, config):
1264
+ super().__init__(config)
1265
+ self.num_labels = config.num_labels
1266
+
1267
+ self.new = NewModel(config, add_pooling_layer=False)
1268
+ classifier_dropout = (
1269
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1270
+ )
1271
+ self.dropout = nn.Dropout(classifier_dropout)
1272
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1273
+
1274
+ # Initialize weights and apply final processing
1275
+ self.post_init()
1276
+
1277
+ def forward(
1278
+ self,
1279
+ input_ids: Optional[torch.Tensor] = None,
1280
+ attention_mask: Optional[torch.Tensor] = None,
1281
+ token_type_ids: Optional[torch.Tensor] = None,
1282
+ position_ids: Optional[torch.Tensor] = None,
1283
+ head_mask: Optional[torch.Tensor] = None,
1284
+ inputs_embeds: Optional[torch.Tensor] = None,
1285
+ labels: Optional[torch.Tensor] = None,
1286
+ output_attentions: Optional[bool] = None,
1287
+ output_hidden_states: Optional[bool] = None,
1288
+ return_dict: Optional[bool] = None,
1289
+ unpad_inputs: Optional[bool] = None,
1290
+ ) -> Union[Tuple[torch.Tensor], NewTokenClassifierOutput]:
1291
+ r"""
1292
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1293
+ Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
1294
+ """
1295
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1296
+
1297
+ outputs = self.new(
1298
+ input_ids,
1299
+ attention_mask=attention_mask,
1300
+ token_type_ids=token_type_ids,
1301
+ position_ids=position_ids,
1302
+ head_mask=head_mask,
1303
+ inputs_embeds=inputs_embeds,
1304
+ output_attentions=output_attentions,
1305
+ output_hidden_states=output_hidden_states,
1306
+ return_dict=return_dict,
1307
+ unpad_inputs=unpad_inputs,
1308
+ )
1309
+
1310
+ sequence_output = outputs[0]
1311
+
1312
+ sequence_output = self.dropout(sequence_output)
1313
+ logits = self.classifier(sequence_output)
1314
+
1315
+ loss = None
1316
+ if labels is not None:
1317
+ loss_fct = nn.CrossEntropyLoss()
1318
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1319
+
1320
+ if not return_dict:
1321
+ output = (logits,) + outputs[2:]
1322
+ return ((loss,) + output) if loss is not None else output
1323
+
1324
+ return NewTokenClassifierOutput(
1325
+ loss=loss,
1326
+ logits=logits,
1327
+ last_hidden_state=sequence_output,
1328
+ hidden_states=outputs.hidden_states,
1329
+ attentions=outputs.attentions,
1330
+ )
1331
+
1332
+
1333
+ class NewForQuestionAnswering(NewPreTrainedModel):
1334
+ def __init__(self, config):
1335
+ super().__init__(config)
1336
+ self.num_labels = config.num_labels
1337
+
1338
+ self.new = NewModel(config, add_pooling_layer=False)
1339
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
1340
+
1341
+ # Initialize weights and apply final processing
1342
+ self.post_init()
1343
+
1344
+ def forward(
1345
+ self,
1346
+ input_ids: Optional[torch.Tensor] = None,
1347
+ attention_mask: Optional[torch.Tensor] = None,
1348
+ token_type_ids: Optional[torch.Tensor] = None,
1349
+ position_ids: Optional[torch.Tensor] = None,
1350
+ head_mask: Optional[torch.Tensor] = None,
1351
+ inputs_embeds: Optional[torch.Tensor] = None,
1352
+ start_positions: Optional[torch.Tensor] = None,
1353
+ end_positions: Optional[torch.Tensor] = None,
1354
+ output_attentions: Optional[bool] = None,
1355
+ output_hidden_states: Optional[bool] = None,
1356
+ return_dict: Optional[bool] = None,
1357
+ unpad_inputs: Optional[bool] = None,
1358
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
1359
+ r"""
1360
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1361
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1362
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1363
+ are not taken into account for computing the loss.
1364
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1365
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1366
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1367
+ are not taken into account for computing the loss.
1368
+ """
1369
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1370
+
1371
+ outputs = self.new(
1372
+ input_ids,
1373
+ attention_mask=attention_mask,
1374
+ token_type_ids=token_type_ids,
1375
+ position_ids=position_ids,
1376
+ head_mask=head_mask,
1377
+ inputs_embeds=inputs_embeds,
1378
+ output_attentions=output_attentions,
1379
+ output_hidden_states=output_hidden_states,
1380
+ return_dict=return_dict,
1381
+ unpad_inputs=unpad_inputs,
1382
+ )
1383
+
1384
+ sequence_output = outputs[0]
1385
+
1386
+ logits = self.qa_outputs(sequence_output)
1387
+ start_logits, end_logits = logits.split(1, dim=-1)
1388
+ start_logits = start_logits.squeeze(-1).contiguous()
1389
+ end_logits = end_logits.squeeze(-1).contiguous()
1390
+
1391
+ total_loss = None
1392
+ if start_positions is not None and end_positions is not None:
1393
+ # If we are on multi-GPU, split add a dimension
1394
+ if len(start_positions.size()) > 1:
1395
+ start_positions = start_positions.squeeze(-1)
1396
+ if len(end_positions.size()) > 1:
1397
+ end_positions = end_positions.squeeze(-1)
1398
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1399
+ ignored_index = start_logits.size(1)
1400
+ start_positions = start_positions.clamp(0, ignored_index)
1401
+ end_positions = end_positions.clamp(0, ignored_index)
1402
+
1403
+ loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
1404
+ start_loss = loss_fct(start_logits, start_positions)
1405
+ end_loss = loss_fct(end_logits, end_positions)
1406
+ total_loss = (start_loss + end_loss) / 2
1407
+
1408
+ if not return_dict:
1409
+ output = (start_logits, end_logits) + outputs[2:]
1410
+ return ((total_loss,) + output) if total_loss is not None else output
1411
+
1412
+ return QuestionAnsweringModelOutput(
1413
+ loss=total_loss,
1414
+ start_logits=start_logits,
1415
+ end_logits=end_logits,
1416
+ hidden_states=outputs.hidden_states,
1417
+ attentions=outputs.attentions,
1418
+ )
document_0_MLMTransformer/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
document_0_MLMTransformer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
document_0_MLMTransformer/tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "do_lower_case": true,
6
+ "is_local": false,
7
+ "mask_token": "[MASK]",
8
+ "max_length": 8192,
9
+ "model_max_length": 512,
10
+ "pad_to_multiple_of": null,
11
+ "pad_token": "[PAD]",
12
+ "pad_token_type_id": 0,
13
+ "padding_side": "right",
14
+ "sep_token": "[SEP]",
15
+ "stride": 0,
16
+ "strip_accents": null,
17
+ "tokenize_chinese_chars": true,
18
+ "tokenizer_class": "TokenizersBackend",
19
+ "truncation_side": "right",
20
+ "truncation_strategy": "longest_first",
21
+ "unk_token": "[UNK]"
22
+ }
document_1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "log1p_relu",
4
+ "word_embedding_dimension": 30522
5
+ }
modules.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Router"
7
+ }
8
+ ]
query_0_SparseStaticEmbedding/config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "frozen": false
3
+ }
query_0_SparseStaticEmbedding/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db73ae4c08aa8a138704c73f4314296d6f6fe0e4bc2283d11de954db26f6f159
3
+ size 122168
query_0_SparseStaticEmbedding/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
query_0_SparseStaticEmbedding/tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "do_lower_case": true,
6
+ "is_local": false,
7
+ "mask_token": "[MASK]",
8
+ "max_length": 8192,
9
+ "model_max_length": 512,
10
+ "pad_to_multiple_of": null,
11
+ "pad_token": "[PAD]",
12
+ "pad_token_type_id": 0,
13
+ "padding_side": "right",
14
+ "sep_token": "[SEP]",
15
+ "stride": 0,
16
+ "strip_accents": null,
17
+ "tokenize_chinese_chars": true,
18
+ "tokenizer_class": "TokenizersBackend",
19
+ "truncation_side": "right",
20
+ "truncation_strategy": "longest_first",
21
+ "unk_token": "[UNK]"
22
+ }
router_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "types": {
3
+ "query_0_SparseStaticEmbedding": "sentence_transformers.sparse_encoder.models.SparseStaticEmbedding.SparseStaticEmbedding",
4
+ "document_0_MLMTransformer": "sentence_transformers.sparse_encoder.models.MLMTransformer.MLMTransformer",
5
+ "document_1_SpladePooling": "sentence_transformers.sparse_encoder.models.SpladePooling.SpladePooling"
6
+ },
7
+ "structure": {
8
+ "query": [
9
+ "query_0_SparseStaticEmbedding"
10
+ ],
11
+ "document": [
12
+ "document_0_MLMTransformer",
13
+ "document_1_SpladePooling"
14
+ ]
15
+ },
16
+ "parameters": {
17
+ "default_route": "document",
18
+ "allow_empty_key": true
19
+ }
20
+ }