Upload 2-epoch fine-tuned sparse encoder for financial documents

Browse files

Files changed (16) hide show

README.md +1060 -0
config_sentence_transformers.json +14 -0
document_0_MLMTransformer/config.json +40 -0
document_0_MLMTransformer/configuration.py +145 -0
document_0_MLMTransformer/model.safetensors +3 -0
document_0_MLMTransformer/modeling.py +1418 -0
document_0_MLMTransformer/sentence_bert_config.json +4 -0
document_0_MLMTransformer/tokenizer.json +0 -0
document_0_MLMTransformer/tokenizer_config.json +22 -0
document_1_SpladePooling/config.json +5 -0
modules.json +8 -0
query_0_SparseStaticEmbedding/config.json +3 -0
query_0_SparseStaticEmbedding/model.safetensors +3 -0
query_0_SparseStaticEmbedding/tokenizer.json +0 -0
query_0_SparseStaticEmbedding/tokenizer_config.json +22 -0
router_config.json +20 -0

README.md ADDED Viewed

	@@ -0,0 +1,1060 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- sentence-transformers
+- sparse-encoder
+- sparse
+- asymmetric
+- inference-free
+- splade
+- generated_from_trainer
+- dataset_size:18247
+- loss:SpladeLoss
+- loss:SparseMultipleNegativesRankingLoss
+- loss:FlopsLoss
+base_model: opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
+widget:
+- text: '```markdown
+    Interval UK Holdings Limited
+    Balance sheet
+    As at 31 December 2024
+    || Note | 2024 £ | 2023 £ |
+    |-----------------------|------|--------|--------|
+    | Non current assets||||
+    | Investments| 9| -| -|
+    | Current assets||||
+    | Cash at bank and in hand || 115,921| 111,205|
+    ||| 115,921| 111,205|
+    | Creditors: amounts falling due within one year | 10   | (100,017)| (100,344)|
+    | Net current assets|| 15,904 | 10,861 |
+    | Creditors: amounts falling due after more than one year | 11   | (430,000)|
+    (430,000)|
+    | Net liabilities|| (414,096)| (419,139)|
+    | Capital and reserves  ||||
+    | Called-up share capital | 12   | 19,811,905| 19,811,905|
+    | Profit and loss account || (20,226,001)| (20,231,044)|
+    | Shareholder''s deficit || (414,096)| (419,139)|
+    The financial statements of Interval UK Holdings Ltd (03000895) were approved
+    by the Board on 25th September 2025
+    and signed on its behalf by:
+    D
+    DP Ettridge
+    Director
+    10
+    ```'
+- text: '25 | Page
+    2023 (£) 2022 (£) Raw materials and consumables 62,693 50,917 Finished goods and
+    goods for resale 61,306 110,746 Total 123,999 161,663
+    An impairment loss of £36,779 (2022: gain of £33,801) was recognised in cost of
+    sales against stock during the year due to revaluation criteria. The value of
+    stock at year end is not materially different to its replacement cost.
+    Stock
+    Of the amounts owed by group undertakings: £ 398,051 (2022: £1,972,482) are trade
+    related, unsecured and have an interest charge of 0%. £ 3,900,000 (2022: £1,700,000)
+    are a deposit, unsecured and have an interest charge of 5.22%.
+    Stock recognised in cost of sales during the year as an expense was £241,612 (2022:
+    £258,218).
+    Cash at bank and in hand
+    2023 (£) 2022 (£) Trade debtors 544,252 739,053 Amounts owed by group undertakings
+    4,298,051 3,672,482 Other debtors 75,183 3,866 Corporation Tax receivable 38,958
+    - Prepayments and accrued income 45,623 44,125 Total 5,002,067 4,459,526
+    Trade debtors are stated after provisions for impairment of £217,545 (2022: £220,257)
+    2023 (£) 2022 (£) Cash at bank and in hand 657,275 545,542
+    Debtors'
+- text: 'Page 23
+    Nominal value: 2022 (€) 2021 (€) £1 120,924,800 120,924,800
+    Employee benefit obligations
+    Notes to the Financial Statements - continued for the year ended 31 December 2022
+    The Scheme closed to the future accrual of benefits on 31 August 2015 and the
+    4 members who were active at the closure date were granted deferred benefits in
+    the Scheme. Since 1 April 2003, the Scheme has provided benefits on a Career Average
+    Revalued Earnings ("CARE") basis with Pensionable Earnings revalued each year
+    in line with the increase in Retail Price Index ("RPI") inflation. This method
+    of revaluation is broadly consistent with the way that deferred pensions increase
+    before retirement and therefore the closure of future accrual does not have a
+    material impact on the value of members'' accrued benefits.
+    Called up share capital
+    Defined benefit pension plans 2022 (€''000) Interest cost 1,674 Interest income
+    (1,697) Actual return on plan assets (23) 31,933
+    Defined benefit pension plans 2022 (€''000) Present value of funded obligations
+    (53,460) Fair value of plan assets 51,628 (Deficit)/surplus (1,832) Net (liability)/surplus
+    (1,832)
+    The amounts recognised in profit or loss are as follows:
+    Number: 103,276,582 Class: Ordinary shares
+    The amounts recognised in the balance sheet are as follows:
+    The company operates a defined benefit pension scheme. An actuarial valuation
+    was carried out at 31 March 2020 by a qualified independent actuary.
+    Allotted, issued and fully paid:
+    Dunlop International Europe Limited'
+- text: 'Komline-Sanderson Ltd
+    Notes to the financial statements
+    For the year ended 31 March 2025
+    3. Employees
+    The average monthly number of employees, including directors, during the year
+    was 4 (2024 - 4).
+    4. Taxation
+    Factors affecting tax charge for the year
+    There was no current tax charge in either year as the Company made a taxable loss
+    during the current reporting period and utilised brought forward trading losses
+    in the prior period.
+    Factors that may affect future tax charges
+    The Company has tax losses of approximately £134,000 which will reduce future
+    charges to corporation tax.
+    5. Tangible fixed assets
+    | Cost or valuation | Office equipment £ |
+    |---|---|
+    | At 1 April 2024 | 2,201 |
+    | At 31 March 2025 | 2,201 |
+    | Depreciation | |
+    | At 1 April 2024 | 2,201 |
+    | At 31 March 2025 | 2,201 |
+    | Net book value | |
+    | At 31 March 2025 | - |
+    | At 31 March 2024 | - |
+    6. Debtors
+    |  | 2025 £ | 2024 € |
+    |---|---|---|
+    | Trade debtors | 881 | 11,736 |
+    | Other debtors | 134 | - |
+    |  | 1,015 | 11,736 |
+    Page 3'
+datasets:
+- oneryalcin/financial-filings-sparse-retrieval-training
+pipeline_tag: feature-extraction
+library_name: sentence-transformers
+metrics:
+- dot_accuracy@1
+- dot_accuracy@3
+- dot_accuracy@5
+- dot_accuracy@10
+- dot_precision@1
+- dot_precision@3
+- dot_precision@5
+- dot_precision@10
+- dot_recall@1
+- dot_recall@3
+- dot_recall@5
+- dot_recall@10
+- dot_ndcg@10
+- dot_mrr@10
+- dot_map@100
+- query_active_dims
+- query_sparsity_ratio
+- corpus_active_dims
+- corpus_sparsity_ratio
+- avg_flops
+model-index:
+- name: Financial Domain Sparse Encoder (doc-v3-gte fine-tuned)
+  results:
+  - task:
+      type: sparse-information-retrieval
+      name: Sparse Information Retrieval
+    dataset:
+      name: NanoNFCorpus
+      type: NanoNFCorpus
+    metrics:
+    - type: dot_accuracy@1
+      value: 0.44
+      name: Dot Accuracy@1
+    - type: dot_accuracy@3
+      value: 0.62
+      name: Dot Accuracy@3
+    - type: dot_accuracy@5
+      value: 0.66
+      name: Dot Accuracy@5
+    - type: dot_accuracy@10
+      value: 0.76
+      name: Dot Accuracy@10
+    - type: dot_precision@1
+      value: 0.44
+      name: Dot Precision@1
+    - type: dot_precision@3
+      value: 0.38
+      name: Dot Precision@3
+    - type: dot_precision@5
+      value: 0.35600000000000004
+      name: Dot Precision@5
+    - type: dot_precision@10
+      value: 0.314
+      name: Dot Precision@10
+    - type: dot_recall@1
+      value: 0.047208191733070066
+      name: Dot Recall@1
+    - type: dot_recall@3
+      value: 0.10033853359651303
+      name: Dot Recall@3
+    - type: dot_recall@5
+      value: 0.12344673739385978
+      name: Dot Recall@5
+    - type: dot_recall@10
+      value: 0.15735846776073342
+      name: Dot Recall@10
+    - type: dot_ndcg@10
+      value: 0.37746727490101206
+      name: Dot Ndcg@10
+    - type: dot_mrr@10
+      value: 0.5380793650793652
+      name: Dot Mrr@10
+    - type: dot_map@100
+      value: 0.17865969564136092
+      name: Dot Map@100
+    - type: query_active_dims
+      value: 4.760000228881836
+      name: Query Active Dims
+    - type: query_sparsity_ratio
+      value: 0.999844046909479
+      name: Query Sparsity Ratio
+    - type: corpus_active_dims
+      value: 1493.3485107421875
+      name: Corpus Active Dims
+    - type: corpus_sparsity_ratio
+      value: 0.9510730453200253
+      name: Corpus Sparsity Ratio
+    - type: avg_flops
+      value: 1.0107077360153198
+      name: Avg Flops
+  - task:
+      type: sparse-information-retrieval
+      name: Sparse Information Retrieval
+    dataset:
+      name: NanoSciFact
+      type: NanoSciFact
+    metrics:
+    - type: dot_accuracy@1
+      value: 0.54
+      name: Dot Accuracy@1
+    - type: dot_accuracy@3
+      value: 0.84
+      name: Dot Accuracy@3
+    - type: dot_accuracy@5
+      value: 0.86
+      name: Dot Accuracy@5
+    - type: dot_accuracy@10
+      value: 0.9
+      name: Dot Accuracy@10
+    - type: dot_precision@1
+      value: 0.54
+      name: Dot Precision@1
+    - type: dot_precision@3
+      value: 0.3
+      name: Dot Precision@3
+    - type: dot_precision@5
+      value: 0.18799999999999997
+      name: Dot Precision@5
+    - type: dot_precision@10
+      value: 0.09999999999999998
+      name: Dot Precision@10
+    - type: dot_recall@1
+      value: 0.53
+      name: Dot Recall@1
+    - type: dot_recall@3
+      value: 0.82
+      name: Dot Recall@3
+    - type: dot_recall@5
+      value: 0.845
+      name: Dot Recall@5
+    - type: dot_recall@10
+      value: 0.89
+      name: Dot Recall@10
+    - type: dot_ndcg@10
+      value: 0.7393057169200965
+      name: Dot Ndcg@10
+    - type: dot_mrr@10
+      value: 0.6897222222222222
+      name: Dot Mrr@10
+    - type: dot_map@100
+      value: 0.6905205627705626
+      name: Dot Map@100
+    - type: query_active_dims
+      value: 19.040000915527344
+      name: Query Active Dims
+    - type: query_sparsity_ratio
+      value: 0.999376187637916
+      name: Query Sparsity Ratio
+    - type: corpus_active_dims
+      value: 1752.0487060546875
+      name: Corpus Active Dims
+    - type: corpus_sparsity_ratio
+      value: 0.9425971854382187
+      name: Corpus Sparsity Ratio
+    - type: avg_flops
+      value: 4.766234874725342
+      name: Avg Flops
+  - task:
+      type: sparse-information-retrieval
+      name: Sparse Information Retrieval
+    dataset:
+      name: NanoFiQA2018
+      type: NanoFiQA2018
+    metrics:
+    - type: dot_accuracy@1
+      value: 0.36
+      name: Dot Accuracy@1
+    - type: dot_accuracy@3
+      value: 0.58
+      name: Dot Accuracy@3
+    - type: dot_accuracy@5
+      value: 0.64
+      name: Dot Accuracy@5
+    - type: dot_accuracy@10
+      value: 0.7
+      name: Dot Accuracy@10
+    - type: dot_precision@1
+      value: 0.36
+      name: Dot Precision@1
+    - type: dot_precision@3
+      value: 0.2733333333333333
+      name: Dot Precision@3
+    - type: dot_precision@5
+      value: 0.20800000000000002
+      name: Dot Precision@5
+    - type: dot_precision@10
+      value: 0.11999999999999998
+      name: Dot Precision@10
+    - type: dot_recall@1
+      value: 0.18724603174603174
+      name: Dot Recall@1
+    - type: dot_recall@3
+      value: 0.40712698412698417
+      name: Dot Recall@3
+    - type: dot_recall@5
+      value: 0.49556349206349204
+      name: Dot Recall@5
+    - type: dot_recall@10
+      value: 0.5478095238095239
+      name: Dot Recall@10
+    - type: dot_ndcg@10
+      value: 0.44010103944970097
+      name: Dot Ndcg@10
+    - type: dot_mrr@10
+      value: 0.46983333333333327
+      name: Dot Mrr@10
+    - type: dot_map@100
+      value: 0.38215757668624306
+      name: Dot Map@100
+    - type: query_active_dims
+      value: 12.0600004196167
+      name: Query Active Dims
+    - type: query_sparsity_ratio
+      value: 0.999604875158259
+      name: Query Sparsity Ratio
+    - type: corpus_active_dims
+      value: 1723.0986328125
+      name: Corpus Active Dims
+    - type: corpus_sparsity_ratio
+      value: 0.9435456840045706
+      name: Corpus Sparsity Ratio
+    - type: avg_flops
+      value: 4.071389198303223
+      name: Avg Flops
+  - task:
+      type: sparse-nano-beir
+      name: Sparse Nano BEIR
+    dataset:
+      name: NanoBEIR mean
+      type: NanoBEIR_mean
+    metrics:
+    - type: dot_accuracy@1
+      value: 0.4466666666666666
+      name: Dot Accuracy@1
+    - type: dot_accuracy@3
+      value: 0.68
+      name: Dot Accuracy@3
+    - type: dot_accuracy@5
+      value: 0.7200000000000001
+      name: Dot Accuracy@5
+    - type: dot_accuracy@10
+      value: 0.7866666666666667
+      name: Dot Accuracy@10
+    - type: dot_precision@1
+      value: 0.4466666666666666
+      name: Dot Precision@1
+    - type: dot_precision@3
+      value: 0.31777777777777777
+      name: Dot Precision@3
+    - type: dot_precision@5
+      value: 0.25066666666666665
+      name: Dot Precision@5
+    - type: dot_precision@10
+      value: 0.17799999999999996
+      name: Dot Precision@10
+    - type: dot_recall@1
+      value: 0.25481807449303395
+      name: Dot Recall@1
+    - type: dot_recall@3
+      value: 0.44248850590783234
+      name: Dot Recall@3
+    - type: dot_recall@5
+      value: 0.4880034098191173
+      name: Dot Recall@5
+    - type: dot_recall@10
+      value: 0.5317226638567525
+      name: Dot Recall@10
+    - type: dot_ndcg@10
+      value: 0.5189580104236031
+      name: Dot Ndcg@10
+    - type: dot_mrr@10
+      value: 0.5658783068783069
+      name: Dot Mrr@10
+    - type: dot_map@100
+      value: 0.41711261169938885
+      name: Dot Map@100
+    - type: query_active_dims
+      value: 11.953333854675293
+      name: Query Active Dims
+    - type: query_sparsity_ratio
+      value: 0.9996083699018847
+      name: Query Sparsity Ratio
+    - type: corpus_active_dims
+      value: 1666.2235158269893
+      name: Corpus Active Dims
+    - type: corpus_sparsity_ratio
+      value: 0.945409097836741
+      name: Corpus Sparsity Ratio
+    - type: avg_flops
+      value: 2.325575590133667
+      name: Avg Flops
+---
+# Financial Domain Sparse Encoder (doc-v3-gte fine-tuned)
+This is a [Asymmetric Inference-free SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte) on the [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space   and can be used for semantic search and sparse retrieval.
+## Model Details
+### Model Description
+- **Model Type:** Asymmetric Inference-free SPLADE Sparse Encoder
+- **Base model:** [opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte) <!-- at revision 1646fef40807937e8e130c66d327a26421c408d5 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 30522 dimensions
+- **Similarity Function:** Dot Product
+- **Training Dataset:**
+    - [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training)
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
+- **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
+### Full Model Architecture
+```
+SparseEncoder(
+  (0): Router(
+    (sub_modules): ModuleDict(
+      (query): Sequential(
+        (0): SparseStaticEmbedding({'frozen': False}, dim=30522, tokenizer=TokenizersBackend)
+      )
+      (document): Sequential(
+        (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'NewForMaskedLM'})
+        (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'log1p_relu', 'word_embedding_dimension': 30522})
+      )
+    )
+  )
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SparseEncoder
+# Download from the 🤗 Hub
+model = SparseEncoder("sparse_encoder_model_id")
+# Run inference
+queries = [
+    "How much did the company charge for depreciation of tangible assets in 2025?",
+]
+documents = [
+    "CoolerAid Holdings Ltd\nNotes to the Consolidated Financial Statements (continued)\nYear ended 31 March 2025\n\n5. Operating profit\nOperating profit or loss is stated after charging/crediting\n\n|| 2025| 2024|\n|------------------------------------------------|---------|-----------|\n| Depreciation of tangible assets| 870,340 | 752,425   |\n| Loss/(gains) on disposal of tangible assets| 1,535   | (173,401) |\n| Impairment of trade debtors| 32,465  | 15,860|\n| Operating lease rentals| 112,292 | 139,908   |\n\n6. Auditor's remuneration\n\n|| 2025  | 2024  |\n|--------------------------------------------------------------|-------|-------|\n| Fees payable for the audit of the consolidated financial statements | 5,000 | 5,000 |\n\n7. Staff costs\nThe average number of persons employed by the group during the year, including the directors, amounted to:\n\n|| 2025 | 2024 |\n|---------------------|------|------|\n| Distribution staff  | 44   | 42   |\n| Administrative staff| 14   | 13   |\n| Management staff| 2| 2|\n| Directors| 4| 4|\n|| 64   | 61   |\n\nThe aggregate payroll costs incurred during the year, relating to the above, were:\n\n|| 2025| 2024|\n|---------------------|-----------|-----------|\n| Wages and salaries  | 2,828,063 | 2,596,870 |\n| Social security costs| 314,830   | 303,444   |\n| Other pension costs | 79,769| 68,413|\n|| 3,222,662 | 2,968,727 |\n\n8. Directors' remuneration\nThe directors' aggregate remuneration in respect of qualifying services was:\n\n|| 2025   | 2024   |\n|---------------|--------|--------|\n| Remuneration  | 18,880 | 18,880 |\n\n- 20 -",
+    'Kenneth Forbes (Holdings) Limited\nNotes to the Financial Statements (continued)\nYear ended 30 April 2025\n\n13. Tangible assets.\n\n| Group and company | Freehold property £ | Plant and machinery £ | Fixtures and fittings £ | Motor vehicles £ | Assets held under the course of construction £ | Total £ |\n|---|---|---|---|---|---|---|\n| Cost |  |  |  |  |  |  |\n| At 1 May 2024 | 3,941,316 | 3,684,443 | 1,326,032 | 294,882 | 41,948 | 9,288,621 |\n| Additions | 570,435 | 395,946 | 187,049 | 194,503 | 106,519 | 1,454,452 |\n| Disposals |  | (317,265) | (214,785) | (164,244) |  | (696,294) |\n| At 30 Apr 2025 | 4,511,751 | 3,763,124 | 1,298,296 | 325,141 | 148,467 | 10,046,779 |\n| Depreciation |  |  |  |  |  |  |\n| At 1 May 2024 | 1,158,100 | 3,257,887 | 904,088 | 156,223 |  | 5,476,298 |\n| Charge for the year | 133,789 | 160,512 | 101,477 | 78,993 |  | 474,771 |\n| Disposals |  | (317,265) | (214,785) | (133,585) |  | (665,635) |\n| At 30 Apr 2025 | 1,291,889 | 3,101,134 | 790,780 | 101,631 |  | 5,285,434 |\n| Carrying amount |  |  |  |  |  |  |\n| At 30 Apr 2025 | 3,219,862 | 661,990 | 507,516 | 223,510 | 148,467 | 4,761,345 |\n| At 30 Apr 2024 | 2,783,216 | 426,556 | 421,944 | 138,659 | 41,948 | 3,812,323 |\n\nFreehold land amounting to £475,010 (2024: £475,010) is not depreciated.\nAll assets are held by the company but used by the group undertakings or associates.\n\nTangible assets held at valuation\n\nIn respect of tangible assets held at valuation, aggregate cost, depreciation and comparable carrying\namount that would have been recognised if the assets had been carried under the historical cost model are\nas follows:\n\nGroup and company\n\n|  | Freehold property £ |\n|---|---|\n| At 30 April 2025 |  |\n| Aggregate cost | 3,854,887 |\n| Aggregate depreciation | (1,963,225) |\n| Carrying value | 1,891,662 |\n| At 30 April 2024 |  |\n| Aggregate cost | 3,854,887 |\n| Aggregate depreciation | (1,700,013) |\n| Carrying value | 2,154,874 |\n\n1\n-24-\n1',
+    'WHOCANFIXMYCAR.COM LTD\nNOTES TO THE FINANCIAL STATEMENTS\nFOR THE YEAR ENDED 31 MARCH 2025\n\n7. Tangible fixed assets\n\n| | Fixtures and fittings £ | Computer equipment £ | Right of use assets £ | Total £ | £ |\n|---|---|---|---|---|---|\n| **Cost** | | | | | |\n| At 1 April 2024 | 61,966 | 131,878 | - | 193,844 |  |\n| Additions | - | 3,187 | 287,367 | 290,554 |  |\n| At 31 March 2025 | 61,966 | 135,065 | 287,367 | 484,398 |  |\n| **Depreciation** | | | | | |\n| At 1 April 2024 | 61,966 | 99,890 | - | 161,856 |  |\n| Charge for the year | - | 12,982 | 82,657 | 95,639 |  |\n| At 31 March 2025 | 61,966 | 112,872 | 82,657 | 257,495 |  |\n| **Net book value** | | | | | |\n| At 31 March 2025 | - | 22,193 | 204,710 | 226,903 |  |\n| At 31 March 2024 | - | 31,988 | - | 31,988 |  |\n\nThe net book value of owned and leased assets included as "Tangible fixed assets" in the Balance Sheet is as follows:\n\n| | 2025 £ | 2024 £ |\n|---|---|---|\n| Tangible fixed assets owned | 22,193 | 31,988 |\n| Right-of-use tangible fixed assets | 204,710 | - |\n| | 226,903 | 31,988 |\n\nInformation about right-of-use assets is summarised below:\n\nNet book value\n\n| | 2025 £ | 2024 £ |\n|---|---|---|\n| Office and computer equipment | 204,710 | - |\n| | 204,710 | - |\n\nPage 11',
+]
+query_embeddings = model.encode_query(queries)
+document_embeddings = model.encode_document(documents)
+print(query_embeddings.shape, document_embeddings.shape)
+# [1, 30522] [3, 30522]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(query_embeddings, document_embeddings)
+print(similarities)
+# tensor([[18.4437, 21.8515, 17.4189]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Sparse Information Retrieval
+* Datasets: `NanoNFCorpus`, `NanoSciFact` and `NanoFiQA2018`
+* Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
+| Metric                | NanoNFCorpus | NanoSciFact | NanoFiQA2018 |
+|:----------------------|:-------------|:------------|:-------------|
+| dot_accuracy@1        | 0.44         | 0.54        | 0.36         |
+| dot_accuracy@3        | 0.62         | 0.84        | 0.58         |
+| dot_accuracy@5        | 0.66         | 0.86        | 0.64         |
+| dot_accuracy@10       | 0.76         | 0.9         | 0.7          |
+| dot_precision@1       | 0.44         | 0.54        | 0.36         |
+| dot_precision@3       | 0.38         | 0.3         | 0.2733       |
+| dot_precision@5       | 0.356        | 0.188       | 0.208        |
+| dot_precision@10      | 0.314        | 0.1         | 0.12         |
+| dot_recall@1          | 0.0472       | 0.53        | 0.1872       |
+| dot_recall@3          | 0.1003       | 0.82        | 0.4071       |
+| dot_recall@5          | 0.1234       | 0.845       | 0.4956       |
+| dot_recall@10         | 0.1574       | 0.89        | 0.5478       |
+| **dot_ndcg@10**       | **0.3775**   | **0.7393**  | **0.4401**   |
+| dot_mrr@10            | 0.5381       | 0.6897      | 0.4698       |
+| dot_map@100           | 0.1787       | 0.6905      | 0.3822       |
+| query_active_dims     | 4.76         | 19.04       | 12.06        |
+| query_sparsity_ratio  | 0.9998       | 0.9994      | 0.9996       |
+| corpus_active_dims    | 1493.3485    | 1752.0487   | 1723.0986    |
+| corpus_sparsity_ratio | 0.9511       | 0.9426      | 0.9435       |
+| avg_flops             | 1.0107       | 4.7662      | 4.0714       |
+#### Sparse Nano BEIR
+* Dataset: `NanoBEIR_mean`
+* Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
+  ```json
+  {
+      "dataset_names": [
+          "nfcorpus",
+          "scifact",
+          "fiqa2018"
+      ],
+      "dataset_id": "sentence-transformers/NanoBEIR-en"
+  }
+  ```
+| Metric                | Value     |
+|:----------------------|:----------|
+| dot_accuracy@1        | 0.4467    |
+| dot_accuracy@3        | 0.68      |
+| dot_accuracy@5        | 0.72      |
+| dot_accuracy@10       | 0.7867    |
+| dot_precision@1       | 0.4467    |
+| dot_precision@3       | 0.3178    |
+| dot_precision@5       | 0.2507    |
+| dot_precision@10      | 0.178     |
+| dot_recall@1          | 0.2548    |
+| dot_recall@3          | 0.4425    |
+| dot_recall@5          | 0.488     |
+| dot_recall@10         | 0.5317    |
+| **dot_ndcg@10**       | **0.519** |
+| dot_mrr@10            | 0.5659    |
+| dot_map@100           | 0.4171    |
+| query_active_dims     | 11.9533   |
+| query_sparsity_ratio  | 0.9996    |
+| corpus_active_dims    | 1666.2235 |
+| corpus_sparsity_ratio | 0.9454    |
+| avg_flops             | 2.3256    |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### financial-filings-sparse-retrieval-training
+* Dataset: [financial-filings-sparse-retrieval-training](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training) at [23e44ab](https://huggingface.co/datasets/oneryalcin/financial-filings-sparse-retrieval-training/tree/23e44abc3bfdb454da434ba8eb3e38bd1e01be84)
+* Size: 18,247 training samples
+* Columns: <code>query</code>, <code>positive</code>, <code>negative_0</code>, <code>negative_1</code>, <code>negative_2</code>, <code>negative_3</code>, <code>negative_4</code>, <code>negative_5</code>, and <code>negative_6</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                             | positive                                                                             | negative_0                                                                           | negative_1                                                                           | negative_2                                                                          | negative_3                                                                          | negative_4                                                                          | negative_5                                                                          | negative_6                                                                          |
+  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                               | string                                                                               | string                                                                               | string                                                                              | string                                                                              | string                                                                              | string                                                                              | string                                                                              |
+  | details | <ul><li>min: 9 tokens</li><li>mean: 20.51 tokens</li><li>max: 79 tokens</li></ul> | <ul><li>min: 51 tokens</li><li>mean: 331.12 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 54 tokens</li><li>mean: 360.35 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 56 tokens</li><li>mean: 357.16 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 303.81 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 263.98 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 221.87 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 193.47 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 166.03 tokens</li><li>max: 512 tokens</li></ul> |
+* Samples:
+  | query                                                                                                       | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | negative_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | negative_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | negative_2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | negative_3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | negative_4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | negative_5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | negative_6    |
+  |:------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------|
+  | <code>What was the actuarial gain on defined benefit pension plans for the year 2021?</code>                | <code>2021 £'000 2020 £'000 Loss for the financial year (9,731) (50,454) Other comprehensive income/(expense) for the financial period Actuarial gain/(loss) on defined benefit pension plans 31,200 (4,400) Deferred tax impact of actuarial gain/(loss) (10,920) 1,540 Other comprehensive income/(expense) 20,280 (2,860) Total comprehensive income/(expense) for the financial period 10,549 (53,314)<br><br>STATEMENT OF COMPREHENSIVE INCOME FOR THE YEAR ENDED 31 DECEMBER 2021<br><br>11</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | <code>Defined benefit pension plans 2022 2021 Equities 12% 26% Bonds 56% 31% Liability driven investment 32% 39% Other - 4% 100% 100%<br><br>Defined benefit pension plans 2022 €'000 2021 €'000 Actuarial losses from changes in financial assumptions 36 391 Actuarial gain/(losses) 3,629 (4,830) 3,665 (4,439)<br><br>```markdown Dunlop International Europe Limited Notes to the Financial Statements - continued for the year ended 31 December 2022<br><br>Defined benefit pension plans 2022 €'000 2021 €'000 Opening fair value of scheme assets 90,460 85,266 Contributions by employer 1,129 1,189 Expected return 1,697 1,183 Actuarial (losses) (33,630) (591) Benefits paid (3,397) (3,408) Exchange differences on foreign plans (4,631) 6,821 51,628 90,460<br><br>Changes in the fair value of scheme assets are as follows:<br><br>Page 24 ```<br><br>The major categories of scheme assets as a percentage of total scheme assets are as follows:<br><br>Changes in the present value of the defined benefit obligation are as follows:<br><br>Defined benefit pension pla...</code> | <code>The components of net periodic pension benefit cost recognized in our Consolidated Statements of Operations for the periods presented are as follows:<br><br>Years Ended December 31, 2021 Projected benefit obligation, beginning of year $ 97,740 Service cost $ 1,282 Interest cost $ 1,452 Actuarial (gain) loss $ (8,682) Benefits paid $ (2,010) Translation adjustment $ (4,006) Projected benefit obligation, end of year $ 85,776 Fair value of plan assets, beginning of year $ 17,293 Actual return on plan assets $ 641 Contributions $ 1,775 Benefits paid $ (1,112) Actuarial gain $ 71 Translation adjustment $ (147) Fair value of plan assets, end of year $ 18,521 Funded status of plan $ (67,255)<br><br>Our projected benefit obligation and plan assets for defined benefit pension plans and the related assumptions used to determine the related liabilities are as follows:<br><br>Defined Benefit Plan<br><br>We maintain defined benefit pension plans for certain of our non-U.S. employees in the U.K., Germany, and Philippines. ...</code>                                     | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <code></code> |
+  | <code>What is the interest rate for the GBP short term loan with Canadian Natural Resources Limited?</code> | <code>CREDITORS: amounts falling due within one year<br><br>33<br><br>DEBTORS<br><br>The carrying amount of debtors is a reasonable approximation to fair value. Trade debtors and other receivables are not overdue as payment terms have not been exceeded. The expected credit loss on the trade debtor's balance was negligible and therefore no adjustment has been applied.<br><br>Other creditors includes £4.9 million (2022 - £4.3 million) in respect of future share options (note 21). The short term loan with CNR International (U.K.) Developments Limited expired on 31 December 2023. Interest was generated at a rate of Secured Overnight Financing Rate (SOFR) + 1.75%.<br><br>Amounts owed by group undertakings are unsecured and repayable on demand. The GBP short term loan with Canadian Natural Resources Limited generates interest at a rate of Sterling Overnight Index Average (SONIA) + 1.15% per annum. Trading balances are interest free. Management considered the expected credit loss on amounts due from Group undertakings at 31 Dec...</code> | <code>for the year ended 31 December 2022<br><br>The share options creditor relates to amounts payable to the ultimate parent Canadian Natural Resources Limited relating to employee options to purchase stock in the aforementioned company. The provision is based on the specifics of the agreed plan with Canadian Natural Resources Limited. The current portion of this totalling £4.3 million (2021 - £3.7 million) is included in creditor amounts falling due within one year.<br><br>34<br><br>The Company settled an intercompany term loan on 24 October 2022. The amount drawn down by the Company at 24 October 2022 was US$440.0 million (2021 - US$440.0 million). Interest was charged at US$ LIBOR + 2.8175% per annum on amounts drawn down from this facility until settlement.<br><br>2022 (£'000) 2021 (£'000) Share options 3,550 4,276 Lease liabilities (note 13) 4,786 6,428 Total 8,336 10,704<br><br>The short term loan with CNR International (U.K.) Developments Limited generates interest at a rate of USS LIBOR + 1.75%.<br><br>18. CREDITORS: ...</code>             | <code>18 Cash and cash equivalents<br><br>Group 31 Dec 2021 (£000) Group 31 Dec 2020 (£000) Company 31 Dec 2021 (£000) Company 31 Dec 2020 (£000) Sterling 19,641 37,349 19,498 37,347 United States Dollar 8,574 6,826 — — Euros 361 — — — Canadian Dollar 227 364 — — Polish Zloty 225 283 — — Singapore Dollar 29 — — — Japanese Yen 7 — — — Total 29,064 44,822 19,498 37,347<br><br>Cash and cash equivalents are denominated in the following currencies:<br><br>On 14 October 2021, the Group and Company entered into a loan agreement with Bank Of Ireland Group plc consisting of a £10 million term loan in addition to a revolving credit facility of £10 million. The loan is secured on the assets of the Group. Operating covenants are limited to the Group’s net debt leverage and interest cover. The term loan is repayable over five years with an initial 12-month repayment holiday followed by annual capital repayments of £1,250,000. At the end of the term, a bullet payment of £5 million is due. The loan is denominated in Pound S...</code>                                           | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | <code></code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <code></code> |
+  | <code>What is the amount for Charges à imputer relative au personnel?</code>                                | <code>Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important<br><br>COMPTES DE RÉGULARISATION<br><br>Description Montant Charges à imputer relative au personnel 128.084.824,70 Charges à imputer: intérêts courus et non échus 37.536.987,21 Charges à imputer diverses 12.774.405,94 Charges à imputer: protocoles et conventions avec autres opérateurs et réseaux 7.240.507,84 Produits à reporter divers 7.841.755,07 Produits à reporter relatifs au trafic 116.123.554,80 Produis à reporter: Hello Belgium Railpass 21.396.182,64 Produis à reporter: NPV 20.088.866,29 Produits à reporter: financements alternatifs 15.285.835,38<br><br>72<br><br>First - C-Cap2022 - 38 / 82</code>                                                                                                                                                                                                                                                                                                                                                        | <code>```markdown N° 0203.430.576 C-Cap 6.9<br><br>Charges à imputer diverses Exercice Charges à imputer: protocoles et conventions avec autres opérateurs et reseaux 2.101.406,29 Charges à imputer relatives au personnel 1.378.996,29 Charges à imputer: intérêts courus et non échus 139.424.041,35 Produits à reporter divers 39.103.099,47 Produits à reporter relatifs au trafic 6.816.893,14 Produits à reporter: financements altematifs 146.853.054,23 Produis à reporter: NPV 9.576.751,43 15.185.940,48<br><br>71 Rapport annuel SNCB 2023 ```<br><br>COMPTES DE RÉGULARISATION<br><br>Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important</code>                                                                                                                                                                                                                                                                                                                                                                                                         | <code>DETTES FISCALES, SALARIALES ET SOCIALES (€)<br><br>CODES BNB 2024 VENTILATION DE LA RUBRIQUE 492/3 DU PASSIF SI CELLE-CI REPRÉSENTE UN MONTANT IMPORTANT Intérêts crédits KBC à imputer Autres charges à imputer<br><br>ÉTAT DES DETTES (€)<br><br>COMPTES DE REGULARISATION (€)<br><br>12.4 Etat des dettes et comptes de régularisation du passif<br><br>CODES BNB 2024 VENTILATION DES DETTES A L'ORIGINE A PLUS D'UN AN, EN FONCTION DE LEUR DUREE RESDIEUELLE Dettes à plus d'un an échéant dans l'année Dettes financières 8801 Etablissements de crédit 8841 Total des dettes à plus d'un an échéant dans l'année (42) Dettes ayant plus d'un an mais 5 ans au plus à courir Dettes financières 8802 Etablissement de crédit 8842 Total des dettes ayant plus d'un an mais 5 ans au plus à courir 8912<br><br>12.5 Résultats d'exploitation (en milliers €)<br><br>DETTES GARANTIES (€)<br><br>CODES BNB 2024 Rémunérations et charges sociales Autres dettes salariales et sociales 9077<br><br>```markdown<br><br>CODES BNB 2023 2024 CHARGES D'EXPLOITATION Travailleurs pour lesquels la ...</code> | <code>Résultats financiers Charges financières récurrentes Ventilation des autres charges financières<br><br>11 Dettes fiscales, salariales et sociales<br><br>2022 2021 Impôts (rubriques 450/3 et 179 du passif) Dettes fiscales échues - - Dettes fiscales non échues - - Dettes fiscales estimées - - Rémunérations et charges sociales (rubriques 454/9 et 179 du passif) Dettes échues envers l'Office National de Sécurité Sociale - - Autres dettes salariales et sociales 28.500 25.000<br><br>FINANCIÈRE DE TUBIZE – RAPPORT FINANCIER ANNUEL 2022<br><br>Comptes de régularisation<br><br>FINANCIÈRE DE TUBIZE - RAPPORT ANNUEL 2022 40 41<br><br>2022 2021 Ventilation de la rubrique 492/3 du passif si celle-ci représente un montant important Charges à imputer: intérêts 345.843 40.556 Charges à imputer: commission de réservation 107.126 76.667<br><br>2022 2021 Charges d'exploitation Travailleurs pour lesquels la société a introduit une déclaration DIMONA ou qui sont inscrits au registre général du personnel - - Nombre total à la date de clôture - - Ef...</code> | <code>ANNEXES DES COMPTES DE LA SOCIÉTÉ AU 31 MAI 2025<br><br>SITUATION FISCALE DIFFEREE ET LATENTE<br><br>\| Accroissements de la dette future d'impôt \| Montant \|<br>\|---\|---\|<br>\| Impôt dû sur provisions réglementées: \|  \|<br>\| Provisions pour hausse de prix \|  \|<br>\| Provisions pour fluctuation des cours \|  \|<br>\| Provisions pour investissements \|  \|<br>\| Amortissements dérogatoires \| 3 548 \|<br>\| Subventions d'investissement \|  \|<br>\| TOTAL ACCROISSEMENTS \| 3 548 \|<br><br>\| Allègements de la dette future d'impôt \| Montant \|<br>\|---\|---\|<br>\| Impôt payé d'avance sur: \|  \|<br>\| Charges non déductibles temporairement (à déduire l'année suivante) \| 1 200 \|<br>\| Congés payés \|  \|<br>\| Participation des salariés \| 53 \|<br>\| Autres \|  \|<br>\| A déduire ultérieurement \|  \|<br>\| Provisions pour propre assureur \|  \|<br>\| Autres \|  \|<br>\| TOTAL ALLÈGEMENTS \| 1 254 \|<br><br>SITUATION FISCALE DIFFÉRÉE NETTE<br>2 294<br><br>IMPÔT DÙ SUR: Plus-values différées<br>43 436<br><br>CREDIT A IMPUTER SUR: Déficits reportables<br><br>CREDIT A IMPUTER SUR: Moins-values à long terme<br><br>SITUATION FISCALE LATENTE NETT...</code> | <code>d'un mois au plus<br><br>Titres à revenu fixe émis par des établissements de crédit<br><br>AUTRES PLACEMENTS DE TRÉSORERIE<br><br>Autres placements de trésorerie non repris ci-avant<br><br>Codes Exercice Exercice précédent 51 8681 8682 8683 52 18.088.896,34 55.716.775,84 8684 53 226.519.496,99 136.305.308,18 8686 85.000.000,00 801.112,99 8687 1.599.690,84 8688 139.919.806,15 135.504.195,19 8689<br><br>Actions, parts et placements autres que placements à revenu fixe<br><br>Titres à revenu fixe<br><br>Actions et parts - Montant non appelé<br><br>Avec une durée résiduelle ou de préavis<br><br>Métaux précieux et œuvres d'art<br><br>Ventilation de la rubrique 490/1 de l'actif si celle-ci représente un montant important<br><br>PLACEMENTS DE TRÉSORERIE ET COMPTES DE RÉGULARISATION DE L'ACTIF<br><br>64 Rapport annuel SNCB 2023<br><br>Actions et parts - Valeur comptable augmentée du montant non appelé<br><br>Comptes à terme détenus auprès des établissements de crédit<br><br>de plus d'un mois à un an au plus<br><br>COMPTES DE RÉGULARISATION<br><br>Exercice Charges à reporter: redevance infrastru...</code> | <code>4.6. Accroissements et allégements de la dette future d'impôt<br><br>Les éléments entraînant un décalage d'imposition conduisent à un accroissement de la dette future d'impôt de 21 278K€ calculé au taux de 25.82%.<br><br>La situation fiscale latente s'analyse comme suit :<br><br>\| Base de calcul \| Montants en K€ \|<br>\|---\|---\|<br>\| BASE D'IMPOT SUR : \|  \|<br>\| Provisions réglementées : \|  \|<br>\| - Ecart de conversion Actif \| 0 \|<br>\| - Ecart de conversion Passif \| -4 \|<br>\| - Provision pour investissements \|  \|<br>\| - Amortissements dérogatoires \| 94 455 \|<br>\| Subventions d'investissement \| 3 283 \|<br>\| Produits non imposables temporairement : \|  \|<br>\| (à réintégrer l'année de leur acquisition) \|  \|<br>\| - plafonnement TP \|  \|<br>\| **TOTAL ACCROISSEMENTS** \| **97 734** \|<br>\| BASE D'IMPOT PAYE D'AVANCE SUR : \|  \|<br>\| Charges non déductibles temporairement : \|  \|<br>\| (à déduire l'année suivante) \|  \|<br>\| - Provision pour risques et charges \| -928 \|<br>\| - Provision pour participation \| -4 083 \|<br>\| - Contribution solidarité \| -869 \|<br>\| - Provisions pou...</code> | <code></code> |
+* Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
+  ```json
+  {
+      "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score', gather_across_devices=False)",
+      "document_regularizer_weight": 3e-05,
+      "query_regularizer_weight": 0.0
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `num_train_epochs`: 2
+- `learning_rate`: 2e-05
+- `warmup_steps`: 114
+- `weight_decay`: 0.01
+- `gradient_accumulation_steps`: 4
+- `bf16`: True
+- `tf32`: True
+- `eval_strategy`: steps
+- `dataloader_num_workers`: 4
+- `batch_sampler`: no_duplicates
+- `router_mapping`: {'query': 'query', 'positive': 'document', 'negative_0': 'document', 'negative_1': 'document', 'negative_2': 'document', 'negative_3': 'document', 'negative_4': 'document', 'negative_5': 'document', 'negative_6': 'document'}
+- `learning_rate_mapping`: {'sub_modules\\.query\\..*': 0.001}
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `per_device_train_batch_size`: 8
+- `num_train_epochs`: 2
+- `max_steps`: -1
+- `learning_rate`: 2e-05
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: None
+- `warmup_steps`: 114
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `weight_decay`: 0.01
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `optim_target_modules`: None
+- `gradient_accumulation_steps`: 4
+- `average_tokens_across_devices`: True
+- `max_grad_norm`: 1.0
+- `label_smoothing_factor`: 0.0
+- `bf16`: True
+- `fp16`: False
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: True
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `use_cache`: False
+- `neftune_noise_alpha`: None
+- `torch_empty_cache_steps`: None
+- `auto_find_batch_size`: False
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `include_num_input_tokens_seen`: no
+- `log_level`: passive
+- `log_level_replica`: warning
+- `disable_tqdm`: False
+- `project`: huggingface
+- `trackio_space_id`: trackio
+- `eval_strategy`: steps
+- `per_device_eval_batch_size`: 8
+- `prediction_loss_only`: True
+- `eval_on_start`: False
+- `eval_do_concat_batches`: True
+- `eval_use_gather_object`: False
+- `eval_accumulation_steps`: None
+- `include_for_metrics`: []
+- `batch_eval_metrics`: False
+- `save_only_model`: False
+- `save_on_each_node`: False
+- `enable_jit_checkpoint`: False
+- `push_to_hub`: False
+- `hub_private_repo`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_always_push`: False
+- `hub_revision`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `restore_callback_states_from_checkpoint`: False
+- `full_determinism`: False
+- `seed`: 42
+- `data_seed`: None
+- `use_cpu`: False
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `parallelism_config`: None
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 4
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `dataloader_prefetch_factor`: None
+- `remove_unused_columns`: True
+- `label_names`: None
+- `train_sampling_strategy`: random
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `ddp_backend`: None
+- `ddp_timeout`: 1800
+- `fsdp`: []
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `deepspeed`: None
+- `debug`: []
+- `skip_memory_metrics`: True
+- `do_predict`: False
+- `resume_from_checkpoint`: None
+- `warmup_ratio`: None
+- `local_rank`: -1
+- `prompts`: None
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {'query': 'query', 'positive': 'document', 'negative_0': 'document', 'negative_1': 'document', 'negative_2': 'document', 'negative_3': 'document', 'negative_4': 'document', 'negative_5': 'document', 'negative_6': 'document'}
+- `learning_rate_mapping`: {'sub_modules\\.query\\..*': 0.001}
+</details>
+### Training Logs
+<details><summary>Click to expand</summary>
+| Epoch  | Step | Training Loss | NanoNFCorpus_dot_ndcg@10 | NanoSciFact_dot_ndcg@10 | NanoFiQA2018_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
+|:------:|:----:|:-------------:|:------------------------:|:-----------------------:|:------------------------:|:-------------------------:|
+| 0.0175 | 10   | 2.5846        | -                        | -                       | -                        | -                         |
+| 0.0351 | 20   | 2.4596        | -                        | -                       | -                        | -                         |
+| 0.0526 | 30   | 2.5787        | -                        | -                       | -                        | -                         |
+| 0.0701 | 40   | 2.2135        | -                        | -                       | -                        | -                         |
+| 0.0877 | 50   | 2.1444        | -                        | -                       | -                        | -                         |
+| 0.1052 | 60   | 2.0011        | -                        | -                       | -                        | -                         |
+| 0.1228 | 70   | 1.8179        | -                        | -                       | -                        | -                         |
+| 0.1403 | 80   | 1.7744        | -                        | -                       | -                        | -                         |
+| 0.1578 | 90   | 1.7054        | -                        | -                       | -                        | -                         |
+| 0.1754 | 100  | 1.5427        | -                        | -                       | -                        | -                         |
+| 0.1929 | 110  | 1.6134        | -                        | -                       | -                        | -                         |
+| 0.2104 | 120  | 1.6381        | -                        | -                       | -                        | -                         |
+| 0.2280 | 130  | 1.6946        | -                        | -                       | -                        | -                         |
+| 0.2455 | 140  | 1.4456        | -                        | -                       | -                        | -                         |
+| 0.2630 | 150  | 1.4302        | -                        | -                       | -                        | -                         |
+| 0.2806 | 160  | 1.3097        | -                        | -                       | -                        | -                         |
+| 0.2981 | 170  | 1.5755        | -                        | -                       | -                        | -                         |
+| 0.3157 | 180  | 1.2906        | -                        | -                       | -                        | -                         |
+| 0.3332 | 190  | 1.3424        | -                        | -                       | -                        | -                         |
+| 0.3507 | 200  | 1.5477        | -                        | -                       | -                        | -                         |
+| 0.3683 | 210  | 1.3442        | -                        | -                       | -                        | -                         |
+| 0.3858 | 220  | 1.2810        | -                        | -                       | -                        | -                         |
+| 0.4033 | 230  | 1.3157        | -                        | -                       | -                        | -                         |
+| 0.4209 | 240  | 1.2839        | -                        | -                       | -                        | -                         |
+| 0.4384 | 250  | 1.2428        | -                        | -                       | -                        | -                         |
+| 0.4559 | 260  | 1.2376        | -                        | -                       | -                        | -                         |
+| 0.4735 | 270  | 1.1353        | -                        | -                       | -                        | -                         |
+| 0.4910 | 280  | 1.2513        | -                        | -                       | -                        | -                         |
+| 0.5085 | 290  | 1.0490        | -                        | -                       | -                        | -                         |
+| 0.5261 | 300  | 1.0669        | -                        | -                       | -                        | -                         |
+| 0.5436 | 310  | 1.2219        | -                        | -                       | -                        | -                         |
+| 0.5612 | 320  | 1.0313        | -                        | -                       | -                        | -                         |
+| 0.5787 | 330  | 1.2846        | -                        | -                       | -                        | -                         |
+| 0.5962 | 340  | 1.0939        | -                        | -                       | -                        | -                         |
+| 0.6138 | 350  | 1.0299        | -                        | -                       | -                        | -                         |
+| 0.6313 | 360  | 0.6464        | -                        | -                       | -                        | -                         |
+| 0.6488 | 370  | 0.7067        | -                        | -                       | -                        | -                         |
+| 0.6664 | 380  | 0.5505        | -                        | -                       | -                        | -                         |
+| 0.6839 | 390  | 0.6885        | -                        | -                       | -                        | -                         |
+| 0.7014 | 400  | 0.8663        | -                        | -                       | -                        | -                         |
+| 0.7190 | 410  | 0.8602        | -                        | -                       | -                        | -                         |
+| 0.7365 | 420  | 0.5517        | -                        | -                       | -                        | -                         |
+| 0.7541 | 430  | 0.3781        | -                        | -                       | -                        | -                         |
+| 0.7716 | 440  | 0.6533        | -                        | -                       | -                        | -                         |
+| 0.7891 | 450  | 1.1145        | -                        | -                       | -                        | -                         |
+| 0.8067 | 460  | 0.3240        | -                        | -                       | -                        | -                         |
+| 0.8242 | 470  | 0.5818        | -                        | -                       | -                        | -                         |
+| 0.8417 | 480  | 0.3394        | -                        | -                       | -                        | -                         |
+| 0.8593 | 490  | 0.8986        | -                        | -                       | -                        | -                         |
+| 0.8768 | 500  | 0.6177        | 0.3695                   | 0.7388                  | 0.3862                   | 0.4982                    |
+| 0.8943 | 510  | 0.8443        | -                        | -                       | -                        | -                         |
+| 0.9119 | 520  | 0.5454        | -                        | -                       | -                        | -                         |
+| 0.9294 | 530  | 0.9840        | -                        | -                       | -                        | -                         |
+| 0.9470 | 540  | 0.6111        | -                        | -                       | -                        | -                         |
+| 0.9645 | 550  | 0.7095        | -                        | -                       | -                        | -                         |
+| 0.9820 | 560  | 0.8391        | -                        | -                       | -                        | -                         |
+| 0.9996 | 570  | 0.6461        | -                        | -                       | -                        | -                         |
+| 1.0158 | 580  | 1.3053        | -                        | -                       | -                        | -                         |
+| 1.0333 | 590  | 0.9817        | -                        | -                       | -                        | -                         |
+| 1.0509 | 600  | 1.0531        | -                        | -                       | -                        | -                         |
+| 1.0684 | 610  | 0.9087        | -                        | -                       | -                        | -                         |
+| 1.0859 | 620  | 0.9186        | -                        | -                       | -                        | -                         |
+| 1.1035 | 630  | 1.0373        | -                        | -                       | -                        | -                         |
+| 1.1210 | 640  | 0.9417        | -                        | -                       | -                        | -                         |
+| 1.1385 | 650  | 0.9963        | -                        | -                       | -                        | -                         |
+| 1.1561 | 660  | 0.9058        | -                        | -                       | -                        | -                         |
+| 1.1736 | 670  | 0.9252        | -                        | -                       | -                        | -                         |
+| 1.1911 | 680  | 1.0170        | -                        | -                       | -                        | -                         |
+| 1.2087 | 690  | 0.9957        | -                        | -                       | -                        | -                         |
+| 1.2262 | 700  | 0.8720        | -                        | -                       | -                        | -                         |
+| 1.2438 | 710  | 0.8776        | -                        | -                       | -                        | -                         |
+| 1.2613 | 720  | 0.8562        | -                        | -                       | -                        | -                         |
+| 1.2788 | 730  | 0.8772        | -                        | -                       | -                        | -                         |
+| 1.2964 | 740  | 0.9591        | -                        | -                       | -                        | -                         |
+| 1.3139 | 750  | 0.9495        | -                        | -                       | -                        | -                         |
+| 1.3314 | 760  | 0.9933        | -                        | -                       | -                        | -                         |
+| 1.3490 | 770  | 0.8449        | -                        | -                       | -                        | -                         |
+| 1.3665 | 780  | 0.7833        | -                        | -                       | -                        | -                         |
+| 1.3840 | 790  | 0.9574        | -                        | -                       | -                        | -                         |
+| 1.4016 | 800  | 0.7727        | -                        | -                       | -                        | -                         |
+| 1.4191 | 810  | 0.8997        | -                        | -                       | -                        | -                         |
+| 1.4367 | 820  | 0.8796        | -                        | -                       | -                        | -                         |
+| 1.4542 | 830  | 0.8535        | -                        | -                       | -                        | -                         |
+| 1.4717 | 840  | 1.0049        | -                        | -                       | -                        | -                         |
+| 1.4893 | 850  | 0.8912        | -                        | -                       | -                        | -                         |
+| 1.5068 | 860  | 0.9883        | -                        | -                       | -                        | -                         |
+| 1.5243 | 870  | 0.7190        | -                        | -                       | -                        | -                         |
+| 1.5419 | 880  | 0.9274        | -                        | -                       | -                        | -                         |
+| 1.5594 | 890  | 0.8372        | -                        | -                       | -                        | -                         |
+| 1.5769 | 900  | 0.7986        | -                        | -                       | -                        | -                         |
+| 1.5945 | 910  | 0.7205        | -                        | -                       | -                        | -                         |
+| 1.6120 | 920  | 0.5797        | -                        | -                       | -                        | -                         |
+| 1.6295 | 930  | 0.6741        | -                        | -                       | -                        | -                         |
+| 1.6471 | 940  | 0.5253        | -                        | -                       | -                        | -                         |
+| 1.6646 | 950  | 0.1963        | -                        | -                       | -                        | -                         |
+| 1.6822 | 960  | 0.4864        | -                        | -                       | -                        | -                         |
+| 1.6997 | 970  | 0.7439        | -                        | -                       | -                        | -                         |
+| 1.7172 | 980  | 0.6164        | -                        | -                       | -                        | -                         |
+| 1.7348 | 990  | 0.3680        | -                        | -                       | -                        | -                         |
+| 1.7523 | 1000 | 0.5521        | 0.3775                   | 0.7393                  | 0.4401                   | 0.5190                    |
+| 1.7698 | 1010 | 0.2149        | -                        | -                       | -                        | -                         |
+| 1.7874 | 1020 | 0.5544        | -                        | -                       | -                        | -                         |
+| 1.8049 | 1030 | 0.8062        | -                        | -                       | -                        | -                         |
+| 1.8224 | 1040 | 0.2349        | -                        | -                       | -                        | -                         |
+| 1.8400 | 1050 | 0.5362        | -                        | -                       | -                        | -                         |
+| 1.8575 | 1060 | 0.8963        | -                        | -                       | -                        | -                         |
+| 1.8751 | 1070 | 0.5910        | -                        | -                       | -                        | -                         |
+| 1.8926 | 1080 | 0.3764        | -                        | -                       | -                        | -                         |
+| 1.9101 | 1090 | 0.5331        | -                        | -                       | -                        | -                         |
+| 1.9277 | 1100 | 1.0374        | -                        | -                       | -                        | -                         |
+| 1.9452 | 1110 | 0.6087        | -                        | -                       | -                        | -                         |
+| 1.9627 | 1120 | 0.4690        | -                        | -                       | -                        | -                         |
+| 1.9803 | 1130 | 0.4651        | -                        | -                       | -                        | -                         |
+| 1.9978 | 1140 | 0.5315        | -                        | -                       | -                        | -                         |
+</details>
+### Framework Versions
+- Python: 3.11.10
+- Sentence Transformers: 5.2.3
+- Transformers: 5.2.0
+- PyTorch: 2.10.0+cu128
+- Accelerate: 1.12.0
+- Datasets: 4.5.0
+- Tokenizers: 0.22.2
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### SpladeLoss
+```bibtex
+@misc{formal2022distillationhardnegativesampling,
+      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
+      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
+      year={2022},
+      eprint={2205.04733},
+      archivePrefix={arXiv},
+      primaryClass={cs.IR},
+      url={https://arxiv.org/abs/2205.04733},
+}
+```
+#### SparseMultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+#### FlopsLoss
+```bibtex
+@article{paria2020minimizing,
+    title={Minimizing flops to learn efficient sparse representations},
+    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
+    journal={arXiv preprint arXiv:2004.05665},
+    year={2020}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SparseEncoder",
+  "__version__": {
+    "sentence_transformers": "5.2.3",
+    "transformers": "5.2.0",
+    "pytorch": "2.10.0+cu128"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "dot"
+}

document_0_MLMTransformer/config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "architectures": [
+    "NewForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration.NewConfig",
+    "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
+    "AutoModelForMaskedLM": "modeling.NewForMaskedLM",
+    "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
+    "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
+    "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
+    "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
+  },
+  "classifier_dropout": 0.1,
+  "dtype": "float32",
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "layer_norm_type": "layer_norm",
+  "logn_attention_clip1": false,
+  "logn_attention_scale": false,
+  "max_position_embeddings": 8192,
+  "model_type": "new",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pack_qkv": true,
+  "pad_token_id": 0,
+  "position_embedding_type": "rope",
+  "rope_parameters": null,
+  "rope_theta": 500000,
+  "transformers_version": "5.2.0",
+  "type_vocab_size": 0,
+  "unpad_inputs": false,
+  "use_memory_efficient_attention": false,
+  "vocab_size": 30522
+}

document_0_MLMTransformer/configuration.py ADDED Viewed

	@@ -0,0 +1,145 @@

+# coding=utf-8
+# Copyright 2024 The GTE Team Authors and Alibaba Group.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" NEW model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class NewConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
+    instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the NEW
+    [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 30522):
+            Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        hidden_size (`int`, *optional*, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (`int`, *optional*, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (`int`, *optional*, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
+            `"relu"`, `"silu"` and `"gelu_new"` are supported.
+        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (`int`, *optional*, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (`int`, *optional*, defaults to 2):
+            The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (`str`, *optional*, defaults to `"rope"`):
+            Type of position embedding. Choose one of `"absolute"`, `"rope"`.
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
+            `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
+            these scaling strategies behave:
+            https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
+            experimental feature, subject to breaking API changes in future versions.
+        classifier_dropout (`float`, *optional*):
+            The dropout ratio for the classification head.
+    Examples:
+    ```python
+    >>> from transformers import NewConfig, NewModel
+    >>> # Initializing a NEW izhx/new-base-en style configuration
+    >>> configuration = NewConfig()
+    >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
+    >>> model = NewModel(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "new"
+    def __init__(
+        self,
+        vocab_size=30528,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.0,
+        max_position_embeddings=2048,
+        type_vocab_size=1,
+        initializer_range=0.02,
+        layer_norm_type='layer_norm',
+        layer_norm_eps=1e-12,
+        # pad_token_id=0,
+        position_embedding_type="rope",
+        rope_theta=10000.0,
+        rope_scaling=None,
+        classifier_dropout=None,
+        pack_qkv=True,
+        unpad_inputs=False,
+        use_memory_efficient_attention=False,
+        logn_attention_scale=False,
+        logn_attention_clip1=False,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_type = layer_norm_type
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self.classifier_dropout = classifier_dropout
+        self.pack_qkv = pack_qkv
+        self.unpad_inputs = unpad_inputs
+        self.use_memory_efficient_attention = use_memory_efficient_attention
+        self.logn_attention_scale = logn_attention_scale
+        self.logn_attention_clip1 = logn_attention_clip1

document_0_MLMTransformer/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a31dcc6d79cb8ce449a436f13b6031ba24d60ed20e074f5532d917b7559473dd
+size 643355976

document_0_MLMTransformer/modeling.py ADDED Viewed

	@@ -0,0 +1,1418 @@

+# coding=utf-8
+# Copyright 2024 The GTE Team Authors and Alibaba Group.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""PyTorch NEW model."""
+import math
+from dataclasses import dataclass
+from typing import List, Optional, Tuple, Union
+import torch
+import torch.utils.checkpoint
+from torch import nn
+from transformers.activations import ACT2FN
+from transformers.modeling_outputs import (
+    BaseModelOutput,
+    BaseModelOutputWithPooling,
+    MaskedLMOutput,
+    MultipleChoiceModelOutput,
+    QuestionAnsweringModelOutput,
+    SequenceClassifierOutput,
+    ModelOutput,
+)
+from transformers.modeling_utils import PreTrainedModel
+from transformers.utils import logging
+try:
+    import xformers.ops as xops
+except ImportError as e:
+    xops = None
+from .configuration import NewConfig
+logger = logging.get_logger(__name__)
+# Adapted from https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/bert_padding.py
+# Which was adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
+class IndexFirstAxis(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, input, indices):
+        ctx.save_for_backward(indices)
+        assert input.ndim >= 2
+        ctx.first_axis_dim, other_shape = input.shape[0], input.shape[1:]
+        second_dim = other_shape.numel()
+        # TD [2022-03-04] For some reason torch.gather is a bit faster than indexing.
+        # return input[indices]
+        # return torch.gather(
+        #     rearrange(input, "b ... -> b (...)"), 0, repeat(indices, "z -> z d", d=second_dim)
+        # ).reshape(-1, *other_shape)
+        return torch.gather(
+            input.view(ctx.first_axis_dim, second_dim),
+            0,
+            indices.unsqueeze(-1).expand(indices.size(0), second_dim)
+        ).reshape(-1, *other_shape)
+    @staticmethod
+    def backward(ctx, grad_output):
+        (indices,) = ctx.saved_tensors
+        assert grad_output.ndim >= 2
+        other_shape = grad_output.shape[1:]
+        # grad_output = rearrange(grad_output, "b ... -> b (...)")
+        grad_output = grad_output.view(grad_output.size(0), other_shape.numel())
+        grad_input = torch.zeros(
+            [ctx.first_axis_dim, grad_output.shape[1]],
+            device=grad_output.device,
+            dtype=grad_output.dtype,
+        )
+        # TD [2022-03-04] For some reason torch.scatter is a bit faster than indexing.
+        # grad_input[indices] = grad_output
+        # grad_input.scatter_(0, repeat(indices, "z -> z d", d=grad_output.shape[1]), grad_output)
+        grad_input.scatter_(
+            0, indices.unsqueeze(-1).expand(indices.size(0), grad_output.size(1)), grad_output
+        )
+        return grad_input.reshape(ctx.first_axis_dim, *other_shape), None
+index_first_axis = IndexFirstAxis.apply
+def unpad_input(hidden_states, attention_mask=None, indices=None):
+    """
+    Arguments:
+        hidden_states: (batch, seqlen, ...)
+        attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
+        indices: (total_nnz), the indices of non-masked tokens from the flattened input sequence.
+    Return:
+        hidden_states: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
+    """
+    if indices is None:
+        assert attention_mask is not None
+        indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+    # TD [2022-03-04] We don't want to index with a bool mask, because Pytorch will expand the
+    # bool mask, then call nonzero to get the indices, then index with those. The indices is @dim
+    # times larger than it needs to be, wasting memory. It's faster and more memory-efficient to
+    # index with integer indices. Moreover, torch's index is a bit slower than it needs to be,
+    # so we write custom forward and backward to make it a bit faster.
+    hidden_states = hidden_states.view(-1, *hidden_states.shape[2:])
+    return index_first_axis(hidden_states, indices)
+class IndexPutFirstAxis(torch.autograd.Function):
+    @staticmethod
+    def forward(
+        ctx,
+        values: torch.Tensor,
+        indices: torch.Tensor,
+        first_axis_dim
+    ) -> torch.Tensor:
+        ctx.save_for_backward(indices)
+        assert indices.ndim == 1
+        assert values.ndim >= 2
+        output = torch.zeros(
+            first_axis_dim, *values.shape[1:], device=values.device, dtype=values.dtype
+        )
+        output[indices] = values
+        return output
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
+        indices, = ctx.saved_tensors
+        grad_values = grad_output[indices]
+        return grad_values, None, None
+index_put_first_axis = IndexPutFirstAxis.apply
+def pad_input(inputs: torch.Tensor, indices: torch.Tensor, batch: int, seqlen: int) -> torch.Tensor:
+    """Add padding to sequences.
+    Arguments:
+        inputs: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
+        indices: (total_nnz), `indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()`
+        batch: int batch_size
+        seqlen: int max sequence length
+    Returns:
+        inputs: (batch, seqlen, ...)
+    """
+    output = index_put_first_axis(inputs, indices, batch * seqlen)
+    return output.view(batch, seqlen, *inputs.shape[1:])
+def rotate_half(x):
+    """Rotates half the hidden dims of the input."""
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+def apply_rotary_pos_emb(q, k, cos, sin):
+    """Applies Rotary Position Embedding to the query and key tensors.
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+    """
+    cos, sin = cos.to(q.dtype), sin.to(q.dtype)
+    q_embed = (q * cos) + (rotate_half(q) * sin)
+    k_embed = (k * cos) + (rotate_half(k) * sin)
+    return q_embed, k_embed
+class RotaryEmbedding(torch.nn.Module):
+    def __init__(self, dim, max_position_embeddings=512, base=10000.0, device=None):
+        super().__init__()
+        self.dim = dim
+        self.max_position_embeddings = max_position_embeddings
+        self.base = base
+        inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+        # Build here to make `torch.jit.trace` work.
+        self._set_cos_sin_cache(
+            seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
+        )
+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.max_seq_len_cached = seq_len
+        t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
+        # Different from paper, but it uses a different permutation in order to obtain the same calculation
+        emb = torch.cat((freqs, freqs), dim=-1)
+        self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
+        self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
+    def forward(self, x, seq_len=None):
+        # x: [bs, num_attention_heads, seq_len, head_size]
+        if seq_len > self.max_seq_len_cached:
+            self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
+        return (
+            self.cos_cached[:seq_len, ...].to(dtype=x.dtype),
+            self.sin_cached[:seq_len, ...].to(dtype=x.dtype),
+        )
+class NTKScalingRotaryEmbedding(RotaryEmbedding):
+    """RotaryEmbedding extended with fixed and mixed NTK scaling. https://kexue.fm/archives/9706 """
+    def __init__(self, dim, max_position_embeddings=512, base=10000, device=None, scaling_factor=1.0, mixed_b=None):
+        self.scaling_factor = scaling_factor
+        self.mixed_b = mixed_b
+        super().__init__(dim, max_position_embeddings, base, device)
+        max_position_embeddings = max_position_embeddings * self.scaling_factor
+        self._set_cos_sin_cache(max_position_embeddings, self.inv_freq.device, torch.get_default_dtype())
+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.max_seq_len_cached = seq_len
+        if seq_len > self.max_position_embeddings:
+            base = self.base * (self.scaling_factor if self.mixed_b is None else 1)
+            inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
+            if self.mixed_b is None:
+                inv_freq = inv_freq / self.scaling_factor ** (2 / self.dim)  # (6)
+            else:
+                a = torch.tensor(self.scaling_factor).log() / (self.dim / 2) ** self.mixed_b  # (13)
+                lambda_1_m = (a * torch.arange(1, self.dim // 2 + 1).float().to(device) ** self.mixed_b).exp()  # (12)
+                inv_freq = inv_freq / lambda_1_m  # (10)
+            self.register_buffer("inv_freq", inv_freq, persistent=False)
+        t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
+        # Different from paper, but it uses a different permutation in order to obtain the same calculation
+        emb = torch.cat((freqs, freqs), dim=-1)
+        self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
+        self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
+class RMSNorm(nn.Module):
+    def __init__(self, hidden_size, eps=1e-6):
+        """
+        RMSNorm is equivalent to T5LayerNorm
+        """
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = eps
+    def forward(self, hidden_states):
+        input_dtype = hidden_states.dtype
+        hidden_states = hidden_states.to(torch.float32)
+        variance = hidden_states.pow(2).mean(-1, keepdim=True)
+        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
+        return self.weight * hidden_states.to(input_dtype)
+LAYER_NORM = {
+    'layer_norm': nn.LayerNorm,
+    'rms_norm': RMSNorm
+}
+class NewEmbeddings(nn.Module):
+    """
+    Embedding and Unpadding.
+    """
+    def __init__(self, config: NewConfig):
+        super().__init__()
+        self.padding_idx = config.pad_token_id
+        self.word_embeddings = nn.Embedding(
+            config.vocab_size, config.hidden_size, padding_idx=self.padding_idx
+        )
+        self.position_embedding_type = config.position_embedding_type
+        if self.position_embedding_type == 'absolute':
+            self.position_embeddings = nn.Embedding(
+                config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
+            )
+        elif self.position_embedding_type == 'rope':
+            self._init_rope(config)
+        else:
+            raise ValueError
+        self.type_vocab_size = config.type_vocab_size
+        if self.type_vocab_size > 0:
+            self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
+        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
+        # any TensorFlow checkpoint file
+        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        # position_ids is contiguous in memory and excluded when serialized
+        self.register_buffer(
+            "position_ids", torch.arange(config.max_position_embeddings), persistent=False
+        )
+    def _init_rope(self, config):
+        kwargs = dict(
+            dim=int(config.hidden_size / config.num_attention_heads),
+            max_position_embeddings=config.max_position_embeddings,
+            base=config.rope_theta
+        )
+        if config.rope_scaling is None:
+            self.rotary_emb = RotaryEmbedding(**kwargs)
+        else:
+            kwargs.update(scaling_factor=config.rope_scaling["factor"])
+            scaling_type = config.rope_scaling["type"]
+            if scaling_type == 'ntk':
+                kwargs.update(mixed_b=config.rope_scaling.get('mixed_b', None))
+                self.rotary_emb = NTKScalingRotaryEmbedding(**kwargs)
+            # elif scaling_type == "linear":
+            #     self.rotary_emb = LinearScalingRotaryEmbedding(**kwargs)
+            # elif scaling_type == "dynamic":
+            #     self.rotary_emb = DynamicNTKScalingRotaryEmbedding(**kwargs)
+            else:
+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+    def forward(
+        self,
+        unpad_inputs: bool,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        length: Optional[List[int]] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor, Optional[Tuple], Optional[List[int]]]:
+        """
+        """
+        if inputs_embeds is None:
+            device, input_shape = input_ids.device, input_ids.shape
+        else:
+            device, input_shape = inputs_embeds.device, inputs_embeds.shape[:2]
+        batch_size, seq_length = input_shape
+        # Set attention_mask if it's None
+        if attention_mask is None:
+            attention_mask = torch.ones(input_shape, device=device)
+            if length is not None:
+                for i, l in enumerate(length):
+                    attention_mask[i, l:] = 0
+        # Set attention_mask_bool for unpadding
+        if unpad_inputs:
+            attention_mask_bool = attention_mask.bool()
+            if length is None:
+                length = attention_mask.sum(-1).tolist()
+        # Get word embeddings
+        if inputs_embeds is None:
+            if unpad_inputs:
+                input_ids = input_ids[attention_mask_bool].unsqueeze(0)
+            inputs_embeds = self.word_embeddings(input_ids)
+        else:
+            if unpad_inputs:
+                inputs_embeds = inputs_embeds[attention_mask_bool].unsqueeze(0)
+        embeddings = inputs_embeds
+        # Set and unpad position_ids
+        if position_ids is None:
+            if seq_length > self.position_ids.size(0):
+                self.register_buffer(
+                    "position_ids", torch.arange(seq_length, device=embeddings.device), persistent=False
+                )
+            if unpad_inputs:
+                # [1, cumsum_seq_len]
+                position_ids = torch.cat([self.position_ids[:l] for l in length]).unsqueeze(0)
+            else:
+                # [bs, seq_len]
+                position_ids = self.position_ids[:seq_length].expand(batch_size, -1)
+        elif unpad_inputs:
+            position_ids = position_ids[attention_mask_bool].unsqueeze(0)  # [1, cumsum_seq_len]
+        # Compute rotary embedding
+        if self.position_embedding_type == 'rope':
+            rope_cos, rope_sin = self.rotary_emb(inputs_embeds, seq_len=seq_length)
+            rope_cos = rope_cos[position_ids].unsqueeze(2)  # [bs, seq_len, 1, dim]
+            rope_sin = rope_sin[position_ids].unsqueeze(2)  # [bs, seq_len, 1, dim]
+            rope_embeds = rope_cos, rope_sin
+        else:
+            rope_embeds = None
+        if self.type_vocab_size > 0:
+            if token_type_ids is None:
+                token_type_ids = position_ids.mul(0)
+            else:
+                if self.type_vocab_size < 2:
+                    token_type_ids.mul_(0)
+                if unpad_inputs:
+                    token_type_ids = token_type_ids[attention_mask_bool].unsqueeze(0)
+            token_type_embeddings = self.token_type_embeddings(token_type_ids)
+            embeddings = embeddings + token_type_embeddings
+        # BERT position
+        if self.position_embedding_type == "absolute":
+            position_embeddings = self.position_embeddings(position_ids)
+            embeddings = embeddings + position_embeddings
+        embeddings = self.LayerNorm(embeddings)
+        embeddings = self.dropout(embeddings)
+        return embeddings, attention_mask, rope_embeds, length
+class NewAttention(nn.Module):
+    def __init__(self, config: NewConfig, pack_qkv=None, use_memory_efficient_attention=None):
+        super().__init__()
+        self.config = config
+        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
+            raise ValueError(
+                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
+                f"heads ({config.num_attention_heads})"
+            )
+        self.hidden_size = config.hidden_size
+        self.num_attention_heads = config.num_attention_heads
+        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
+        self.all_head_size = self.num_attention_heads * self.attention_head_size
+        if pack_qkv is None:
+            pack_qkv = config.pack_qkv
+        self.pack_qkv = pack_qkv
+        if self.pack_qkv:
+            self.qkv_proj = nn.Linear(config.hidden_size, self.all_head_size * 3, bias=True)
+        else:
+            self.q_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+            self.k_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+            self.v_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
+        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
+        self.o_proj = nn.Linear(config.hidden_size, config.hidden_size, bias=True)
+        if use_memory_efficient_attention is None:
+            use_memory_efficient_attention = self.config.use_memory_efficient_attention
+        self.use_memory_efficient_attention = use_memory_efficient_attention
+        self.memory_efficient_attention = None if xops is None else xops.memory_efficient_attention
+        if self.use_memory_efficient_attention:
+            assert self.memory_efficient_attention is not None, 'please install xformers'
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: torch.FloatTensor,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        qkv_inputs: Optional[Tuple] = None,  # For RetroMAE
+    ) -> Tuple[torch.Tensor, ...]:
+        shape_hd = (self.num_attention_heads, self.attention_head_size)
+        # qkv
+        if self.pack_qkv and qkv_inputs is None:
+            qkv_pack = self.qkv_proj(hidden_states).split(self.all_head_size, dim=-1)
+        else:
+            if qkv_inputs is None:
+                qkv_inputs = (hidden_states, hidden_states, hidden_states)
+            qkv_pack = [
+                getattr(self, n + '_proj')(s) for s, n in zip(qkv_inputs, 'qkv')
+            ]
+        query_states, key_states, value_states = [t.view(t.shape[:-1] + shape_hd) for t in qkv_pack]
+        if self.config.position_embedding_type == 'rope':
+            query_states, key_states = apply_rotary_pos_emb(query_states, key_states, *rope_embeds)
+        dtype = query_states.dtype
+        if self.config.logn_attention_scale and attention_scale is not None:
+            # https://kexue.fm/archives/8823
+            query_states = query_states * attention_scale.to(dtype)
+        if padding_inputs is not None:
+            query_states = pad_input(query_states.squeeze(), *padding_inputs)
+            key_states = pad_input(key_states.squeeze(), *padding_inputs)
+            value_states = pad_input(value_states.squeeze(), *padding_inputs)
+        if self.use_memory_efficient_attention:
+            assert self.memory_efficient_attention is not None, "xformers is not loaded"
+            assert output_attentions is False, "memory_efficient_attention do not output attentions"
+            assert head_mask is None, "Not support yet"
+            attention_probs = None
+            if torch.is_tensor(attention_bias):
+                attention_bias = attention_bias.to(dtype)
+            context_layer = self.memory_efficient_attention(
+                query_states,
+                key_states,
+                value_states,
+                attn_bias=attention_bias,
+                p=self.dropout.p
+            )
+        else:
+            if output_attentions and isinstance(self, NewSdpaAttention):
+                raise RuntimeError("SDPA do not output attentions")
+            context_layer, attention_probs = self._attention(
+                query_states, key_states, value_states, attention_bias, head_mask
+            )
+        if padding_inputs is not None:
+            context_layer = unpad_input(context_layer, indices=padding_inputs[0])
+        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+        context_layer = context_layer.view(new_context_layer_shape)
+        # output proj
+        attn_output = self.o_proj(context_layer)
+        # add attentions if we output them
+        outputs = (attn_output, attention_probs) if output_attentions else (attn_output,)
+        return outputs
+    def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
+        """
+        Args:
+            q/k/v: (B, L, n_head, head_dim),
+        Returns:
+            attn_output: (B L, n_head, head_dim)
+        """
+        query_states = query_states.transpose(1, 2)
+        key_states = key_states.transpose(1, 2)
+        value_states = value_states.transpose(1, 2)
+        # Take the dot product between "query" and "key" to get the raw attention scores.
+        attention_scores = torch.matmul(query_states, key_states.transpose(-1, -2))
+        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
+        if attention_bias is not None:
+            # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
+            attention_scores = attention_scores + attention_bias
+        # Normalize the attention scores to probabilities.
+        attention_probs = nn.functional.softmax(attention_scores, dim=-1)
+        # This is actually dropping out entire tokens to attend to, which might
+        # seem a bit unusual, but is taken from the original Transformer paper.
+        if self.dropout.p > 0:
+            attention_probs = self.dropout(attention_probs)
+        # Mask heads if we want to
+        if head_mask is not None:
+            attention_probs = attention_probs * head_mask
+        context_layer = torch.matmul(attention_probs, value_states)
+        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
+        return context_layer, attention_probs
+class NewSdpaAttention(NewAttention):
+    """
+    New attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
+    `NewAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
+    SDPA API.
+    """
+    def __init__(self, config: NewConfig, **kwargs):
+        super().__init__(config, **kwargs)
+        # torch.backends.cuda.enable_mem_efficient_sdp(False)
+        # logger.warning(
+        #     "Disable memory efficient attention kernel for `NewSdpaAttention`, you can set "
+        #     "`use_memory_efficient_attention=True` if it expected to use."
+        # )
+    def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
+        attn_output = torch.nn.functional.scaled_dot_product_attention(
+            query_states.transpose(1, 2),
+            key_states.transpose(1, 2),
+            value_states.transpose(1, 2),
+            attn_mask=attention_bias,
+            dropout_p=self.dropout.p if self.training else 0.0,
+        )
+        attn_output = attn_output.permute(0, 2, 1, 3).contiguous()
+        return attn_output, None
+NEW_ATTENTION_CLASSES = {
+    "eager": NewAttention,
+    # "flash_attention_2": ,  # TODO
+    "sdpa": NewSdpaAttention,
+}
+class NewGatedMLP(nn.Module):
+    """
+    GLU Variants Improve Transformer.
+    """
+    def __init__(self, config: NewConfig):
+        super().__init__()
+        self.intermediate_size = config.intermediate_size
+        self.up_gate_proj = nn.Linear(config.hidden_size, self.intermediate_size * 2, bias=False)
+        self.down_proj = nn.Linear(self.intermediate_size, config.hidden_size, bias=True)
+        self.act_fn = ACT2FN[config.hidden_act]
+        if config.hidden_dropout_prob > 0:
+            self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
+        else:
+            self.hidden_dropout = None
+    def forward(self, hidden_states):
+        up_gate = self.up_gate_proj(hidden_states)
+        up_states, gate = torch.split(up_gate, self.intermediate_size, dim=-1)
+        gate = self.act_fn(gate)
+        gated_states = gate * up_states
+        if self.hidden_dropout is not None:
+            gated_states = self.hidden_dropout(gated_states)
+        down_states = self.down_proj(gated_states)
+        return down_states
+class NewLayer(nn.Module):
+    def __init__(
+        self,
+        config: NewConfig,
+        pack_qkv=None,
+        use_memory_efficient_attention=None,
+        attn_implementation=None
+    ):
+        super().__init__()
+        if attn_implementation is None:
+            attn_implementation = config._attn_implementation
+        if use_memory_efficient_attention is None:
+            use_memory_efficient_attention = config.use_memory_efficient_attention
+        if use_memory_efficient_attention:
+            if attn_implementation != 'eager':
+                logger.warning_once(f"Override {attn_implementation=} to 'eager' as {use_memory_efficient_attention=}")
+                attn_implementation = 'eager'  # Since it will be SDPA by default for torch>=2.1.1
+        self.attention = NEW_ATTENTION_CLASSES[attn_implementation](
+            config, pack_qkv=pack_qkv, use_memory_efficient_attention=use_memory_efficient_attention
+        )
+        self.mlp = NewGatedMLP(config)
+        ln_class = LAYER_NORM[config.layer_norm_type]
+        self.attn_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
+        self.mlp_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
+        if config.hidden_dropout_prob > 0:
+            self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
+        else:
+            self.hidden_dropout = None
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: torch.FloatTensor,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        qkv_inputs: Optional[Tuple] = None,  # For RetroMAE
+    ) -> Tuple[torch.Tensor, ...]:
+        # Multi head self attention
+        residual = hidden_states if qkv_inputs is None else qkv_inputs[0]
+        attention_outputs = self.attention(
+            hidden_states,
+            attention_bias,
+            rope_embeds,
+            padding_inputs,
+            attention_scale,
+            head_mask,
+            output_attentions=output_attentions,
+            qkv_inputs=qkv_inputs,
+        )
+        hidden_states = attention_outputs[0]
+        if self.hidden_dropout is not None:
+            hidden_states = self.hidden_dropout(hidden_states)
+        hidden_states = residual + hidden_states
+        # In pretraining, after the attention of last layer, we only need the masked tokens.
+        if subset_indices is not None:
+            hidden_states = hidden_states[subset_indices]
+        hidden_states = self.attn_ln(hidden_states)
+        # Fully Connected
+        residual = hidden_states
+        hidden_states = self.mlp(hidden_states)
+        if self.hidden_dropout is not None:
+            hidden_states = self.hidden_dropout(hidden_states)
+        hidden_states = residual + hidden_states
+        hidden_states = self.mlp_ln(hidden_states)
+        # add self attentions if we output attention weights
+        outputs = (hidden_states,) + attention_outputs[1:]
+        return outputs
+class NewEncoder(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.layer = nn.ModuleList([NewLayer(config) for _ in range(config.num_hidden_layers)])
+        self.gradient_checkpointing = False
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_bias: Optional[torch.FloatTensor] = None,
+        rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
+        padding_inputs: Optional[Tuple] = None,  # indices, batch, seqlen
+        attention_scale: Optional[torch.FloatTensor] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        head_mask: Optional[torch.FloatTensor] = None,
+        output_attentions: Optional[bool] = False,
+        output_hidden_states: Optional[bool] = False,
+        return_dict: Optional[bool] = True,
+    ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
+        all_hidden_states = () if output_hidden_states else None
+        all_self_attentions = () if output_attentions else None
+        for i, layer_module in enumerate(self.layer):
+            if output_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+            if i >= len(self.layer) - 1:
+                layer_subset_indices = subset_indices
+            else:
+                layer_subset_indices = None
+            layer_head_mask = head_mask[i] if head_mask is not None else None
+            if self.gradient_checkpointing and self.training:
+                layer_outputs = self._gradient_checkpointing_func(
+                    layer_module.__call__,
+                    hidden_states,
+                    attention_bias,
+                    rope_embeds,
+                    padding_inputs,
+                    attention_scale,
+                    layer_subset_indices,
+                    layer_head_mask,
+                )
+            else:
+                layer_outputs = layer_module(
+                    hidden_states,
+                    attention_bias,
+                    rope_embeds,
+                    padding_inputs,
+                    attention_scale,
+                    layer_subset_indices,
+                    layer_head_mask,
+                    output_attentions,
+                )
+            hidden_states = layer_outputs[0]
+            if output_attentions:
+                all_self_attentions = all_self_attentions + (layer_outputs[1],)
+        if output_hidden_states:
+            all_hidden_states = all_hidden_states + (hidden_states,)
+        if not return_dict:
+            return tuple(
+                v
+                for v in [
+                    hidden_states,
+                    all_hidden_states,
+                    all_self_attentions,
+                ]
+                if v is not None
+            )
+        return BaseModelOutput(
+            last_hidden_state=hidden_states,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attentions,
+        )
+# Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->New
+class NewPooler(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.activation = nn.Tanh()
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        # We "pool" the model by simply taking the hidden state corresponding
+        # to the first token.
+        first_token_tensor = hidden_states[:, 0]
+        pooled_output = self.dense(first_token_tensor)
+        pooled_output = self.activation(pooled_output)
+        return pooled_output
+class NewPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+    config_class = NewConfig
+    base_model_prefix = "new"
+    supports_gradient_checkpointing = True
+    _supports_sdpa = True
+    def _init_weights(self, module):
+        """Initialize the weights"""
+        if isinstance(module, nn.Linear):
+            # Slightly different from the TF version which uses truncated_normal for initialization
+            # cf https://github.com/pytorch/pytorch/pull/5617
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.Embedding):
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.padding_idx is not None:
+                module.weight.data[module.padding_idx].zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+class NewModel(NewPreTrainedModel):
+    """
+    The bare New Model transformer outputting raw hidden-states without any specific head on top.
+    """
+    def __init__(self, config: NewConfig, add_pooling_layer=False):
+        super().__init__(config)
+        self.config = config
+        self.embeddings = NewEmbeddings(config)
+        self.encoder = NewEncoder(config)
+        self.pooler = NewPooler(config) if add_pooling_layer else None
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self):
+        return self.embeddings.word_embeddings
+    def set_input_embeddings(self, value):
+        self.embeddings.word_embeddings = value
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        length: Optional[List[int]] = None,
+        subset_indices: Optional[torch.LongTensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]:
+        r"""
+        length  (`list` of length `batch_size`, *optional*):
+            If is `None`, return padded `last_hidden_state`.
+        subset_indices  ():
+            pass
+        unpad_inputs  (`bool`, *optional*):
+            pass
+        """
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        unpad_inputs = unpad_inputs if unpad_inputs is not None else self.config.unpad_inputs
+        output_padded = length is None
+        if input_ids is not None and inputs_embeds is not None:
+            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
+        elif input_ids is not None:
+            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
+            input_shape = input_ids.size()
+        elif inputs_embeds is not None:
+            input_shape = inputs_embeds.size()[:-1]
+        else:
+            raise ValueError("You have to specify either input_ids or inputs_embeds")
+        # TODO: not used
+        # # Prepare head mask if needed
+        # # 1.0 in head_mask indicate we keep the head
+        # # attention_probs has shape bsz x n_heads x N x N
+        # # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
+        # # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
+        # head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
+        # Get embeddings, may unpad them
+        (embedding_output, attention_mask, rope_embeds, length) = self.embeddings(
+            unpad_inputs,
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            length=length,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            inputs_embeds=inputs_embeds
+        )
+        batch_size, seq_length = input_shape
+        if unpad_inputs and self.config.use_memory_efficient_attention:
+            attention_bias = xops.fmha.attn_bias.BlockDiagonalMask.from_seqlens(length)
+        else:
+            # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
+            # ourselves in which case we just need to make it broadcastable to all heads.
+            attention_bias = self.get_extended_attention_mask(attention_mask, input_shape)
+            if self.config.use_memory_efficient_attention:
+                # Invalid shape for attention bias: torch.Size([48, 1, 1, 512]) (expected (48, 12, 512, 512))
+                attention_bias = attention_bias.expand(-1, self.config.num_attention_heads, seq_length, -1)
+        padding_inputs = None
+        if unpad_inputs and (output_padded or not self.config.use_memory_efficient_attention):
+            indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+            if not self.config.use_memory_efficient_attention:
+                padding_inputs = (indices, *input_shape)
+        attention_scale = None
+        if self.config.logn_attention_scale:
+            logger.warning_once("TODO: logn_attention_scale")
+        #     # attention scale log_512(input_len)
+        #     attention_scale = attention_mask.sum(1).log() / torch.tensor(self.config.max_position_embeddings).log()
+        #     # inference-time logn scale need clip 1
+        #     if self.config.logn_attention_clip1:
+        #         attention_scale.clip_(1)
+        #     attention_scale = attention_scale[:, None, None, None]
+        # else:
+        #     attention_scale = None
+        encoder_outputs = self.encoder(
+            embedding_output,
+            attention_bias=attention_bias,
+            rope_embeds=rope_embeds,
+            padding_inputs=padding_inputs,
+            attention_scale=attention_scale,
+            subset_indices=subset_indices,
+            head_mask=head_mask,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = encoder_outputs[0]
+        if unpad_inputs and output_padded:
+            sequence_output = pad_input(
+                sequence_output.squeeze(), indices, batch_size, seq_length
+            )
+        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
+        if not return_dict:
+            return (sequence_output, pooled_output) + encoder_outputs[1:]
+        return BaseModelOutputWithPooling(
+            last_hidden_state=sequence_output,
+            pooler_output=pooled_output,
+            hidden_states=encoder_outputs.hidden_states,
+            attentions=encoder_outputs.attentions,
+        )
+class NewLMPredictionHead(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.transform_act_fn = ACT2FN[config.hidden_act]
+        self.norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        # The output weights are the same as the input embeddings, but there is
+        # an output-only bias for each token.
+        self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
+    def forward(self, hidden_states):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.transform_act_fn(hidden_states)
+        hidden_states = self.norm(hidden_states)
+        hidden_states = self.decoder(hidden_states)
+        return hidden_states
+class NewForMaskedLM(NewPreTrainedModel):
+    _tied_weights_keys = ["lm_head.decoder.bias", "lm_head.decoder.weight"]
+    def __init__(self, config: NewConfig):
+        super().__init__(config)
+        self.new = NewModel(config, add_pooling_layer=False)
+        self.lm_head = NewLMPredictionHead(config)
+        self.loss_fct = nn.CrossEntropyLoss()
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_output_embeddings(self):
+        return self.lm_head.decoder
+    def set_output_embeddings(self, new_embeddings):
+        self.lm_head.decoder = new_embeddings
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
+            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
+            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        if labels is None or not self.new.config.unpad_inputs:
+            length = None
+            subset_indices = None
+        else:
+            length = attention_mask.sum(-1).tolist()
+            labels = labels[attention_mask.bool()].unsqueeze(0)
+            subset_indices = labels > -100
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            length=length,
+            subset_indices=subset_indices,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        prediction_scores = self.lm_head(sequence_output)
+        masked_lm_loss = None
+        if labels is not None:
+            if subset_indices is None:
+                mask = attention_mask.bool()
+                prediction_scores = prediction_scores[mask]
+                labels = labels[mask]
+            else:
+                labels = labels[subset_indices]
+            masked_lm_loss = self.loss_fct(prediction_scores, labels)
+        if not return_dict:
+            output = (prediction_scores,) + outputs[2:]
+            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
+        return MaskedLMOutput(
+            loss=masked_lm_loss,
+            logits=prediction_scores,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForSequenceClassification(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.config = config
+        self.new = NewModel(config, add_pooling_layer=True)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
+            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
+            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        loss = None
+        if labels is not None:
+            if self.config.problem_type is None:
+                if self.num_labels == 1:
+                    self.config.problem_type = "regression"
+                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
+                    self.config.problem_type = "single_label_classification"
+                else:
+                    self.config.problem_type = "multi_label_classification"
+            if self.config.problem_type == "regression":
+                loss_fct = nn.MSELoss()
+                if self.num_labels == 1:
+                    loss = loss_fct(logits.squeeze(), labels.squeeze())
+                else:
+                    loss = loss_fct(logits, labels)
+            elif self.config.problem_type == "single_label_classification":
+                loss_fct = nn.CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            elif self.config.problem_type == "multi_label_classification":
+                loss_fct = nn.BCEWithLogitsLoss()
+                loss = loss_fct(logits, labels)
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return SequenceClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForMultipleChoice(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.new = NewModel(config, add_pooling_layer=True)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, 1)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
+            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
+            `input_ids` above)
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
+        input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
+        attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
+        token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
+        position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
+        inputs_embeds = (
+            inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
+            if inputs_embeds is not None
+            else None
+        )
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        reshaped_logits = logits.view(-1, num_choices)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(reshaped_logits, labels)
+        if not return_dict:
+            output = (reshaped_logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return MultipleChoiceModelOutput(
+            loss=loss,
+            logits=reshaped_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+@dataclass
+class NewTokenClassifierOutput(ModelOutput):
+    loss: Optional[torch.FloatTensor] = None
+    logits: torch.FloatTensor = None
+    last_hidden_state: torch.FloatTensor = None
+    hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
+    attentions: Optional[Tuple[torch.FloatTensor, ...]] = None
+class NewForTokenClassification(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.new = NewModel(config, add_pooling_layer=False)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        labels: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], NewTokenClassifierOutput]:
+        r"""
+        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        sequence_output = self.dropout(sequence_output)
+        logits = self.classifier(sequence_output)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+        return NewTokenClassifierOutput(
+            loss=loss,
+            logits=logits,
+            last_hidden_state=sequence_output,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+class NewForQuestionAnswering(NewPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.new = NewModel(config, add_pooling_layer=False)
+        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        token_type_ids: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.Tensor] = None,
+        head_mask: Optional[torch.Tensor] = None,
+        inputs_embeds: Optional[torch.Tensor] = None,
+        start_positions: Optional[torch.Tensor] = None,
+        end_positions: Optional[torch.Tensor] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        unpad_inputs: Optional[bool] = None,
+    ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
+        r"""
+        start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the start of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
+            Labels for position (index) of the end of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
+            are not taken into account for computing the loss.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        outputs = self.new(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            unpad_inputs=unpad_inputs,
+        )
+        sequence_output = outputs[0]
+        logits = self.qa_outputs(sequence_output)
+        start_logits, end_logits = logits.split(1, dim=-1)
+        start_logits = start_logits.squeeze(-1).contiguous()
+        end_logits = end_logits.squeeze(-1).contiguous()
+        total_loss = None
+        if start_positions is not None and end_positions is not None:
+            # If we are on multi-GPU, split add a dimension
+            if len(start_positions.size()) > 1:
+                start_positions = start_positions.squeeze(-1)
+            if len(end_positions.size()) > 1:
+                end_positions = end_positions.squeeze(-1)
+            # sometimes the start/end positions are outside our model inputs, we ignore these terms
+            ignored_index = start_logits.size(1)
+            start_positions = start_positions.clamp(0, ignored_index)
+            end_positions = end_positions.clamp(0, ignored_index)
+            loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
+            start_loss = loss_fct(start_logits, start_positions)
+            end_loss = loss_fct(end_logits, end_positions)
+            total_loss = (start_loss + end_loss) / 2
+        if not return_dict:
+            output = (start_logits, end_logits) + outputs[2:]
+            return ((total_loss,) + output) if total_loss is not None else output
+        return QuestionAnsweringModelOutput(
+            loss=total_loss,
+            start_logits=start_logits,
+            end_logits=end_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )

document_0_MLMTransformer/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false
+}

document_0_MLMTransformer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

document_0_MLMTransformer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "max_length": 8192,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "TokenizersBackend",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

document_1_SpladePooling/config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+    "pooling_strategy": "max",
+    "activation_function": "log1p_relu",
+    "word_embedding_dimension": 30522
+}

modules.json ADDED Viewed

	@@ -0,0 +1,8 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Router"
+  }
+]

query_0_SparseStaticEmbedding/config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "frozen": false
+}

query_0_SparseStaticEmbedding/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db73ae4c08aa8a138704c73f4314296d6f6fe0e4bc2283d11de954db26f6f159
+size 122168

query_0_SparseStaticEmbedding/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

query_0_SparseStaticEmbedding/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "max_length": 8192,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "TokenizersBackend",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

router_config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+    "types": {
+        "query_0_SparseStaticEmbedding": "sentence_transformers.sparse_encoder.models.SparseStaticEmbedding.SparseStaticEmbedding",
+        "document_0_MLMTransformer": "sentence_transformers.sparse_encoder.models.MLMTransformer.MLMTransformer",
+        "document_1_SpladePooling": "sentence_transformers.sparse_encoder.models.SpladePooling.SpladePooling"
+    },
+    "structure": {
+        "query": [
+            "query_0_SparseStaticEmbedding"
+        ],
+        "document": [
+            "document_0_MLMTransformer",
+            "document_1_SpladePooling"
+        ]
+    },
+    "parameters": {
+        "default_route": "document",
+        "allow_empty_key": true
+    }
+}