ubergarm commited on
Commit
084e588
·
1 Parent(s): ba689ec

add some perplexity data

Browse files
README.md CHANGED
@@ -11,11 +11,6 @@ tags:
11
  - step3p5
12
  ---
13
 
14
- ## WIP
15
- Only one test quant for now, a custom `IQ4_XS` which runs on both mainline llama.cpp and [ik_llama.cpp now that this was just merged to main](https://github.com/ikawrakow/ik_llama.cpp/pull/1240).
16
-
17
- I'm cooking imatrix now and planning to release some more ik_llama.cpp quants on Saturday!
18
-
19
  ## `ik_llama.cpp` imatrix Quantizations of stepfun-ai/Step-3.5-Flash
20
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
21
 
@@ -35,31 +30,72 @@ Perplexity computed against *wiki.test.raw*. (lower is "better")
35
 
36
  ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.")
37
 
38
- These two are just a test quants for baseline perplexity comparison:
39
  * `BF16` 366.952 GiB (16.004 BPW)
40
- - TODO
41
  * `Q8_0` 195.031 GiB (8.506 BPW)
42
- - TODO
43
 
44
  *NOTE*: The first split file is much smaller on purpose to only contain metadata, its fine!
45
 
46
- ## IQ5_K TODO
47
- TODO
48
 
49
  <details>
50
 
51
  <summary>👈 Secret Recipe</summary>
52
 
53
  ```bash
54
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ```
56
 
57
  </details>
58
 
59
  ## IQ4_XS 100.53 GiB (4.38 BPW)
60
- TODO
61
 
62
- *NOTE*: This is the first test quant and does not use imatrix. It is compatible with mainline llama.cpp as well.
63
 
64
  <details>
65
 
@@ -111,33 +147,61 @@ numactl -N ${SOCKET} -m ${SOCKET} \
111
 
112
  </details>
113
 
114
- ## IQ4_KSS TODO
115
- TODO
116
 
117
  <details>
118
 
119
  <summary>👈 Secret Recipe</summary>
120
 
121
  ```bash
122
- echo TODO
123
- ```
124
 
125
- </details>
 
126
 
127
- ## IQ3_KS TODO
128
- TODO
 
 
 
 
129
 
130
- <details>
 
 
131
 
132
- <summary>👈 Secret Recipe</summary>
 
 
133
 
134
- ```bash
135
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  ```
137
 
138
  </details>
139
 
140
- ## IQ2_KS TODO
141
  TODO
142
 
143
  <details>
@@ -185,9 +249,9 @@ numactl -N "$SOCKET" -m "$SOCKET" \
185
  --jinja
186
  ```
187
 
188
- For tool use you can always bring your own template with `--jinja --chat-template-file myTemplate.jinja` and might need `--special` etc. The chat template baked into these GGUFs from the [original one](https://huggingface.co/stepfun-ai/Step-3.5-Flash/blob/main/chat_template.jinja). However just for tool use, it is possible [to copy paste the line out of this one](https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4/blob/main/step3p5_flash_Q4_K_S-00001-of-00012.gguf) but seems to mess it up for normal usage.
189
 
190
- Another option is to check out [pwilkin's autoparser branch](https://github.com/ggml-org/llama.cpp/pull/18675) which might work best in many cases.
191
 
192
  ## References
193
  * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
 
11
  - step3p5
12
  ---
13
 
 
 
 
 
 
14
  ## `ik_llama.cpp` imatrix Quantizations of stepfun-ai/Step-3.5-Flash
15
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
16
 
 
30
 
31
  ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.")
32
 
33
+ These two are just a test quants for baseline perplexity comparison and not available for download here:
34
  * `BF16` 366.952 GiB (16.004 BPW)
35
+ - PPL over 561 chunks for n_ctx=512 = 2.4169 +/- 0.01107
36
  * `Q8_0` 195.031 GiB (8.506 BPW)
37
+ - PPL over 561 chunks for n_ctx=512 = 2.4188 +/- 0.01109
38
 
39
  *NOTE*: The first split file is much smaller on purpose to only contain metadata, its fine!
40
 
41
+ ## IQ5_K 136.891 GiB (5.970 BPW)
42
+ PPL over 561 chunks for n_ctx=512 = 2.4304 +/- 0.01117
43
 
44
  <details>
45
 
46
  <summary>👈 Secret Recipe</summary>
47
 
48
  ```bash
49
+ #!/usr/bin/env bash
50
+
51
+ custom="
52
+ # 45 Repeating Layers [0-44]
53
+
54
+ # Attention [0-44] GPU
55
+ blk\..*\.attn_gate.*=q8_0
56
+ blk\..*\.attn_q.*=q8_0
57
+ blk\..*\.attn_k.*=q8_0
58
+ blk\..*\.attn_v.*=q8_0
59
+ blk\..*\.attn_output.*=q8_0
60
+
61
+ # First 3 Dense Layers [0-2] GPU
62
+ blk\..*\.ffn_down\.weight=q8_0
63
+ blk\..*\.ffn_(gate|up)\.weight=q8_0
64
+
65
+ # Shared Expert Layers [3-44] GPU
66
+ blk\..*\.ffn_down_shexp\.weight=q8_0
67
+ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
68
+
69
+ # Routed Experts Layers [3-44] CPU
70
+ blk\..*\.ffn_down_exps\.weight=iq6_k
71
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq5_k
72
+
73
+ # Non-Repeating Layers
74
+ token_embd\.weight=q8_0
75
+ output\.weight=q8_0
76
+ "
77
+
78
+ custom=$(
79
+ echo "$custom" | grep -v '^#' | \
80
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
81
+ )
82
+
83
+ numactl -N ${SOCKET} -m ${SOCKET} \
84
+ ./build/bin/llama-quantize \
85
+ --custom-q "$custom" \
86
+ --imatrix /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat \
87
+ /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf \
88
+ /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-IQ5_K.gguf \
89
+ IQ5_K \
90
+ 128
91
  ```
92
 
93
  </details>
94
 
95
  ## IQ4_XS 100.53 GiB (4.38 BPW)
96
+ PPL over 561 chunks for n_ctx=512 = 2.5181 +/- 0.01178
97
 
98
+ *NOTE*: This mainline compatible quant does not use imatrix.
99
 
100
  <details>
101
 
 
147
 
148
  </details>
149
 
150
+ ## smol-IQ3_KS 75.934 GiB (3.312 BPW)
151
+ PPL over 561 chunks for n_ctx=512 = 2.7856 +/- 0.01365
152
 
153
  <details>
154
 
155
  <summary>👈 Secret Recipe</summary>
156
 
157
  ```bash
158
+ #!/usr/bin/env bash
 
159
 
160
+ custom="
161
+ # 45 Repeating Layers [0-44]
162
 
163
+ # Attention [0-44] GPU
164
+ blk\..*\.attn_gate.*=iq6_k
165
+ blk\..*\.attn_q.*=iq6_k
166
+ blk\..*\.attn_k.*=iq6_k
167
+ blk\..*\.attn_v.*=iq6_k
168
+ blk\..*\.attn_output.*=iq6_k
169
 
170
+ # First 3 Dense Layers [0-2] GPU
171
+ blk\..*\.ffn_down\.weight=iq6_k
172
+ blk\..*\.ffn_(gate|up)\.weight=iq6_k
173
 
174
+ # Shared Expert Layers [3-44] GPU
175
+ blk\..*\.ffn_down_shexp\.weight=iq6_k
176
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq6_k
177
 
178
+ # Routed Experts Layers [3-44] CPU
179
+ blk\..*\.ffn_down_exps\.weight=iq3_ks
180
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks
181
+
182
+ # Non-Repeating Layers
183
+ token_embd\.weight=iq4_k
184
+ output\.weight=iq6_k
185
+ "
186
+
187
+ custom=$(
188
+ echo "$custom" | grep -v '^#' | \
189
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
190
+ )
191
+
192
+ numactl -N ${SOCKET} -m ${SOCKET} \
193
+ ./build/bin/llama-quantize \
194
+ --custom-q "$custom" \
195
+ --imatrix /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat \
196
+ /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf \
197
+ /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-smol-IQ3_KS.gguf \
198
+ IQ3_KS \
199
+ 128
200
  ```
201
 
202
  </details>
203
 
204
+ ## smol-IQ2_KS TODO
205
  TODO
206
 
207
  <details>
 
249
  --jinja
250
  ```
251
 
252
+ For tool use you can always bring your own template with `--chat-template-file myTemplate.jinja` and might need `--special` etc. The chat template baked into these GGUFs from the [original one](https://huggingface.co/stepfun-ai/Step-3.5-Flash/blob/main/chat_template.jinja).
253
 
254
+ Another option for mainline tool calling users is to check out [pwilkin's autoparser branch](https://github.com/ggml-org/llama.cpp/pull/18675).
255
 
256
  ## References
257
  * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
images/perplexity.png ADDED

Git LFS Details

  • SHA256: dc7c397099cb15347d757c12a474fb9cacc5ba8f13c3e9922946b7c7c777d95e
  • Pointer size: 131 Bytes
  • Size of remote file: 208 kB
logs/imatrix-Step-3.5-Flash-BF16.log ADDED
@@ -0,0 +1,780 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numactl -N 0 -m 0 ./build/bin/llama-imatrix --model /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf -f ubergarm-imatrix-calibration-corpus-v02.txt -o /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat --no-fused-moe --no-fused-up-gate --no-fused-mul-multiadd --ctx-size 512 -ub 4096 -b 4096 --threads 96 --threads-batch 128 --no-mmap --numa numactl --verbosity 1 --layer-similarity
2
+
3
+ CPU: using device CPU - 0 MiB free
4
+ llama_model_loader: additional 8 GGUFs metadata loaded.
5
+ llama_model_loader: loaded meta data with 50 key-value pairs and 754 tensors from /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf (version GGUF V3 (latest))
6
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
7
+ llama_model_loader: - kv 0: general.architecture str = step35
8
+ llama_model_loader: - kv 1: general.type str = model
9
+ llama_model_loader: - kv 2: general.name str = Step 3.5 Flash
10
+ llama_model_loader: - kv 3: general.size_label str = 288x7.4B
11
+ llama_model_loader: - kv 4: general.license str = apache-2.0
12
+ llama_model_loader: - kv 5: general.base_model.count u32 = 1
13
+ llama_model_loader: - kv 6: general.base_model.0.name str = Step 3.5 Flash
14
+ llama_model_loader: - kv 7: general.base_model.0.organization str = Stepfun Ai
15
+ llama_model_loader: - kv 8: general.base_model.0.repo_url str = https://huggingface.co/stepfun-ai/ste...
16
+ llama_model_loader: - kv 9: step35.block_count u32 = 45
17
+ llama_model_loader: - kv 10: step35.context_length u32 = 262144
18
+ llama_model_loader: - kv 11: step35.embedding_length u32 = 4096
19
+ llama_model_loader: - kv 12: step35.feed_forward_length u32 = 11264
20
+ llama_model_loader: - kv 13: step35.attention.head_count arr[i32,45] = [64, 96, 96, 96, 64, 96, 96, 96, 64, ...
21
+ llama_model_loader: - kv 14: step35.rope.freq_base f32 = 5000000.000000
22
+ llama_model_loader: - kv 15: step35.rope.freq_base_swa f32 = 10000.000000
23
+ llama_model_loader: - kv 16: step35.expert_gating_func u32 = 2
24
+ llama_model_loader: - kv 17: step35.attention.key_length u32 = 128
25
+ llama_model_loader: - kv 18: step35.attention.value_length u32 = 128
26
+ llama_model_loader: - kv 19: general.file_type u32 = 32
27
+ llama_model_loader: - kv 20: step35.attention.head_count_kv arr[i32,45] = [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...
28
+ llama_model_loader: - kv 21: step35.attention.sliding_window u32 = 512
29
+ llama_model_loader: - kv 22: step35.attention.sliding_window_pattern arr[i32,45] = [0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ...
30
+ llama_model_loader: - kv 23: step35.expert_count u32 = 288
31
+ llama_model_loader: - kv 24: step35.expert_used_count u32 = 8
32
+ llama_model_loader: - kv 25: step35.expert_feed_forward_length u32 = 1280
33
+ llama_model_loader: - kv 26: step35.expert_shared_feed_forward_length u32 = 1280
34
+ llama_model_loader: - kv 27: step35.expert_weights_scale f32 = 3.000000
35
+ llama_model_loader: - kv 28: step35.expert_weights_norm bool = true
36
+ llama_model_loader: - kv 29: step35.leading_dense_block_count u32 = 3
37
+ llama_model_loader: - kv 30: step35.moe_every_n_layers u32 = 1
38
+ llama_model_loader: - kv 31: step35.attention.layer_norm_rms_epsilon f32 = 0.000010
39
+ llama_model_loader: - kv 32: step35.swiglu_clamp_exp arr[f32,45] = [0.000000, 0.000000, 0.000000, 0.0000...
40
+ llama_model_loader: - kv 33: step35.swiglu_clamp_shexp arr[f32,45] = [0.000000, 0.000000, 0.000000, 0.0000...
41
+ llama_model_loader: - kv 34: general.quantization_version u32 = 2
42
+ llama_model_loader: - kv 35: tokenizer.ggml.model str = gpt2
43
+ llama_model_loader: - kv 36: tokenizer.ggml.pre str = deepseek-v3
44
+ llama_model_loader: - kv 37: tokenizer.ggml.tokens arr[str,128896] = ["<|begin▁of▁sentence|>", "<�...
45
+ llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,128896] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
46
+ llama_model_loader: - kv 39: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
47
+ llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 0
48
+ llama_model_loader: - kv 41: tokenizer.ggml.eos_token_id u32 = 128007
49
+ llama_model_loader: - kv 42: tokenizer.ggml.padding_token_id u32 = 1
50
+ llama_model_loader: - kv 43: tokenizer.ggml.add_bos_token bool = true
51
+ llama_model_loader: - kv 44: tokenizer.ggml.add_sep_token bool = false
52
+ llama_model_loader: - kv 45: tokenizer.ggml.add_eos_token bool = false
53
+ llama_model_loader: - kv 46: tokenizer.chat_template str = {% macro render_content(content) %}{%...
54
+ llama_model_loader: - kv 47: split.no u16 = 0
55
+ llama_model_loader: - kv 48: split.count u16 = 9
56
+ llama_model_loader: - kv 49: split.tensors.count i32 = 754
57
+ llama_model_loader: - type f32: 266 tensors
58
+ llama_model_loader: - type bf16: 488 tensors
59
+ load: printing all EOG tokens:
60
+ load: - 128007 ('<|im_end|>')
61
+ load: special tokens cache size = 818
62
+ load: token to piece cache size = 0.8220 MB
63
+ llm_load_print_meta: format = GGUF V3 (latest)
64
+ llm_load_print_meta: arch = step35
65
+ llm_load_print_meta: n_ctx_train = 262144
66
+ llm_load_print_meta: n_embd = 4096
67
+ llm_load_print_meta: n_layer = 45
68
+ llm_load_print_meta: n_head = [64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64]
69
+ llm_load_print_meta: n_head_kv = 8
70
+ llm_load_print_meta: n_rot = 128
71
+ llm_load_print_meta: n_swa = 512
72
+ llm_load_print_meta: n_swa_pattern = 1
73
+ llm_load_print_meta: n_embd_head_k = 128
74
+ llm_load_print_meta: n_embd_head_v = 128
75
+ llm_load_print_meta: n_gqa = [8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8]
76
+ llm_load_print_meta: n_embd_k_gqa = 1024
77
+ llm_load_print_meta: n_embd_v_gqa = 1024
78
+ llm_load_print_meta: f_norm_eps = 0.0e+00
79
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-05
80
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
81
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
82
+ llm_load_print_meta: f_logit_scale = 0.0e+00
83
+ llm_load_print_meta: n_ff = 11264
84
+ llm_load_print_meta: n_expert = 288
85
+ llm_load_print_meta: n_expert_used = 8
86
+ llm_load_print_meta: causal attn = 1
87
+ llm_load_print_meta: pooling type = 0
88
+ llm_load_print_meta: rope type = 2
89
+ llm_load_print_meta: rope scaling = linear
90
+ llm_load_print_meta: freq_base_train = 5000000.0
91
+ llm_load_print_meta: freq_scale_train = 1
92
+ llm_load_print_meta: n_ctx_orig_yarn = 262144
93
+ llm_load_print_meta: rope_finetuned = unknown
94
+ llm_load_print_meta: ssm_d_conv = 0
95
+ llm_load_print_meta: ssm_d_inner = 0
96
+ llm_load_print_meta: ssm_d_state = 0
97
+ llm_load_print_meta: ssm_dt_rank = 0
98
+ llm_load_print_meta: model type = ?B
99
+ llm_load_print_meta: model ftype = BF16
100
+ llm_load_print_meta: model params = 196.956 B
101
+ llm_load_print_meta: model size = 366.952 GiB (16.004 BPW)
102
+ llm_load_print_meta: repeating layers = 364.986 GiB (16.004 BPW, 195.900 B parameters)
103
+ llm_load_print_meta: general.name = Step 3.5 Flash
104
+ print_info: vocab type = BPE
105
+ print_info: n_vocab = 128896
106
+ print_info: n_merges = 127741
107
+ print_info: BOS token = 0 '<|begin▁of▁sentence|>'
108
+ print_info: EOS token = 128007 '<|im_end|>'
109
+ print_info: EOT token = 128007 '<|im_end|>'
110
+ print_info: PAD token = 1 '<|end▁of▁sentence|>'
111
+ print_info: LF token = 201 'Ċ'
112
+ print_info: FIM PRE token = 128801 '<|fim▁begin|>'
113
+ print_info: FIM SUF token = 128800 '<|fim▁hole|>'
114
+ print_info: FIM MID token = 128802 '<|fim▁end|>'
115
+ print_info: EOG token = 128007 '<|im_end|>'
116
+ print_info: max token length = 256
117
+ llm_load_tensors: ggml ctx size = 0.31 MiB
118
+ llm_load_tensors: offloading 0 repeating layers to GPU
119
+ llm_load_tensors: offloaded 0/46 layers to GPU
120
+ llm_load_tensors: CPU buffer size = 375759.27 MiB
121
+ ....................................................................................................
122
+ llama_new_context_with_model: n_ctx = 512
123
+ llama_new_context_with_model: n_batch = 512
124
+ llama_new_context_with_model: n_ubatch = 512
125
+ llama_new_context_with_model: flash_attn = 1
126
+ llama_new_context_with_model: attn_max_b = 0
127
+ llama_new_context_with_model: fused_moe = 0
128
+ llama_new_context_with_model: grouped er = 0
129
+ llama_new_context_with_model: fused_up_gate = 0
130
+ llama_new_context_with_model: fused_mmad = 0
131
+ llama_new_context_with_model: rope_cache = 0
132
+ llama_new_context_with_model: graph_reuse = 1
133
+ llama_new_context_with_model: k_cache_hadam = 0
134
+ llama_new_context_with_model: split_mode_graph_scheduling = 0
135
+ llama_new_context_with_model: reduce_type = f16
136
+ llama_new_context_with_model: sched_async = 0
137
+ llama_new_context_with_model: ser = -1, 0
138
+ llama_new_context_with_model: freq_base = 5000000.0
139
+ llama_new_context_with_model: freq_scale = 1
140
+ llama_kv_cache_init: CPU KV buffer size = 90.00 MiB
141
+ llama_new_context_with_model: KV self size = 90.00 MiB, K (f16): 45.00 MiB, V (f16): 45.00 MiB
142
+ llama_new_context_with_model: CPU output buffer size = 0.49 MiB
143
+ llama_new_context_with_model: CPU compute buffer size = 259.75 MiB
144
+ llama_new_context_with_model: graph nodes = 2369
145
+ llama_new_context_with_model: graph splits = 1
146
+ XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
147
+
148
+ system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
149
+ compute_imatrix: tokenizing the input ..
150
+ compute_imatrix: tokenization took 599.134 ms
151
+ compute_imatrix: computing over 812 chunks with batch_size 512
152
+ compute_imatrix: 4.10 seconds per pass - ETA 55.55 minutes
153
+ ===================================== llama_new_context_with_model: f16
154
+ ======================================= HAVE_FANCY_SIMD is defined
155
+ [1]92.2870,[2]15.6185,[3]9.0021,[4]5.2226,[5]3.8316,[6]3.1180,[7]2.6999,[8]2.4021,[9]2.2278,
156
+ save_imatrix: entry ' blk.43.ffn_up_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
157
+ save_imatrix: entry ' blk.42.ffn_down_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
158
+ save_imatrix: entry ' blk.39.ffn_gate_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
159
+ save_imatrix: entry ' blk.38.ffn_gate_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
160
+ save_imatrix: entry ' blk.39.ffn_down_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
161
+ save_imatrix: entry ' blk.37.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
162
+ save_imatrix: entry ' blk.36.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
163
+ save_imatrix: entry ' blk.40.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
164
+ save_imatrix: entry ' blk.35.ffn_down_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
165
+ save_imatrix: entry ' blk.35.ffn_gate_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
166
+ save_imatrix: entry ' blk.34.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
167
+ save_imatrix: entry ' blk.34.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
168
+ save_imatrix: entry ' blk.33.ffn_down_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
169
+ save_imatrix: entry ' blk.33.ffn_gate_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
170
+ save_imatrix: entry ' blk.39.ffn_up_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
171
+ save_imatrix: entry ' blk.32.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
172
+ save_imatrix: entry ' blk.32.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
173
+ save_imatrix: entry ' blk.34.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
174
+ save_imatrix: entry ' blk.31.ffn_down_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
175
+ save_imatrix: entry ' blk.31.ffn_gate_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
176
+ save_imatrix: entry ' blk.40.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
177
+ save_imatrix: entry ' blk.43.ffn_gate_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
178
+ save_imatrix: entry ' blk.30.ffn_down_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
179
+ save_imatrix: entry ' blk.30.ffn_gate_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
180
+ save_imatrix: entry ' blk.29.ffn_gate_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
181
+ save_imatrix: entry ' blk.29.ffn_up_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
182
+ save_imatrix: entry ' blk.42.ffn_up_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
183
+ save_imatrix: entry ' blk.28.ffn_gate_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
184
+ save_imatrix: entry ' blk.28.ffn_up_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
185
+ save_imatrix: entry ' blk.43.ffn_down_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
186
+ save_imatrix: entry ' blk.31.ffn_up_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
187
+ save_imatrix: entry ' blk.27.ffn_gate_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
188
+ save_imatrix: entry ' blk.26.ffn_gate_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
189
+ save_imatrix: entry ' blk.26.ffn_up_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
190
+ save_imatrix: entry ' blk.36.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
191
+ save_imatrix: entry ' blk.24.ffn_down_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
192
+ save_imatrix: entry ' blk.24.ffn_gate_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
193
+ save_imatrix: entry ' blk.28.ffn_down_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
194
+ save_imatrix: entry ' blk.23.ffn_down_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
195
+ save_imatrix: entry ' blk.23.ffn_gate_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
196
+ save_imatrix: entry ' blk.23.ffn_up_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
197
+ save_imatrix: entry ' blk.38.ffn_up_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
198
+ save_imatrix: entry ' blk.22.ffn_down_exps.weight' has partial data (89.58%) 30 out of 288 experts are missing data - skipping
199
+ save_imatrix: entry ' blk.22.ffn_gate_exps.weight' has partial data (89.58%) 30 out of 288 experts are missing data - skipping
200
+ save_imatrix: entry ' blk.25.ffn_gate_exps.weight' has partial data (90.97%) 26 out of 288 experts are missing data - skipping
201
+ save_imatrix: entry ' blk.15.ffn_down_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
202
+ save_imatrix: entry ' blk.7.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
203
+ save_imatrix: entry ' blk.11.ffn_up_exps.weight' has partial data (89.93%) 29 out of 288 experts are missing data - skipping
204
+ save_imatrix: entry ' blk.6.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
205
+ save_imatrix: entry ' blk.20.ffn_down_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
206
+ save_imatrix: entry ' blk.11.ffn_down_exps.weight' has partial data (89.93%) 29 out of 288 experts are missing data - skipping
207
+ save_imatrix: entry ' blk.16.ffn_up_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
208
+ save_imatrix: entry ' blk.41.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
209
+ save_imatrix: entry ' blk.33.ffn_up_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
210
+ save_imatrix: entry ' blk.4.ffn_up_exps.weight' has partial data (82.99%) 49 out of 288 experts are missing data - skipping
211
+ save_imatrix: entry ' blk.29.ffn_down_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
212
+ save_imatrix: entry ' blk.8.ffn_up_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
213
+ save_imatrix: entry ' blk.10.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
214
+ save_imatrix: entry ' blk.3.ffn_up_exps.weight' has partial data (99.31%) 2 out of 288 experts are missing data Storing **but be aware**
215
+ save_imatrix: entry ' blk.6.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
216
+ save_imatrix: entry ' blk.37.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
217
+ save_imatrix: entry ' blk.9.ffn_gate_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
218
+ save_imatrix: entry ' blk.36.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
219
+ save_imatrix: entry ' blk.3.ffn_down_exps.weight' has partial data (99.31%) 2 out of 288 experts are missing data Storing **but be aware**
220
+ save_imatrix: entry ' blk.12.ffn_down_exps.weight' has partial data (88.54%) 33 out of 288 experts are missing data - skipping
221
+ save_imatrix: entry ' blk.21.ffn_down_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
222
+ save_imatrix: entry ' blk.27.ffn_up_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
223
+ save_imatrix: entry ' blk.41.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
224
+ save_imatrix: entry ' blk.12.ffn_gate_exps.weight' has partial data (88.54%) 33 out of 288 experts are missing data - skipping
225
+ save_imatrix: entry ' blk.38.ffn_down_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
226
+ save_imatrix: entry ' blk.44.ffn_up_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
227
+ save_imatrix: entry ' blk.4.ffn_gate_exps.weight' has partial data (82.99%) 49 out of 288 experts are missing data - skipping
228
+ save_imatrix: entry ' blk.19.ffn_up_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
229
+ save_imatrix: entry ' blk.13.ffn_up_exps.weight' has partial data (83.33%) 48 out of 288 experts are missing data - skipping
230
+ save_imatrix: entry ' blk.44.ffn_down_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
231
+ save_imatrix: entry ' blk.7.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
232
+ save_imatrix: entry ' blk.30.ffn_up_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
233
+ save_imatrix: entry ' blk.5.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
234
+ save_imatrix: entry ' blk.18.ffn_up_exps.weight' has partial data (90.62%) 27 out of 288 experts are missing data - skipping
235
+ save_imatrix: entry ' blk.4.ffn_down_exps.weight' has partial data (82.99%) 49 out of 288 experts are missing data - skipping
236
+ save_imatrix: entry ' blk.17.ffn_up_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
237
+ save_imatrix: entry ' blk.41.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
238
+ save_imatrix: entry ' blk.9.ffn_down_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
239
+ save_imatrix: entry ' blk.25.ffn_up_exps.weight' has partial data (90.97%) 26 out of 288 experts are missing data - skipping
240
+ save_imatrix: entry ' blk.3.ffn_gate_exps.weight' has partial data (99.31%) 2 out of 288 experts are missing data Storing **but be aware**
241
+ save_imatrix: entry ' blk.8.ffn_gate_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
242
+ save_imatrix: entry ' blk.9.ffn_up_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
243
+ save_imatrix: entry ' blk.5.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
244
+ save_imatrix: entry ' blk.13.ffn_down_exps.weight' has partial data (83.33%) 48 out of 288 experts are missing data - skipping
245
+ save_imatrix: entry ' blk.16.ffn_gate_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
246
+ save_imatrix: entry ' blk.27.ffn_down_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
247
+ save_imatrix: entry ' blk.26.ffn_down_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
248
+ save_imatrix: entry ' blk.5.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
249
+ save_imatrix: entry ' blk.11.ffn_gate_exps.weight' has partial data (89.93%) 29 out of 288 experts are missing data - skipping
250
+ save_imatrix: entry ' blk.37.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
251
+ save_imatrix: entry ' blk.18.ffn_gate_exps.weight' has partial data (90.62%) 27 out of 288 experts are missing data - skipping
252
+ save_imatrix: entry ' blk.20.ffn_gate_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
253
+ save_imatrix: entry ' blk.13.ffn_gate_exps.weight' has partial data (83.33%) 48 out of 288 experts are missing data - skipping
254
+ save_imatrix: entry ' blk.40.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
255
+ save_imatrix: entry ' blk.14.ffn_up_exps.weight' has partial data (87.85%) 35 out of 288 experts are missing data - skipping
256
+ save_imatrix: entry ' blk.10.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
257
+ save_imatrix: entry ' blk.14.ffn_gate_exps.weight' has partial data (87.85%) 35 out of 288 experts are missing data - skipping
258
+ save_imatrix: entry ' blk.14.ffn_down_exps.weight' has partial data (87.85%) 35 out of 288 experts are missing data - skipping
259
+ save_imatrix: entry ' blk.8.ffn_down_exps.weight' has partial data (92.36%) 22 out of 288 experts are missing data - skipping
260
+ save_imatrix: entry ' blk.24.ffn_up_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
261
+ save_imatrix: entry ' blk.12.ffn_up_exps.weight' has partial data (88.54%) 33 out of 288 experts are missing data - skipping
262
+ save_imatrix: entry ' blk.42.ffn_gate_exps.weight' has partial data (88.89%) 32 out of 288 experts are missing data - skipping
263
+ save_imatrix: entry ' blk.10.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
264
+ save_imatrix: entry ' blk.15.ffn_up_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
265
+ save_imatrix: entry ' blk.15.ffn_gate_exps.weight' has partial data (90.28%) 28 out of 288 experts are missing data - skipping
266
+ save_imatrix: entry ' blk.16.ffn_down_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
267
+ save_imatrix: entry ' blk.17.ffn_gate_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
268
+ save_imatrix: entry ' blk.35.ffn_up_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
269
+ save_imatrix: entry ' blk.17.ffn_down_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
270
+ save_imatrix: entry ' blk.18.ffn_down_exps.weight' has partial data (90.62%) 27 out of 288 experts are missing data - skipping
271
+ save_imatrix: entry ' blk.21.ffn_up_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
272
+ save_imatrix: entry ' blk.25.ffn_down_exps.weight' has partial data (90.97%) 26 out of 288 experts are missing data - skipping
273
+ save_imatrix: entry ' blk.6.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
274
+ save_imatrix: entry ' blk.19.ffn_gate_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
275
+ save_imatrix: entry ' blk.19.ffn_down_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
276
+ save_imatrix: entry ' blk.20.ffn_up_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
277
+ save_imatrix: entry ' blk.22.ffn_up_exps.weight' has partial data (89.58%) 30 out of 288 experts are missing data - skipping
278
+ save_imatrix: entry ' blk.21.ffn_gate_exps.weight' has partial data (89.24%) 31 out of 288 experts are missing data - skipping
279
+ save_imatrix: entry ' blk.44.ffn_gate_exps.weight' has partial data (91.67%) 24 out of 288 experts are missing data - skipping
280
+ save_imatrix: entry ' blk.7.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
281
+ save_imatrix: entry ' blk.32.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
282
+ save_imatrix: warning: storing only 418 out of 529 entries
283
+
284
+ save_imatrix: stored collected data after 10 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
285
+ [10]2.1021,[11]2.3311,[12]2.4035,[13]2.3973,[14]2.4537,[15]2.3408,[16]2.2269,[17]2.1399,[18]2.0725,[19]2.0198,
286
+ save_imatrix: entry ' blk.43.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
287
+ save_imatrix: entry ' blk.42.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
288
+ save_imatrix: entry ' blk.37.ffn_gate_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
289
+ save_imatrix: entry ' blk.36.ffn_down_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
290
+ save_imatrix: entry ' blk.35.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
291
+ save_imatrix: entry ' blk.35.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
292
+ save_imatrix: entry ' blk.34.ffn_gate_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
293
+ save_imatrix: entry ' blk.34.ffn_up_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
294
+ save_imatrix: entry ' blk.33.ffn_down_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
295
+ save_imatrix: entry ' blk.33.ffn_gate_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
296
+ save_imatrix: entry ' blk.32.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
297
+ save_imatrix: entry ' blk.32.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
298
+ save_imatrix: entry ' blk.34.ffn_down_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
299
+ save_imatrix: entry ' blk.31.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
300
+ save_imatrix: entry ' blk.31.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
301
+ save_imatrix: entry ' blk.43.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
302
+ save_imatrix: entry ' blk.30.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
303
+ save_imatrix: entry ' blk.30.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
304
+ save_imatrix: entry ' blk.29.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
305
+ save_imatrix: entry ' blk.29.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
306
+ save_imatrix: entry ' blk.42.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
307
+ save_imatrix: entry ' blk.28.ffn_gate_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
308
+ save_imatrix: entry ' blk.28.ffn_up_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
309
+ save_imatrix: entry ' blk.43.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
310
+ save_imatrix: entry ' blk.31.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
311
+ save_imatrix: entry ' blk.27.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
312
+ save_imatrix: entry ' blk.26.ffn_gate_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
313
+ save_imatrix: entry ' blk.26.ffn_up_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
314
+ save_imatrix: entry ' blk.36.ffn_up_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
315
+ save_imatrix: entry ' blk.24.ffn_down_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
316
+ save_imatrix: entry ' blk.24.ffn_gate_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
317
+ save_imatrix: entry ' blk.28.ffn_down_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
318
+ save_imatrix: entry ' blk.23.ffn_down_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
319
+ save_imatrix: entry ' blk.23.ffn_gate_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
320
+ save_imatrix: entry ' blk.23.ffn_up_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
321
+ save_imatrix: entry ' blk.22.ffn_down_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
322
+ save_imatrix: entry ' blk.22.ffn_gate_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
323
+ save_imatrix: entry ' blk.25.ffn_gate_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
324
+ save_imatrix: entry ' blk.15.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
325
+ save_imatrix: entry ' blk.11.ffn_up_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
326
+ save_imatrix: entry ' blk.6.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
327
+ save_imatrix: entry ' blk.20.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
328
+ save_imatrix: entry ' blk.11.ffn_down_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
329
+ save_imatrix: entry ' blk.16.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
330
+ save_imatrix: entry ' blk.41.ffn_down_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
331
+ save_imatrix: entry ' blk.33.ffn_up_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
332
+ save_imatrix: entry ' blk.4.ffn_up_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
333
+ save_imatrix: entry ' blk.29.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
334
+ save_imatrix: entry ' blk.8.ffn_up_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
335
+ save_imatrix: entry ' blk.10.ffn_gate_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
336
+ save_imatrix: entry ' blk.6.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
337
+ save_imatrix: entry ' blk.37.ffn_down_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
338
+ save_imatrix: entry ' blk.9.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
339
+ save_imatrix: entry ' blk.36.ffn_gate_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
340
+ save_imatrix: entry ' blk.12.ffn_down_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
341
+ save_imatrix: entry ' blk.21.ffn_down_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
342
+ save_imatrix: entry ' blk.27.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
343
+ save_imatrix: entry ' blk.41.ffn_gate_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
344
+ save_imatrix: entry ' blk.12.ffn_gate_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
345
+ save_imatrix: entry ' blk.44.ffn_up_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
346
+ save_imatrix: entry ' blk.4.ffn_gate_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
347
+ save_imatrix: entry ' blk.19.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
348
+ save_imatrix: entry ' blk.13.ffn_up_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
349
+ save_imatrix: entry ' blk.44.ffn_down_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
350
+ save_imatrix: entry ' blk.30.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
351
+ save_imatrix: entry ' blk.5.ffn_down_exps.weight' has partial data (98.61%) 4 out of 288 experts are missing data Storing **but be aware**
352
+ save_imatrix: entry ' blk.18.ffn_up_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
353
+ save_imatrix: entry ' blk.4.ffn_down_exps.weight' has partial data (93.06%) 20 out of 288 experts are missing data - skipping
354
+ save_imatrix: entry ' blk.17.ffn_up_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
355
+ save_imatrix: entry ' blk.41.ffn_up_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
356
+ save_imatrix: entry ' blk.9.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
357
+ save_imatrix: entry ' blk.25.ffn_up_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
358
+ save_imatrix: entry ' blk.8.ffn_gate_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
359
+ save_imatrix: entry ' blk.9.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
360
+ save_imatrix: entry ' blk.5.ffn_up_exps.weight' has partial data (98.61%) 4 out of 288 experts are missing data Storing **but be aware**
361
+ save_imatrix: entry ' blk.13.ffn_down_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
362
+ save_imatrix: entry ' blk.16.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
363
+ save_imatrix: entry ' blk.27.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
364
+ save_imatrix: entry ' blk.26.ffn_down_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
365
+ save_imatrix: entry ' blk.5.ffn_gate_exps.weight' has partial data (98.61%) 4 out of 288 experts are missing data Storing **but be aware**
366
+ save_imatrix: entry ' blk.11.ffn_gate_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
367
+ save_imatrix: entry ' blk.37.ffn_up_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
368
+ save_imatrix: entry ' blk.18.ffn_gate_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
369
+ save_imatrix: entry ' blk.20.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
370
+ save_imatrix: entry ' blk.13.ffn_gate_exps.weight' has partial data (91.32%) 25 out of 288 experts are missing data - skipping
371
+ save_imatrix: entry ' blk.14.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
372
+ save_imatrix: entry ' blk.10.ffn_down_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
373
+ save_imatrix: entry ' blk.14.ffn_gate_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
374
+ save_imatrix: entry ' blk.14.ffn_down_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
375
+ save_imatrix: entry ' blk.8.ffn_down_exps.weight' has partial data (96.88%) 9 out of 288 experts are missing data Storing **but be aware**
376
+ save_imatrix: entry ' blk.24.ffn_up_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
377
+ save_imatrix: entry ' blk.12.ffn_up_exps.weight' has partial data (93.40%) 19 out of 288 experts are missing data - skipping
378
+ save_imatrix: entry ' blk.42.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
379
+ save_imatrix: entry ' blk.10.ffn_up_exps.weight' has partial data (97.57%) 7 out of 288 experts are missing data Storing **but be aware**
380
+ save_imatrix: entry ' blk.15.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
381
+ save_imatrix: entry ' blk.15.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
382
+ save_imatrix: entry ' blk.16.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
383
+ save_imatrix: entry ' blk.17.ffn_gate_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
384
+ save_imatrix: entry ' blk.35.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
385
+ save_imatrix: entry ' blk.17.ffn_down_exps.weight' has partial data (94.10%) 17 out of 288 experts are missing data - skipping
386
+ save_imatrix: entry ' blk.18.ffn_down_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
387
+ save_imatrix: entry ' blk.21.ffn_up_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
388
+ save_imatrix: entry ' blk.25.ffn_down_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
389
+ save_imatrix: entry ' blk.6.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
390
+ save_imatrix: entry ' blk.19.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
391
+ save_imatrix: entry ' blk.19.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
392
+ save_imatrix: entry ' blk.20.ffn_up_exps.weight' has partial data (93.75%) 18 out of 288 experts are missing data - skipping
393
+ save_imatrix: entry ' blk.22.ffn_up_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
394
+ save_imatrix: entry ' blk.21.ffn_gate_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
395
+ save_imatrix: entry ' blk.44.ffn_gate_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
396
+ save_imatrix: entry ' blk.32.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
397
+ save_imatrix: warning: storing only 478 out of 529 entries
398
+
399
+ save_imatrix: stored collected data after 20 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
400
+ [20]1.9686,[21]1.9365,[22]1.8869,[23]1.8609,[24]1.8847,[25]1.8792,[26]1.8445,[27]1.9620,[28]2.0677,[29]2.1517,
401
+ save_imatrix: entry ' blk.43.ffn_up_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
402
+ save_imatrix: entry ' blk.42.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
403
+ save_imatrix: entry ' blk.43.ffn_gate_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
404
+ save_imatrix: entry ' blk.30.ffn_down_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
405
+ save_imatrix: entry ' blk.30.ffn_gate_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
406
+ save_imatrix: entry ' blk.29.ffn_gate_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
407
+ save_imatrix: entry ' blk.29.ffn_up_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
408
+ save_imatrix: entry ' blk.42.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
409
+ save_imatrix: entry ' blk.28.ffn_gate_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
410
+ save_imatrix: entry ' blk.28.ffn_up_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
411
+ save_imatrix: entry ' blk.43.ffn_down_exps.weight' has partial data (94.79%) 15 out of 288 experts are missing data - skipping
412
+ save_imatrix: entry ' blk.24.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
413
+ save_imatrix: entry ' blk.24.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
414
+ save_imatrix: entry ' blk.28.ffn_down_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
415
+ save_imatrix: entry ' blk.23.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
416
+ save_imatrix: entry ' blk.23.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
417
+ save_imatrix: entry ' blk.23.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
418
+ save_imatrix: entry ' blk.25.ffn_gate_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
419
+ save_imatrix: entry ' blk.15.ffn_down_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
420
+ save_imatrix: entry ' blk.20.ffn_down_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
421
+ save_imatrix: entry ' blk.4.ffn_up_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
422
+ save_imatrix: entry ' blk.29.ffn_down_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
423
+ save_imatrix: entry ' blk.12.ffn_down_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
424
+ save_imatrix: entry ' blk.21.ffn_down_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
425
+ save_imatrix: entry ' blk.12.ffn_gate_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
426
+ save_imatrix: entry ' blk.4.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
427
+ save_imatrix: entry ' blk.13.ffn_up_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
428
+ save_imatrix: entry ' blk.30.ffn_up_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
429
+ save_imatrix: entry ' blk.18.ffn_up_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
430
+ save_imatrix: entry ' blk.4.ffn_down_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
431
+ save_imatrix: entry ' blk.17.ffn_up_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
432
+ save_imatrix: entry ' blk.25.ffn_up_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
433
+ save_imatrix: entry ' blk.13.ffn_down_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
434
+ save_imatrix: entry ' blk.18.ffn_gate_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
435
+ save_imatrix: entry ' blk.20.ffn_gate_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
436
+ save_imatrix: entry ' blk.13.ffn_gate_exps.weight' has partial data (92.71%) 21 out of 288 experts are missing data - skipping
437
+ save_imatrix: entry ' blk.14.ffn_up_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
438
+ save_imatrix: entry ' blk.14.ffn_gate_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
439
+ save_imatrix: entry ' blk.14.ffn_down_exps.weight' has partial data (94.44%) 16 out of 288 experts are missing data - skipping
440
+ save_imatrix: entry ' blk.24.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
441
+ save_imatrix: entry ' blk.12.ffn_up_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
442
+ save_imatrix: entry ' blk.42.ffn_gate_exps.weight' has partial data (95.14%) 14 out of 288 experts are missing data - skipping
443
+ save_imatrix: entry ' blk.15.ffn_up_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
444
+ save_imatrix: entry ' blk.15.ffn_gate_exps.weight' has partial data (95.49%) 13 out of 288 experts are missing data Storing **but be aware**
445
+ save_imatrix: entry ' blk.17.ffn_gate_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
446
+ save_imatrix: entry ' blk.17.ffn_down_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
447
+ save_imatrix: entry ' blk.18.ffn_down_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
448
+ save_imatrix: entry ' blk.21.ffn_up_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
449
+ save_imatrix: entry ' blk.25.ffn_down_exps.weight' has partial data (97.22%) 8 out of 288 experts are missing data Storing **but be aware**
450
+ save_imatrix: entry ' blk.20.ffn_up_exps.weight' has partial data (95.83%) 12 out of 288 experts are missing data Storing **but be aware**
451
+ save_imatrix: entry ' blk.21.ffn_gate_exps.weight' has partial data (96.53%) 10 out of 288 experts are missing data Storing **but be aware**
452
+ save_imatrix: warning: storing only 511 out of 529 entries
453
+
454
+ save_imatrix: stored collected data after 30 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
455
+ [30]2.1645,[31]2.1882,[32]2.1900,[33]2.1667,[34]2.2092,[35]2.2241,[36]2.2535,[37]2.2534,[38]2.3061,[39]2.2955,
456
+ save_imatrix: entry ' blk.43.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
457
+ save_imatrix: entry ' blk.42.ffn_down_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
458
+ save_imatrix: entry ' blk.43.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
459
+ save_imatrix: entry ' blk.42.ffn_up_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
460
+ save_imatrix: entry ' blk.43.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
461
+ save_imatrix: entry ' blk.23.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
462
+ save_imatrix: entry ' blk.23.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
463
+ save_imatrix: entry ' blk.23.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
464
+ save_imatrix: entry ' blk.4.ffn_up_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
465
+ save_imatrix: entry ' blk.4.ffn_gate_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
466
+ save_imatrix: entry ' blk.13.ffn_up_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
467
+ save_imatrix: entry ' blk.4.ffn_down_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
468
+ save_imatrix: entry ' blk.13.ffn_down_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
469
+ save_imatrix: entry ' blk.13.ffn_gate_exps.weight' has partial data (96.18%) 11 out of 288 experts are missing data Storing **but be aware**
470
+ save_imatrix: entry ' blk.14.ffn_up_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
471
+ save_imatrix: entry ' blk.14.ffn_gate_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
472
+ save_imatrix: entry ' blk.14.ffn_down_exps.weight' has partial data (97.92%) 6 out of 288 experts are missing data Storing **but be aware**
473
+ save_imatrix: entry ' blk.42.ffn_gate_exps.weight' has partial data (98.26%) 5 out of 288 experts are missing data Storing **but be aware**
474
+
475
+ save_imatrix: stored collected data after 40 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
476
+ [40]2.3248,[41]2.3142,[42]2.2977,[43]2.3104,[44]2.3077,[45]2.3022,[46]2.3080,[47]2.2986,[48]2.2730,[49]2.2517,
477
+ save_imatrix: stored collected data after 50 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
478
+ [50]2.2370,[51]2.2344,[52]2.2286,[53]2.2301,[54]2.2405,[55]2.2243,[56]2.2017,[57]2.2026,[58]2.1997,[59]2.2053,
479
+ save_imatrix: stored collected data after 60 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
480
+ [60]2.1886,[61]2.2348,[62]2.2826,[63]2.3263,[64]2.3770,[65]2.4355,[66]2.4710,[67]2.5238,[68]2.5784,[69]2.6394,
481
+ save_imatrix: stored collected data after 70 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
482
+ [70]2.7132,[71]2.7529,[72]2.7870,[73]2.8046,[74]2.8227,[75]2.8730,[76]2.9207,[77]2.9354,[78]2.9547,[79]2.9834,
483
+ save_imatrix: stored collected data after 80 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
484
+ [80]3.0243,[81]3.0668,[82]3.1158,[83]3.1211,[84]3.1977,[85]3.2135,[86]3.2150,[87]3.2843,[88]3.3456,[89]3.4204,
485
+ save_imatrix: stored collected data after 90 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
486
+ [90]3.4383,[91]3.4328,[92]3.4361,[93]3.4487,[94]3.4527,[95]3.4903,[96]3.4952,[97]3.5381,[98]3.5657,[99]3.5372,
487
+ save_imatrix: stored collected data after 100 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
488
+ [100]3.5712,[101]3.6255,[102]3.6591,[103]3.7021,[104]3.7340,[105]3.7678,[106]3.8049,[107]3.7880,[108]3.7927,[109]3.7995,
489
+ save_imatrix: stored collected data after 110 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
490
+ [110]3.8065,[111]3.7946,[112]3.8325,[113]3.8564,[114]3.8682,[115]3.8427,[116]3.8048,[117]3.7913,[118]3.7995,[119]3.7751,
491
+ save_imatrix: stored collected data after 120 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
492
+ [120]3.7521,[121]3.7361,[122]3.7214,[123]3.7211,[124]3.7221,[125]3.7333,[126]3.7428,[127]3.7639,[128]3.7981,[129]3.8097,
493
+ save_imatrix: stored collected data after 130 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
494
+ [130]3.7767,[131]3.7424,[132]3.7104,[133]3.6785,[134]3.6792,[135]3.6723,[136]3.7012,[137]3.7375,[138]3.7522,[139]3.7524,
495
+ save_imatrix: stored collected data after 140 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
496
+ [140]3.7753,[141]3.8038,[142]3.8356,[143]3.8474,[144]3.8692,[145]3.8896,[146]3.9076,[147]3.9221,[148]3.9314,[149]3.9288,
497
+ save_imatrix: stored collected data after 150 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
498
+ [150]3.9329,[151]3.9521,[152]3.9701,[153]3.9697,[154]3.9747,[155]3.9853,[156]3.9910,[157]3.9969,[158]4.0021,[159]4.0095,
499
+ save_imatrix: stored collected data after 160 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
500
+ [160]4.0235,[161]4.0243,[162]4.0248,[163]4.0299,[164]4.0368,[165]4.0363,[166]4.0330,[167]4.0539,[168]4.0627,[169]4.0696,
501
+ save_imatrix: stored collected data after 170 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
502
+ [170]4.0906,[171]4.1067,[172]4.1002,[173]4.1049,[174]4.1072,[175]4.1209,[176]4.1278,[177]4.1407,[178]4.1391,[179]4.1392,
503
+ save_imatrix: stored collected data after 180 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
504
+ [180]4.1375,[181]4.1371,[182]4.1347,[183]4.1326,[184]4.1200,[185]4.1316,[186]4.1609,[187]4.1892,[188]4.2153,[189]4.2400,
505
+ save_imatrix: stored collected data after 190 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
506
+ [190]4.2773,[191]4.2866,[192]4.2995,[193]4.2804,[194]4.2936,[195]4.2837,[196]4.2593,[197]4.2321,[198]4.2527,[199]4.2750,
507
+ save_imatrix: stored collected data after 200 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
508
+ [200]4.2825,[201]4.2905,[202]4.3070,[203]4.3250,[204]4.3391,[205]4.3518,[206]4.3650,[207]4.3586,[208]4.3318,[209]4.3069,
509
+ save_imatrix: stored collected data after 210 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
510
+ [210]4.2806,[211]4.2549,[212]4.2300,[213]4.2044,[214]4.2076,[215]4.2338,[216]4.2205,[217]4.2112,[218]4.2377,[219]4.2507,
511
+ save_imatrix: stored collected data after 220 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
512
+ [220]4.2713,[221]4.2947,[222]4.3132,[223]4.3261,[224]4.3557,[225]4.3644,[226]4.3954,[227]4.4296,[228]4.4535,[229]4.4635,
513
+ save_imatrix: stored collected data after 230 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
514
+ [230]4.4711,[231]4.4786,[232]4.5008,[233]4.5066,[234]4.5144,[235]4.5423,[236]4.5473,[237]4.5815,[238]4.6125,[239]4.6244,
515
+ save_imatrix: stored collected data after 240 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
516
+ [240]4.6367,[241]4.6566,[242]4.6653,[243]4.6757,[244]4.6927,[245]4.7105,[246]4.7363,[247]4.7391,[248]4.7495,[249]4.7627,
517
+ save_imatrix: stored collected data after 250 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
518
+ [250]4.7765,[251]4.7811,[252]4.7926,[253]4.8027,[254]4.8115,[255]4.8230,[256]4.8389,[257]4.8518,[258]4.8654,[259]4.8754,
519
+ save_imatrix: stored collected data after 260 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
520
+ [260]4.8781,[261]4.8896,[262]4.8903,[263]4.9069,[264]4.9292,[265]4.9484,[266]4.9674,[267]4.9801,[268]4.9860,[269]4.9948,
521
+ save_imatrix: stored collected data after 270 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
522
+ [270]5.0074,[271]5.0283,[272]5.0497,[273]5.0671,[274]5.0720,[275]5.0726,[276]5.0887,[277]5.0971,[278]5.1116,[279]5.1267,
523
+ save_imatrix: stored collected data after 280 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
524
+ [280]5.1272,[281]5.1285,[282]5.1367,[283]5.1374,[284]5.1515,[285]5.1579,[286]5.1643,[287]5.1887,[288]5.2028,[289]5.2193,
525
+ save_imatrix: stored collected data after 290 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
526
+ [290]5.2383,[291]5.2501,[292]5.2751,[293]5.2875,[294]5.3043,[295]5.3194,[296]5.3327,[297]5.3388,[298]5.3604,[299]5.3687,
527
+ save_imatrix: stored collected data after 300 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
528
+ [300]5.3728,[301]5.3898,[302]5.4105,[303]5.4165,[304]5.4243,[305]5.4300,[306]5.4394,[307]5.4488,[308]5.4525,[309]5.4703,
529
+ save_imatrix: stored collected data after 310 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
530
+ [310]5.4786,[311]5.4932,[312]5.5118,[313]5.5282,[314]5.5483,[315]5.5213,[316]5.5219,[317]5.4985,[318]5.5149,[319]5.5228,
531
+ save_imatrix: stored collected data after 320 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
532
+ [320]5.5227,[321]5.5188,[322]5.5330,[323]5.5465,[324]5.5547,[325]5.5643,[326]5.5650,[327]5.5811,[328]5.5871,[329]5.6020,
533
+ save_imatrix: stored collected data after 330 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
534
+ [330]5.6110,[331]5.6185,[332]5.6276,[333]5.5984,[334]5.6102,[335]5.6322,[336]5.6522,[337]5.6740,[338]5.6883,[339]5.7097,
535
+ save_imatrix: stored collected data after 340 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
536
+ [340]5.7121,[341]5.7121,[342]5.7181,[343]5.7259,[344]5.7456,[345]5.7737,[346]5.7669,[347]5.7665,[348]5.7747,[349]5.7696,
537
+ save_imatrix: stored collected data after 350 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
538
+ [350]5.7718,[351]5.7753,[352]5.7696,[353]5.7763,[354]5.7891,[355]5.7864,[356]5.7856,[357]5.7662,[358]5.7433,[359]5.7300,
539
+ save_imatrix: stored collected data after 360 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
540
+ [360]5.7155,[361]5.6990,[362]5.6893,[363]5.6717,[364]5.6643,[365]5.6480,[366]5.6478,[367]5.6308,[368]5.6273,[369]5.6031,
541
+ save_imatrix: stored collected data after 370 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
542
+ [370]5.5829,[371]5.5735,[372]5.5600,[373]5.5399,[374]5.5243,[375]5.5158,[376]5.4972,[377]5.4881,[378]5.4873,[379]5.4862,
543
+ save_imatrix: stored collected data after 380 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
544
+ [380]5.4784,[381]5.4703,[382]5.4476,[383]5.4259,[384]5.4148,[385]5.4007,[386]5.3796,[387]5.3563,[388]5.3332,[389]5.3184,
545
+ save_imatrix: stored collected data after 390 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
546
+ [390]5.3114,[391]5.3143,[392]5.3078,[393]5.3050,[394]5.2956,[395]5.2810,[396]5.2605,[397]5.2439,[398]5.2358,[399]5.2181,
547
+ save_imatrix: stored collected data after 400 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
548
+ [400]5.2028,[401]5.1889,[402]5.1764,[403]5.1653,[404]5.1494,[405]5.1340,[406]5.1233,[407]5.1054,[408]5.0884,[409]5.0733,
549
+ save_imatrix: stored collected data after 410 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
550
+ [410]5.0606,[411]5.0529,[412]5.0454,[413]5.0384,[414]5.0269,[415]5.0170,[416]4.9984,[417]4.9798,[418]4.9609,[419]4.9444,
551
+ save_imatrix: stored collected data after 420 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
552
+ [420]4.9271,[421]4.9130,[422]4.8962,[423]4.8795,[424]4.8668,[425]4.8511,[426]4.8386,[427]4.8284,[428]4.8148,[429]4.7992,
553
+ save_imatrix: stored collected data after 430 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
554
+ [430]4.7833,[431]4.7701,[432]4.7664,[433]4.7576,[434]4.7630,[435]4.7521,[436]4.7382,[437]4.7269,[438]4.7143,[439]4.7061,
555
+ save_imatrix: stored collected data after 440 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
556
+ [440]4.6956,[441]4.6814,[442]4.6741,[443]4.6630,[444]4.6612,[445]4.6517,[446]4.6430,[447]4.6422,[448]4.6334,[449]4.6247,
557
+ save_imatrix: stored collected data after 450 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
558
+ [450]4.6132,[451]4.6061,[452]4.5944,[453]4.5835,[454]4.5715,[455]4.5610,[456]4.5472,[457]4.5362,[458]4.5256,[459]4.5127,
559
+ save_imatrix: stored collected data after 460 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
560
+ [460]4.5012,[461]4.4927,[462]4.4892,[463]4.4768,[464]4.4725,[465]4.4667,[466]4.4614,[467]4.4546,[468]4.4480,[469]4.4419,
561
+ save_imatrix: stored collected data after 470 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
562
+ [470]4.4352,[471]4.4286,[472]4.4222,[473]4.4156,[474]4.4099,[475]4.4034,[476]4.3970,[477]4.3924,[478]4.3810,[479]4.3720,
563
+ save_imatrix: stored collected data after 480 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
564
+ [480]4.3595,[481]4.3528,[482]4.3495,[483]4.3489,[484]4.3361,[485]4.3263,[486]4.3164,[487]4.3049,[488]4.2961,[489]4.2901,
565
+ save_imatrix: stored collected data after 490 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
566
+ [490]4.2823,[491]4.2764,[492]4.2675,[493]4.2613,[494]4.2508,[495]4.2465,[496]4.2389,[497]4.2302,[498]4.2205,[499]4.2204,
567
+ save_imatrix: stored collected data after 500 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
568
+ [500]4.2207,[501]4.2235,[502]4.2204,[503]4.2211,[504]4.2210,[505]4.2177,[506]4.2105,[507]4.2207,[508]4.2305,[509]4.2408,
569
+ save_imatrix: stored collected data after 510 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
570
+ [510]4.2493,[511]4.2572,[512]4.2653,[513]4.2721,[514]4.2796,[515]4.2847,[516]4.2921,[517]4.2970,[518]4.2971,[519]4.3143,
571
+ save_imatrix: stored collected data after 520 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
572
+ [520]4.3267,[521]4.3409,[522]4.3505,[523]4.3562,[524]4.3613,[525]4.3665,[526]4.3715,[527]4.3778,[528]4.3836,[529]4.3874,
573
+ save_imatrix: stored collected data after 530 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
574
+ [530]4.3930,[531]4.3979,[532]4.4008,[533]4.4036,[534]4.4078,[535]4.4051,[536]4.4066,[537]4.4142,[538]4.4193,[539]4.4241,
575
+ save_imatrix: stored collected data after 540 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
576
+ [540]4.4356,[541]4.4396,[542]4.4415,[543]4.4458,[544]4.4472,[545]4.4490,[546]4.4543,[547]4.4598,[548]4.4671,[549]4.4731,
577
+ save_imatrix: stored collected data after 550 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
578
+ [550]4.4794,[551]4.4874,[552]4.4927,[553]4.5002,[554]4.5035,[555]4.5077,[556]4.5118,[557]4.5194,[558]4.5195,[559]4.5247,
579
+ save_imatrix: stored collected data after 560 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
580
+ [560]4.5291,[561]4.5356,[562]4.5417,[563]4.5440,[564]4.5502,[565]4.5578,[566]4.5635,[567]4.5724,[568]4.5739,[569]4.5765,
581
+ save_imatrix: stored collected data after 570 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
582
+ [570]4.5767,[571]4.5807,[572]4.5769,[573]4.5728,[574]4.5711,[575]4.5738,[576]4.5735,[577]4.5762,[578]4.5760,[579]4.5796,
583
+ save_imatrix: stored collected data after 580 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
584
+ [580]4.5788,[581]4.5769,[582]4.5764,[583]4.5737,[584]4.5691,[585]4.5699,[586]4.5660,[587]4.5588,[588]4.5570,[589]4.5552,
585
+ save_imatrix: stored collected data after 590 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
586
+ [590]4.5495,[591]4.5451,[592]4.5398,[593]4.5345,[594]4.5309,[595]4.5299,[596]4.5265,[597]4.5261,[598]4.5232,[599]4.5186,
587
+ save_imatrix: stored collected data after 600 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
588
+ [600]4.5134,[601]4.5140,[602]4.5149,[603]4.5143,[604]4.5097,[605]4.5077,[606]4.5034,[607]4.5078,[608]4.5055,[609]4.5028,
589
+ save_imatrix: stored collected data after 610 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
590
+ [610]4.5024,[611]4.5070,[612]4.5081,[613]4.4981,[614]4.4911,[615]4.4818,[616]4.4730,[617]4.4655,[618]4.4565,[619]4.4458,
591
+ save_imatrix: stored collected data after 620 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
592
+ [620]4.4353,[621]4.4247,[622]4.4166,[623]4.4103,[624]4.4047,[625]4.4032,[626]4.3953,[627]4.3885,[628]4.3801,[629]4.3742,
593
+ save_imatrix: stored collected data after 630 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
594
+ [630]4.3734,[631]4.3750,[632]4.3698,[633]4.3645,[634]4.3603,[635]4.3512,[636]4.3435,[637]4.3355,[638]4.3274,[639]4.3192,
595
+ save_imatrix: stored collected data after 640 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
596
+ [640]4.3116,[641]4.3051,[642]4.2995,[643]4.2915,[644]4.2844,[645]4.2776,[646]4.2779,[647]4.2723,[648]4.2643,[649]4.2584,
597
+ save_imatrix: stored collected data after 650 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
598
+ [650]4.2522,[651]4.2453,[652]4.2373,[653]4.2301,[654]4.2238,[655]4.2182,[656]4.2116,[657]4.2119,[658]4.2109,[659]4.2123,
599
+ save_imatrix: stored collected data after 660 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
600
+ [660]4.2098,[661]4.2022,[662]4.1968,[663]4.1905,[664]4.1822,[665]4.1751,[666]4.1678,[667]4.1612,[668]4.1540,[669]4.1467,
601
+ save_imatrix: stored collected data after 670 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
602
+ [670]4.1401,[671]4.1332,[672]4.1270,[673]4.1204,[674]4.1135,[675]4.1059,[676]4.0989,[677]4.0931,[678]4.0862,[679]4.0799,
603
+ save_imatrix: stored collected data after 680 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
604
+ [680]4.0738,[681]4.0672,[682]4.0606,[683]4.0531,[684]4.0467,[685]4.0404,[686]4.0371,[687]4.0292,[688]4.0222,[689]4.0156,
605
+ save_imatrix: stored collected data after 690 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
606
+ [690]4.0082,[691]4.0020,[692]3.9976,[693]3.9952,[694]3.9912,[695]3.9880,[696]3.9845,[697]3.9813,[698]3.9780,[699]3.9749,
607
+ save_imatrix: stored collected data after 700 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
608
+ [700]3.9718,[701]3.9691,[702]3.9669,[703]3.9641,[704]3.9606,[705]3.9584,[706]3.9549,[707]3.9518,[708]3.9490,[709]3.9462,
609
+ save_imatrix: stored collected data after 710 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
610
+ [710]3.9466,[711]3.9471,[712]3.9477,[713]3.9479,[714]3.9485,[715]3.9479,[716]3.9493,[717]3.9502,[718]3.9503,[719]3.9497,
611
+ save_imatrix: stored collected data after 720 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
612
+ [720]3.9500,[721]3.9501,[722]3.9499,[723]3.9515,[724]3.9533,[725]3.9540,[726]3.9537,[727]3.9534,[728]3.9537,[729]3.9552,
613
+ save_imatrix: stored collected data after 730 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
614
+ [730]3.9562,[731]3.9560,[732]3.9550,[733]3.9541,[734]3.9559,[735]3.9573,[736]3.9575,[737]3.9584,[738]3.9593,[739]3.9593,
615
+ save_imatrix: stored collected data after 740 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
616
+ [740]3.9591,[741]3.9590,[742]3.9602,[743]3.9603,[744]3.9601,[745]3.9609,[746]3.9612,[747]3.9615,[748]3.9606,[749]3.9609,
617
+ save_imatrix: stored collected data after 750 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
618
+ [750]3.9599,[751]3.9609,[752]3.9604,[753]3.9600,[754]3.9608,[755]3.9604,[756]3.9608,[757]3.9618,[758]3.9614,[759]3.9625,
619
+ save_imatrix: stored collected data after 760 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
620
+ [760]3.9629,[761]3.9641,[762]3.9633,[763]3.9637,[764]3.9646,[765]3.9640,[766]3.9639,[767]3.9642,[768]3.9633,[769]3.9631,
621
+ save_imatrix: stored collected data after 770 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
622
+ [770]3.9635,[771]3.9626,[772]3.9624,[773]3.9617,[774]3.9616,[775]3.9632,[776]3.9632,[777]3.9638,[778]3.9640,[779]3.9624,
623
+ save_imatrix: stored collected data after 780 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
624
+ [780]3.9619,[781]3.9623,[782]3.9627,[783]3.9611,[784]3.9616,[785]3.9611,[786]3.9622,[787]3.9626,[788]3.9620,[789]3.9625,
625
+ save_imatrix: stored collected data after 790 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
626
+ [790]3.9628,[791]3.9645,[792]3.9663,[793]3.9661,[794]3.9649,[795]3.9648,[796]3.9660,[797]3.9665,[798]3.9659,[799]3.9668,
627
+ save_imatrix: stored collected data after 800 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
628
+ [800]3.9682,[801]3.9689,[802]3.9695,[803]3.9710,[804]3.9716,[805]3.9721,[806]3.9725,[807]3.9742,[808]3.9749,[809]3.9743,
629
+ save_imatrix: stored collected data after 810 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
630
+ [810]3.9744,[811]3.9747,[812]3.9755,
631
+ save_imatrix: stored collected data after 812 chunks in /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/imatrix-Step-3.5-Flash-BF16.dat
632
+
633
+ Final estimate: PPL = 3.9755 +/- 0.01997
634
+
635
+ ======================== sorted layer importances
636
+ 0: Layer 0, <cos_sim> = 0.191944
637
+ 1: Layer 44, <cos_sim> = 0.794719
638
+ 2: Layer 11, <cos_sim> = 0.880959
639
+ 3: Layer 15, <cos_sim> = 0.889736
640
+ 4: Layer 12, <cos_sim> = 0.892113
641
+ 5: Layer 19, <cos_sim> = 0.896638
642
+ 6: Layer 16, <cos_sim> = 0.902101
643
+ 7: Layer 14, <cos_sim> = 0.904488
644
+ 8: Layer 13, <cos_sim> = 0.904849
645
+ 9: Layer 18, <cos_sim> = 0.90949
646
+ 10: Layer 20, <cos_sim> = 0.912555
647
+ 11: Layer 17, <cos_sim> = 0.913825
648
+ 12: Layer 21, <cos_sim> = 0.916861
649
+ 13: Layer 43, <cos_sim> = 0.920321
650
+ 14: Layer 22, <cos_sim> = 0.920897
651
+ 15: Layer 7, <cos_sim> = 0.925641
652
+ 16: Layer 10, <cos_sim> = 0.928077
653
+ 17: Layer 9, <cos_sim> = 0.930262
654
+ 18: Layer 23, <cos_sim> = 0.930822
655
+ 19: Layer 8, <cos_sim> = 0.932862
656
+ 20: Layer 24, <cos_sim> = 0.936006
657
+ 21: Layer 3, <cos_sim> = 0.940002
658
+ 22: Layer 41, <cos_sim> = 0.945994
659
+ 23: Layer 25, <cos_sim> = 0.946426
660
+ 24: Layer 27, <cos_sim> = 0.946791
661
+ 25: Layer 42, <cos_sim> = 0.94737
662
+ 26: Layer 26, <cos_sim> = 0.948684
663
+ 27: Layer 36, <cos_sim> = 0.949698
664
+ 28: Layer 39, <cos_sim> = 0.949899
665
+ 29: Layer 37, <cos_sim> = 0.9515
666
+ 30: Layer 28, <cos_sim> = 0.951921
667
+ 31: Layer 38, <cos_sim> = 0.953373
668
+ 32: Layer 35, <cos_sim> = 0.955007
669
+ 33: Layer 29, <cos_sim> = 0.955639
670
+ 34: Layer 34, <cos_sim> = 0.955797
671
+ 35: Layer 31, <cos_sim> = 0.956181
672
+ 36: Layer 6, <cos_sim> = 0.956762
673
+ 37: Layer 33, <cos_sim> = 0.958702
674
+ 38: Layer 5, <cos_sim> = 0.959416
675
+ 39: Layer 40, <cos_sim> = 0.96006
676
+ 40: Layer 30, <cos_sim> = 0.960335
677
+ 41: Layer 32, <cos_sim> = 0.961425
678
+ 42: Layer 4, <cos_sim> = 0.963155
679
+ 43: Layer 1, <cos_sim> = 0.977383
680
+ 44: Layer 2, <cos_sim> = 0.981096
681
+
682
+ ======================== sorted attention importances
683
+ 0: Layer 3, <cos_sim> = 0.268473
684
+ 1: Layer 5, <cos_sim> = 0.445389
685
+ 2: Layer 1, <cos_sim> = 0.491229
686
+ 3: Layer 4, <cos_sim> = 0.507703
687
+ 4: Layer 2, <cos_sim> = 0.523524
688
+ 5: Layer 7, <cos_sim> = 0.546491
689
+ 6: Layer 6, <cos_sim> = 0.551201
690
+ 7: Layer 0, <cos_sim> = 0.657228
691
+ 8: Layer 9, <cos_sim> = 0.693649
692
+ 9: Layer 8, <cos_sim> = 0.693792
693
+ 10: Layer 10, <cos_sim> = 0.715702
694
+ 11: Layer 11, <cos_sim> = 0.738956
695
+ 12: Layer 13, <cos_sim> = 0.812073
696
+ 13: Layer 14, <cos_sim> = 0.819818
697
+ 14: Layer 12, <cos_sim> = 0.85671
698
+ 15: Layer 15, <cos_sim> = 0.860875
699
+ 16: Layer 17, <cos_sim> = 0.888072
700
+ 17: Layer 18, <cos_sim> = 0.89278
701
+ 18: Layer 16, <cos_sim> = 0.914259
702
+ 19: Layer 19, <cos_sim> = 0.931089
703
+ 20: Layer 21, <cos_sim> = 0.949091
704
+ 21: Layer 22, <cos_sim> = 0.955978
705
+ 22: Layer 20, <cos_sim> = 0.958918
706
+ 23: Layer 23, <cos_sim> = 0.963765
707
+ 24: Layer 24, <cos_sim> = 0.963995
708
+ 25: Layer 28, <cos_sim> = 0.965883
709
+ 26: Layer 43, <cos_sim> = 0.967174
710
+ 27: Layer 42, <cos_sim> = 0.969761
711
+ 28: Layer 26, <cos_sim> = 0.970181
712
+ 29: Layer 25, <cos_sim> = 0.971553
713
+ 30: Layer 39, <cos_sim> = 0.972275
714
+ 31: Layer 41, <cos_sim> = 0.975387
715
+ 32: Layer 29, <cos_sim> = 0.975487
716
+ 33: Layer 36, <cos_sim> = 0.977112
717
+ 34: Layer 32, <cos_sim> = 0.978462
718
+ 35: Layer 38, <cos_sim> = 0.979173
719
+ 36: Layer 27, <cos_sim> = 0.979313
720
+ 37: Layer 35, <cos_sim> = 0.980944
721
+ 38: Layer 34, <cos_sim> = 0.98212
722
+ 39: Layer 30, <cos_sim> = 0.982521
723
+ 40: Layer 33, <cos_sim> = 0.982989
724
+ 41: Layer 37, <cos_sim> = 0.983563
725
+ 42: Layer 40, <cos_sim> = 0.985181
726
+ 43: Layer 31, <cos_sim> = 0.985454
727
+ 44: Layer 44, <cos_sim> = 0.987712
728
+
729
+ ======================== sorted ffn importances
730
+ 0: Layer 0, <cos_sim> = 0.431108
731
+ 1: Layer 2, <cos_sim> = 0.44518
732
+ 2: Layer 3, <cos_sim> = 0.450093
733
+ 3: Layer 4, <cos_sim> = 0.471592
734
+ 4: Layer 5, <cos_sim> = 0.482406
735
+ 5: Layer 6, <cos_sim> = 0.559887
736
+ 6: Layer 1, <cos_sim> = 0.602544
737
+ 7: Layer 8, <cos_sim> = 0.643123
738
+ 8: Layer 7, <cos_sim> = 0.684008
739
+ 9: Layer 9, <cos_sim> = 0.708513
740
+ 10: Layer 10, <cos_sim> = 0.718472
741
+ 11: Layer 13, <cos_sim> = 0.770861
742
+ 12: Layer 12, <cos_sim> = 0.786273
743
+ 13: Layer 44, <cos_sim> = 0.811898
744
+ 14: Layer 14, <cos_sim> = 0.832882
745
+ 15: Layer 11, <cos_sim> = 0.841347
746
+ 16: Layer 16, <cos_sim> = 0.847809
747
+ 17: Layer 17, <cos_sim> = 0.867317
748
+ 18: Layer 18, <cos_sim> = 0.875668
749
+ 19: Layer 15, <cos_sim> = 0.886359
750
+ 20: Layer 19, <cos_sim> = 0.932629
751
+ 21: Layer 21, <cos_sim> = 0.935681
752
+ 22: Layer 20, <cos_sim> = 0.936905
753
+ 23: Layer 22, <cos_sim> = 0.94295
754
+ 24: Layer 23, <cos_sim> = 0.944582
755
+ 25: Layer 27, <cos_sim> = 0.947721
756
+ 26: Layer 24, <cos_sim> = 0.95027
757
+ 27: Layer 25, <cos_sim> = 0.952
758
+ 28: Layer 43, <cos_sim> = 0.953131
759
+ 29: Layer 35, <cos_sim> = 0.954686
760
+ 30: Layer 31, <cos_sim> = 0.954798
761
+ 31: Layer 38, <cos_sim> = 0.958932
762
+ 32: Layer 26, <cos_sim> = 0.960332
763
+ 33: Layer 37, <cos_sim> = 0.960368
764
+ 34: Layer 28, <cos_sim> = 0.96127
765
+ 35: Layer 29, <cos_sim> = 0.961706
766
+ 36: Layer 34, <cos_sim> = 0.962314
767
+ 37: Layer 36, <cos_sim> = 0.964392
768
+ 38: Layer 32, <cos_sim> = 0.965215
769
+ 39: Layer 33, <cos_sim> = 0.9656
770
+ 40: Layer 39, <cos_sim> = 0.965828
771
+ 41: Layer 41, <cos_sim> = 0.966507
772
+ 42: Layer 30, <cos_sim> = 0.966721
773
+ 43: Layer 42, <cos_sim> = 0.967369
774
+ 44: Layer 40, <cos_sim> = 0.970084
775
+
776
+ llama_print_timings: load time = 89422.20 ms
777
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
778
+ llama_print_timings: prompt eval time = 3082057.10 ms / 415744 tokens ( 7.41 ms per token, 134.89 tokens per second)
779
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
780
+ llama_print_timings: total time = 3182699.59 ms / 415745 tokens
logs/perplexity-Step-3.5-Flash-BF16.log ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+
3
+ # echo 0 | sudo tee /proc/sys/kernel/numa_balancing
4
+ # sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
5
+
6
+ model=/mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf
7
+ #model=/mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-Q8_0.gguf
8
+ #model=/mnt/data/models/stepfun-ai/Step-3.5-Flash-Int4/step3p5_flash_Q4_K_S-00001-of-00012.gguf
9
+ #model=/mnt/raid/hf/Step-3.5-Flash-GGUF/IQ4_XS/Step-3.5-Flash-IQ4_XS-00001-of-00004.gguf
10
+ #model=/mnt/raid/hf/Step-3.5-Flash-GGUF/IQ5_K/Step-3.5-Flash-IQ5_K-00001-of-00004.gguf
11
+ #model=/mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-IQ3_KS.gguf
12
+ #model=/mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-smol-IQ3_KS.gguf
13
+ #model=/mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-IQ2_KL.gguf
14
+
15
+ # Check if the SOCKET variable is unset or empty.
16
+ if [[ -z "${SOCKET}" ]]; then
17
+ # If it is, print an error to standard error and exit with a non-zero status.
18
+ echo "Error: The SOCKET environment variable is not set." >&2
19
+ exit 1
20
+ else
21
+ # If it is set, print its value and exit successfully.
22
+ echo "SOCKET is set to: ${SOCKET}"
23
+ fi
24
+ SOCKET="${SOCKET}"
25
+
26
+ numactl -N "$SOCKET" -m "$SOCKET" \
27
+ ./build/bin/llama-perplexity \
28
+ -m "$model" \
29
+ -f wiki.test.raw \
30
+ --seed 1337 \
31
+ --ctx-size 512 \
32
+ -ub 4096 -b 4096 \
33
+ --numa numactl \
34
+ --threads 96 \
35
+ --threads-batch 128 \
36
+ --validate-quants \
37
+ --no-mmap
38
+
39
+ SOCKET is set to: 1
40
+ main: build = 4186 (82c4f273)
41
+ main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
42
+ main: seed = 1337
43
+ CPU: using device CPU - 0 MiB free
44
+ llama_model_loader: additional 8 GGUFs metadata loaded.
45
+ llama_model_loader: loaded meta data with 50 key-value pairs and 754 tensors from /mnt/data/models/ubergarm/Step-3.5-Flash-GGUF/Step-3.5-Flash-288x7.4B-BF16-00001-of-00009.gguf (version GGUF V3 (latest))
46
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
47
+ llama_model_loader: - kv 0: general.architecture str = step35
48
+ llama_model_loader: - kv 1: general.type str = model
49
+ llama_model_loader: - kv 2: general.name str = Step 3.5 Flash
50
+ llama_model_loader: - kv 3: general.size_label str = 288x7.4B
51
+ llama_model_loader: - kv 4: general.license str = apache-2.0
52
+ llama_model_loader: - kv 5: general.base_model.count u32 = 1
53
+ llama_model_loader: - kv 6: general.base_model.0.name str = Step 3.5 Flash
54
+ llama_model_loader: - kv 7: general.base_model.0.organization str = Stepfun Ai
55
+ llama_model_loader: - kv 8: general.base_model.0.repo_url str = https://huggingface.co/stepfun-ai/ste...
56
+ llama_model_loader: - kv 9: step35.block_count u32 = 45
57
+ llama_model_loader: - kv 10: step35.context_length u32 = 262144
58
+ llama_model_loader: - kv 11: step35.embedding_length u32 = 4096
59
+ llama_model_loader: - kv 12: step35.feed_forward_length u32 = 11264
60
+ llama_model_loader: - kv 13: step35.attention.head_count arr[i32,45] = [64, 96, 96, 96, 64, 96, 96, 96, 64, ...
61
+ llama_model_loader: - kv 14: step35.rope.freq_base f32 = 5000000.000000
62
+ llama_model_loader: - kv 15: step35.rope.freq_base_swa f32 = 10000.000000
63
+ llama_model_loader: - kv 16: step35.expert_gating_func u32 = 2
64
+ llama_model_loader: - kv 17: step35.attention.key_length u32 = 128
65
+ llama_model_loader: - kv 18: step35.attention.value_length u32 = 128
66
+ llama_model_loader: - kv 19: general.file_type u32 = 32
67
+ llama_model_loader: - kv 20: step35.attention.head_count_kv arr[i32,45] = [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...
68
+ llama_model_loader: - kv 21: step35.attention.sliding_window u32 = 512
69
+ llama_model_loader: - kv 22: step35.attention.sliding_window_pattern arr[i32,45] = [0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ...
70
+ llama_model_loader: - kv 23: step35.expert_count u32 = 288
71
+ llama_model_loader: - kv 24: step35.expert_used_count u32 = 8
72
+ llama_model_loader: - kv 25: step35.expert_feed_forward_length u32 = 1280
73
+ llama_model_loader: - kv 26: step35.expert_shared_feed_forward_length u32 = 1280
74
+ llama_model_loader: - kv 27: step35.expert_weights_scale f32 = 3.000000
75
+ llama_model_loader: - kv 28: step35.expert_weights_norm bool = true
76
+ llama_model_loader: - kv 29: step35.leading_dense_block_count u32 = 3
77
+ llama_model_loader: - kv 30: step35.moe_every_n_layers u32 = 1
78
+ llama_model_loader: - kv 31: step35.attention.layer_norm_rms_epsilon f32 = 0.000010
79
+ llama_model_loader: - kv 32: step35.swiglu_clamp_exp arr[f32,45] = [0.000000, 0.000000, 0.000000, 0.0000...
80
+ llama_model_loader: - kv 33: step35.swiglu_clamp_shexp arr[f32,45] = [0.000000, 0.000000, 0.000000, 0.0000...
81
+ llama_model_loader: - kv 34: general.quantization_version u32 = 2
82
+ llama_model_loader: - kv 35: tokenizer.ggml.model str = gpt2
83
+ llama_model_loader: - kv 36: tokenizer.ggml.pre str = deepseek-v3
84
+ llama_model_loader: - kv 37: tokenizer.ggml.tokens arr[str,128896] = ["<|begin▁of▁sentence|>", "<�...
85
+ llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,128896] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
86
+ llama_model_loader: - kv 39: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
87
+ llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 0
88
+ llama_model_loader: - kv 41: tokenizer.ggml.eos_token_id u32 = 128007
89
+ llama_model_loader: - kv 42: tokenizer.ggml.padding_token_id u32 = 1
90
+ llama_model_loader: - kv 43: tokenizer.ggml.add_bos_token bool = true
91
+ llama_model_loader: - kv 44: tokenizer.ggml.add_sep_token bool = false
92
+ llama_model_loader: - kv 45: tokenizer.ggml.add_eos_token bool = false
93
+ llama_model_loader: - kv 46: tokenizer.chat_template str = {% macro render_content(content) %}{%...
94
+ llama_model_loader: - kv 47: split.no u16 = 0
95
+ llama_model_loader: - kv 48: split.count u16 = 9
96
+ llama_model_loader: - kv 49: split.tensors.count i32 = 754
97
+ llama_model_loader: - type f32: 266 tensors
98
+ llama_model_loader: - type bf16: 488 tensors
99
+ load: printing all EOG tokens:
100
+ load: - 128007 ('<|im_end|>')
101
+ load: special tokens cache size = 818
102
+ load: token to piece cache size = 0.8220 MB
103
+ llm_load_print_meta: format = GGUF V3 (latest)
104
+ llm_load_print_meta: arch = step35
105
+ llm_load_print_meta: n_ctx_train = 262144
106
+ llm_load_print_meta: n_embd = 4096
107
+ llm_load_print_meta: n_layer = 45
108
+ llm_load_print_meta: n_head = [64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64, 96, 96, 96, 64]
109
+ llm_load_print_meta: n_head_kv = 8
110
+ llm_load_print_meta: n_rot = 128
111
+ llm_load_print_meta: n_swa = 512
112
+ llm_load_print_meta: n_swa_pattern = 1
113
+ llm_load_print_meta: n_embd_head_k = 128
114
+ llm_load_print_meta: n_embd_head_v = 128
115
+ llm_load_print_meta: n_gqa = [8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8, 12, 12, 12, 8]
116
+ llm_load_print_meta: n_embd_k_gqa = 1024
117
+ llm_load_print_meta: n_embd_v_gqa = 1024
118
+ llm_load_print_meta: f_norm_eps = 0.0e+00
119
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-05
120
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
121
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
122
+ llm_load_print_meta: f_logit_scale = 0.0e+00
123
+ llm_load_print_meta: n_ff = 11264
124
+ llm_load_print_meta: n_expert = 288
125
+ llm_load_print_meta: n_expert_used = 8
126
+ llm_load_print_meta: causal attn = 1
127
+ llm_load_print_meta: pooling type = 0
128
+ llm_load_print_meta: rope type = 2
129
+ llm_load_print_meta: rope scaling = linear
130
+ llm_load_print_meta: freq_base_train = 5000000.0
131
+ llm_load_print_meta: freq_scale_train = 1
132
+ llm_load_print_meta: n_ctx_orig_yarn = 262144
133
+ llm_load_print_meta: rope_finetuned = unknown
134
+ llm_load_print_meta: ssm_d_conv = 0
135
+ llm_load_print_meta: ssm_d_inner = 0
136
+ llm_load_print_meta: ssm_d_state = 0
137
+ llm_load_print_meta: ssm_dt_rank = 0
138
+ llm_load_print_meta: model type = ?B
139
+ llm_load_print_meta: model ftype = BF16
140
+ llm_load_print_meta: model params = 196.956 B
141
+ llm_load_print_meta: model size = 366.952 GiB (16.004 BPW)
142
+ llm_load_print_meta: repeating layers = 364.986 GiB (16.004 BPW, 195.900 B parameters)
143
+ llm_load_print_meta: general.name = Step 3.5 Flash
144
+ print_info: vocab type = BPE
145
+ print_info: n_vocab = 128896
146
+ print_info: n_merges = 127741
147
+ print_info: BOS token = 0 '<|begin▁of▁sentence|>'
148
+ print_info: EOS token = 128007 '<|im_end|>'
149
+ print_info: EOT token = 128007 '<|im_end|>'
150
+ print_info: PAD token = 1 '<|end▁of▁sentence|>'
151
+ print_info: LF token = 201 'Ċ'
152
+ print_info: FIM PRE token = 128801 '<|fim▁begin|>'
153
+ print_info: FIM SUF token = 128800 '<|fim▁hole|>'
154
+ print_info: FIM MID token = 128802 '<|fim▁end|>'
155
+ print_info: EOG token = 128007 '<|im_end|>'
156
+ print_info: max token length = 256
157
+ llm_load_tensors: ggml ctx size = 0.31 MiB
158
+ llm_load_tensors: offloading 0 repeating layers to GPU
159
+ llm_load_tensors: offloaded 0/46 layers to GPU
160
+ llm_load_tensors: CPU buffer size = 375759.27 MiB
161
+ ....................................................................................................
162
+ llama_new_context_with_model: n_ctx = 4096
163
+ llama_new_context_with_model: n_batch = 4096
164
+ llama_new_context_with_model: n_ubatch = 4096
165
+ llama_new_context_with_model: flash_attn = 1
166
+ llama_new_context_with_model: attn_max_b = 0
167
+ llama_new_context_with_model: fused_moe = 1
168
+ llama_new_context_with_model: grouped er = 0
169
+ llama_new_context_with_model: fused_up_gate = 1
170
+ llama_new_context_with_model: fused_mmad = 1
171
+ llama_new_context_with_model: rope_cache = 0
172
+ llama_new_context_with_model: graph_reuse = 1
173
+ llama_new_context_with_model: k_cache_hadam = 0
174
+ llama_new_context_with_model: split_mode_graph_scheduling = 0
175
+ llama_new_context_with_model: reduce_type = f16
176
+ llama_new_context_with_model: sched_async = 0
177
+ llama_new_context_with_model: ser = -1, 0
178
+ llama_new_context_with_model: freq_base = 5000000.0
179
+ llama_new_context_with_model: freq_scale = 1
180
+ llama_kv_cache_init: CPU KV buffer size = 720.00 MiB
181
+ llama_new_context_with_model: KV self size = 720.00 MiB, K (f16): 360.00 MiB, V (f16): 360.00 MiB
182
+ llama_new_context_with_model: CPU output buffer size = 3.93 MiB
183
+ llama_new_context_with_model: CPU compute buffer size = 2078.00 MiB
184
+ llama_new_context_with_model: graph nodes = 2201
185
+ llama_new_context_with_model: graph splits = 1
186
+ XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
187
+
188
+ system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
189
+ perplexity: tokenizing the input ..
190
+ perplexity: tokenization took 723.567 ms
191
+ perplexity: calculating perplexity over 561 chunks, n_ctx=512, batch_size=4096, n_seq=8
192
+ perplexity: 15.47 seconds per pass - ETA 18.07 minutes
193
+ ===================================== llama_new_context_with_model: f16
194
+ ======================================= HAVE_FANCY_SIMD is defined
195
+ [1]1.5125,[2]1.9280,[3]1.6178,[4]1.4760,[5]1.4000,[6]1.3378,[7]1.3006,[8]1.2759,[9]1.2557,[10]1.2356,[11]1.2434,[12]1.2544,[13]1.2647,[14]1.3110,[15]1.3541,[16]1.3996,[17]1.5016,[18]1.5846,[19]1.5731,[20]1.5561,[21]1.5612,[22]1.5507,[23]1.5331,[24]1.5278,[25]1.5172,[26]1.5098,[27]1.5009,[28]1.4958,[29]1.4917,[30]1.4977,[31]1.4967,[32]1.4862,[33]1.4809,[34]1.4913,[35]1.4946,[36]1.5053,[37]1.5355,[38]1.5694,[39]1.6011,[40]1.6472,[41]1.6760,[42]1.6825,[43]1.7172,[44]1.7369,[45]1.7776,[46]1.8155,[47]1.8165,[48]1.8110,[49]1.8061,[50]1.7953,[51]1.8169,[52]1.8152,[53]1.8328,[54]1.8418,[55]1.8548,[56]1.8631,[57]1.8643,[58]1.8699,[59]1.8768,[60]1.8919,[61]1.8872,[62]1.9143,[63]1.9293,[64]1.9433,[65]1.9454,[66]1.9417,[67]1.9391,[68]1.9453,[69]1.9449,[70]1.9460,[71]1.9421,[72]1.9417,[73]1.9501,[74]1.9627,[75]1.9628,[76]1.9498,[77]1.9417,[78]1.9370,[79]1.9331,[80]1.9279,[81]1.9234,[82]1.9262,[83]1.9219,[84]1.9180,[85]1.9127,[86]1.9152,[87]1.9229,[88]1.9163,[89]1.9171,[90]1.9167,[91]1.9127,[92]1.9089,[93]1.9056,[94]1.9002,[95]1.9012,[96]1.9053,[97]1.9162,[98]1.9152,[99]1.9091,[100]1.9069,[101]1.9066,[102]1.9153,[103]1.9198,[104]1.9363,[105]1.9437,[106]1.9683,[107]1.9908,[108]2.0093,[109]2.0375,[110]2.0637,[111]2.0878,[112]2.0815,[113]2.0835,[114]2.0887,[115]2.0900,[116]2.0982,[117]2.0991,[118]2.1001,[119]2.0971,[120]2.0958,[121]2.0988,[122]2.0957,[123]2.0949,[124]2.0910,[125]2.0874,[126]2.0863,[127]2.0868,[128]2.0853,[129]2.0883,[130]2.0891,[131]2.0895,[132]2.0910,[133]2.1011,[134]2.1063,[135]2.1041,[136]2.1007,[137]2.0981,[138]2.0948,[139]2.0931,[140]2.0920,[141]2.0920,[142]2.0917,[143]2.0939,[144]2.0946,[145]2.0887,[146]2.0841,[147]2.0816,[148]2.0776,[149]2.0759,[150]2.0711,[151]2.0657,[152]2.0632,[153]2.0603,[154]2.0590,[155]2.0579,[156]2.0557,[157]2.0555,[158]2.0547,[159]2.0544,[160]2.0526,[161]2.0621,[162]2.0724,[163]2.0757,[164]2.0811,[165]2.0870,[166]2.0970,[167]2.0994,[168]2.1125,[169]2.1201,[170]2.1320,[171]2.1389,[172]2.1361,[173]2.1299,[174]2.1337,[175]2.1363,[176]2.1380,[177]2.1385,[178]2.1385,[179]2.1403,[180]2.1426,[181]2.1545,[182]2.1659,[183]2.1785,[184]2.1920,[185]2.2015,[186]2.2151,[187]2.2300,[188]2.2433,[189]2.2494,[190]2.2499,[191]2.2526,[192]2.2557,[193]2.2550,[194]2.2580,[195]2.2577,[196]2.2627,[197]2.2682,[198]2.2709,[199]2.2707,[200]2.2704,[201]2.2808,[202]2.2752,[203]2.2756,[204]2.2758,[205]2.2770,[206]2.2777,[207]2.2782,[208]2.2809,[209]2.2834,[210]2.2825,[211]2.2798,[212]2.2796,[213]2.2796,[214]2.2784,[215]2.2749,[216]2.2745,[217]2.2700,[218]2.2682,[219]2.2687,[220]2.2680,[221]2.2684,[222]2.2646,[223]2.2630,[224]2.2663,[225]2.2666,[226]2.2632,[227]2.2648,[228]2.2669,[229]2.2686,[230]2.2755,[231]2.2821,[232]2.2807,[233]2.2788,[234]2.2786,[235]2.2789,[236]2.2814,[237]2.2857,[238]2.2896,[239]2.2969,[240]2.3025,[241]2.3097,[242]2.3165,[243]2.3226,[244]2.3274,[245]2.3365,[246]2.3413,[247]2.3413,[248]2.3395,[249]2.3395,[250]2.3363,[251]2.3351,[252]2.3388,[253]2.3440,[254]2.3505,[255]2.3528,[256]2.3542,[257]2.3561,[258]2.3563,[259]2.3555,[260]2.3564,[261]2.3564,[262]2.3564,[263]2.3570,[264]2.3558,[265]2.3556,[266]2.3568,[267]2.3584,[268]2.3604,[269]2.3629,[270]2.3620,[271]2.3645,[272]2.3624,[273]2.3610,[274]2.3581,[275]2.3584,[276]2.3541,[277]2.3568,[278]2.3644,[279]2.3720,[280]2.3785,[281]2.3816,[282]2.3827,[283]2.3869,[284]2.3908,[285]2.3995,[286]2.3997,[287]2.4026,[288]2.4078,[289]2.4094,[290]2.4075,[291]2.4083,[292]2.4165,[293]2.4195,[294]2.4216,[295]2.4238,[296]2.4268,[297]2.4273,[298]2.4297,[299]2.4306,[300]2.4315,[301]2.4335,[302]2.4351,[303]2.4356,[304]2.4357,[305]2.4437,[306]2.4474,[307]2.4559,[308]2.4506,[309]2.4480,[310]2.4432,[311]2.4426,[312]2.4399,[313]2.4376,[314]2.4357,[315]2.4354,[316]2.4353,[317]2.4330,[318]2.4308,[319]2.4298,[320]2.4300,[321]2.4270,[322]2.4274,[323]2.4282,[324]2.4255,[325]2.4236,[326]2.4203,[327]2.4176,[328]2.4185,[329]2.4184,[330]2.4218,[331]2.4228,[332]2.4261,[333]2.4255,[334]2.4253,[335]2.4257,[336]2.4261,[337]2.4274,[338]2.4280,[339]2.4294,[340]2.4319,[341]2.4356,[342]2.4404,[343]2.4458,[344]2.4486,[345]2.4474,[346]2.4446,[347]2.4455,[348]2.4445,[349]2.4417,[350]2.4408,[351]2.4423,[352]2.4415,[353]2.4421,[354]2.4420,[355]2.4420,[356]2.4403,[357]2.4410,[358]2.4415,[359]2.4387,[360]2.4372,[361]2.4374,[362]2.4370,[363]2.4360,[364]2.4361,[365]2.4331,[366]2.4331,[367]2.4333,[368]2.4315,[369]2.4314,[370]2.4304,[371]2.4320,[372]2.4343,[373]2.4323,[374]2.4299,[375]2.4292,[376]2.4321,[377]2.4358,[378]2.4335,[379]2.4320,[380]2.4310,[381]2.4324,[382]2.4333,[383]2.4354,[384]2.4386,[385]2.4416,[386]2.4447,[387]2.4495,[388]2.4516,[389]2.4481,[390]2.4448,[391]2.4411,[392]2.4397,[393]2.4389,[394]2.4375,[395]2.4345,[396]2.4323,[397]2.4286,[398]2.4258,[399]2.4222,[400]2.4188,[401]2.4144,[402]2.4113,[403]2.4074,[404]2.4042,[405]2.4002,[406]2.3964,[407]2.3934,[408]2.3907,[409]2.3869,[410]2.3861,[411]2.3874,[412]2.3864,[413]2.3888,[414]2.3894,[415]2.3861,[416]2.3825,[417]2.3850,[418]2.3814,[419]2.3801,[420]2.3776,[421]2.3747,[422]2.3706,[423]2.3670,[424]2.3663,[425]2.3636,[426]2.3602,[427]2.3576,[428]2.3562,[429]2.3537,[430]2.3505,[431]2.3470,[432]2.3453,[433]2.3430,[434]2.3409,[435]2.3391,[436]2.3380,[437]2.3377,[438]2.3381,[439]2.3395,[440]2.3425,[441]2.3478,[442]2.3534,[443]2.3516,[444]2.3511,[445]2.3515,[446]2.3537,[447]2.3564,[448]2.3580,[449]2.3595,[450]2.3612,[451]2.3634,[452]2.3642,[453]2.3656,[454]2.3641,[455]2.3664,[456]2.3674,[457]2.3700,[458]2.3738,[459]2.3739,[460]2.3745,[461]2.3727,[462]2.3734,[463]2.3768,[464]2.3811,[465]2.3792,[466]2.3804,[467]2.3820,[468]2.3835,[469]2.3839,[470]2.3849,[471]2.3872,[472]2.3892,[473]2.3895,[474]2.3912,[475]2.3928,[476]2.3930,[477]2.3936,[478]2.3945,[479]2.3961,[480]2.3975,[481]2.3948,[482]2.3958,[483]2.3948,[484]2.3977,[485]2.4024,[486]2.4038,[487]2.4061,[488]2.4079,[489]2.4099,[490]2.4128,[491]2.4155,[492]2.4188,[493]2.4186,[494]2.4172,[495]2.4168,[496]2.4166,[497]2.4169,[498]2.4168,[499]2.4157,[500]2.4170,[501]2.4207,[502]2.4199,[503]2.4202,[504]2.4209,[505]2.4228,[506]2.4245,[507]2.4259,[508]2.4280,[509]2.4251,[510]2.4246,[511]2.4238,[512]2.4222,[513]2.4199,[514]2.4195,[515]2.4192,[516]2.4170,[517]2.4164,[518]2.4161,[519]2.4153,[520]2.4149,[521]2.4149,[522]2.4137,[523]2.4146,[524]2.4141,[525]2.4148,[526]2.4135,[527]2.4115,[528]2.4113,[529]2.4105,[530]2.4100,[531]2.4090,[532]2.4065,[533]2.4042,[534]2.4025,[535]2.4023,[536]2.4037,[537]2.4056,[538]2.4072,[539]2.4089,[540]2.4121,[541]2.4149,[542]2.4175,[543]2.4190,[544]2.4184,[545]2.4186,[546]2.4160,[547]2.4136,[548]2.4108,[549]2.4085,[550]2.4071,[551]2.4058,[552]2.4042,[553]2.4031,[554]2.4033,[555]2.4028,[556]2.4056,[557]2.4078,[558]2.4111,[559]2.4132,[560]2.4173,[561]2.4169,
196
+ llama_print_timings: load time = 168747.99 ms
197
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
198
+ llama_print_timings: prompt eval time = 879771.94 ms / 287232 tokens ( 3.06 ms per token, 326.48 tokens per second)
199
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
200
+ llama_print_timings: total time = 890647.44 ms / 287233 tokens
201
+
202
+ Final estimate: PPL over 561 chunks for n_ctx=512 = 2.4169 +/- 0.01107