Reporting successful run of MXFP4 using ik_llama cpp
#4
by
nimishchaudhari - opened
Hello,
Thanks for these quants, I tried mxfp4 as it had the least perplexity and it seems to be working much better for me than the UD_Q5 from unsloth.
Command I ran:
${ik_llama_cpp}
-m ${models_dir}/LLMs/GLM-4.7-Flash-MXFP4.gguf
-a 'GLM_4.7_Flash'
-ger --special
--merge-qkv
-mla 3 -amb 512
-ngl 99
-c 100000
--temp 0.7
--top-p 1.0
--min-p 0.01
--jinja
Nice, thanks for testing!
I did a few KLD benchmarks suggesting that the MXFP4 while having lowest perplexity, diverges more from the full bf16 than the other two quants I've released. Sorry life is busy right now so slow getting back to folks.
Here is a quick data dump on my limited testing: https://github.com/Thireus/GGUF-Tool-Suite/issues/52#issuecomment-3795175551