Reporting successful run of MXFP4 using ik_llama cpp

#4
by nimishchaudhari - opened

Hello,

Thanks for these quants, I tried mxfp4 as it had the least perplexity and it seems to be working much better for me than the UD_Q5 from unsloth.

Command I ran:

      ${ik_llama_cpp}
      -m ${models_dir}/LLMs/GLM-4.7-Flash-MXFP4.gguf
      -a 'GLM_4.7_Flash'
      -ger --special
      --merge-qkv 
      -mla 3 -amb 512 
      -ngl 99 
      -c 100000
      --temp 0.7
      --top-p 1.0
      --min-p 0.01
      --jinja
Owner

@nimishchaudhari

Nice, thanks for testing!

I did a few KLD benchmarks suggesting that the MXFP4 while having lowest perplexity, diverges more from the full bf16 than the other two quants I've released. Sorry life is busy right now so slow getting back to folks.

Here is a quick data dump on my limited testing: https://github.com/Thireus/GGUF-Tool-Suite/issues/52#issuecomment-3795175551

Sign up or log in to comment