Reporting successful run of MXFP4 using ik_llama cpp

by nimishchaudhari - opened Jan 26

Jan 26

Hello,

Thanks for these quants, I tried mxfp4 as it had the least perplexity and it seems to be working much better for me than the UD_Q5 from unsloth.

Command I ran:

      ${ik_llama_cpp}
      -m ${models_dir}/LLMs/GLM-4.7-Flash-MXFP4.gguf
      -a 'GLM_4.7_Flash'
      -ger --special
      --merge-qkv 
      -mla 3 -amb 512 
      -ngl 99 
      -c 100000
      --temp 0.7
      --top-p 1.0
      --min-p 0.01
      --jinja

ubergarm

Owner Feb 2

@nimishchaudhari

Nice, thanks for testing!

I did a few KLD benchmarks suggesting that the MXFP4 while having lowest perplexity, diverges more from the full bf16 than the other two quants I've released. Sorry life is busy right now so slow getting back to folks.

Here is a quick data dump on my limited testing: https://github.com/Thireus/GGUF-Tool-Suite/issues/52#issuecomment-3795175551

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment