Content

This model area holds the public parts of converted gguf models using Skipper (T3) or Mate (M8) technology. Future modes will also follow the nautic theme.

The T3 (and M8) project is a Proto Open Source project that does NOT publish its code but applies the benefits ONLY to OSI models and some select Open Weights models. The goal is to strengthen the True Open Source model family. Open Sourcing the code would also benefit proprietary models. Currently there are demos on Huggingface Spaces to try out the model behavior under such extreme compression. Further variants for faster inference and local inference will follow.

Demo Spaces

  • Regular compression
    • Granite4family All Granite4 models (small, tiny, micro, nano 1b and nano 350m)
  • T3 OSI compression
    • TOM@zero Demo of next generation 2bpw compression (Skipper aka T3) with high quality open source models (OSI)
      • Olmo3
      • Smol3
      • Apertus
  • T3 Open Weights compression
    • Granite4extreme Granite 4 small hybrid 32b compressed to below 9GB in fp16 quality
    • tbd Qwen3.5

Challenge: high quality models in 1/2/4/8/.. GB size

  • Phone 4GB
  • Home 8GB
  • Game 16GB
  • Pro 32GB
  • Zero 64GB - 71GB
  • Server 128GB+
Quality vs. Size Casual Premium Advanced Frontier
64-71 GB SOTA SOTA SOTA BETA
32 GB SOTA SOTA SOTA+ RESEARCH
16 GB SOTA SOTA+ BETA -
8 GB SOTA BETA BETA -
4 GB SOTA RESEARCH - -
2 GB RESEARCH - - -
1 GB - - - -
  • SOTA: K quants
  • SOTA+: UD quants
  • BETA: REAP + UD
  • RESEARCH: M8 and better

ELO (https://lmarena.ai/leaderboard/text)

Versions

Version Codename Fileprefix typical bpw range new feature
1.0 Skipper T3 and T2 0.8 .. 2.2 introduce new compression method
1.5 Mate M8 0.4 .. 2 compression improvements
2.0 Cheng Cx 0.3 .. 2 speed improvements
2.5 Cheng++ Cy 0.1 .. 2 reduce compute requirements

V1 does reduce model size significantly at same subjective quality, but leaves compute requirements high.

V2 will scale down compute requirements and support cheap NPUs

expected bpw (bit per weight)

Actual bpw are higher for small models and lower for larger models. Similar to JPEG and video encoding, higher input quality opens more opportunity for compression.

Base Mode % bpw@30b
Q5_K T3UD 95 2 .. 2.2
Q4_K T2UD 90 1.4 .. 1.6
Q2_K T2UD2 75 1 .. 1.2
Q2_K T2UD1 60 0.8
Q2_K M8HQ 75 0.8
Q2_K M8LQ 60 0.4 .. 0.6
Downloads last month
105
GGUF
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support