13 1 57

Ruslan

uzvisa

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

baidu/Unlimited-OCR

new activity about 2 months ago

Qwen/Qwen3.6-35B-A3B:how to enable non-thinking mode of this model in llama.cpp?

reacted to eaddario's post with 👍 about 2 months ago

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B. Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards. https://huggingface.co/eaddario/Qwen3.6-27B-GGUF https://huggingface.co/eaddario/Qwen3.6-35B-A3B-GGUF

View all activity

Organizations

None yet

liked a model 1 day ago

baidu/Unlimited-OCR

Image-Text-to-Text • 3B • Updated 1 day ago • 70.7k • 882

New activity in Qwen/Qwen3.6-35B-A3B about 2 months ago

how to enable non-thinking mode of this model in llama.cpp?

#54 opened about 2 months ago by

daijava

reacted to eaddario's post with 👍🔥 about 2 months ago

Post

3251

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B.

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards.

eaddario/Qwen3.6-27B-GGUF
eaddario/Qwen3.6-35B-A3B-GGUF

updated a collection 3 months ago

smart-writing

Collection

14 items • Updated Apr 10

liked a model 3 months ago

Tesslate/OmniCoder-9B

Text Generation • 9B • Updated Mar 13 • 4.05k • 647

updated a collection 4 months ago

smart-writing

Collection

14 items • Updated Apr 10

liked 4 models 4 months ago

reacted to robtacconelli's post with 🔥 4 months ago

Post

3645

🏆 Nacrith: a 135M model that out-compresses everything on natural language

What if a tiny LM could compress english text better than _every_ compressor out there — classical or neural, small or large?

Nacrith pairs SmolLM2-135M with an ensemble of online predictors and high-precision arithmetic coding.

What's inside

The standard LLM+arithmetic coding approach wastes ~75% of CDF precision on large vocabularies. Our CDF-24 fix alone recovers 0.5 bpb. On top: a token N-gram that skips the GPU on predictable tokens, an adaptive bias head, llama.cpp backend (7× faster than PyTorch), multi-GPU parallel compression, and a binary file format (NC06) — the first LLM-based binary compressor we know of.

Runs on a GTX 1050 Ti. ~500 MB weights, ~1.2 GB VRAM per worker.

💻 Code: https://github.com/robtacconelli/Nacrith-GPU
⭐ Space: robtacconelli/Nacrith-GPU
📄 Paper: Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding (2602.19626)

Try it, break it, share your results — all feedback welcome. ⭐ on the repo appreciated!

Results across all systems we tested:
- alice29.txt → 0.918 bpb (−44% vs CMIX, −20% vs ts_zip) — below the 2nd-order Shannon entropy bound
- enwik8 (100 MB) → 0.9389 bpb (−8% vs FineZip/LLMZip's 8B model, −15% vs ts_zip)
- Unseen text → 0.723 bpb on a doc published after training cutoff — no memorization, 26% better than FineZip/LLMZip on the same model

SmolLM2-135M by