Gemma 7B Instruct GGUF
Contains Q4 & Q8 quantized GGUFs for google/gemma
Perf
| Variant | Device | Perf |
|---|---|---|
| Q4 | RTX 2070S | 22 tok/s |
| M1 Pro 10-core GPU | 28 tok/s | |
| Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) |
| M1 Pro 10-core GPU | 17 tok/s |
- Downloads last month
- 23
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support