Gemma 7B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant	Device	Perf
Q4	RTX 2070S	22 tok/s
	M1 Pro 10-core GPU	28 tok/s
Q8	RTX 2070S	7 tok/s (could only offload 23/29 layers to GPU)
	M1 Pro 10-core GPU	17 tok/s

GGUF

Model size

9B params

Architecture

gemma

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support