Initial upload of AWQ quantized Qwen3-VL-8B-Instruct

df4a298 verified 4 months ago

693 Bytes

	---
	license: apache-2.0
	library_name: transformers
	tags:
	- qwen
	- vision-language
	- awq
	- int4
	- vllm
	base_model: Qwen/Qwen3-VL-8B-Instruct
	---
	code taken from : https://github.com/vllm-project/llm-compressor/blob/main/examples/awq/qwen3-vl-30b-a3b-Instruct-example.py
	# Qwen3-VL-8B-Instruct-AWQ

	AWQ (W4A16) quantized version of `Qwen/Qwen3-VL-8B-Instruct`.

	- Quantization: AWQ, 4 bits, group_size=128, zero_point=true, version="gemm"
	- modules_to_not_convert: ["visual"]
	- Prepared with LLM Compressor oneshot AWQ.
	recipe = AWQModifier(
	targets="Linear",
	scheme="W4A16",
	ignore=[r"re:model.visual.", r"re:visual."], # drop lm_head from ignore
	duo_scaling=True,
	)