| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - qwen | |
| - vision-language | |
| - awq | |
| - int4 | |
| - vllm | |
| base_model: Qwen/Qwen3-VL-8B-Instruct | |
| code taken from : https://github.com/vllm-project/llm-compressor/blob/main/examples/awq/qwen3-vl-30b-a3b-Instruct-example.py | |
| # Qwen3-VL-8B-Instruct-AWQ | |
| AWQ (W4A16) quantized version of `Qwen/Qwen3-VL-8B-Instruct`. | |
| - **Quantization:** AWQ, 4 bits, group_size=128, zero_point=true, version="gemm" | |
| - **modules_to_not_convert:** ["visual"] | |
| - Prepared with LLM Compressor oneshot AWQ. | |
| recipe = AWQModifier( | |
| targets="Linear", | |
| scheme="W4A16", | |
| ignore=[r"re:model.visual.*", r"re:visual.*"], # drop lm_head from ignore | |
| duo_scaling=True, | |
| ) | |