|
|
--- |
|
|
base_model: fdtn-ai/Foundation-Sec-8B-Instruct |
|
|
tags: |
|
|
- quantized |
|
|
- nvfp4 |
|
|
- tensorrt |
|
|
- foundation-sec-8b-instruct |
|
|
- cybersecurity |
|
|
--- |
|
|
|
|
|
# Foundation-Sec-8B-Instruct-NVFP4-quantized |
|
|
|
|
|
This repository contains an NVFP4 quantized version of the |
|
|
[fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct) model, optimized for NVIDIA Spark using TensorRT Model Optimizer. |
|
|
|
|
|
๐ Quantizing the Foundation-Sec-8B model to NVFP4 can significantly reduce its memory footprint by up to 3.5 times, allowing it to run on hardware with less VRAM. |
|
|
This process also increases inference speed by reducing the memory bandwidth bottleneck and leveraging optimizations specific to NVIDIA's Blackwell architecture. |
|
|
Read more about NVFP4 at NVIDIA (https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/) |
|
|
|
|
|
โ๏ธ NVIDIA Pretraining Large Language Models with NVFP4 Paper(https://arxiv.org/abs/2509.25149) |
|
|
|
|
|
## Quantization Details |
|
|
- Quantization Method: NVFP4 |
|
|
- Base Model: [fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct) |
|
|
- Tool: NVIDIA TensorRT Model Optimizer |
|
|
- Environment: NVIDIA DGX Spark | NVIDIA-SMI 580.95.05 | Driver Version: 580.95.05 | CUDA Version: 13.0 | |
|
|
|
|
|
## Loading |
|
|
Refer to TensorRT-LLM or your deployment stack for loading NVFP4 artifacts. |
|
|
|
|
|
## License |
|
|
(Inherit from base model) |
|
|
|
|
|
## Contacts |
|
|
@guerilla7 | Ron F. Del Rosario | LinkedIn:(https://www.linkedin.com/in/ronaldfloresdelrosario/) |
|
|
|