File size: 1,520 Bytes
869c26f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7dcafb6
 
 
 
 
 
869c26f
 
 
 
f489f2b
869c26f
 
 
 
 
7dcafb6
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
base_model: fdtn-ai/Foundation-Sec-8B-Instruct
tags:
- quantized
- nvfp4
- tensorrt
- foundation-sec-8b-instruct
- cybersecurity
---

# Foundation-Sec-8B-Instruct-NVFP4-quantized

This repository contains an NVFP4 quantized version of the
[fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct) model, optimized for NVIDIA Spark using TensorRT Model Optimizer.

🚀 Quantizing the Foundation-Sec-8B model to NVFP4 can significantly reduce its memory footprint by up to 3.5 times, allowing it to run on hardware with less VRAM. 
This process also increases inference speed by reducing the memory bandwidth bottleneck and leveraging optimizations specific to NVIDIA's Blackwell architecture. 
Read more about NVFP4 at NVIDIA (https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/)

❇️ NVIDIA Pretraining Large Language Models with NVFP4 Paper(https://arxiv.org/abs/2509.25149)

## Quantization Details
- Quantization Method: NVFP4
- Base Model: [fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct)
- Tool: NVIDIA TensorRT Model Optimizer
- Environment: NVIDIA DGX Spark | NVIDIA-SMI 580.95.05 | Driver Version: 580.95.05 | CUDA Version: 13.0 |

## Loading
Refer to TensorRT-LLM or your deployment stack for loading NVFP4 artifacts.

## License
(Inherit from base model)

## Contacts
@guerilla7 | Ron F. Del Rosario | LinkedIn:(https://www.linkedin.com/in/ronaldfloresdelrosario/)