FAQ | Dell Enterprise Hub by Hugging Face

What is the Dell Enterprise Hub?

The Dell Enterprise Hub is an online portal making it easy to train and deploy the latest open AI models on-premise using Dell platforms, and securely build Generative AI applications. The Dell Enterprise Hub is the result of a deep engineering collaboration between Dell Technologies and Hugging Face, it includes:

Optimized Containers: Ready-to-use containers and scripts for model deployment and fine-tuning, optimized for each base model and tested for Dell hardware configurations.
Dell Platforms native support: Dell customers easily filter models by the Dell Platforms they have access to, and access tested and maintained configurations for training and inference.
Bring Your Own Model: Dell Enterprise Hub supports importing compatible fine-tuned models into the optimized containers.
Enterprise Security: Leverage Hugging Face Enterprise Hub advanced security features including Single Sign-On, fine-grained access controls, audit logs and malware scanning, along with the enterprise-curated models offered in Dell Enterprise Hub.

The Dell Enterprise hub provides a secure, streamlined experience for Dell customers to build Generative AI applications in confidence, taking full advantage of the computing power of the Dell Platforms at their disposal.

How does Dell Enterprise Hub strengthen supply chain security?

In order to enable users to easily run popular models from the Hugging Face Hub on a variety of devices on the Dell Enterprise Hub, we provide custom docker images. Security being a top concern of our customers, these docker images are not only tested for functionality and performance, but also for security. When it comes to these artifacts, there are two major components of security testing.

Docker image security: This refers to the security posture of packages within the container image itself. While docker images are static artifacts, the open source community finds and publishes new package vulnerabilities regularly via the National Vulnerability Database. For docker image security, we currently use Amazon Inspector to scan for vulnerabilities in our model containers. We follow the CVSS v3 ratings associated with a particular vulnerability. Images are scanned daily, and we are notified immediately when a critical vulnerability emerges.
Model security: This refers to the safety of the model itself, including its files and potential malicious functional behavior. Hugging Face performs malware and pickle scans on all model repositories with in-house scanners and third-party scanners, to assess model security. Reports can be found by navigating to the Files and versions tab on every model page. These scans protect enterprises against the risk of malicious content being introduced via models. For more information on the scanners used, please visit the Hugging Face Hub documentation.

DEH containers provide a complete, self-contained environment enabling on-premises model deployment. Once a container and model files are downloaded, no external internet connection is required. This ensures secure execution fully within your infrastructure, without needing to transmit data externally.

The security scan results provided are for informational purposes only. No representation or warranty, express or implied, is made regarding the accuracy, completeness, or reliability of the security scan results provided. The scan results are not a substitute for comprehensive security assessments or ongoing monitoring. You remain solely responsible for implementing appropriate security measures and for any decisions made based on the scan results. The providers of the scan results disclaim all liability and responsibility for any loss, damage, or harm arising from reliance on the scan results, including but not limited to direct, indirect, incidental, or consequential damages.

How do I deploy a Model?

Deploying a model on a Dell Platform is a simple 4 steps process:

Choose Your Model: Select a model from the Dell Model Catalog.
Configure Inference Settings: In the Model Card, click Deploy, then select a compatible Dell Platform and the desired configuration.
Run Deployment Command: Copy the generated command, and run it inside the Dell infrastructure.
Test Your Model: Once the container is set up and endpoints are up and running, test your model with the provided sample code snippets.

If you want to deploy a fine-tuned model instead of the curated models above, refer to How can I deploy a fine-tuned model?

The Dell Enterprise Hub inference containers leverage Hugging Face ML production technologies, including Text Generation Inference for Large Language Models. The predefined configurations provided can be easily adjusted to fit your needs, by changing the default values for:

NUM_SHARD: The number of shards, or tensor parallelism, used for the model.
(optional) MAX_INPUT_LENGTH: The maximum input length that the model can handle.
(optional) MAX_TOTAL_TOKENS: The maximum total tokens the model can generate.
(optional) MAX_BATCH_PREFILL_TOKENS: The maximum number of tokens to prefill the batch used for continuous batching.

More information can be found in the Hugging Face Text Generation Inference documentation, and also in the Text Generation Inference v3 release notes.

How do I migrate to the new container system?

From 1st February 2026, Dell Enterprise Hub will only offer access to containers that do not contain model weights inside, in order to reduce the image size and better decouple model and environment lifecycle. This is the foundation to offer a better security and compliance service with Dell Enterprise Hub.

All new model images added from now on (since October 2025) to the Dell Enterprise Hub (i.e. those labeled as New in the catalog, such as ibm-granite/granite-4.0-h-micro or openai/gpt-oss-20b), will not contain pre-downloaded model weights in any of the offered Docker image tags.

As for all previous model images (i.e. those not labeled as New), you can expect no breaking changes when using latest image tag, since those still contain model weights. However, as mentioned before, latest versions of these images will be replaced from next February onwards, so we encourage users to transition to this new container system. To do so, images without weights are already available under specific version tags (e.g: registry.dell.huggingface.co/enterprise-dell-inference-google-gemma-3-12b-it:tgi-3.3.6). When using these images, model weights will be downloaded on container runtime, so make sure to include both HF_TOKEN and MODEL_ID when running your Docker container to both bypass the gating (if applicable) and ensure you don't get rate-limited when pulling the weights from the Hugging Face Hub.

Using pre-downloaded model weights

To prevent from pulling the model weights from the Hugging Face Hub on container startup, we strongly recommend pulling those locally via the huggingface_hub Python CLI into a local directory with the following command: hf download <MODEL_ID> --local-dir <DOWNLOAD_DIR>.

Docker

Once the model weights are downloaded locally, you can mount those in the container and set MODEL_ID environment variable pointing to that path, so that those are not pulled on every container startup.

docker run --gpus all --shm-size 1g -p 8080:80 \
  -v <DOWNLOAD_DIR>:/data \
  -e MODEL_ID=/data \
  registry.dell.huggingface.co/enterprise-dell-inference-<model>:<tag>

The docker run command above is just an example, what you should actually do is to copy the docker run command on the Dell Enterprise Hub (DEH) for a given model and then add the -v <DOWNLOAD_DIR>:/data and update the default value for the MODEL_ID environment variable to point to /data (or whatever path you chose within the container). Note that you should still respect and keep the rest of the configuration for the given model, meaning all the environment variables or container arguments that are defined in the original command.

Kubernetes

When working with Kubernetes, if you already downloaded and stored the model weights on a local path or in a shared network filesystem such as NFS, you can mount it into your Kubernetes pods, ensuring all nodes can access the directory (your storage must support at least ReadOnlyMany (ROX) access). To do so, you can use PersistentVolumes in your Kubernetes manifest.

This is an example of a Kubernetes manifest to deploy a model with pre-downloaded weights stored on an NFS server:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: model-weights-nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: <NFS_SERVER_IP_OR_HOST>
    path: <DOWNLOAD_DIR> # path where model weights have been downloaded
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-weights-pvc
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 100Gi
  volumeName: model-weights-nfs-pv # binds to the PV above
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: server
  template:
    metadata:
      labels:
        app: server
        hf.co/model: <MODEL_ID>
        hf.co/task: text-generation
    spec:
      containers:
        - name: container
          image: registry.dell.huggingface.co/enterprise-dell-inference-<model>:<tag>
          resources:
            limits:
              nvidia.com/gpu: 1
          env:
            - name: MODEL_ID
              value: "/mnt/data"
            - name: NUM_SHARD
              value: "1"
          volumeMounts:
            - name: model-weights
              mountPath: /mnt/data
              readOnly: true
            - name: dshm
              mountPath: /dev/shm
      volumes:
        - name: model-weights
          persistentVolumeClaim:
            claimName: model-weights-pvc
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi
      nodeSelector:
        nvidia.com/gpu.product: NVIDIA-H100-80GB-HBM3

Instead of using NFS, you can also use hostPath to mount a local path. In this case, make sure the model weights are available at the specified path on all nodes of your cluster. Then, just replace nfs section in the PersistentVolume definition with the following:

  hostPath:
    path: "<PATH>"
    type: Directory

Using model weights from Hugging Face cache

An alternative for downloading the model weights before running the container consists on re-using the default Hugging Face cache and mount it, so that the container points to the cache mounted path via the environment variable HF_HUB_CACHE. This way you can benefit from using your existing local Hugging Face cache for mounting those, whilst still keeping the MODEL_ID default value, given that Hugging Face will resolve the MODEL_ID to whatever is on the cache.

To download the model weights in the cache, you can either just run hf download <MODEL_ID> and those will automatically be downloaded in the cache (i.e. ~/.cache/huggingface/hub), or rather specigy the --cache-dir argument pointing to another directory to act as the Hugging Face cache. Note that the file structure and hierarchy is different when using the cache versus when just pulling locally via --local-dir as previously mentioned.

hf download <MODEL_ID> --cache-dir <DOWNLOAD_DIR>

This command will download the model weights into <DOWNLOAD_DIR>/models--<MODEL_ID>/snapshots/.... This is needed if we want the container to check if the specified MODEL_ID is already cached.

Kubernetes

You can update the previous Kubernetes manifest to use the Hugging Face cache as follows:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: model-weights-nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadOnlyMany
  nfs:
    server: <NFS_SERVER_IP_OR_HOST>
    path: <DOWNLOAD_DIR> # or ~/.cache/huggingface/hub if downloaded to de default cache using `hf download <MODEL_ID>`.
---
# ...
          env:
            - name: MODEL_ID
              value: "<MODEL_ID>"
            - name: HF_HUB_CACHE # no need to set it if mounting to the default cache path as shown below.
              value: "/mnt/data"
          volumeMounts:
            - name: model-weights
              mountPath: /mnt/data # or /root/.cache/huggingface/hub to mount to the default cache path.
              readOnly: true
# ...

Docker

You can update and run your Docker container as follows:

docker run --gpus all --shm-size 1g -p 8080:80 \
  -v <DOWNLOAD_DIR>:/data/ \
  -e HF_HUB_CACHE=/data/ \
  -e MODEL_ID=<MODEL_ID> \
  registry.dell.huggingface.co/enterprise-dell-inference-<model>:<tag>

How can I fine-tune a Model?

To start training one of the models available in the Dell Model Catalog, please follow the following steps:

Select Base Model: Start by choosing a trainable model in the Model Catalog. Currently the following models are available for training:
- Meta-Llama-3.1-8B
- Meta-Llama-3.1-70B
Configure Training Settings: From the Model Card, click Train, then select the Dell Platform you want to use. Next, set the local path of the CSV training dataset file, and the path to store the fine-tuned model. Learn below how to format and prepare your dataset at how should my dataset look. Finally, adjust the training configuration default settings to match your requirements.
Deploy Training Container: With Dell Enterprise Hub, model training jobs are configured within ready-to-use, optimized training containers. You can run your training job by deploying the container using the provided command, executed within your Dell environment.
Monitor Training Job: Track the progress of your training job to ensure optimal performance and results.

Training containers leverage Hugging Face autotrain, a powerful tool that simplifies the process of model training. Hugging Face autotrain supports a variety of configurations to customize training jobs, including:

lr: Initial learning rate for the training.
epochs: The number of training epochs.
batch_size: Size of the batches used during training.

More details on these configurations can be found in the Autotrain CLI documentation.

How should my dataset look?

To finetune LLMs your dataset should have a column with the formatted training samples. The column used for training is defined through the text-column argument when starting your training, below it would be text.

Example Format:

text
human: hello \n bot: hi nice to meet you
human: how are you \n bot: I am fine
human: What is your name? \n bot: My name is Mary
human: Which is the best programming language? \n bot: Python

You can use both CSV and JSONL files. For more details, refer to the original documentation.

How can I deploy a fine-tuned model?

To deploy a fine-tuned model on your Dell Platform, you can use the special "Bring Your Own Model" (BYOM) Dell inference container available in the Dell Enterprise Hub. This makes it easy to integrate fine-tuned models seamlessly into your Dell environment.

Select Base Model: In the Model Catalog, open the Model Card for the base model used for fine-tuning, then click "Deploy Fine-Tuned Model" to access the BYOM feature.
Configure Inference Settings: Select the Dell Platform you want to use, and the configuration options. Make sure to correctly set the Path to the local directory where your fine-tuned model is stored.
Run Deployment Command: Copy the generated command, and run it inside your Dell environment.
Test Your Model: Once the BYOM container is set up and endpoints are up and running, test your model with the provided sample code snippets.

Unlike direct deployment of models provided in the Dell Model Catalog, when you deploy a fine-tuned model, the model is mounted to the BYOM Dell inference container. It's important to make sure that the mounted directory contains the fine-tuned model and the provided path is correct.

Hardware Requirements

Gemma

For models fine-tuned from the Gemma base model, the following hardware configurations are recommended for deployment:

Dell Platforms	Number of Shards (GPUs)	Max Input Tokens	Max Total Tokens	Max Batch Prefill Tokens
xe9680-nvidia-h100	1	4000	4096	16182
xe9680-amd-mi300x	1	4000	4096	16182
xe8640-nvidia-h100	1	4000	4096	16182
r760xa-nvidia-h100	1	4000	4096	16182
r760xa-nvidia-l40s	2	4000	4096	8192
r760xa-nvidia-l40s	4	4000	4096	16182

Llama 3.1 8B

For models fine-tuned from the Llama 3.1 8B base model, the following SKUs are suitable:

Dell Platforms	Number of Shards (GPUs)	Max Input Tokens	Max Total Tokens	Max Batch Prefill Tokens
xe9680-nvidia-h100	1	8000	8192	32768
xe9680-amd-mi300x	1	8000	8192	32768
xe8640-nvidia-h100	1	8000	8192	32768
r760xa-nvidia-h100	1	4000	4096	16182
r760xa-nvidia-l40s	2	8000	8192	16182
r760xa-nvidia-l40s	4	8000	8192	32768

Llama 3.1 70B

For models fine-tuned from the Llama 3.1 70B base model, use these configurations for deployment:

Dell Platforms	Number of Shards (GPUs)	Max Input Tokens	Max Total Tokens	Max Batch Prefill Tokens
xe9680-nvidia-h100	4	8000	8192	16182
xe9680-nvidia-h100	8	8000	8192	16182
xe9680-amd-mi300x	4	8000	8192	16182
xe9680-amd-mi300x	8	8000	8192	16182
xe8640-nvidia-h100	4	8000	8192	8192

Mistral 7B

Hardware configurations for models fine-tuned from the Mistral 7B are as follows:

Dell Platforms	Number of Shards (GPUs)	Max Input Tokens	Max Total Tokens	Max Batch Prefill Tokens
xe9680-nvidia-h100	1	8000	8192	32768
xe9680-amd-mi300x	1	8000	8192	32768
xe8640-nvidia-h100	1	8000	8192	32768
r760xa-nvidia-h100	1	4000	4096	16182
r760xa-nvidia-l40s	2	8000	8192	16182
r760xa-nvidia-l40s	4	8000	8192	32768

Mixtral 8x7B

For models fine-tuned from the Mixtral base model, the deployment configurations are:

Dell Platforms	Number of Shards (GPUs)	Max Input Tokens	Max Total Tokens	Max Batch Prefill Tokens
xe9680-nvidia-h100	4	8000	8192	16182
xe9680-nvidia-h100	8	8000	8192	16182
xe9680-amd-mi300x	4	8000	8192	16182
xe9680-amd-mi300x	8	8000	8192	16182
xe8640-nvidia-h100	4	8000	8192	8192
r760xa-nvidia-h100	4	8000	8192	16182

What does deprecated mean?

A deprecated model status indicates the model is no longer actively maintained, but remains fully functional for inference and fine-tuning (if applicable), whilst it will no longer receive updates, or regular maintenance.

Deprecation typically occurs due to low usage metrics, customers can always continue using deprecated models, but we recommend migrating to actively maintained alternatives e.g. Meta Llama 3.1 may be deprecated, but Meta Llama 3.3 is available.

If we stop seeing usage from deprecated models for multiple months, we then remove them. For reference, here is the list of models we have removed from the catalog so far:

Model Name	Removal Date
Meta-Llama-3-70b	2025-10-05
google/gemma-7b (train only)	2025-10-05
mistralai/Mixtral-8x7B-v0.1 (train only)	2025-10-05
mistralai/Mistral-7B-v0.1 (train only)	2025-10-05
HuggingFaceH4/zephyr-7b-beta	2025-10-05
meta-llama/Meta-Llama-3.1-405B-Instruct	2025-10-05
meta-llama/meta-llama-3-70b-instruct	2025-10-05
google/paligemma2-3b-mix-448	2025-10-05
google/paligemma2-10b-mix-448	2025-10-05
google/paligemma2-28b-mix-448	2025-10-05

Why do I need to provide a HF Token?

Providing a Hugging Face token is the best practice to make sure your calls to the Hugging Face Hub are authenticated. Also, it lets you benefit from higher rate limits.

What is a gated model?

To give more control over how models are used, the Hub allows model authors to enable access requests for their models. A model with access requests enabled is called a gated model.

As a user, if you want to use a gated model, you will need to request access to it. This means that you must be logged in to a Hugging Face user account. Requesting access can only be done from your browser. Go to the model on the Hub and request access to the model provider.

Resources

Hugging Face Enterprise Hub documentation
Hugging Face Text Generation Inference documentation
Hugging Face AutoTrain documentation
Hugging Face Transformers documentation
vLLM Documentation