Instructions to use meta-llama/Meta-Llama-3-70B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meta-llama/Meta-Llama-3-70B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-70B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-70B-Instruct") model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-70B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use meta-llama/Meta-Llama-3-70B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "meta-llama/Meta-Llama-3-70B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/meta-llama/Meta-Llama-3-70B-Instruct
- SGLang
How to use meta-llama/Meta-Llama-3-70B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "meta-llama/Meta-Llama-3-70B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "meta-llama/Meta-Llama-3-70B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Meta-Llama-3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use meta-llama/Meta-Llama-3-70B-Instruct with Docker Model Runner:
docker model run hf.co/meta-llama/Meta-Llama-3-70B-Instruct
requires Pro
getting this message when doing inferenceServer meta-llama/Meta-Llama-3-70B-Instruct does not seem to support chat completion. Error: Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query.
just yesterday i was testing and it seemed to work (serverless API)
I have the same problem too.
Token is valid (permission: fineGrained).
Your token has been saved in your configured git credential helpers (manager).
Your token has been saved to C:\Users\Administrator.cache\huggingface\token
Login successful
Server /static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fmeta-llama%2FMeta-Llama-3-70B-Instruct%2Fv1%2Fchat%2Fcompletions%3C%2Fa%3E does not seem to support chat completion. Falling back to text generation. Error: (Request ID: k3O10Rur0ON-c9TLe0lGa)
Bad request: The above exception was the direct cause of the following exception: Traceback (most recent call last): Bad request: During handling of the above exception, another exception occurred: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): Bad request:
Authorization header is correct, but the token seems invalid
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\utils_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "C:\ProgramData\miniconda3\envs\cuda_env\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: /static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fmeta-llama%2FMeta-Llama-3-70B-Instruct%2Fv1%2Fchat%2Fcompletions%3C%2Fa%3E%3C%2Fp%3E
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 706, in chat_completion
data = self.post(
^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 273, in post
hf_raise_for_status(response)
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\utils_errors.py", line 358, in hf_raise_for_status
raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError: (Request ID: k3O10Rur0ON-c9TLe0lGa)
Authorization header is correct, but the token seems invalid
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\utils_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "C:\ProgramData\miniconda3\envs\cuda_env\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: /static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fmeta-llama%2FMeta-Llama-3-70B-Instruct%3C%2Fa%3E%3C%2Fp%3E
File "E:\Python\T5\T5Agent.py", line 37, in
response_content = llm_engine(messages)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\transformers\agents\llm_engine.py", line 85, in call
response = self.client.chat_completion(messages, stop=stop_sequences, max_tokens=1500)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 738, in chat_completion
return self.chat_completion(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 770, in chat_completion
text_generation_output = self.text_generation(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 2061, in text_generation
raise_text_generation_error(e)
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_common.py", line 460, in raise_text_generation_error
raise http_error
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 2032, in text_generation
bytes_output = self.post(json=payload, model=model, task="text-generation", stream=stream) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\inference_client.py", line 273, in post
hf_raise_for_status(response)
File "C:\Users\Administrator\AppData\Roaming\Python\Python312\site-packages\huggingface_hub\utils_errors.py", line 358, in hf_raise_for_status
raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError: (Request ID: vUBetFgYdy5jSneTSQgy4)
Authorization header is correct, but the token seems invalid
is this for running the model on the API, or is there a subscription required to run it locally?
i get this error even after i purchase pro membership which is $9 per month. Do i need to get different license for Serverless API?