Thank you for developing with Llama models. As part of the Llama 3.1 release, we’ve consolidated GitHub repos and added some additional repos as we’ve expanded Llama’s functionality into being an e2e Llama Stack. Please use the following repos going forward:
llama-models - Central repo for the foundation models including basic utilities, model cards, license and use policies
PurpleLlama - Key component of Llama Stack focusing on safety risks and inference time mitigations
llama-toolchain - Model development (inference/fine-tuning/safety shields/synthetic data generation) interfaces and canonical implementations
llama-agentic-system - E2E standalone Llama Stack system, along with opinionated underlying interface, that enables creation of agentic applications
llama-cookbook - Community driven scripts and integrations
If you have any questions, please feel free to file an issue on any of the above repos and we will do our best to respond in a timely manner.
Thank you!
(Deprecated) Meta Llama 3
We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.
This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.
This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-cookbook.
Download
To download the model weights and tokenizer, please visit the Meta Llama website and accept our License.
Once your request is approved, you will receive a signed URL over email. Then, run the download.sh script, passing the URL provided when prompted to start the download.
Pre-requisites: Ensure you have wget and md5sum installed. Then run the script: ./download.sh.
Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as 403: Forbidden.
Access to Hugging Face
We also provide downloads on Hugging Face, in both transformers and native llama3 formats. To download the weights from Hugging Face, please follow these steps:
Read and accept the license. Once your request is approved, you’ll be granted access to all the Llama 3 models. Note that requests used to take up to one hour to get processed.
To download the original native weights to use with this repo, click on the “Files and versions” tab and download the contents of the original folder. You can also download them from the command line if you pip install huggingface-hub:
You can follow the steps below to get up and running with Llama 3 models quickly. These steps will let you run quick inference locally. For more examples, see the Llama Cookbook repository.
Clone and download this repository in a conda env with PyTorch / CUDA.
Replace Meta-Llama-3-8B-Instruct/ with the path to your checkpoint directory and Meta-Llama-3-8B-Instruct/tokenizer.model with the path to your tokenizer model.
The –nproc_per_node should be set to the MP value for the model you are using.
Adjust the max_seq_len and max_batch_size parameters as needed.
This example runs the example_chat_completion.py found in this repository, but you can change that to a different .py file.
Inference
Different models require different model-parallel (MP) values:
Model
MP
8B
1
70B
8
All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
Pretrained Models
These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.
See example_text_completion.py for some examples. To illustrate, see the command below to run it with the llama-3-8b model (nproc_per_node needs to be set to the MP value):
The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, specific formatting defined in ChatFormat
needs to be followed: The prompt begins with a <|begin_of_text|> special token, after which one or more messages follow. Each message starts with the <|start_header_id|> tag, the role system, user or assistant, and the <|end_header_id|> tag. After a double newline \n\n, the message’s contents follow. The end of each message is marked by the <|eot_id|> token.
You can also deploy additional classifiers to filter out inputs and outputs that are deemed unsafe. See the llama-cookbook repo for an example of how to add a safety checker to the inputs and outputs of your inference code.
Llama 3 is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios.
To help developers address these risks, we have created the Responsible Use Guide.
Issues
Please report any software “bug” or other problems with the models through one of the following means:
Our model and weights are licensed for researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals and industry through this opportunity while fostering an environment of discovery and ethical AI advancements.
🤗 Models on Hugging Face  | Blog  | Website  | Get Started 
Note of deprecation
Thank you for developing with Llama models. As part of the Llama 3.1 release, we’ve consolidated GitHub repos and added some additional repos as we’ve expanded Llama’s functionality into being an e2e Llama Stack. Please use the following repos going forward:
If you have any questions, please feel free to file an issue on any of the above repos and we will do our best to respond in a timely manner.
Thank you!
(Deprecated) Meta Llama 3
We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.
This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.
This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-cookbook.
Download
To download the model weights and tokenizer, please visit the Meta Llama website and accept our License.
Once your request is approved, you will receive a signed URL over email. Then, run the download.sh script, passing the URL provided when prompted to start the download.
Pre-requisites: Ensure you have
wgetandmd5suminstalled. Then run the script:./download.sh.Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as
403: Forbidden.Access to Hugging Face
We also provide downloads on Hugging Face, in both transformers and native
llama3formats. To download the weights from Hugging Face, please follow these steps:originalfolder. You can also download them from the command line if youpip install huggingface-hub:To use with transformers, the following pipeline snippet will download and cache the weights:
Quick Start
You can follow the steps below to get up and running with Llama 3 models quickly. These steps will let you run quick inference locally. For more examples, see the Llama Cookbook repository.
Clone and download this repository in a conda env with PyTorch / CUDA.
In the top-level directory run:
Visit the Meta Llama website and register to download the model/s.
Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.
Once you get the email, navigate to your downloaded llama repository and run the download.sh script.
Once the model/s you want have been downloaded, you can run the model locally using the command below:
Note
Meta-Llama-3-8B-Instruct/with the path to your checkpoint directory andMeta-Llama-3-8B-Instruct/tokenizer.modelwith the path to your tokenizer model.–nproc_per_nodeshould be set to the MP value for the model you are using.max_seq_lenandmax_batch_sizeparameters as needed.Inference
Different models require different model-parallel (MP) values:
All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to
max_seq_lenandmax_batch_sizevalues. So set those according to your hardware.Pretrained Models
These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.
See
example_text_completion.pyfor some examples. To illustrate, see the command below to run it with the llama-3-8b model (nproc_per_nodeneeds to be set to theMPvalue):Instruction-tuned Models
The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, specific formatting defined in
ChatFormatneeds to be followed: The prompt begins with a<|begin_of_text|>special token, after which one or more messages follow. Each message starts with the<|start_header_id|>tag, the rolesystem,userorassistant, and the<|end_header_id|>tag. After a double newline\n\n, the message’s contents follow. The end of each message is marked by the<|eot_id|>token.You can also deploy additional classifiers to filter out inputs and outputs that are deemed unsafe. See the llama-cookbook repo for an example of how to add a safety checker to the inputs and outputs of your inference code.
Examples using llama-3-8b-chat:
Llama 3 is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. To help developers address these risks, we have created the Responsible Use Guide.
Issues
Please report any software “bug” or other problems with the models through one of the following means:
Model Card
See MODEL_CARD.md.
License
Our model and weights are licensed for researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals and industry through this opportunity while fostering an environment of discovery and ethical AI advancements.
See the LICENSE file, as well as our accompanying Acceptable Use Policy
Questions
For common questions, the FAQ can be found here, which will be updated over time as new questions arise.