model: ministral w/ llama4 scaling (#13292)
This change:
- fixes rope scaling in the mistral converter
- updates ministral to include llama4 scaling
- includes a new ministral parser for parsing reasoning and tool calling
Co-authored-by: jmorganca jmorganca@gmail.com
Ollama
Get up and running with large language models.
macOS
Download
Windows
Download
Linux
Manual install instructions
Docker
The official Ollama Docker image
ollama/ollamais available on Docker Hub.Libraries
Community
Quickstart
To run and chat with Gemma 3:
Model library
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
ollama run gemma3:1bollama run gemma3ollama run gemma3:12bollama run gemma3:27bollama run qwqollama run deepseek-r1ollama run deepseek-r1:671bollama run llama4:scoutollama run llama4:maverickollama run llama3.3ollama run llama3.2ollama run llama3.2:1bollama run llama3.2-visionollama run llama3.2-vision:90bollama run llama3.1ollama run llama3.1:405bollama run phi4ollama run phi4-miniollama run mistralollama run moondreamollama run neural-chatollama run starling-lmollama run codellamaollama run llama2-uncensoredollama run llavaollama run granite3.3Customize a model
Import from GGUF
Ollama supports importing GGUF models in the Modelfile:
Create a file named
Modelfile, with aFROMinstruction with the local filepath to the model you want to import.Create the model in Ollama
Run the model
Import from Safetensors
See the guide on importing models for more information.
Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to customize the
llama3.2model:Create a
Modelfile:Next, create and run the model:
For more information on working with a Modelfile, see the Modelfile documentation.
CLI Reference
Create a model
ollama createis used to create a model from a Modelfile.Pull a model
Remove a model
Copy a model
Multiline input
For multiline input, you can wrap text with
""":Multimodal models
Pass the prompt as an argument
Show model information
List models on your computer
List which models are currently loaded
Stop a model which is currently running
Generate embeddings from the CLI
You can also pipe text for scripted workflows:
Start Ollama
ollama serveis used when you want to start ollama without running the desktop application.Building
See the developer guide
Running local builds
Next, start the server:
Finally, in a separate shell, run a model:
REST API
Ollama has a REST API for running and managing models.
Generate a response
Chat with a model
See the API documentation for all endpoints.
Community Integrations
Web & Desktop
Cloud
Tutorial
Terminal
Apple Vision Pro
Database
Package managers
Libraries
Mobile
Extensions & Plugins
Supported backends
Observability
Security