Large Language Models (LLMs) are becoming increasingly accessible, and platforms like Ollama make it easier than ever to run these powerful models locally. Combining Ollama with Docker provides a clean and portable way to manage your LLM environment, and using an API client like Bruno allows you to interact with your running models effortlessly.
This guide will walk you through setting up Ollama within Docker and using Bruno to send requests to your local LLM.
Docker offers several benefits when running Ollama:
1. Isolation: Keeps your Ollama installation and its dependencies separate from your host system, preventing conflicts.
2. Portability: Easily move your Ollama environment to different machines.
3. Consistency: Ensures your Ollama setup is the same every time you run it.
4. Dependency Management: Simplifies managing the dependencies required by Ollama.
Bruno is an open-source, Git-friendly, and popular API client that provides a clean and intuitive way to design and test APIs. It uses a plain-text markup language (bru) for storing API collections, making them easily versionable and shareable. For interacting with Ollama's API, Bruno offers a user-friendly interface to construct and send requests.
Before you begin, make sure you have the following installed:
1. Docker
2. Bruno
Ollama provides official Docker images, making it straightforward to get started.
1. Pull the Ollama Docker Image: Open your terminal and run the following command:
docker pull ollama/ollama
2. Run the Ollama Container: Now, start the Ollama container. We'll map port 11434 on your host machine to the same port in the container, as this is the default port Ollama listens on. We'll also use a named volume to persist the models you download.
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull tinyllama
This will download the llama model into the persistent volume.
{
"model": "tinyllama",
"prompt": "Explain about Bruno the api client?",
"stream": false
}
5. Send the Request: Click the "Send" button.
You should see the response from the tinyllama model in the "Response" panel of Bruno.
Ollama's API has other useful endpoints you can interact with using Bruno:
The JSON body for this endpoint is more complex and involves an array of messages. Refer to the Ollama API documentation for the exact structure.
Running Ollama in Docker provides a robust and maintainable environment for your local LLMs. Coupled with Bruno's intuitive API client, you have a powerful setup for experimenting with and integrating LLMs into your workflows. This approach allows you to easily manage different models, test prompts, and build applications that leverage the capabilities of these powerful language models.
Learn more about Bruno: