This is a single consolidated guide for running local models with Ollama and connecting tools and agents (especially Goose) to your local Ollama server. It merges the core Desktop, CLI, and Goose workflows into one practical "happy path + operations" reference.

When to use this combined guide.
Mental model.
ollama --version
Pick one of these normal local patterns:
ollama serve yourself in a terminal or through a service manager.For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pull downloads the model without opening a chat.ollama run downloads it if needed and starts an interactive local session.Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
ollama serve.ollama pull qwen3.5:3bollama run qwen3.5:3bcurl http://localhost:11434/api/versionhttp://localhost:11434 or http://localhost:11434/v1qwen3.5:3bFast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
This is the simplest setup if you want Ollama always available but still want terminal control.
Use the terminal when you want explicit control over everything:
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps
This is the better fit for headless machines, scripted workflows, and debugging.
ollama ls
ollama ps
ollama pull
ollama show
ollama stop
ollama rm
Typical pattern:
ollama ls to see what is installedollama ps to see what is loaded in memoryollama show to inspect size and family detailsollama stop to unload a modelollama rm to reclaim disk spaceUse a Modelfile when you want a reusable local model preset:
FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.
Build and run it:
ollama create my-local-coder -f Modelfile
ollama run my-local-coder
Many local tools can connect through Ollama's OpenAI-compatible endpoint:
http://localhost:11434/v1/Example:
export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama
Linux example:
sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f
Use Docker when you want isolation or a predictable local deployment:
docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b
The safe default is local-only on 127.0.0.1:11434.
If you intentionally need LAN access:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Only do this on trusted networks, and preferably behind a proxy with auth.
For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pull downloads the model without opening a chat.ollama run downloads it if needed and starts an interactive local session.Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
ollama serve.ollama pull qwen3.5:3bollama run qwen3.5:3bcurl http://localhost:11434/api/versionhttp://localhost:11434 or http://localhost:11434/v1qwen3.5:3bFast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
When to use this combined guide.
Mental model.
ollama --version
Pick one of these normal local patterns:
ollama serve yourself in a terminal or through a service manager.For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pull downloads the model without opening a chat.ollama run downloads it if needed and starts an interactive local session.Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
ollama serve.ollama pull qwen3.5:3bollama run qwen3.5:3bcurl http://localhost:11434/api/versionhttp://localhost:11434 or http://localhost:11434/v1qwen3.5:3bFast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
This is the simplest setup if you want Ollama always available but still want terminal control.
Use the terminal when you want explicit control over everything:
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps
This is the better fit for headless machines, scripted workflows, and debugging.
ollama ls
ollama ps
ollama pull
ollama show
ollama stop
ollama rm
Typical pattern:
ollama ls to see what is installedollama ps to see what is loaded in memoryollama show to inspect size and family detailsollama stop to unload a modelollama rm to reclaim disk spaceUse a Modelfile when you want a reusable local model preset:
FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.
Build and run it:
ollama create my-local-coder -f Modelfile
ollama run my-local-coder
Many local tools can connect through Ollama's OpenAI-compatible endpoint:
http://localhost:11434/v1/Example:
export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama
Linux example:
sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f
Use Docker when you want isolation or a predictable local deployment:
docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b
The safe default is local-only on 127.0.0.1:11434.
If you intentionally need LAN access:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Only do this on trusted networks, and preferably behind a proxy with auth.
For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pull downloads the model without opening a chat.ollama run downloads it if needed and starts an interactive local session.Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
ollama serve.ollama pull qwen3.5:3bollama run qwen3.5:3bcurl http://localhost:11434/api/versionhttp://localhost:11434 or http://localhost:11434/v1qwen3.5:3bFast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b