Ollama Local Hosting — CLI + Desktop Guide

Index

Overview
Setup
Beginner usage
Pro usage
Cost savings guide
Privacy guide
Security guide
Appendix
Troubleshooting quick hits

Overview

When to use this combined guide.

You want one local-only Ollama reference instead of separate desktop and terminal docs.
You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
You want a single workflow for hosting models that local agents, editors, and scripts can reuse.

Mental model.

The desktop app is the easiest way to keep Ollama running in the background.
The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
The local API is the shared interface your other tools use.

Setup

1) Install Ollama

Install the Ollama desktop app or your platform's preferred package.
Verify the install:

bash

ollama --version

2) Choose how you run it

Pick one of these normal local patterns:

Desktop-first: open the Ollama app and let it keep the local server running in the background.
CLI-first: run ollama serve yourself in a terminal or through a service manager.

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

This is the simplest setup if you want Ollama always available but still want terminal control.

Daily CLI workflow

Use the terminal when you want explicit control over everything:

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps

This is the better fit for headless machines, scripted workflows, and debugging.

Model lifecycle

bash

ollama ls
ollama ps
ollama pull 
ollama show 
ollama stop 
ollama rm

Typical pattern:

ollama ls to see what is installed
ollama ps to see what is loaded in memory
ollama show to inspect size and family details
ollama stop to unload a model
ollama rm to reclaim disk space

Custom models with Modelfile

Use a Modelfile when you want a reusable local model preset:

text

FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.

Build and run it:

bash

ollama create my-local-coder -f Modelfile
ollama run my-local-coder

OpenAI-compatible clients

Many local tools can connect through Ollama's OpenAI-compatible endpoint:

Base URL: http://localhost:11434/v1/
API key: often required by the client, but ignored by Ollama
Model: the exact local model name you installed

Example:

bash

export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama

Background service patterns

macOS/Windows: the desktop app is usually the easiest background service.
Linux: a system service is usually cleaner than a long-lived shell.

Linux example:

bash

sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f

Docker hosting

Use Docker when you want isolation or a predictable local deployment:

bash

docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b

LAN access use sparingly

The safe default is local-only on 127.0.0.1:11434.

If you intentionally need LAN access:

bash

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Only do this on trusted networks, and preferably behind a proxy with auth.

Cost savings guide

Use smaller local models for routine tasks and larger ones only when needed.
Pre-pull the models you use often so you avoid cold-start delays.
Run one heavy model at a time on constrained machines.
Reuse one local Ollama host across your agents, editors, and scripts.

Privacy guide

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

Ollama Local Hosting — CLI + Desktop Guide

Overview

When to use this combined guide.

You want one local-only Ollama reference instead of separate desktop and terminal docs.
You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
You want a single workflow for hosting models that local agents, editors, and scripts can reuse.

Mental model.

The desktop app is the easiest way to keep Ollama running in the background.
The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
The local API is the shared interface your other tools use.

Setup

1) Install Ollama

Install the Ollama desktop app or your platform's preferred package.
Verify the install:

bash

ollama --version

2) Choose how you run it

Pick one of these normal local patterns:

Desktop-first: open the Ollama app and let it keep the local server running in the background.
CLI-first: run ollama serve yourself in a terminal or through a service manager.

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

This is the simplest setup if you want Ollama always available but still want terminal control.

Daily CLI workflow

Use the terminal when you want explicit control over everything:

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps

This is the better fit for headless machines, scripted workflows, and debugging.

Model lifecycle

bash

ollama ls
ollama ps
ollama pull 
ollama show 
ollama stop 
ollama rm

Typical pattern:

ollama ls to see what is installed
ollama ps to see what is loaded in memory
ollama show to inspect size and family details
ollama stop to unload a model
ollama rm to reclaim disk space

Custom models with Modelfile

Use a Modelfile when you want a reusable local model preset:

text

FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.

Build and run it:

bash

ollama create my-local-coder -f Modelfile
ollama run my-local-coder

OpenAI-compatible clients

Many local tools can connect through Ollama's OpenAI-compatible endpoint:

Base URL: http://localhost:11434/v1/
API key: often required by the client, but ignored by Ollama
Model: the exact local model name you installed

Example:

bash

export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama

Background service patterns

macOS/Windows: the desktop app is usually the easiest background service.
Linux: a system service is usually cleaner than a long-lived shell.

Linux example:

bash

sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f

Docker hosting

Use Docker when you want isolation or a predictable local deployment:

bash

docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b

LAN access use sparingly

The safe default is local-only on 127.0.0.1:11434.

If you intentionally need LAN access:

bash

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Only do this on trusted networks, and preferably behind a proxy with auth.

Cost savings guide

Use smaller local models for routine tasks and larger ones only when needed.
Pre-pull the models you use often so you avoid cold-start delays.
Run one heavy model at a time on constrained machines.
Reuse one local Ollama host across your agents, editors, and scripts.

Privacy guide

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

Ollama Local Hosting — CLI + Desktop Guide

Index

Overview
Setup
Beginner usage
Pro usage
Cost savings guide
Privacy guide
Security guide
Appendix
Troubleshooting quick hits

Overview

When to use this combined guide.

You want one local-only Ollama reference instead of separate desktop and terminal docs.
You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
You want a single workflow for hosting models that local agents, editors, and scripts can reuse.

Mental model.

The desktop app is the easiest way to keep Ollama running in the background.
The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
The local API is the shared interface your other tools use.

Setup

1) Install Ollama

Install the Ollama desktop app or your platform's preferred package.
Verify the install:

bash

ollama --version

2) Choose how you run it

Pick one of these normal local patterns:

Desktop-first: open the Ollama app and let it keep the local server running in the background.
CLI-first: run ollama serve yourself in a terminal or through a service manager.

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

This is the simplest setup if you want Ollama always available but still want terminal control.

Daily CLI workflow

Use the terminal when you want explicit control over everything:

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps

This is the better fit for headless machines, scripted workflows, and debugging.

Model lifecycle

bash

ollama ls
ollama ps
ollama pull 
ollama show 
ollama stop 
ollama rm

Typical pattern:

ollama ls to see what is installed
ollama ps to see what is loaded in memory
ollama show to inspect size and family details
ollama stop to unload a model
ollama rm to reclaim disk space

Custom models with Modelfile

Use a Modelfile when you want a reusable local model preset:

text

FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.

Build and run it:

bash

ollama create my-local-coder -f Modelfile
ollama run my-local-coder

OpenAI-compatible clients

Many local tools can connect through Ollama's OpenAI-compatible endpoint:

Base URL: http://localhost:11434/v1/
API key: often required by the client, but ignored by Ollama
Model: the exact local model name you installed

Example:

bash

export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama

Background service patterns

macOS/Windows: the desktop app is usually the easiest background service.
Linux: a system service is usually cleaner than a long-lived shell.

Linux example:

bash

sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f

Docker hosting

Use Docker when you want isolation or a predictable local deployment:

bash

docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b

LAN access use sparingly

The safe default is local-only on 127.0.0.1:11434.

If you intentionally need LAN access:

bash

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Only do this on trusted networks, and preferably behind a proxy with auth.

Cost savings guide

Use smaller local models for routine tasks and larger ones only when needed.
Pre-pull the models you use often so you avoid cold-start delays.
Run one heavy model at a time on constrained machines.
Reuse one local Ollama host across your agents, editors, and scripts.

Privacy guide

For direct terminal hosting:

bash

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

bash

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

bash

curl http://localhost:11434/api/version

Check inference:

bash

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

bash

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

bash

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

Ollama AI Guides

Topics

Ollama Local Hosting — CLI + Desktop Guide

Index

Overview

Setup

1) Install Ollama

2) Choose how you run it

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Daily CLI workflow

Model lifecycle

Custom models with Modelfile

OpenAI-compatible clients

Background service patterns

Docker hosting

LAN access use sparingly

Cost savings guide

Privacy guide

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Ollama Local Hosting — CLI + Desktop Guide

Index

Overview

Setup

1) Install Ollama

2) Choose how you run it

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Daily CLI workflow

Model lifecycle

Custom models with Modelfile

OpenAI-compatible clients

Background service patterns

Docker hosting

LAN access use sparingly

Cost savings guide

Privacy guide

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Ollama Local Hosting — CLI + Desktop Guide

Index

Overview

Setup

1) Install Ollama

2) Choose how you run it

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Daily CLI workflow

Model lifecycle

Custom models with Modelfile

OpenAI-compatible clients

Background service patterns

Docker hosting

LAN access use sparingly

Cost savings guide

Privacy guide

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow