Building a Fully Local, Privacy First AI Chat With Ollama and Open WebUI

Building a Fully Local, Privacy-First AI Chat with Ollama and Open WebUI

Wed, 14 May 2025

Sometimes I want full control, not most, but total, end-to-end control over the tools I use to think, build, and create.

That’s where Ollama and Open WebUI comes in. This is my go-to local AI chat setup: a fully local, OpenAI-style interface that runs quietly on your machine without whispering a word to the cloud.

Borrowed UI demo of Open WebUI

What I Use This For

This isn’t just an academic exercise in privacy. I use this:

To draft content (like this post)
To lint, test, and document code
For talking with my documents
As a scratchpad for ideas, code snippets, and architectural patterns

It feels like having a partner on hand who never leaves the room. It’s fast, runs on my hardware, and doesn’t phone home. That last part matters more and more.

Getting the Stack Running

You’ll need two components:

Ollama: The engine that runs LLMs locally
Open WebUI: The polished, responsive interface for chatting with them

This stack works beautifully on macOS, Linux, and even Windows via WSL2.

Requirements

8GB RAM minimum, 16GB+ ideal
Decent CPU or Apple Silicon / GPU for acceleration
Basic terminal literacy
WebSocket support is required for Open WebUI to function correctly

Step 1: Install Ollama

Ollama handles downloading and running models locally. It’s shockingly simple.

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Once installed:

ollama run llama3

This kicks off the download and drops you into an interactive prompt.

Step 2: Set Up Open WebUI

If Ollama is the engine, Open WebUI is the race car. It provides the clean interface that makes interacting with your models feel familiar chat interface, but without the data trail.

Recommended: Docker-Based Installation

This is the approach suggested by the Open WebUI maintainers and the one I’ve come to rely on for clean, repeatable setup.

1. Pull the Docker image

docker pull ghcr.io/open-webui/open-webui:main

2. Run the container

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name openwebui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

This spins up Open WebUI and binds it to localhost:3000. The volume ensures your settings and chats persist.

3. Open your browser

http://localhost:3000

Create your admin account on first launch. Done.

Optional Configs

Enable GPU Support (if you’ve got an NVIDIA card):

docker run -d -p 3000:8080 \
  --gpus all \
  -v open-webui:/app/backend/data \
  --name openwebui \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

Disable Auth for Single-User Mode (great for personal rigs):

docker run -d -p 3000:8080 \
  -e WEBUI_AUTH=False \
  -v open-webui:/app/backend/data \
  --name openwebui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Connect to Remote Ollama:

docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://<ollama-host>:11434 \
  -v open-webui:/app/backend/data \
  --name openwebui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Replace <ollama-host> with your actual host IP or Docker bridge name.

Want to go further? You can wire this into a Docker Compose setup or reverse proxy it through Nginx. But this will get you up and running fast.

Pulling New Models

Want a coding-specific model? One tuned for reasoning? Just pull another:

ollama run codellama
ollama run mistral
ollama run gemma
ollama run your-mom
....

Note: Available models can be found here.

Open WebUI detects the models and lets you switch between them via the dropdown in the sidebar.

A Few Use Cases I Keep Coming Back To

Code Review Companion

Drop in a code snippet and ask for clarity, edge case coverage, or performance optimizations. I’ve had it generate tests that caught things I missed.

Shell Task Generator

Ask for scripts. Really. It’s great at wiring together bash, systemctl, ufw, … well you name it.

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3", "prompt": "Write a cron job to clean /tmp every hour."}'

Documentation Generator

Paste your raw TypeScript and let it create JSDoc, full markdown doc pages, or even API reference notes.

Commit Linting Assistant

Hook it into a Git commit hook to validate messages against conventional rules or your custom format.

Tips From Real Use

Prefer smaller quantized models like llama3:8b-q4_0 for performance
Use ollama list to inspect installed models
Clear up old models with ollama rm model-name

And if you’re paranoid like I am:

sudo ufw deny out to any from 127.0.0.1 port 11434

That’ll make sure nothing ever leaks even if the stack gets curious.

Why I Stick With It

I iterate faster with this setup than I ever did using cloud tools. There’s something freeing about having an LLM on tap with no rate limits, no UI latency, and no worry that a half-baked idea is being logged somewhere for future training fodder.

I can throw ideas at it without hesitation: malformed prompts, ugly code, speculative concepts, experimental prompts. It’s like a whiteboard that answers back.

When something clicks, I turn it into a script, a library, or just a line of code that works a little better than before. And the whole time, I know the conversation stays local. That matters when your experiments are product ideas or internal IP.

This stack isn’t just a replacement for ChatGPT it’s a sandbox that respects boundaries, encourages risk-free exploration, and rewards the habit of thinking out loud.

There’s power in knowing exactly what your tools are doing.

-Rob

Coderrob

Hi, I'm Rob—programmer, Pluralsight author, software architect, emerging technologist, and lifelong learner.