Building a Fully Local, Privacy-First AI Chat with Ollama and Open WebUI
Sometimes I want full control, not most, but total, end-to-end control over the tools I use to think, build, and create.
That’s where Ollama and Open WebUI comes in. This is my go-to local AI chat setup: a fully local, OpenAI-style interface that runs quietly on your machine without whispering a word to the cloud.
What I Use This For
This isn’t just an academic exercise in privacy. I use this:
- To draft content (like this post)
- To lint, test, and document code
- For talking with my documents
- As a scratchpad for ideas, code snippets, and architectural patterns
It feels like having a partner on hand who never leaves the room. It’s fast, runs on my hardware, and doesn’t phone home. That last part matters more and more.
Getting the Stack Running
You’ll need two components:
- Ollama: The engine that runs LLMs locally
- Open WebUI: The polished, responsive interface for chatting with them
This stack works beautifully on macOS, Linux, and even Windows via WSL2.
Requirements
- 8GB RAM minimum, 16GB+ ideal
- Decent CPU or Apple Silicon / GPU for acceleration
- Basic terminal literacy
- WebSocket support is required for Open WebUI to function correctly
Step 1: Install Ollama
Ollama handles downloading and running models locally. It’s shockingly simple.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
Once installed:
ollama run llama3
This kicks off the download and drops you into an interactive prompt.
Step 2: Set Up Open WebUI
If Ollama is the engine, Open WebUI is the race car. It provides the clean interface that makes interacting with your models feel familiar chat interface, but without the data trail.
Recommended: Docker-Based Installation
This is the approach suggested by the Open WebUI maintainers and the one I’ve come to rely on for clean, repeatable setup.
1. Pull the Docker image
docker pull ghcr.io/open-webui/open-webui:main
2. Run the container
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
--name openwebui \
--restart always \
ghcr.io/open-webui/open-webui:main
This spins up Open WebUI and binds it to
localhost:3000
. The volume ensures your settings and chats persist.
3. Open your browser
http://localhost:3000
Create your admin account on first launch. Done.
Optional Configs
Enable GPU Support (if you’ve got an NVIDIA card):
docker run -d -p 3000:8080 \
--gpus all \
-v open-webui:/app/backend/data \
--name openwebui \
--restart always \
ghcr.io/open-webui/open-webui:cuda
Disable Auth for Single-User Mode (great for personal rigs):
docker run -d -p 3000:8080 \
-e WEBUI_AUTH=False \
-v open-webui:/app/backend/data \
--name openwebui \
--restart always \
ghcr.io/open-webui/open-webui:main
Connect to Remote Ollama:
docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://<ollama-host>:11434 \
-v open-webui:/app/backend/data \
--name openwebui \
--restart always \
ghcr.io/open-webui/open-webui:main
Replace <ollama-host>
with your actual host IP or Docker bridge name.
Want to go further? You can wire this into a Docker Compose setup or reverse proxy it through Nginx. But this will get you up and running fast.
Pulling New Models
Want a coding-specific model? One tuned for reasoning? Just pull another:
ollama run codellama
ollama run mistral
ollama run gemma
ollama run your-mom
....
Note: Available models can be found here.
Open WebUI detects the models and lets you switch between them via the dropdown in the sidebar.
A Few Use Cases I Keep Coming Back To
Code Review Companion
Drop in a code snippet and ask for clarity, edge case coverage, or performance optimizations. I’ve had it generate tests that caught things I missed.
Shell Task Generator
Ask for scripts. Really. It’s great at wiring together bash, systemctl, ufw, … well you name it.
curl http://localhost:11434/api/generate \
-d '{"model": "llama3", "prompt": "Write a cron job to clean /tmp every hour."}'
Documentation Generator
Paste your raw TypeScript and let it create JSDoc, full markdown doc pages, or even API reference notes.
Commit Linting Assistant
Hook it into a Git commit hook to validate messages against conventional rules or your custom format.
Tips From Real Use
- Prefer smaller quantized models like
llama3:8b-q4_0
for performance - Use
ollama list
to inspect installed models - Clear up old models with
ollama rm model-name
And if you’re paranoid like I am:
sudo ufw deny out to any from 127.0.0.1 port 11434
That’ll make sure nothing ever leaks even if the stack gets curious.
Why I Stick With It
I iterate faster with this setup than I ever did using cloud tools. There’s something freeing about having an LLM on tap with no rate limits, no UI latency, and no worry that a half-baked idea is being logged somewhere for future training fodder.
I can throw ideas at it without hesitation: malformed prompts, ugly code, speculative concepts, experimental prompts. It’s like a whiteboard that answers back.
When something clicks, I turn it into a script, a library, or just a line of code that works a little better than before. And the whole time, I know the conversation stays local. That matters when your experiments are product ideas or internal IP.
This stack isn’t just a replacement for ChatGPT it’s a sandbox that respects boundaries, encourages risk-free exploration, and rewards the habit of thinking out loud.
There’s power in knowing exactly what your tools are doing.
-Rob