How to Set Up a Local LLM with Ollama + Ngrok for Remote Access
2025-05-30 08:18
TOC
本篇中文版:使用 Ollama + Ngrok 搭建本地 LLM,遠端存取 AI 模型教學
What This Guide Solves
AI tools like ChatGPT are super handy, but there’s always that concern your work data might be used as training material. The recent buzz around DeepSeek and its bans in several countries highlights these security worries.
I recently got a new Mac mini and decided to set up a local LLM (Large Language Model) for work. Once installed, I needed remote access too. After several failed attempts with NAT port forwarding, I opted for a much simpler method—Ngrok.
Listen in: Key takeaways from Google’s NotebookLM.
Installing and Using Ollama
Official site: https://ollama.com/
Just click “Download” to install it.
Once installed, you can download models like Llama, Phi, Gemma, and Mistral—most open-source ones are available here:
For reference, I’m using a Mac mini with:
- 10-core CPU
- 10-core GPU
- 16GB unified memory
Running Phi-4 14B is smooth, but larger models like Mistral Small3 22B tend to lag.
Downloading a model is easy: find it on the site, copy the terminal command, and run it.

Ollama model download interface
Key Ollama commands:
ollama serve
– Start the Ollama serverollama run
– Run a modelollama list
– List all modelsollama rm
– Remove a modelollama create
– Create a model from a Modelfileollama show
– Show model infoollama stop
– Stop a running modelollama pull
– Pull a model from Ollamaollama push
– Push a model to Ollamaollama ps
– Show running modelsollama cp
– Copy a modelollama help
– Show help info
You’ll most often use the first four.
Once your model is ready, you’ll see a basic chat interface:

Chat interface
Type /bye
to exit.
Since I needed remote access for my work computer, I used the API mode. Luckily, Ollama enables API usage by default.
Using the API
Docs: https://github.com/ollama/ollama/blob/main/docs/api.md
POST Endpoint:
http://localhost:11434/api/generate
Common parameters:
model
(required): model nameprompt
: input promptsuffix
: text to append to responseimages
: Base64-encoded image list (for multimodal models like Llava)format
: response format (json
or JSON schema)options
: model parameters like temperaturesystem
: overrides system message in Modelfiletemplate
: overrides prompt template in Modelfilestream
: set to false to return a single object instead of a streamraw
: set to true to disable prompt formattingkeep_alive
: how long to keep model in memory (default: 5 min)
Here’s a basic test using Postman:

Postman test
If you get a response, it’s working!
Installing and Using Ngrok
What’s Ngrok? Imagine you’ve got a “secret base” at home (your computer or server), but your coworkers—stuck in the office—can’t visit. Ngrok creates a magical tunnel that links your “secret base” to the web so they can connect from anywhere.
In plain terms: it turns your localhost into a public URL.
Official site: https://ngrok.com/
Sign up, and you’ll see several installation methods:

Ngrok installation options
This guide shows the MacOS method. After choosing MacOS, the site will show you the install commands:

Ngrok install for MacOS
- First command: install Ngrok
- Second command: set your auth token (needed to use Ngrok services)
Then run this command to tunnel Ollama’s local API (port 11434):
1ngrok http 11434 --host-header="localhost:11434"
This command works perfectly in my testing.
Your terminal should show something like this:

Ngrok running
The key line is the one labeled Forwarding:
1https://fb55-118-233-2-60.ngrok-free.app -> http://localhost:11434
This means your local localhost:11434
is now publicly accessible via https://fb55-118-233-2-60.ngrok-free.app
.
Now just update your Postman test with this new URL:

Postman with Ngrok URL
Success! You can now access your LLM API from your office.
One heads-up: Ngrok’s free plan gives you a different public URL each time. For a consistent URL, you’ll need a paid plan.
Personally, I’m fine with the free version—works great for me!
Translated by ChatGPT.