Local LLM for Coding with Ollama on macOS

Published at 2025-08-04T16:43:39+03:00

Table of Contents

With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama.

Ollama is a powerful tool that brings local AI capabilities directly to your local hardware. By running AI models locally, you can enjoy the benefits of intelligent assistance without relying on cloud services. This document outlines my initial setup and experiences with Ollama, with a focus on coding tasks and agentic coding.

https://ollama.com/

Why Local LLMs?

Using local AI models through Ollama offers several advantages:

Hardware Considerations

Running large language models locally is currently limited by consumer hardware capabilities:

For reference, here are some key points about running large LLMs locally:

The model I'll be mainly using in this blog post (`qwen2.5-coder:14b-instruct`) is particularly interesting as:

https://ollama.com/library/qwen2.5-coder

https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct

For general thinking tasks, I found `deepseek-r1:14b` to be useful (in the future, I also want to try other `qwen` models here). For instance, I utilised `deepseek-r1:14b` to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within a minute or so.

https://ollama.com/library/deepseek-r1:14b

https://huggingface.co/deepseek-ai/DeepSeek-R1

A quantised (as mentioned above) LLM which has been converted from high-precision connection (typically 16- or 32-bit floating point) representations to lower-precision formats, such as 8-bit integers. This reduces the overall memory footprint of the model, making it significantly smaller and enabling it to run more efficiently on hardware with limited resources or to allow higher throughput on GPUs and CPUs. The benefits of quantisation include reduced storage and faster inference times due to simpler computations and better memory bandwidth utilisation. However, quantisation can introduce a drop in model accuracy because the lower numerical precision means the model cannot represent parameter values as precisely. In some cases, it may lead to instability or unexpected outputs in specific tasks or edge cases.

Basic Setup and Manual Code Prompting

Installing Ollama and a Model

To install Ollama, performed these steps (this assumes that you have already installed Homebrew on your macOS system):

Which started up the Ollama server with something like this (the screenshots shows already some requests made):

Ollama serving

And then, in a new terminal, I pulled the model with:

Now, I was ready to go! It wasn't so difficult. Now, let's see how I used this model for coding tasks.

Example Usage

I run the following command to get a Go function for calculating Fibonacci numbers:

Note, after having written this blog post, I tried the same with the newer model `qwen3-coder:30b-a3b-q4_K_M` (which "just" came out, and it's a quantised 30B model), and it was much faster:

https://ollama.com/library/qwen3-coder:30b-a3b-q4_K_M

Agentic Coding with Aider

Installation

Aider is a tool that enables agentic coding by leveraging AI models (also local ones, as in our case). While setting up OpenAI Codex and OpenCode with Ollama proved challenging (those tools either didn't know how to work with the "tools" (the capability to execute external commands or to edit files for example) or didn't connect at all to Ollama for some reason), Aider worked smoothly.

To get started, the only thing I had to do was to install it via Homebrew, initialise a Git repository, and then start Aider with the Ollama model `ollama_chat/qwen2.5-coder:14b-instruct`:

https://aider.chat

https://opencode.ai

https://github.com/openai/codex

Agentic coding prompt

This is the prompt I gave:

It then generated something, but did not work out of the box, as it had some issues with the imports and package names. So I had to do some follow-up prompts to fix those issues with something like this:

Aider fixing the packages

Compilation & Execution

Once done so, the project was ready and I could compile and run it:

The code

The code it generated was simple, but functional. The `./cmd/aitest/main.go` file:

The `./internal/version.go` file:

The `./internal/count.go` file:

The code is quite straightforward, especially for generating boilerplate code this will be useful for many use cases!

In-Editor Code Completion

To leverage Ollama for real-time code completion in my editor, I have integrated it with Helix, my preferred text editor. Helix supports the LSP (Language Server Protocol), which enables advanced code completion features. The `lsp-ai` is an LSP server that can interface with Ollama models for code completion tasks.

https://helix-editor.com

https://github.com/SilasMarvin/lsp-ai

Installation of `lsp-ai`

I installed `lsp-ai` via Rust's Cargo package manager. (If you don't have Rust installed, you can install it via Homebrew as well.):

Helix Configuration

I edited `~/.config/helix/languages.toml` to include:

Note that there is also a `gpt` language server configured, which is for GitHub Copilot, but it is out of scope of this blog post. Let's also configure `lsp-ai` settings in the same file:

As you can see, I have also added other models, such as Mistral Nemo and DeepSeek R1, so that I can switch between them in Helix. Other than that, the completion parameters are interesting. They define how the LLM should interact with the text in the text editor based on the given examples.

If you want to see more `lsp-ai` configuration examples, they are some for Vim and Helix in the `lsp-ai` git repository!

Code completion in action

The screenshot shows how Ollama's `qwen2.5-coder` model provides code completion suggestions within the Helix editor. LSP auto-completion is triggered by leaving the cursor at position `<CURSOR>` for a short period in the code snippet, and Ollama responds with relevant completions based on the context.

Completing the fib-function

In the LSP auto-completion, the one prefixed with `ai - ` was generated by `qwen2.5-coder`, the other ones are from other LSP servers (GitHub Copilot, Go linter, Go language server, etc.).

I found GitHub Copilot to be still faster than `qwen2.5-coder:14b`, but the local LLM one is actually workable for me already. And, as mentioned earlier, things will likely improve in the future regarding local LLMs. So I am excited about the future of local LLMs and coding tools like Ollama and Helix.

After trying `qwen3-coder:30b-a3b-q4_K_M` (following the publication of this blog post), I found it to be significantly faster and more capable than the previous model, making it a promising option for local coding tasks. Honestly, even my current local setup already handles routine coding stuff pretty well—better than I expected.

Conclusion

Will there ever be a time we can run larger models (60B, 100B, ...and larger) on consumer hardware, or even on our phones? We are not quite there yet, but I am optimistic that we will see improvements in the next few years. As hardware capabilities improve and/or become cheaper, and more efficient models are developed (or new techniques will be invented to make language models more effective), the landscape of local AI coding assistants will continue to evolve.

For now, even the models listed in this blog post are very promising already, and they run on consumer-grade hardware (at least in the realm of the initial tests I've performed... the ones in this blog post are overly simplistic, though! But they were good for getting started with Ollama and initial demonstration)! I will continue experimenting with Ollama and other local LLMs to see how they can enhance my coding experience. I may cancel my Copilot subscription, which I currently use only for in-editor auto-completion, at some point.

However, truth be told, I don't think the setup described in this blog post currently matches the performance of commercial models like Claude Code (Sonnet 4, Opus 4), Gemini 2.5 Pro, the OpenAI models and others. Maybe we could get close if we had the high-end hardware needed to run the largest Qwen Coder model available. But, as mentioned already, that is out of reach for occasional coders like me. Furthermore, I want to continue coding manually to some degree, as otherwise I will start to forget how to write for-loops, which would be awkward... However, do we always need the best model when AI can help generate boilerplate or repetitive tasks even with smaller models?

E-Mail your comments to `paul@nospam.buetow.org` :-)

Other related posts are:

2025-08-05 Local LLM for Coding with Ollama on macOS (You are currently reading this)

2025-06-22 Task Samurai: An agentic coding learning experiment

Back to the main site