Running ~ollama~ on RX 6600

Why Local LLMs?

I believe that Local LLMs are the better future for privacy reasons.

Setup

OS	NixOS Unstable Baremetal
CPU	R9 7950X
RAM	64GB @ 6000Mhz
GPU	RX 6600

Configuration

As seen in my configuration.nix, I have ollama enabled as a service. The problem with this for my GPU is that it does not automatically run with it. While I can dig up how to change the configuration to include an environment variable, we can just run it from the command line.

OLLAMA_HOST="127.0.0.1:11444" HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

OLLAMA_HOST is important here since we don't want to conflict with the default port.
HSA_OVERRIDE_GFX_VERSION is the important environment variable to set since this enables work with the GPU.

Usage

Simply run

OLLAMA_HOST="127.0.0.1:11444" ollama run llama2:latest

With your model of choice of course. If you have radeontop installed you should see the VRAM usage spike up to ~60%.

Models that fit in the VRAM

Here is a list of models I tested that fit in the VRAM of the 6600 as of [2024-04-13 Sat].

codegemma:7b
llama2:7b
zephyr:7b
gemma:instruct

Models that do not work

These models seem to not work although they fit in the GPU

phi:2.7b
wizardcoder:7b-python
wizardcoder:7b-python-q4_0

Have fun!