Fix NVIDIA Driver + Install Qwen3.5 27B Q8_0 via Ollama
Context
Run Qwen3.5 27B locally at Q8_0 quantization, split across 2x RTX 3090. NVIDIA driver modules are mismatched with the running kernel — must fix first.
Hardware Layout
┌─────────────────────────────────────────────────────┐
│ HOST SYSTEM │
│ CPU: x86_64 │ RAM: 128 GB │ Ollama 0.17.5 │
├─────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ GPU 0 (65:00) │ │ GPU 1 (b3:00) │ │
│ │ RTX 3090 │ │ RTX 3090 │ │
│ │ 24 GB VRAM │ │ 24 GB VRAM │ │
│ │ │ │ │ │
│ │ ┌────────────┐ │ │ ┌────────────┐ │ │
│ │ │ Q8_0 Layers│ │ │ │ Q8_0 Layers│ │ │
│ │ │ ~14 GB │ │ │ │ ~14 GB │ │ │
│ │ └────────────┘ │ │ └────────────┘ │ │
│ └──────────────────┘ └──────────────────┘ │
│ ▲ ▲ │
│ └────────┐ ┌────────────┘ │
│ │ │ │
│ ┌─────┴──┴─────┐ │
│ │ Ollama │ │
│ │ Tensor Split │ │
│ │ (automatic) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────┘
Current Problem
Kernel: 6.17.0-19-generic ◄── running
Modules: 6.14.0-27-generic ◄── installed (MISMATCH!)
╰─► nvidia-smi FAILS
╰─► GPUs invisible to Ollama
Step 1: Fix NVIDIA Driver
Upgrade kernel modules to match the running kernel:
sudo apt update
sudo apt upgrade -y nvidia-driver-550 linux-modules-nvidia-550-generic-hwe-24.04
Reboot to load new modules:
sudo reboot
Verify after reboot:
nvidia-smi
Expected output:
+-------------------------+-------------------------+
| GPU 0: RTX 3090 | GPU 1: RTX 3090 |
| 24576 MiB VRAM | 24576 MiB VRAM |
| Driver: 550.163.01 | Driver: 550.163.01 |
+-------------------------+-------------------------+
Step 2: Pull Qwen3.5 27B Q8_0 via Ollama
Check existing models and pull the target tag:
ollama list # check existing models
ollama pull qwen3.5:27b-q8_0
If the Q8_0 tag isn't in the Ollama library, download the GGUF from HuggingFace and create a Modelfile:
Option A: Ollama library has it
ollama pull qwen3.5:27b-q8_0
Option B: Custom GGUF from HuggingFace
- Download the
.gguffile (~28 GB) - Create a Modelfile:
FROM ./qwen3.5-27b-q8_0.gguf
- Build the model:
ollama create qwen3.5-27b:q8_0 -f Modelfile
Step 3: Run and Verify
ollama run qwen3.5:27b-q8_0 "Hello, what model are you?"
Verify the GPU split in a second terminal while the model is running:
# Check GPU memory usage:
nvidia-smi
# Check Ollama logs for layer distribution:
journalctl -u ollama --no-pager | tail -30
Expected GPU memory usage during inference:
┌──────────────────┐ ┌──────────────────┐
│ GPU 0: 3090 │ │ GPU 1: 3090 │
│ ~14 GB / 24 GB │ │ ~14 GB / 24 GB │
│ ████████░░░░ │ │ ████████░░░░ │
│ 58% utilized │ │ 58% utilized │
└──────────────────┘ └──────────────────┘
Verification Checklist
nvidia-smishows both GPUs after rebootollama pullcompletes successfully (~28 GB download)ollama runresponds correctly with GPU acceleration- Both GPUs show VRAM usage during inference