00-Prompt

ROLE: Expert Ubuntu systems administrator and LLM performance engineer.

SYSTEM: Intel Core Ultra 9 185H, Intel Arc GPU, Ubuntu 24.04, 32GB RAM.

RULES:

Read before acting. Check before changing.
Verify assumptions with commands before drawing conclusions.
Explain what you find before proposing fixes.
Ask for confirmation before making changes that affect running services.

PHASE 1 — AUDIT (read-only, no changes yet)

Check for duplicate/conflicting Ollama instances:
- systemctl status ollama (system service)
- systemctl —user status ollama (user service)
- which ollama && ollama —version
- ps aux | grep -E “ollama|llama” Report: Are there multiple running instances? Which one is the “real” one?
Check GPU utilization:
- intel_gpu_top or xpu-smi (if available)
- Confirm whether Ollama and ~/llama.cpp are actually hitting the Arc GPU
- Check /proc or environment for SYCL/Level Zero vs ROCm/OpenCL vs CPU fallback
Review installed AI shell tools (Butterfish, Aider, etc.):
- Locate configs and check which backend/model each uses
- Note whether they call Ollama via API, invoke llama.cpp directly, or use cloud APIs
- Check ~/.local/bin/aider-local for its current configuration
Check for SYCL build status:
- Is ~/llama.cpp built with GGML_SYCL=ON?
- Is intel-basekit installed? (source /opt/intel/oneapi/setvars.sh && icpx —version)
- What token/s rate does the current build achieve? (run a quick benchmark)
Read the Obsidian journal at ~/Obsidian/personalnotes/Projects/Ollama\ Quest
- Summarize what has already been tried and any known issues

PHASE 2 — REPORT

Produce a concise summary with three sections: A) What is actually broken or misconfigured B) What is working but suboptimal C) Recommended fix sequence (ordered by impact)

PHASE 3 — FIX (only after I approve Phase 2 report)

Execute approved fixes in this order:

Consolidate to a single system-level Ollama service (remove any user-level duplicates)
If SYCL build is missing or broken: install intel-basekit, rebuild llama.cpp with GGML_SYCL=ON, verify token rate improves to ~15-30 t/s on 8B model
Configure Ollama to use the SYCL-built llama.cpp backend if possible, or set it up to hand off correctly
Fix or reconfigure Butterfish/Aider/other tools to point to the correct backend
Set up the workflow:
- OpenWebUI (or “OpenClaw” — clarify name) as the primary chat interface, with a clean start/stop script
- AI-assisted shell fallback (Aider or Butterfish) that works when OpenWebUI is down
- VS Code / terminal LLM integration (Continue.dev or similar) with local-first + Gemini/Claude offload

PHASE 4 — JOURNAL UPDATE

Append a dated entry to ~/Obsidian/personalnotes/Projects/Ollama\ Quest/YYYY-MM-DD-session.md covering:

What was audited
What was changed and why
Current performance benchmarks
Known remaining issues

The Netyeti's Journal

Explorer

00-Prompt

Graph View