Arc GPU Optimization Guide: Intel SYCL + Local LLM
This guide tracks the optimized local LLM setup for the Intel Core Ultra (Arc Graphics) using SYCL offloading. Reference the Discovery Log for context.
Components
- Back-end:
llama.cppbuilt withGGML_SYCL=ONandicpx. - Drivers:
libze-intel-gpu(Level Zero) + Intel oneAPI 2025.3. - Service:
llama-serve.service(User-level systemd).
Daily Usage Commands
Switching Workflows
Use the custom toggle-ai.sh script to manage memory and services:
# Start OpenClaw for agentic work
toggle-ai.sh openclaw
# Switch to terminal-only AI (Butterfish / Pi)
toggle-ai.sh shell
# See what's currently active
toggle-ai.sh status
# Shut it all down
toggle-ai.sh stopHigh-Performance Aider
Launch Aider with the Arc-optimized backend (bypasses default Ollama):
aider-local coder # Launches Qwen 2.5 Coder 7B
aider-local llama # Launches Llama 3.1 8BTroubleshooting
- Memory Stats: If the logs show warnings about
ext_intel_free_memory, ensureexport ZES_ENABLE_SYSMAN=1is set. - Service Logs:
journalctl --user -u llama-serve.service -f - Device Detection: Run
sycl-lsto verify the Level Zero GPU is recognized.
Maintenance
If llama.cpp needs to be updated:
cd ~/llama.cpp/buildsource /opt/intel/oneapi/setvars.shcmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpxmake -j$(nproc) llama-server llama-cli