It works on Windows, Linux, and might even work on macOS in the future.
Especially token generation with the Qwen 3.5 Models and CUDA on the 1.109 versions is slower.
current device: 0, in function ggml_backend_cuda_device_event_synchronize at ggml/src/ggml-cuda/ggml-cuda.cu:4947 This issue only seems to effect the Qwen 3.5 series ...