XDA Developers on MSN
I plugged a desktop GPU into my gaming handheld, and now it runs local LLMs
It works on Windows, Linux, and might even work on macOS in the future.
Especially token generation with the Qwen 3.5 Models and CUDA on the 1.109 versions is slower.
current device: 0, in function ggml_backend_cuda_device_event_synchronize at ggml/src/ggml-cuda/ggml-cuda.cu:4947 This issue only seems to effect the Qwen 3.5 series ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results