The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.

    • Infinite@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      Probably 24 GB VRAM and 32-64 GB RAM for minimum specs with 4-bit quantization. This is a beefy boi.

    • TheCornCollector@piefed.zipOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      9 days ago

      I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~120-130* token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

      *Edit: I had a configuration issue on my llama.cpp that reduced the performance. It was limited to 85 tk/s but that was user error on my part.

      • fonix232@fedia.io
        link
        fedilink
        arrow-up
        0
        ·
        15 days ago

        Wonder what the wombo-combo of Ryzen AI APU can do with this.

        Time to fire up the trusty 370.

      • venusaur@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        16 days ago

        Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.

          • venusaur@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            15 days ago

            Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.

      • ericwdhs@discuss.online
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        Can I ask you what GPU driver version you’re running? I’m running a 7900 XTX as well and recently encountered some stability issues after a driver update (trying to support gaming and AI stuff at the same time). The latest version I could find as a recommendation for similar issues was 24.12.1.

        • TheCornCollector@piefed.zipOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          11 days ago

          Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.

          • ericwdhs@discuss.online
            link
            fedilink
            English
            arrow-up
            0
            ·
            11 days ago

            Thanks! I’m migrating all my PCs to Linux anyway and just haven’t gotten to the AI stuff yet, so it sounds like that might fix itself.