Releases · ggml-org/llama.cpp

28 Sep 07:06

d8359f5

b6615 Latest

Latest

vulkan: 64-bit im2col (#16135)

* vulkan: 64-bit im2col

Add variants of the im2col shaders that use buffer_device_address/buffer_reference,
and use 64-bit address calculations. This is needed for large convolutions used in
stable-diffusion.cpp.

* fix validation error for large im2col

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-28T07:06:41Z
llama-b6615-bin-macos-arm64.zip

sha256:ef8b48de9a800820e3c1bc66cad3a1082fc971ec5b285622b6a80a467e676603

10.3 MB 2025-09-28T07:06:56Z
llama-b6615-bin-macos-x64.zip

sha256:c0a218d3fa2fddee455acdee27acd59924125fb686923ff348fd5feb320aaccc

27.7 MB 2025-09-28T07:06:58Z
llama-b6615-bin-ubuntu-vulkan-x64.zip

sha256:4e9a5a66f3032f0a734e24eb9234084f2aa51f3b5f3734736044a2b0db031292

25.6 MB 2025-09-28T07:06:59Z
llama-b6615-bin-ubuntu-x64.zip

sha256:151a4325ec4b7c22771c42854ebb1f8353748c9b3061021fa596926c5cc45279

12.3 MB 2025-09-28T07:07:01Z
llama-b6615-bin-win-cpu-arm64.zip

sha256:cd5289323a3c4d021462bd0f7ecf8691ea8478bd4263afd95f5460fbb8b9ee03

10.4 MB 2025-09-28T07:07:02Z
llama-b6615-bin-win-cpu-x64.zip

sha256:33cac9f34e88aac3a0de9ff8915076f5baf0c603a7caee6aabcfb69fe3d0a562

13.5 MB 2025-09-28T07:07:03Z
llama-b6615-bin-win-cuda-12.4-x64.zip

sha256:dc42cc5b82f681b37e59db4206d5a1097b90bcec6fe0fada798f205ac742c684

149 MB 2025-09-28T07:07:05Z
llama-b6615-bin-win-hip-radeon-x64.zip

sha256:d106039f45d5bac50ca3003fbca868c746c86dbcf9738b8e65944ca5d9e9aad9

313 MB 2025-09-28T07:07:12Z
llama-b6615-bin-win-opencl-adreno-arm64.zip

sha256:56a68a7e5019a24ba80f3e219f87235658de058a4f25dd4a0913c86c12c616af

10.8 MB 2025-09-28T07:07:22Z
Source code (zip)

2025-09-28T06:38:37Z
Source code (tar.gz)

2025-09-28T06:38:37Z

28 Sep 06:55

github-actions

b6613

3b53634

b6613

metal : fuse non-sequential nodes (#16102)

* metal : fuse non-sequential nodes

* cont : add comment

* cont : simplify bounds checks

Assets 15

28 Sep 02:10

github-actions

b6612

1384abf

b6612

vulkan: handle mat_mul with A matrix > 4GB (#16176)

* vulkan: handle mat_mul with A matrix > 4GB

This change splits mat_mul operations with huge A matrix into chunks in the M
dimension. This works well for stable-diffusion use cases where the im2col
matrix has very large M.

Fix the order of setting the stride in mul_mm_cm2 - setting the dimension
clobbers the stride, so stride should be set after.

* build fixes

Assets 15

27 Sep 21:05

github-actions

b6611

e6d65fb

b6611

vulkan: support arbitrary KV dimension in flash attention (#16160)

The "Clamp" spec constant is already based on whether KV is a multiple of Bc,
so use that to control whether bounds checking is performed. Add bounds checking
to the scalar and coopmat1 paths. Coopmat2 didn't need any changes (the K/V
tensors are already optionally clamped, nothing else needed to be changed).

Assets 15

27 Sep 21:13

github-actions

b6610

8656f5d

b6610

vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…

Assets 15

27 Sep 18:08

github-actions

b6608

c0bfc57

b6608

CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…

Assets 15

27 Sep 18:02

github-actions

b6607

75a3a6c

b6607

CUDA: refactor and deduplicate vector FA kernels (#16208)

* CUDA: refactor and deduplicate vector FA kernels

Assets 15

27 Sep 17:09

github-actions

b6606

0499b29

b6606

vulkan: throw system error instead of SIGABRT during init on older de…

Assets 15

27 Sep 16:38

github-actions

b6605

234e2ff

b6605

server : remove old LLAMA_SERVER_SSL (#16290)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 15

27 Sep 10:56

github-actions

b6604

3f81b4e

b6604

vulkan: support GET_ROWS for k-quants (#16235)

The dequantize functions are copy/pasted from mul_mm_funcs.comp with very few
changes - add a_offset and divide iqs by 2. It's probably possible to call
these functions from mul_mm_funcs and avoid the duplication, but I didn't go
that far in this change.

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6615

Uh oh!

b6613

Uh oh!

b6612

Uh oh!

b6611

Uh oh!

b6610

Uh oh!

b6608

Uh oh!

b6607

Uh oh!

b6606

Uh oh!

b6605

Uh oh!

b6604

Uh oh!