Skip to content

Releases: ggml-org/llama.cpp

b6615

28 Sep 07:06
d8359f5
Compare
Choose a tag to compare
vulkan: 64-bit im2col (#16135)

* vulkan: 64-bit im2col

Add variants of the im2col shaders that use buffer_device_address/buffer_reference,
and use 64-bit address calculations. This is needed for large convolutions used in
stable-diffusion.cpp.

* fix validation error for large im2col

b6613

28 Sep 06:55
3b53634
Compare
Choose a tag to compare
metal : fuse non-sequential nodes (#16102)

* metal : fuse non-sequential nodes

* cont : add comment

* cont : simplify bounds checks

b6612

28 Sep 02:10
1384abf
Compare
Choose a tag to compare
vulkan: handle mat_mul with A matrix > 4GB (#16176)

* vulkan: handle mat_mul with A matrix > 4GB

This change splits mat_mul operations with huge A matrix into chunks in the M
dimension. This works well for stable-diffusion use cases where the im2col
matrix has very large M.

Fix the order of setting the stride in mul_mm_cm2 - setting the dimension
clobbers the stride, so stride should be set after.

* build fixes

b6611

27 Sep 21:05
e6d65fb
Compare
Choose a tag to compare
vulkan: support arbitrary KV dimension in flash attention (#16160)

The "Clamp" spec constant is already based on whether KV is a multiple of Bc,
so use that to control whether bounds checking is performed. Add bounds checking
to the scalar and coopmat1 paths. Coopmat2 didn't need any changes (the K/V
tensors are already optionally clamped, nothing else needed to be changed).

b6610

27 Sep 21:13
8656f5d
Compare
Choose a tag to compare
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…

b6608

27 Sep 18:08
c0bfc57
Compare
Choose a tag to compare
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…

b6607

27 Sep 18:02
75a3a6c
Compare
Choose a tag to compare
CUDA: refactor and deduplicate vector FA kernels (#16208)

* CUDA: refactor and deduplicate vector FA kernels

b6606

27 Sep 17:09
0499b29
Compare
Choose a tag to compare
vulkan: throw system error instead of SIGABRT during init on older de…

b6605

27 Sep 16:38
234e2ff
Compare
Choose a tag to compare
server : remove old LLAMA_SERVER_SSL (#16290)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

b6604

27 Sep 10:56
3f81b4e
Compare
Choose a tag to compare
vulkan: support GET_ROWS for k-quants (#16235)

The dequantize functions are copy/pasted from mul_mm_funcs.comp with very few
changes - add a_offset and divide iqs by 2. It's probably possible to call
these functions from mul_mm_funcs and avoid the duplication, but I didn't go
that far in this change.