Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6615
vulkan: 64-bit im2col (#16135) * vulkan: 64-bit im2col Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp. * fix validation error for large im2col
b6613
metal : fuse non-sequential nodes (#16102) * metal : fuse non-sequential nodes * cont : add comment * cont : simplify bounds checks
b6612
vulkan: handle mat_mul with A matrix > 4GB (#16176) * vulkan: handle mat_mul with A matrix > 4GB This change splits mat_mul operations with huge A matrix into chunks in the M dimension. This works well for stable-diffusion use cases where the im2col matrix has very large M. Fix the order of setting the stride in mul_mm_cm2 - setting the dimension clobbers the stride, so stride should be set after. * build fixes
b6611
vulkan: support arbitrary KV dimension in flash attention (#16160) The "Clamp" spec constant is already based on whether KV is a multiple of Bc, so use that to control whether bounds checking is performed. Add bounds checking to the scalar and coopmat1 paths. Coopmat2 didn't need any changes (the K/V tensors are already optionally clamped, nothing else needed to be changed).
b6610
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…
b6608
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…
b6607
CUDA: refactor and deduplicate vector FA kernels (#16208) * CUDA: refactor and deduplicate vector FA kernels
b6606
vulkan: throw system error instead of SIGABRT during init on older de…
b6605
server : remove old LLAMA_SERVER_SSL (#16290) Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b6604
vulkan: support GET_ROWS for k-quants (#16235) The dequantize functions are copy/pasted from mul_mm_funcs.comp with very few changes - add a_offset and divide iqs by 2. It's probably possible to call these functions from mul_mm_funcs and avoid the duplication, but I didn't go that far in this change.