Skip to content

Releases: ggml-org/llama.cpp

b5252

01 May 22:35
d7a14c4
Compare
Choose a tag to compare
build : fix build info on windows (#13239)

* build : fix build info on windows

* fix cuda host compiler msg

b5250

01 May 21:21
e0f572c
Compare
Choose a tag to compare
llama-chat : update GLM4 chat template (#13238)

* update GLM4 chat template

* Update chat template

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

b5249

01 May 21:17
79f26e9
Compare
Choose a tag to compare
vulkan: Add bfloat16 support (#12554)

* vulkan: Add bfloat16 support

This adds bfloat16 matrix multiply support based on VK_KHR_shader_bfloat16.
The extension is required for coopmat multiply support, but matrix-vector
multiply trivially promotes bf16 to fp32 and doesn't require the extension.
The copy/get_rows shaders also don't require the extension.

It's probably possible to fall back to non-coopmat and promote to fp32 when
the extension isn't supported, but this change doesn't do that.

The coopmat support also requires a glslc that supports the extension, which
currently requires a custom build.

* vulkan: Support bf16 tensors without the bf16 extension or coopmat support

Compile a variant of the scalar mul_mm shader that will promote the bf16
values to float, and use that when either the bf16 extension or the coopmat
extensions aren't available.

* vulkan: bfloat16 fixes (really works without bfloat16 support now)

* vulkan: fix spirv-val failure and reenable -O

b5248

01 May 20:21
fc727bc
Compare
Choose a tag to compare
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul sha…

b5246

01 May 18:06
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b5243

01 May 16:33
8936784
Compare
Choose a tag to compare
mtmd : add **vision** support for Mistral Small 3.1 (#13231)

* convert ok

* load ok, missing patch merger

* ah sheet it works

* update llava/readme

* add test

* fix test

b5242

01 May 10:31
13c9a33
Compare
Choose a tag to compare
arg : remove CURLINFO_EFFECTIVE_METHOD (#13228)

b5241

01 May 10:08
a70183e
Compare
Choose a tag to compare
llama-model : fix the reported size class for nomic-embed-text-v2-moe…

b5239

01 May 10:08
Compare
Choose a tag to compare
ggml : fix ggml_gallocr_ptr type (ggml/1205)

b5237

30 Apr 22:12
e1e8e09
Compare
Choose a tag to compare
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)