Releases · ggml-org/llama.cpp

01 May 22:35

d7a14c4

b5252

build : fix build info on windows (#13239)

* build : fix build info on windows

* fix cuda host compiler msg

Assets 26

01 May 21:21

github-actions

b5250

e0f572c

b5250

llama-chat : update GLM4 chat template (#13238)

* update GLM4 chat template

* Update chat template

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Assets 26

01 May 21:17

github-actions

b5249

79f26e9

b5249

vulkan: Add bfloat16 support (#12554)

* vulkan: Add bfloat16 support

This adds bfloat16 matrix multiply support based on VK_KHR_shader_bfloat16.
The extension is required for coopmat multiply support, but matrix-vector
multiply trivially promotes bf16 to fp32 and doesn't require the extension.
The copy/get_rows shaders also don't require the extension.

It's probably possible to fall back to non-coopmat and promote to fp32 when
the extension isn't supported, but this change doesn't do that.

The coopmat support also requires a glslc that supports the extension, which
currently requires a custom build.

* vulkan: Support bf16 tensors without the bf16 extension or coopmat support

Compile a variant of the scalar mul_mm shader that will promote the bf16
values to float, and use that when either the bf16 extension or the coopmat
extensions aren't available.

* vulkan: bfloat16 fixes (really works without bfloat16 support now)

* vulkan: fix spirv-val failure and reenable -O

Assets 26

01 May 20:21

github-actions

b5248

fc727bc

b5248

vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul sha…

Assets 26

01 May 18:06

github-actions

b5246

b1dd4d0

b5246

sync : ggml

ggml-ci

Assets 26

01 May 16:33

github-actions

b5243

8936784

b5243

mtmd : add **vision** support for Mistral Small 3.1 (#13231)

* convert ok

* load ok, missing patch merger

* ah sheet it works

* update llava/readme

* add test

* fix test

Assets 26

01 May 10:31

github-actions

b5242

13c9a33

b5242

arg : remove CURLINFO_EFFECTIVE_METHOD (#13228)

Assets 26

01 May 10:08

github-actions

b5241

a70183e

b5241

llama-model : fix the reported size class for nomic-embed-text-v2-moe…

Assets 26

01 May 10:08

github-actions

b5239

4254bb4

b5239

ggml : fix ggml_gallocr_ptr type (ggml/1205)

Assets 26

30 Apr 22:12

github-actions

b5237

e1e8e09

b5237

CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5252

Uh oh!

b5250

Uh oh!

b5249

Uh oh!

b5248

Uh oh!

b5246

Uh oh!

b5243

Uh oh!

b5242

Uh oh!

b5241

Uh oh!

b5239

Uh oh!

b5237

Uh oh!