Releases · ggml-org/llama.cpp

24 Sep 16:01

e789095

b6569

llama: print memory breakdown on exit (#15860)

* llama: print memory breakdown on exit

Assets 15

24 Sep 15:19

github-actions

b6568

f2a789e

b6568

ggml : split graph allocations according to backend max buffer size (…

Assets 15

24 Sep 12:20

github-actions

b6567

3a59971

b6567

model : add label for LiquidAI LFM2-2.6B model (#16204)

* model : add label for LiquidAI LFM2-2.6B model

HF link: [LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B).

Support for GGUF conversion and inference is added in #14620.

However, due to similar `n_embd`, it identifies as a 1.2B model.
Fix the label by using `n_ff` to identify the model instead.

Output of `llama-bench`:
```
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| lfm2 1.2B F16                  |   2.18 GiB |     1.17 B | CPU        |      10 |           pp512 |        223.97 ± 5.32 |
| lfm2 2.6B F16                  |   4.79 GiB |     2.57 B | CPU        |      10 |           pp512 |         92.53 ± 4.14 |
| lfm2 350M F16                  | 676.25 MiB |   354.48 M | CPU        |      10 |           pp512 |       725.52 ± 11.70 |
| lfm2 700M F16                  |   1.38 GiB |   742.49 M | CPU        |      10 |           pp512 |       336.22 ± 12.93 |
```

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 15

24 Sep 07:33

github-actions

b6565

152729f

b6565

common : add missing chrono header for common.cpp (#16211)

Signed-off-by: Uilian Ries <uilianries@gmail.com>

Assets 15

23 Sep 09:32

github-actions

b6558

4e29084

b6558

ggml-cpu: Respect cpumask settings (#16164)

Assets 15

23 Sep 08:55

github-actions

b6557

f6b4af3

b6557

ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15…

Assets 15

23 Sep 07:13

github-actions

b6556

264f1b5

b6556

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 15

22 Sep 17:35

github-actions

b6550

3ecb2f6

b6550

ggml : implement set_rows with i32 index (#16159)

* implement set_rows with i32 index

* template fix

* test quantized path

warnings--

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* forgotten name change

* deduplicate cuda/sycl and test-fix

* indent++

* vulkan: support set_rows with i32 index type (#16162)

* disable i32 index for webgpu for now

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

Assets 15

22 Sep 15:38

github-actions

b6549

432cf43

b6549

codeowners : update + cleanup (#16174)

---------

Co-authored-by: slaren <slarengh@gmail.com>

Assets 15

22 Sep 12:40

github-actions

b6548

37a23c1

b6548

common : enable `--offline` mode without curl support (#16137)

* common : use the json parser

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* common : enable --offline mode without CURL support

This change refactors the download logic to properly support offline mode
even when the project is built without CURL.

Without this commit, using `--offline` would give the following error:

    error: built without CURL, cannot download model from the internet

even if all the files are already cached.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6569

Uh oh!

b6568

Uh oh!

b6567

Uh oh!

b6565

Uh oh!

b6558

Uh oh!

b6557

Uh oh!

b6556

Uh oh!

b6550

Uh oh!

b6549

Uh oh!

b6548

Uh oh!