Skip to content

Commit 56b6336

Browse files
authored
feat: add event gist for fine-grained retrieval (#102)
* feat: add event fill param * feat: add event gist * docs: update locomo * docs: add search gists * feat: add search gist memory * feat: update search gist sdk * feat: fix db warning and update tests
1 parent 7991dd5 commit 56b6336

File tree

27 files changed

+34754
-30051
lines changed

27 files changed

+34754
-30051
lines changed

docs/experiments/locomo-benchmark/README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ This project contains the code of running benchmark results on [Locomo dataset](
99
- zep
1010
- basic rag
1111
- naive LLM
12-
- Memobase (version [0.0.32-beta1](https://github.com/orgs/memodb-io/packages/container/memobase/408325731?tag=0.0.32-beta1) or later)
12+
- Memobase
1313

1414
## Result
1515

@@ -22,10 +22,10 @@ This project contains the code of running benchmark results on [Locomo dataset](
2222
| Mem0 | **67.13** | 51.15 | 72.93 | 55.51 | 66.88 |
2323
| Mem0-Graph | 65.71 | 47.19 | 75.71 | 58.13 | 68.44 |
2424
| LangMem | 62.23 | 47.92 | 71.12 | 23.43 | 58.10 |
25-
| Zep | 61.70 | 41.35 | **76.60** | 49.31 | 65.99 |
25+
| Zep | 61.70 | 41.35 | 76.60 | 49.31 | 65.99 |
2626
| OpenAI | 63.79 | 42.92 | 62.29 | 21.71 | 52.90 |
27-
| Memobase(*v0.0.32*) | 63.83 | **52.08** | 71.82 | **80.37** | **70.91** |
28-
| Memobase(*v0.0.37-a1*) | **67.60** | 39.76 | **76.65** | 78.87 | **73.27** |
27+
| Memobase(*v0.0.32*) | 63.83 | **52.08** | 71.82 | 80.37 | 70.91 |
28+
| Memobase(*v0.0.37*) | **70.92** | 46.88 | **77.17** | **85.05** | **75.78** |
2929

3030
> **What is LLM Judge Score?**
3131
>
@@ -37,14 +37,14 @@ We attached the artifacts of Memobase under `fixture/memobase/`:
3737
- `fixture/memobase/results_0503_3000.json`: predicted answers from Memobase Memory
3838
- `fixture/memobase/memobase_eval_0503_3000.json`: LLM Judge results of predicted answers
3939

40-
- v0.0.37-a1
41-
- `fixture/memobase/results_0709_3000.json`: predicted answers from Memobase Memory
42-
- `fixture/memobase/memobase_eval_0709_3000.json`: LLM Judge results of predicted answers
40+
- v0.0.37
41+
- `fixture/memobase/results_0710_3000.json`: predicted answers from Memobase Memory
42+
- `fixture/memobase/memobase_eval_0710_3000.json`: LLM Judge results of predicted answers
4343

4444
To generate the latest scorings, run:
4545

4646
```bash
47-
python generate_scores.py --input_path="fixture/memobase/memobase_eval_0709_3000.json"
47+
python generate_scores.py --input_path="fixture/memobase/memobase_eval_0710_3000.json"
4848
```
4949

5050
Output:
@@ -53,15 +53,15 @@ Output:
5353
Mean Scores Per Category:
5454
bleu_score f1_score llm_score count type
5555
category
56-
1 0.3048 0.4254 0.6760 250 single_hop
57-
2 0.4323 0.6052 0.7887 284 temporal
58-
3 0.1943 0.2616 0.3976 83 multi_hop
59-
4 0.4121 0.5207 0.7665 771 open_domain
56+
1 0.3516 0.4629 0.7092 282 single_hop
57+
2 0.4758 0.6423 0.8505 321 temporal
58+
3 0.1758 0.2293 0.4688 96 multi_hop
59+
4 0.4089 0.5155 0.7717 841 open_domain
6060
6161
Overall Mean Scores:
62-
bleu_score 0.3839
63-
f1_score 0.5053
64-
llm_score 0.7327
62+
bleu_score 0.3978
63+
f1_score 0.5145
64+
llm_score 0.7578
6565
dtype: float64
6666
```
6767

0 commit comments

Comments
 (0)