Skip to content

Commit a7557da

Browse files
committed
doc: Update README, add sequence diagram
1 parent e33988e commit a7557da

File tree

5 files changed

+110
-6
lines changed

5 files changed

+110
-6
lines changed

README.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Arcmind Vector DB
22

3+
Arcmind Vector DB is a high-performance, flexible, and ergonomic vector similarity search database for the [Internet Computer](https://internetcomputer.org). It is designed to be a general-purpose vector similarity search database that can be used for a wide range of AI-powered applications, including recommendation systems, search engines, [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) (RAG), and long-term memory of Autonomous AI agents like [ArcMind AI](https://github.com/arcmindai/arcmindai).
4+
5+
## Architecture
6+
7+
Sequence Flow Diagram
8+
![ArcMind Vector DB](/diagram/architecture.png)
9+
310
## Prerequisites
411

512
- Install Rust Toolchain using Rustup
@@ -22,10 +29,48 @@ If you want to test your project locally, you can use the following commands:
2229
dfx start --background
2330

2431
# Deploys controller and brain canisters to the local replica
32+
# Setup the environment variable: CONTROLLER_PRINCIPAL using using > dfx identity get-principal
33+
2534
./scripts/provision.sh
2635
```
2736

28-
The provision script will deploy a `controller` canister and a `brain` canister which is owned solely by the `controller`
37+
The provision script will deploy a `arcmindvectordb` canister.
38+
39+
## API
40+
41+
See [Candid](/src/arcmindvectordb/arcmindvectordb.did) for the full API.
42+
43+
## Interacting with the canisters
44+
45+
Sample shell scripts are provided to interact with the canisters in the [interact](/interact/) directory.
46+
Sample embeddings content and their embedding vectors are provided in the [embeddings](/embeddings/) directory.
47+
48+
### Add a vector to the VectorStore
49+
50+
Open and Edit:
51+
52+
```bash
53+
./interact/add_vector.sh
54+
```
55+
56+
Try adding multiple vectors of different topics to the VectorStore.
57+
58+
### Search the VectorStore
59+
60+
Then search for similar vectors by using one of the vectors you added as input.
61+
It should return the same vector as the most similar vector and other similar vectors of the same topic.
62+
See how it can understand the semantic meanings of the vectors with many dimensions.
63+
64+
Open and Edit:
65+
66+
```bash
67+
./interact/search_vector.sh
68+
```
69+
70+
Note that the same embedding model must be used for adding and searching vectors.
71+
It is recommended that you use the same embedding model in a single VectorStore for consistent results.
72+
73+
The embeddings in /embeddings/ are generated using the [OpenAI text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings/embedding-models) model with its [Embedding API](https://platform.openai.com/docs/api-reference/embeddings)
2974

3075
## Setting up Github Action CI / CD
3176

@@ -44,6 +89,14 @@ awk 'NF {sub(/\r/, ""); printf "%s\\r\\n",$0;}' ~/.config/dfx/identity/default/i
4489
cat ~/.config/dfx/identity/default/wallets.json
4590
```
4691

92+
## Roadmap
93+
94+
- [x] Backend - Research and implement primary canister as long-term VectorStore with Nearest Neighbours distance metric, embedding API and indexing
95+
- [x] Backend - Integrate with ArcMind AI Autonomous Agent for long-term memory
96+
- [] Doc - Add documentation for the VectorStore API
97+
- [] Backend - Self-hosted machine learning models for generating text (NLP), image and audio embeddings
98+
- [] Backend - Scalable storage buckets for large-scale vector data beyond the canister storage limit
99+
47100
## License
48101

49102
See the [License](LICENSE) file for license rights and limitations (MIT).
@@ -59,6 +112,8 @@ Code & Architecture: Henry Chan, [henry@arcmindai.app](mailto:henry@arcmindai.ap
59112
## References
60113

61114
- [Internet Computer](https://internetcomputer.org)
115+
- [Cloudflare - What is a Vector Database?](https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/)
116+
- [RAG] (https://arxiv.org/abs/2005.11401)
62117
- [Open-source vector similarity search for Postgres](https://github.com/pgvector/pgvector)
63118
- [Spotify Annoy Library - Approximate Nearest Neighbors in C++/Python](https://github.com/spotify/annoy)
64119
- [What is similarity Search](https://www.pinecone.io/learn/what-is-similarity-search/)

diagram/architecture.drawio

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
<mxfile host="65bd71144e">
2+
<diagram id="a3mhpKqtOrk29XRP44x8" name="Page-1">
3+
<mxGraphModel dx="695" dy="597" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
4+
<root>
5+
<mxCell id="0"/>
6+
<mxCell id="1" parent="0"/>
7+
<mxCell id="2" value="Vector DB" style="shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
8+
<mxGeometry x="510" y="128" width="60" height="290" as="geometry"/>
9+
</mxCell>
10+
<mxCell id="4" style="edgeStyle=none;html=1;entryX=0;entryY=0;entryDx=0;entryDy=80;entryPerimeter=0;exitX=1;exitY=0.25;exitDx=0;exitDy=0;" edge="1" parent="1" source="12" target="2">
11+
<mxGeometry relative="1" as="geometry">
12+
<mxPoint x="70" y="223.33333333333337" as="sourcePoint"/>
13+
<mxPoint x="430" y="320" as="targetPoint"/>
14+
</mxGeometry>
15+
</mxCell>
16+
<mxCell id="5" value="2. Add resulting vector and doc text" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="4">
17+
<mxGeometry x="0.0112" y="3" relative="1" as="geometry">
18+
<mxPoint x="-128" y="-18" as="offset"/>
19+
</mxGeometry>
20+
</mxCell>
21+
<mxCell id="6" value="4. Search for similar docs using search &lt;br&gt;keyword embedding" style="edgeStyle=none;html=1;entryX=-0.05;entryY=0.769;entryDx=0;entryDy=0;entryPerimeter=0;exitX=1;exitY=0.75;exitDx=0;exitDy=0;" edge="1" parent="1" source="12" target="2">
22+
<mxGeometry x="-0.5081" y="20" relative="1" as="geometry">
23+
<mxPoint x="70" y="260" as="sourcePoint"/>
24+
<mxPoint as="offset"/>
25+
</mxGeometry>
26+
</mxCell>
27+
<mxCell id="9" style="edgeStyle=none;html=1;entryX=0.014;entryY=0.08;entryDx=0;entryDy=0;entryPerimeter=0;exitX=1;exitY=0.077;exitDx=0;exitDy=0;exitPerimeter=0;" edge="1" parent="1" source="12" target="8">
28+
<mxGeometry relative="1" as="geometry">
29+
<mxPoint x="62.5" y="165" as="sourcePoint"/>
30+
</mxGeometry>
31+
</mxCell>
32+
<mxCell id="10" value="1. Generate embedding from your doc text" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="9">
33+
<mxGeometry x="-0.2027" y="1" relative="1" as="geometry">
34+
<mxPoint x="26" y="-21" as="offset"/>
35+
</mxGeometry>
36+
</mxCell>
37+
<mxCell id="11" value="3. Generate embedding from your &lt;br&gt;search keywords text" style="edgeStyle=none;html=1;entryX=-0.014;entryY=0.478;entryDx=0;entryDy=0;entryPerimeter=0;" edge="1" parent="1" target="8">
38+
<mxGeometry x="-0.165" y="23" relative="1" as="geometry">
39+
<mxPoint x="60" y="273" as="sourcePoint"/>
40+
<mxPoint x="268" y="276" as="targetPoint"/>
41+
<mxPoint as="offset"/>
42+
</mxGeometry>
43+
</mxCell>
44+
<mxCell id="8" value="OpenAI Embedding API" style="rounded=1;whiteSpace=wrap;html=1;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;fillColor=#fff2cc;strokeColor=#d6b656;fontStyle=0" vertex="1" parent="1">
45+
<mxGeometry x="300" y="139" width="40" height="281" as="geometry"/>
46+
</mxCell>
47+
<mxCell id="12" value="User" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#dae8fc;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;strokeColor=#6c8ebf;" vertex="1" parent="1">
48+
<mxGeometry x="20" y="139" width="40" height="281" as="geometry"/>
49+
</mxCell>
50+
</root>
51+
</mxGraphModel>
52+
</diagram>
53+
</mxfile>

diagram/architecture.png

77.2 KB
Loading

diagram/architecture.svg

Lines changed: 1 addition & 0 deletions
Loading

old_prod/canister_ids.json

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)