Skip to content

Can't load dataset #24

Open
Open
@djklim87

Description

@djklim87

I am using a 4GB dataset and cannot upload it to Meilisearch. Neither the engine nor the importer provides clear error messages to explain what is going wrong.

MRE

I am using the db-benchmarks dataset. Due to GitHub’s limitations, I cannot upload the file here, so I will outline the steps to download it locally:

  1. Clone the db-benchmarks repository:
git clone https://github.com/db-benchmarks/db-benchmarks.git
  1. Download the dataset:
cd db-benchmarks/tests/hn_small
./prepare_csv/prepare.sh | while IFS= read -r line; do echo -e "\t$line"; done
  1. Run Meilisearch:
docker run -it --rm -p 7700:7700 -d getmeili/meilisearch:v1.12
  1. Create and configure the index:
curl -s -X POST "http://localhost:7700/indexes" -H 'Content-Type: application/json' \
  --data-binary "{\"uid\": \"hn_small\", \"primaryKey\": \"id\"}" 

curl -s -X PATCH "http://localhost:7700/indexes/hn_small/settings" \
  -H 'Content-Type: application/json' \
  --data-binary '{"pagination":{"maxTotalHits":2000},"searchableAttributes":["story_text","comment_text","story_author","comment_author"],"filterableAttributes":["comment_ranking","story_author"],"sortableAttributes":["comment_ranking","author_comment_count","story_id","comment_id"],"typoTolerance":{"enabled":false}}'
  1. Run the importer:
./meilisearch-importer --url 'http://localhost:7700' --index hn_small --files ./data/data.csv --batch-size 90MB
  1. Confirm that the data wasn’t uploaded:
curl -s http://localhost:7700/indexes/hn_small/stats

{"numberOfDocuments":0,"isIndexing":false,"fieldDistribution":{}}

Logs

From the logs, I can see errors, but they don’t provide any meaningful explanation of what went wrong:

2025-01-28T08:48:08.923880Z  INFO HTTP request{method=POST host="localhost:7700" route=/indexes/hn_small/documents query_parameters= user_agent=ureq/2.9.6 status_code=202}: meilisearch: close time.busy=17.4ms time.idle=1.50s
2025-01-28T08:48:08.927445343Z 2025-01-28T08:48:08.927360Z  INFO index_scheduler: A batch of tasks was successfully completed with 0 successful tasks and 1 failed tasks.
2025-01-28T08:48:12.498362053Z 2025-01-28T08:48:12.498246Z  INFO HTTP request{method=POST host="localhost:7700" route=/indexes/hn_small/documents query_parameters= user_agent=ureq/2.9.6 status_code=202}: meilisearch: close time.busy=19.7ms time.idle=1.49s
2025-01-28T08:48:12.501504720Z 2025-01-28T08:48:12.501430Z  INFO index_scheduler: A batch of tasks was successfully completed with 0 successful tasks and 1 failed tasks.
2025-01-28T08:48:15.633547596Z 2025-01-28T08:48:15.633356Z  INFO HTTP request{method=POST host="localhost:7700" route=/indexes/hn_small/documents query_parameters= user_agent=ureq/2.9.6 status_code=202}: meilisearch: close time.busy=19.8ms time.idle=1.30s
2025-01-28T08:48:15.637011471Z 2025-01-28T08:48:15.636957Z  INFO index_scheduler: A batch of tasks was successfully completed with 0 successful tasks and 1 failed tasks.
2025-01-28T08:48:31.814040173Z 2025-01-28T08:48:31.813684Z  INFO HTTP request{method=GET host="localhost:7700" route=/indexes/hn_small/stats query_parameters= user_agent=curl/8.7.1 status_code=200}: meilisearch: close time.busy=900µs time.idle=201µs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions