comet-ml · juanferrub · Jul 15, 2025
@@ -2,17 +2,49 @@
 subtitle: Guides you through the process of creating and managing datasets
 ---
 
-Datasets can be used to track test cases you would like to evaluate your LLM on. Each dataset is made up of dictionary
+Datasets can be used to track test cases you would like to evaluate your LLM on. Each dataset is made up of a dictionary
 with any key value pairs. When getting started, we recommend having an `input` and optional `expected_output` fields for
 example. These datasets can be created from:
 
-- Python SDK: You can use the Python SDK to create an dataset and add items to it.
+- Python SDK: You can use the Python SDK to create a dataset and add items to it.
 - Traces table: You can add existing logged traces (from a production application for example) to a dataset.
 - The Opik UI: You can manually create a dataset and add items to it.
 
 Once a dataset has been created, you can run Experiments on it. Each Experiment will evaluate an LLM application based
 on the test cases in the dataset using an evaluation metric and report the results back to the dataset.
 
+## Create a dataset via the UI
+
+The simplest and fastest way to create a dataset is directly in the Opik UI. 
+This is ideal for quickly bootstrapping datasets from CSV files without needing to write any code.
+
+Steps:
+	1. Navigate to Evaluation > Datasets in the Opik UI.
+	2. Click Create new dataset.
+	3. In the pop-up modal:
+    * Provide a name and an optional description
+    * Optionally, upload a CSV file with your data
+	4. Click Create dataset.
+
+<Frame>
+  <img src="/img/evaluation/create_dataset.png" />
+</Frame>
+
+CSV Format Requirements:
+* Your CSV must contain exactly two columns:
+    * input
+    * output
+* Maximum of 1,000 rows per upload.
+
+<Tip>
+  The UI dataset creation has some limitations:
+    * Only two columns are allowed.
+    * File size is limited to 1,000 rows via the UI.
+    * No support for nested JSON structures in the CSV itself.
+
+  For datasets requiring rich metadata, complex schemas, or programmatic control, use the SDK instead (see the next section).
+</Tip>
+
 ## Creating a dataset using the SDK
 
 You can create a dataset and log items to it using the `get_or_create_dataset` method: