Skip to content

update enrichment tables doc with new structure and pipeline usage, update Streaming Search performance, and add upload and recovery guide for enrichment tables #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ OpenObserve is configured through the use of below environment variables.
| ZO_SWAGGER_ENABLED | true | No | Generate SWAGGER API documentation by default. (since v0.10.8) |
| ZO_INGEST_ALLOWED_UPTO | 5 | No | Discards events older than the specified number of hours. By default, OpenObserve accepts data only if it is not older than 5 hours from the current ingestion time.|
| ZO_INGEST_ALLOWED_IN_FUTURE | 24 | No | Discards events dated beyond the specified number of future hours. By default, OpenObserve accepts data only if it is not timestamped more than 24 hours into the future.|
| ZO_QUERY_INDEX_THREAD_NUM | 0 | No | Controls thread count for Tantivy index search. Set to `0` to use default: `CPU cores × 4`. Set a positive integer to override. `0` does not mean unlimited.|

> For local mode, OpenObserve use sqlite as the metadata store.
>
Expand Down
Binary file added docs/images/enable-disable-streaming-search.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/resume-scheduled-pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/streaming-search-access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/upload-enrichment-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/use-enrichment-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/with-streaming-search-panel1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/with-streaming-search-panel2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/without-streaming-search-panel1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/without-streaming-search-panel2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ OpenObserve's architectural approach can transform how you handle observability
**Next Steps:**

- Explore our comprehensive [Feature List](../features/logs.md) to see all capabilities
- Check out [Getting Started Guide](../getting-started.md) to start exploring
- Check out [Getting Started Guide](../docs/getting-started/) to start exploring
- Join our [Community](https://github.com/openobserve/openobserve/discussions) to connect with other users

*Sleep better at night knowing your observability stack is both powerful and affordable*
1 change: 1 addition & 0 deletions docs/user-guide/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ nav:
- Dashboards: dashboards
- Actions: actions
- Functions: functions
- Enrichment Tables: enrichment-tables
- Real User Monitoring (RUM): rum.md
- Identity and Access Management (IAM): identity-and-access-management
- Management: management
Expand Down
5 changes: 5 additions & 0 deletions docs/user-guide/enrichment-tables/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
nav:

- Enrichment Tables Overview: index.md
- Enrichment Tables: enrichment.md
- Upload, Caching, and Restart Behavior: enrichment-table-upload-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Enrichment Table Upload and Recovery Flow – OpenObserve
description: Explains enrichment table upload, caching, and recovery behavior in OpenObserve based on file size and system settings.
---
This page describes how OpenObserve handles enrichment table uploads, background synchronization, local disk caching, and recovery after node restarts.


## Upload Behavior
The upload flow adapts based on the file size, controlled by the environment variable `ZO_ENRICHMENT_TABLE_MERGE_THRESHOLD_MB`.
> Default value: 60 MB

### When the file is smaller than 60 MB
- Uploaded to the metadata and file list storage system. For example, PostgreSQL.
- A background job runs job at regular intervals to:

- Merge all enrichment files received in the last interval.
- Create a single Parquet file.
- Upload the merged file to the remote telemetry storage such as S3.

!!! info "To configure the interval:"
Set the `ZO_ENRICHMENT_TABLE_MERGE_INTERVAL` environment variable.

- This variable defines how frequently the merge job runs.
- The value is in seconds.
- Default: 600

### When the file is 60 MB or larger

- Skips the metadata and file list storage system.
- Directly uploads to remote telemetry storage such as S3.
- No merging or background sync is involved in this path.



## Local Disk Cache
After every enrichment table upload, OpenObserve caches the data locally to allow quick recovery and reduce remote fetches.

- Default path: `/data/openobserve/cache/enrichment_table_cache`
- Configurable with: `ZO_ENRICHMENT_TABLE_CACHE_DIR`

This cache is the primary recovery source during node restarts.


## Behavior on Node Restart
When a node restarts, OpenObserve restores the enrichment table in the following order:

### If local disk cache is available

- OpenObserve checks the local disk cache first and loads the enrichment table directly from the local disk cache into memory.

### If local disk cache is missing

- OpenObserve sends a single search request to one of the querier nodes.
- The querier fetches the latest enrichment data from PostgreSQL or S3 and provides it to the restarting node.


148 changes: 148 additions & 0 deletions docs/user-guide/enrichment-tables/enrichment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: Enrichment Table – OpenObserve
description: Learn how to enrich incoming or queried log data in OpenObserve using enrichment tables.
---
This page explains how to enrich incoming or queried log data in OpenObserve using enrichment tables.

## What Is an Enrichment Table
An enrichment table in OpenObserve is a reference table used to enhance your log data with additional context. It is typically a CSV file that maps keys from your logs to descriptive values.

You can use enrichment tables during:

- **Ingestion**: To add context as data is ingested.
- **Query time**: To enrich data dynamically while querying.

**Enrichment is performed using Vector Remap Language or VRL functions.**

!!! note "Where to find"
To access the enrichment table interface:

1. Select the appropriate organization from the dropdown in the top-right corner.
2. Navigate to the left-hand menu.
3. Select **Pipelines > Enrichment Tables**.

This opens the enrichment table management interface, where you can view, create, and manage enrichment tables available to the selected organization.

!!! note "Who can access"
Access to enrichment tables is controlled via the **Enrichment Tables** module in the **IAM** settings, using **role-based access control (RBAC)**.

- **Root users** have full access by default.
- Other users must be assigned access through **Roles** in **IAM**.
- You can assign access to the entire **Enrichment Tables** module.
- You can also assign permissions to individual enrichment tables. This allows fine-grained control over who can use or modify specific enrichment tables.

## Common Use Cases for Enrichment

Enrichment tables are often used to add human-readable context or derived values to logs. Examples include:

- **Country code to country name**: Add a new field that maps `IN` to `India`, `US` to `United States`, etc.

- **Status code to status label**: Add a new field that maps status `1` to `success`, `2` to `failure`, and `3` to `unknown`.

- **Internal vs external IP**: Add a new field that classifies the IP address as `internal` or `external` based on private IP ranges.

- **Protocol number to protocol name**: Add a new field that maps `6` to `TCP` and `17` to `UDP` using a protocol lookup table.



## How to Create and Use an Enrichment Table

### Step 1: Identify the Field to Enrich
Review your log data and identify a field that contains codes or labels with limited context. <br>
**Example** <br>
The `log_iostream` field in the logs has values such as:

```json
"log_iostream": "stdout"
"log_iostream": "stderr"
```
> The goal is to create a new field, for example `stream_type_description`, that provides a readable explanation like:
```json
"log_iostream": "stderr"
"stream_type_description": "Standard Error – error or diagnostic logs"
```

### Step 2: Prepare the Enrichment Table
Create a CSV file containing the original values and their corresponding descriptive meanings. Use clear and consistent column headers.
Example CSV (`enrichment_reference.csv`)
```cs
log_iostream,stream_type_description
stdout,Standard Output – application logs
stderr,Standard Error – error or diagnostic logs
```

### Step 3: Upload the Enrichment Table

1. Go to **Pipelines > Enrichment Tables** in the OpenObserve UI.
2. Click **Add Enrichment Table**.
3. Set a name such as log_stream_labels.
4. Upload your CSV file.
5. Click **Save**.
<br>
![Upload the Enrichment Table](../../images/upload-enrichment-table.png)

The enrichment table is now available for use in VRL.

### Step 4: Use the Enrichment Table in a VRL Function
1. Go to the **Logs** page.
2. Select the relevant log stream.
3. In the **VRL Function Editor**, enter the following:

```js linenums="1"
record, err = get_enrichment_table_record("log_stream_labels", {"log_iostream": .log_iostream})
.stream_type_description = record.stream_type_description
.
```

!!! note "Explanation:"
**Line 1:** <br>

`record, err = get_enrichment_table_record("log_stream_labels", { "log_iostream": .log_iostream })`:

- This line searches the enrichment table named `log_stream_labels`.
- It matches the field `log_iostream` in your log event with the `log_iostream` column in the enrichment table.
- If a match is found, the corresponding row from the table is returned as record.
- If no match is found or an error occurs, record will be empty and err will contain the error.

**Line 2:** <br>
`.stream_type_description = record.stream_type_description`:

- This creates a new field called `stream_type_description` in your log event.
- The value is taken from the `stream_type_description` column in the enrichment table row returned above.
- If the enrichment table did not contain a matching entry, this field may not be added.

**Line 3:** <br>
`.`

- This tells OpenObserve to return the modified log event, including the newly added field.

**Optional** <br>
If you prefer to replace the original value instead of adding a new field, you can do:

```js linenums="1"
record, err = get_enrichment_table_record("log_stream_labels", {"log_iostream": .log_iostream})
.log_iostream = record.stream_type_description
.
```
### Step 5: Run the Query and View the Results
Click Run Query. A new field (such as stream_type_description) will appear in the results, containing the enriched meaning of the original value.
<br>
![Use the Enrichment Table](../../images/use-enrichment-table.png)


## Use Enrichment Tables in Pipelines
In addition to enriching data at query time, you can apply the same enrichment logic during ingestion using **Pipelines**. This allows you to permanently transform log records as they arrive, ensuring that enriched fields are stored along with the original data.

### How it works

- You define a pipeline with a **Transform** step that uses a VRL function.
- The VRL function reads from an enrichment table, just like in the **Logs** UI.
- The enriched field is added before the data is written to storage.

!!! note
Use query-time enrichment when you want flexibility. Use ingestion-time enrichment when you want consistency and speed.

## Troubleshooting
- **Field not enriched:** Ensure the enrichment table column name matches the log field and that the data types are compatible.
- **No result added:** Check that the enrichment table was uploaded and saved correctly, and that a matching row exists.
- **Permission denied:** Ensure the user has the correct permissions in the IAM role to access the enrichment table.
6 changes: 6 additions & 0 deletions docs/user-guide/enrichment-tables/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Enrichment tables in OpenObserve allow you to add meaningful context to your log data by joining it with external reference data. These tables are uploaded as CSV files and can be used during ingestion or at query time to add or modify fields.

Learn more:

- [Enrichment Tables](../enrichment-tables/enrichment/)
- [Upload, Caching, and Restart Behavior](../enrichment-tables/enrichment-table-upload-recovery/)
2 changes: 1 addition & 1 deletion docs/user-guide/functions/.pages
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
nav:
- Functions Overview: index.md
- Functions in OpenObserve: functions-in-openobserve.md
- Enrichment: enrichment.md

Loading