Update approx_topk docs (#94)

DebashisBorgohainO2 · web-flow · commit d66e3b593906 · 2025-07-14T15:07:10.000+05:30
* update approx_topk page and restructure SQL References by creating individual function pages under /sql-functions folder

* address review comments on sql functions pages

* update the approx_topk_distinct documentation

* fix metadata issue and remove old sql-references page from .pages
diff --git a/docs/.pages b/docs/.pages
@@ -5,8 +5,8 @@ nav:
 - Features: features
 - Enterprise Edition Installation Guide: openobserve-enterprise-edition-installation-guide.md
 - Releases: releases.md
+- SQL Functions: sql-functions
 - Example Queries: example-queries.md
-- SQL Function Reference: sql_reference.md
 - HA Deployment: ha_deployment.md
 - Environment Variables: environment-variables.md
 - Data Management: data-management
@@ -22,4 +22,4 @@ nav:
 - Telemetry: telemetry.md
 - zPlane: zplane.md
 - Work Group: work_group.md 
-- SQL Functions: sql-functions
+
diff --git a/docs/sql-functions/aggregate.md b/docs/sql-functions/aggregate.md
@@ -1,12 +1,9 @@
-
 ---
 title: histogram() Function in OpenObserve
-description: This page explains how to use the histogram() function in OpenObserve to group time-based log data into fixed intervals for trend analysis. It includes syntax options with or without interval specification, use with aggregate functions such as COUNT(), and guidance on interpreting the result. A detailed example shows how logs are grouped into 30-second time buckets, along with the output format. Users are advised to specify intervals explicitly to ensure consistent and predictable results. The page also includes a visual example to support understanding. 
+description: This page explains how to use the histogram() function in OpenObserve to group time-based log data into fixed intervals for trend analysis. It includes syntax options with or without interval specification, use with aggregate functions such as COUNT(), and guidance on interpreting the result. A detailed example shows how logs are grouped into 30-second time buckets, along with the output format. 
 ---
-
 Aggregate functions compute a single result from a set of input values. For usage of standard SQL aggregate functions such as `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX`, refer to [PostgreSQL documentation](https://www.postgresql.org/docs/).
 
----
 
 ### `histogram`
 **Syntax**: histogram(field) or histogram(field, 'interval')
diff --git a/docs/sql-functions/approximate-aggregate/approx-topk-distinct.md b/docs/sql-functions/approximate-aggregate/approx-topk-distinct.md
@@ -2,7 +2,6 @@
 title: approx_topk_distinct() Function in OpenObserve
 description: This page explains how to use the approx_topk_distinct() function in OpenObserve to identify the top K values in one field based on the highest number of distinct values in another field. It introduces the combined use of HyperLogLog and Space-Saving algorithms to efficiently process large, high-cardinality datasets. The guide includes SQL syntax, a usage example, and demonstrates how to flatten the result using the unnest() function. It also provides a sample output to help users understand the structure and interpretation of the result. For top values based only on frequency, refer to the approx_topk() function.
 ---
-
 This page provides instructions on using the `approx_topk_distinct()` function. 
 If you only need to find the top K most frequently occurring values in a field, refer to the [approx_topk()](../approx-topk/) function.
 
@@ -75,8 +74,9 @@ For details on how this approach compares to traditional GROUP BY queries in ter
 ## Limitations
 The following are the known limitations of `approx_topk_distinct()` function:
 
-Results are approximate, not guaranteed to be exact. Not recommended when exact accuracy is critical for analysis or reporting.
-Accuracy depends on data distribution across partitions.
 
-![approx_topk_distinct](../../images/approx-topk-distinct.png)
+- Results are approximate, not guaranteed to be exact. 
+- Accuracy depends on data distribution across partitions.
+
+
 
diff --git a/docs/sql-functions/approximate-aggregate/approx-topk.md b/docs/sql-functions/approximate-aggregate/approx-topk.md
@@ -1,9 +1,7 @@
-
 ---
 title: approx_topk() Function in OpenObserve
-description: This page explains how to use the approx_topk() function in OpenObserve to identify the most frequent values in high-cardinality fields. It provides the SQL syntax, a usage example, result structure, and comparison with the traditional GROUP BY approach. The guide includes a detailed performance comparison and highlights memory efficiency in distributed query processing. It also demonstrates how to use approx_topk() with unnest() for flat output and explains scenarios where this function offers a practical advantage. Limitations and frequently asked questions are included to help users understand when to use this approximate method.
+description: This page explains how to use the approx_topk() function in OpenObserve to identify the most frequent values in high-cardinality fields. It provides the SQL syntax, a usage example, result structure, and comparison with the traditional GROUP BY approach. The guide includes a detailed performance comparison and highlights memory efficiency in distributed query processing. 
 ---
-
 This page provides instructions on using the `approx_topk()` function and explains its performance benefits compared to the traditional `GROUP BY` method.
 
 ## What is `approx_topk`?
diff --git a/docs/sql-functions/approximate-aggregate/index.md b/docs/sql-functions/approximate-aggregate/index.md
@@ -4,3 +4,4 @@ Learn more:
 
 - [approx_topk](../approximate-aggregate/approx-topk/)
 - [approx_topk_distinct](../approximate-aggregate/approx-topk-distinct/)
+
diff --git a/docs/sql-functions/array.md b/docs/sql-functions/array.md
@@ -2,7 +2,6 @@
 title: Array Functions in OpenObserve
 description: This page lists all supported array functions in OpenObserve, along with their syntax, descriptions, and usage examples. These functions operate on fields that contain stringified JSON arrays, enabling users to sort, count, extract subsets, join, and combine array elements. Functions such as arrsort, arrjoin, arrindex, arrzip, spath, and cast_to_arr help process and transform array data effectively. 
 ---
-
 This page lists the array functions supported in OpenObserve, along with their usage formats, descriptions, and examples.
 
 The array functions operate on fields that contain arrays. In OpenObserve, array fields are typically stored as stringified JSON arrays.
diff --git a/docs/sql-functions/full-text-search.md b/docs/sql-functions/full-text-search.md
@@ -2,10 +2,8 @@
 title: Full-Text Search Functions in OpenObserve
 description: This page describes the full-text search functions supported in OpenObserve, including their syntax, behavior, and examples. Functions such as str_match, str_match_ignore_case, match_all, re_match, and re_not_match allow users to filter logs based on exact string matches, case-insensitive searches, keyword searches across multiple fields, and pattern-based filtering using regular expressions. The guide also explains the role of inverted indexing and how to enable it for enhanced search coverage. Sample queries and output visuals are provided to help users apply these functions effectively in log analysis.
 ---
-
 The full-text search functions allow you to filter records based on keyword or pattern matches within one or more fields. <br>This page lists the full-text search functions supported in OpenObserve, along with their usage formats, descriptions, and examples.
 
----
 
 ### `str_match`
 

Original file line number	Diff line number	Diff line change
`@@ -4,3 +4,4 @@ Learn more:`
`4`	`4`
`5`	`5`	`- [approx_topk](../approximate-aggregate/approx-topk/)`
`6`	`6`	`- [approx_topk_distinct](../approximate-aggregate/approx-topk-distinct/)`
	`7`	`+`