Skip to content

Commit 77d6bc3

Browse files
update sql function reference page
1 parent 78c6380 commit 77d6bc3

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

docs/sql_reference.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,11 @@ ORDER BY request_count DESC
494494

495495
**Why Results Are Approximate** <br>
496496

497-
Results are approximate because some globally significant IPs might not appear in individual nodes' top 10 lists due to uneven data distribution across nodes. For example, an IP with moderate traffic across all nodes might have a high global total but not rank in any single node's top 10.
497+
The approx_topk function returns approximate results because it relies on each query node sending only its local top N entries to the leader. The leader combines these partial lists to produce the final result.
498+
499+
If a value appears frequently across all nodes but never ranks in the top N on any individual node, it is excluded. This can cause high-frequency values to be missed globally.
500+
501+
For example, if an IP receives 400, 450, and 500 requests across three nodes but ranks 11th on each, it will not appear in any node’s top 10. Even though the global total is 1,350, it will be missed.
498502

499503
**Limitations** <br>
500504

@@ -515,7 +519,7 @@ ORDER BY request_count DESC
515519
- **field2**: The field to count distinct values of.
516520
- **k**: Number of top results to return.
517521

518-
- Uses HyperLogLog algorithm for efficient distinct counting and Space-Saving algorithm for top-K selection on high-cardinality data.
522+
- Uses [**HyperLogLog** algorithm] for efficient distinct counting and Space-Saving algorithm for top-K selection on high-cardinality data.
519523
- Results are approximate due to the probabilistic nature of both algorithms and distributed processing across partitions.
520524

521525
**Example:**
@@ -568,7 +572,7 @@ ORDER BY distinct_count DESC
568572
{"clientip":"172.16.0.30","distinct_count":790}
569573
{"clientip":"192.168.1.150","distinct_count":690}
570574
```
571-
??? info "The HyperLogLog Algorithm Explained:"
575+
??? info "The HyperLogLog Algorithm Explained:"
572576
**Problem Statement**
573577

574578
Traditional `GROUP BY` operations with `DISTINCT` counts on high-cardinality fields can cause memory exhaustion in distributed systems. Consider this query:

0 commit comments

Comments
 (0)