Skip to content

feat: topk in group by in counts api#1702

Merged
nikhilsinhaparseable merged 1 commit into
parseablehq:mainfrom
nikhilsinhaparseable:counts-groupby-topk
Jun 24, 2026
Merged

feat: topk in group by in counts api#1702
nikhilsinhaparseable merged 1 commit into
parseablehq:mainfrom
nikhilsinhaparseable:counts-groupby-topk

Conversation

@nikhilsinhaparseable

@nikhilsinhaparseable nikhilsinhaparseable commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

current: counts api returns the date binned result for each distinct value for the field(s) present in group by

if the field used is high cardinal, the response will be bloated
Prism uses this response to build a chart that might crash in render

change: add optional topk param to group by
default to 10
api returns topk results for the group by field(s)

Summary by CodeRabbit

  • New Features

    • Added support for limiting count and group-by results with a top-N setting.
    • Enabled flexible input names for the new setting in requests.
  • Bug Fixes

    • Improved SQL generation to correctly quote identifiers.
    • Adjusted grouped count queries to return only the highest-ranking groups.
    • Added validation to reject invalid zero-value top-N requests.

current: counts api returns the date binned result for each distinct value
for the field(s) present in group by

if the field used is high cardinal, the response will be bloated
Prism uses this response to build a chart that might crash in render

change: add optional topk param to group by
default to 10
api returns topk results for the group by field(s)
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

src/query/mod.rs adds a top_k parameter to CountConditions and reworks get_df_sql to emit WITH grouped_counts/top_groups CTEs when group-by columns are present, limiting results to the top N groups. A quote_identifier helper is added, a DEFAULT_COUNTS_TOP_K constant introduced, and zero-value validation enforced.

Changes

top_k support for grouped count queries

Layer / File(s) Summary
Constant, CountConditions.top_k field, and quote_identifier helper
src/query/mod.rs
Adds DEFAULT_COUNTS_TOP_K constant, extends CountConditions with top_k: Option<usize> accepting topk/top_k serde aliases, and adds private quote_identifier function for safe SQL identifier embedding.
get_df_sql identifier quoting and top-K grouped query generation
src/query/mod.rs
Quotes table name, time column, and group-by column identifiers in get_df_sql. For ungrouped queries emits a plain SELECT; for grouped queries builds WITH grouped_counts and top_groups CTEs, defaulting top_k to DEFAULT_COUNTS_TOP_K and rejecting top_k == 0.
Tests for deserialization, SQL shape, defaulting, and zero rejection
src/query/mod.rs
Adds tests covering topK/topk JSON deserialization into CountConditions.top_k, the generated SQL CTE structure for grouped queries, default top_k fallback behavior, and the error returned for top_k = 0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • parseablehq/parseable#1601: Modifies the same get_df_sql function in src/query/mod.rs to add group_clause to the grouped counts SELECT/GROUP BY/ORDER BY, directly preceding the top-K restructuring introduced here.

Poem

🐇 Hippity-hop through SQL rows,
With top_k now, the count query flows!
A CTE here, a subquery there,
No zero allowed — we handle with care.
The top groups rise, the rest stand down,
This bunny built a query crown! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title matches the main change: adding topk support to counts group-by responses.
Description check ✅ Passed The description explains the problem, the proposed change, and the default behavior, though it omits the template's issue link and checklist details.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/query/mod.rs (1)

1063-1078: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the top_k alias in the deserialization test.

The struct explicitly accepts top_k, but this test only locks topK and topk. Add one assertion for the snake_case request key.

Suggested test addition
         let conditions: CountConditions = serde_json::from_value(json!({
             "groupBy": ["host"],
             "topk": 3
         }))
         .unwrap();
         assert_eq!(conditions.top_k, Some(3));
+
+        let conditions: CountConditions = serde_json::from_value(json!({
+            "groupBy": ["host"],
+            "top_k": 7
+        }))
+        .unwrap();
+        assert_eq!(conditions.top_k, Some(7));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/query/mod.rs` around lines 1063 - 1078, The CountConditions
deserialization test only verifies the camelCase and lowercase aliases, so add
one assertion for the snake_case request key to cover the explicitly supported
top_k field. Update test_count_conditions_accepts_top_k in src/query/mod.rs to
deserialize a payload using top_k and assert it maps to conditions.top_k,
alongside the existing topK and topk checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/query/mod.rs`:
- Around line 721-722: The top_groups subquery in the query-building logic uses
ORDER BY SUM("count") DESC LIMIT {top_k}, which can return different tied groups
across runs. Update the ordering in the top_groups SELECT to append the group
columns as a deterministic tie-breaker after the count sort, keeping the change
within the formatted SQL assembled in src/query/mod.rs near the
grouped_counts/top_groups query construction.

---

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 1063-1078: The CountConditions deserialization test only verifies
the camelCase and lowercase aliases, so add one assertion for the snake_case
request key to cover the explicitly supported top_k field. Update
test_count_conditions_accepts_top_k in src/query/mod.rs to deserialize a payload
using top_k and assert it maps to conditions.top_k, alongside the existing topK
and topk checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 705e634b-9fb7-4f48-8516-3ab0cd55a458

📥 Commits

Reviewing files that changed from the base of the PR and between 1998658 and aacfe91.

📒 Files selected for processing (1)
  • src/query/mod.rs

Comment thread src/query/mod.rs
@nikhilsinhaparseable nikhilsinhaparseable merged commit 5e7a0f2 into parseablehq:main Jun 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants