You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/12 16:45:52 UTC

[GitHub] [incubator-pinot] kkrugler opened a new issue #6676: Support multiple columns in distinct count aggregations

kkrugler opened a new issue #6676:
URL: https://github.com/apache/incubator-pinot/issues/6676


   Currently these [Pinot aggregations](https://docs.pinot.apache.org/users/user-guide-query/supported-aggregations) only work with a single column:
   
   - DISTINCTCOUNT
   - DISTINCTCOUNTHLL
   - DISTINCTCOUNTRAWHLL
   - DistinctCountThetaSketch
   - DistinctCountRawThetaSketch
   - DISTINCTCOUNTMV
   - DISTINCTCOUNTHLLMV
   - DISTINCTCOUNTRAWHLLMV
   
   This becomes a problem when you need to get the count of the total number of groups from an aggregation (e.g. to support deeper paging in the dashboard UI), and more than one column is being used for grouping. For example, the query `select advertiser,publisher,sum(adSpend) from table group by advertiser,publisher order by sum(adSpend) desc limit 1000` is using two columns (`advertiser,publisher`) for grouping.
   
   The current workaround is do use `concat` to build a single key, e.g. `select distinctcounthll(concat(advertiser, publisher, '|')) from table`, but that suffers from the performance penalty of using the `concat` scalar UDF.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org