You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/17 16:18:01 UTC

[GitHub] [druid] jobar opened a new issue #11264: SQL query uses TopN when grouping by time and other dimension

jobar opened a new issue #11264:
URL: https://github.com/apache/druid/issues/11264


   ### Description
   
   When querying the SQL druid endpoint with a query doing a group-by on two fields and on of them is time related, the TopN query-type could be used instead of the group-by one, with the time grouping being implemented as "granularity".
   
   Example:
   ```
   EXPLAIN PLAN FOR
   SELECT FLOOR("__time" TO MONTH) AS "__timestamp",
          "my_field" AS "my_field",
          SUM(my_value) AS "sum_my_value"
   FROM my_data_source
   WHERE "__time" >= '...'
     AND "__time" < '...'
   GROUP BY "my_field", "FLOOR("__time" TO MONTH)
   ORDER BY sum_my_value
   LIMIT 500;
   
   DruidQueryRel(query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"my_data_source"}...
   ```
   
   ### Motivation
   
   Group-by queries are a lot more expensive than TopN queries, this change would allow to get results a lot faster and cheaper.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org