You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/30 06:07:21 UTC

[GitHub] [incubator-druid] ArvinZheng opened a new issue #8609: Zero filling for TopN (and maybe GroupBy)

ArvinZheng opened a new issue #8609: Zero filling for TopN (and maybe GroupBy)
URL: https://github.com/apache/incubator-druid/issues/8609
 
 
   ### Description
   
   Druid supports **zero-filling** for Timeseries queries which is very user-friendly to end users, they don't have to _make up_ those buckets if there is no data. We have been using Druid for years, and we are seeking a solution to implement zero-filling for **TopN** queries (maybe **GroupBy** also) currently, so that we can easy our Druid users life.
   
   ### Motivation
   
   Druid does not support joining fact data to dimension tables currently, so we couldn't run a left join from dimension tables to fact data.
   Consider following scenario for a Advertising DSP platform ,
   1. A user has created 15 campaigns in his account and he wants to see what are the 10 campaigns which has least delivery in a specific time range.
   2. In the specific date range, only 7 campaigns have delivery.
   
   Currently in our system, we have to build another layer (let's call it Data Binding Layer) on top of Druid query to achieve this,
   1. Run TopN with Inverted metric order
   2. We will get 7 rows back for above case since only 7  out of 15 campaigns have delivery in the specified time range.
   3. We have to join the result to our dimension tables in the Data Binding Layer, for all campaigns which can not join to the result, we do the zero-filling here and then sort the result again after filling all missing campaigns.
   
   ### Proposal 
   We could implement the zero-filling in the **TopNResultBuilder** by filling in 0 for all metrics, this feature is available only when there is a list of input values for the dimension in the query, i.e. a **IN filter**.
   When use runs TopN query with a IN filter, they are able to decide if they want the result for all the input dimension values or not, by default we may set it to false for backward compatibility.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org