You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@superset.apache.org by gi...@git.apache.org on 2017/09/15 20:55:59 UTC

[GitHub] fabianmenges commented on issue #3434: Feature/Fix: times_series limit for Druid

fabianmenges commented on issue #3434: Feature/Fix: times_series limit for Druid
URL: https://github.com/apache/incubator-superset/pull/3434#issuecomment-329899241
 
 
   We are running 0.10.1, we upgraded from 0.9.2 about 2 weeks ago and saw great performance gains across different query types. 
   
   This changeset is to handle Druid behavior that is implemented like this by design (this is not addressing or working around a bug in Druid). If you specify a threshold of 5 for a TopN query, Druid will always return the top 5 results per granularity (lets ignore that its not guaranteed they are the actual top 5 but likely the top 5). 
   As an example of the default TopN behavior, lets say you want to query the top 5 Campaigns with the most (ordered by) "Ad-Impression" on a daily level over a week. Druid will return the top 5 campaigns for Monday, the top 5 for Tuesday, etc... If everyday you have a different set of 5 campaigns you will end up with 7x5 = 35 different Campaigns in your result set, each one with exactly one datapoint. (This is what you can see in the first screenshot)
   
   The argument for this changeset is, that the behavior described is not actually useful and counter intuitive when you want a line-chart for the top 5 Campaigns. What you expect is to see the change over time for the same top 5 Campaigns day by day over the course of the week.  
   The way this is implemented is running one TopN query with granularity all over the entire time range to find the top 5 campaigns and then run a TopN query with granularity of 1 day filtered to the campaigns of the first run.
   
   We have been running this code for about 3 weeks and have not run into problems.
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services