You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@superset.apache.org by Fabian Menges <fm...@twitter.com.INVALID> on 2017/09/13 15:44:52 UTC

Series limit for Druid

Hi,

The series limit for grouped Druid time series is a little counter
intuitive and not super useful for our use case right now.

Lets say you want to look at a metric (e.g. ad impressions) and group by
advertiser over a longer period of time and your ads show a very bursty
behavior on a per advertiser level.
This leads to a different set of advertisers for every aggregation point in
time of your grouped time series chart. You specify a series limit of 5 but
end up with ~25 (see the screenshots in my merge request).

https://github.com/apache/incubator-superset/pull/3434

What you would expect when you set the series limit is that it will show
the top five accounts over the selected period of time. This is also the
behavior that can already be observed when you group by multiple columns
with a Druid backend as well as the behavior with a SQL backend.

My patch will make a druid TopN query across the entire time range to find
the top X elements returned by the 'group by' and then will run another
TopN query to return the 'group by' time series's only for these elements.
The "druid-groupby" path was already implemented the exact same way.

I don't see a use case for the current behavior but I'm happy to add a flag
in the UI to switch between them if anybody is interested in it.

Let me know what you think,

Fabian