You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/04/08 18:01:21 UTC

[GitHub] [incubator-druid] leerho edited a comment on issue #7187: Improve topN algorithm

leerho edited a comment on issue #7187: Improve topN algorithm
URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-480939628

@peferron
> top songs by unique viewers

Yes.

> additional aggregations could be computed at the same time
Yes, with a caveat. For the example you gave: `SELECT IPAddress, COUNT(DISTINCT UserID), COUNT(*) GROUP BY 1 ORDER BY 2 DESC LIMIT 10` A sketch could be constructed to handle that specific query. However, handling arbitrary additional aggregations is more challenging as we would have to assume that the additional fields are simple counters, and restrict the type of aggregation to a simple addition.

The more generic we try to make this, the more challenging it will be to configure and the performance will be impacted.

1. Can we start with just the "top songs by unique users" and characterize that first?

2. Will you need an actual published artifact Jar to test this. Or would a jar generated from master be OK for your testing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org