You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/04/05 17:03:30 UTC

[GitHub] [incubator-druid] leerho commented on issue #7187: Improve topN algorithm

leerho commented on issue #7187: Improve topN algorithm
URL: https://github.com/apache/incubator-druid/issues/7187#issuecomment-480349933
 
 
   @peferron Thank you for your thoughtful comments.  Clearly we have to leave it up to an informed user to decide.  And all we can do is do our best to make sure that he/she is informed.
   
   ----
   
   Bouncing back to the top of this thread, we are developing a new sketch that we are tentatively calling "Frequent Unique Nodes" (FUN).  
   
   Suppose you have a stream that contains pairs {IP address, UserID}, and you wish to identify the IP addresses that have the largest number of unique users.  In this context think of a large graph where the IP addresses and users are nodes in the graph.  Consider Node1 = IP and Node2 = ID, then we want to identify the Node1s that have the largest number of unique Node2s.  Conversely, it might also be interesting to identify the Node2s (IDs) that have the largest number of unique Node1s (IPs).   Conceptually, this can also be extended to more that just 2 nodes (although don't go nuts with this!).
   
   With this new sketch you will be able to perform these types of queries and have some guarantees of accuracy as well.
   
   If this is of interest, please let me know, as we could use your help in characterizing and performance testing of this, if possible.
   
   Lee.
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org