You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2015/01/20 18:02:34 UTC

[jira] [Commented] (STORM-632) New grouping for better load balancing

    [ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284015#comment-14284015 ] 

Robert Joseph Evans commented on STORM-632:
-------------------------------------------

[~azaroth]
I would love to see this pulled in to Storm.  If you want to put up a pull request based on the code in your branch that would be great.  My only comment is that it would be nice to have the partial key grouping match the fields grouping in how field names are passed in, but that would take some tighter integration with storm to do that cleanly.  If you don't feel comfortable making those changes yourself, please let me know.  I cannot promise I'll get to it any time soon though.

> New grouping for better load balancing
> --------------------------------------
>
>                 Key: STORM-632
>                 URL: https://issues.apache.org/jira/browse/STORM-632
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: Gianmarco De Francisci Morales
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping.
> We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms.
> We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2].
> We believe it should be very easy to port from Java.
> For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting.
> References:
> [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)