You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Kevin Peek (JIRA)" <ji...@apache.org> on 2016/11/21 19:24:59 UTC

[jira] [Comment Edited] (STORM-2210) ShuffleGrouping does not produce even distribution

    [ https://issues.apache.org/jira/browse/STORM-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684479#comment-15684479 ] 

Kevin Peek edited comment on STORM-2210 at 11/21/16 7:24 PM:
-------------------------------------------------------------

When running the multi threaded version of the test above, I consistently get distributions similar to the ones below. This shows how many times the grouping returned each taskId.

ShuffleGrouping - with Collections.shuffle():
[50, 155, 10775, 284226, 665, 4129]

ShuffleGrouping - without Collections.shuffle():
[50000, 50000, 50000, 50000, 50000, 50000]


was (Author: kevpeek):
When running the multi threaded version of the test above, I consistently get distributions similar to the ones below. This shows how many times the grouping returned each taskId.

ShuffleGrouping (original):
[50, 155, 10775, 284226, 665, 4129]

ShuffleGrouping (new):
[50000, 50000, 50000, 50000, 50000, 50000]

> ShuffleGrouping does not produce even distribution
> --------------------------------------------------
>
>                 Key: STORM-2210
>                 URL: https://issues.apache.org/jira/browse/STORM-2210
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.2
>            Reporter: Kevin Peek
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When testing the ShuffleGrouping in a multithreaded environment, it produces an extremely uneven distribution.
> This appears to be a result of the Collection.shuffle call here. https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58
> Because current was set to zero before the shuffle, other threads are able to access the arrayList while it is being shuffled.
> Stephen's gist here includes a test that results in a very uneven distribution of taskIds from the ShuffleGrouping: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde
> I would have expected the taskIds from the ShuffleGrouping to be almost uniformly distributed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)