You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by kevpeek <gi...@git.apache.org> on 2016/11/21 19:07:16 UTC

[GitHub] storm pull request #1790: STORM-2210 - remove array shuffle from ShuffleGrou...

GitHub user kevpeek opened a pull request:

    https://github.com/apache/storm/pull/1790

    STORM-2210 - remove array shuffle from ShuffleGrouping when counter resets to zero

    This Collection.shuffle is problematic for 2 reasons.
    
    (1) in the existing implementation, the code resets current to zero before doing the shuffle, which opens a window during which other threads may access the not-yet-reshuffled array (or the mid-shuffle array).
    (2) If we call Collection.shuffle() before resetting current to zero, there is still a chance that some other thread is being slow and trying to access one of the upper indexes. E.g. choices.get(size - 3). Shuffling the array first could cause this thread to get the post-shuffle value instead of the pre-shuffle value.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevpeek/storm STORM-2210

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1790
    
----
commit 980af8083aaf185fad3db476dd765abe4ea9cda4
Author: Kevin Peek <kp...@salesforce.com>
Date:   2016-11-21T18:53:33Z

    remove array shuffle from ShuffleGrouping when counter resets to zero

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @kevpeek No need, like I said it is a clean cherry pick.  We just have a rule that we need to merge code into master/newer versions before it goes into older versions.
    
    We also  have a 24 hour waiting period for merging in new code.  So I will likely pull this in tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @kevpeek 
    Could you squash your commits into one? The change is small enough and other commits are closer to code style. 
    I can handle it while merging, but PR for non-master branch is not affected by `commit message for automatic close issue/PR`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    NoneGrouping just runs Random.nextInt() for deciding 'tasks' (so it will follow uniform distribution), whereas ShuffleGrouping simply does Round-Robin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by kevpeek <gi...@git.apache.org>.
Github user kevpeek commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @revans2 should I create a new PR against master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by ohadedel <gi...@git.apache.org>.
Github user ohadedel commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    Sound reasonable :)
    
    The documentation in storm was a bit misleading, made us think that its being implemented the same.
    
    "None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible)."
    
    One last question @HeartSaVioR , we are running on v0.9.3 for the last two years, and i am working to upgrade us to v1.0.x 
    There are a few guys at my company that say: maybe we should wait and make sure the version will be stabler.
    This issue with the ShuffleGrouping is pretty small bug but it cause a very big performance issue and we were surprise to be one of the first to stumble on it, so at first we thought we did something wrong while installing.
    
    Do you have any better insight to the version? because i really want to bring all the new features that this version has.
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    I keep beating my head against this to think if there are issues with this, but I honestly don't see any.  Long term I would like to see more of a combining of shuffle grouping and the shuffle or local grouping so that it favors local when possible, but when overloaded it starts to send tuples elsewhere.  But until then I think this is fine.  +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by ohadedel <gi...@git.apache.org>.
Github user ohadedel commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    Hi, I don't know what storm release policy, but any chance to see a stable version with this bug fix release any time soon (v1.0.3).
    What we see in our environment is that the host with the spout is almost idle with the current implementation, thats a major problem.
    
    P.S. any reason that NoneShuffle works differently?
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @Crim 
    One thing to be clear, please don't consider my comment as on behalf of community's opinion. Opinions are completely on my own.
    
    To be honest, I'm not a big fan of test coverage for such a complicated thing. What we call 'critical' is often from race condition, and I'm not sure these things can be covered from tests. I agree that testable things should have tests so always good to have tests, but there're still edge cases.
    
    We have Travis CI for checking the unit tests and some integration tests. We're manually testing the patch before checking in unless there's a small change so don't need to be tested. Basic manual tests are done while voting for release. Two things we don't have are automatic system tests and automatic performance tests, as it's very hard to get stable result automatically, and also we don't have INFRA to construct.
    
    Along with Apache Storm community, there're enterprise customers as well and I guess service vendors are fixing reported bugs fast and these are also contributed to Apache Storm as well. More eyes and more early adopters can greatly increase project's health.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    The only other comment that I would have is that it would be nice to have a pull request against master instead of 1.x, but that is minor as it is a clean cherry-pick


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by kevpeek <gi...@git.apache.org>.
Github user kevpeek commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    Great. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1790: STORM-2210 - remove array shuffle from ShuffleGrou...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1790


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @ohadedel 
    I don't know the intention of 'equivalent' (implementation, or result), but if it means only the result, the result of twos would be similar if you think far. They provide uniform distribution.
    
    Regarding stable version, personally I had been thinking v1.0.2 is fairly stable since we fixed numerous issues on v1.0.1 and v1.0.2, and after v1.0.2 there're no performance-affecting critical bugs reported before this one.
    
    Below links point the issues for next bugfix (v1.0.3) and next minor (v1.1.0):
    * v1.0.3: https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%201.0.3%20AND%20project%20%3D%20STORM%20order%20by%20priority
    
    * v1.1.0: https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%201.1.0%20AND%20fixVersion%20not%20in%20(1.0.0%2C%201.0.1%2C%201.0.2)%20AND%20project%20%3D%20STORM%20order%20by%20priority
    
    There's no 'completely stable' in every projects since we're adding the code anyway and no one ensure there's no bug on that version. If the critical bugs are reported well from the community and the community reacts it faster and release the new version timely, IMHO it is another kind of stable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    I looked the implementation, and without locking array shouldn't be modified.
    Nice finding. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by Crim <gi...@git.apache.org>.
Github user Crim commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    Your comment is a tad worrisome.  Is test coverage not at a place where project contributors can feel good about a release?  Or is each release essentially QA'd by early adopters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by kevpeek <gi...@git.apache.org>.
Github user kevpeek commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    @HeartSaVioR  I created a PR on master with the squashed version of these commits. I hope that's flexible enough to do what you need. https://github.com/apache/storm/pull/1797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1790: STORM-2210 - remove array shuffle from ShuffleGrou...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1790#discussion_r88975651
  
    --- Diff: storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java ---
    @@ -17,17 +17,13 @@
      */
     package org.apache.storm.grouping;
     
    -import java.io.Serializable;
    -import java.util.ArrayList;
    -import java.util.Arrays;
    -import java.util.Collections;
    -import java.util.List;
    -import java.util.Random;
    -import java.util.concurrent.atomic.AtomicInteger;
    -
     import org.apache.storm.generated.GlobalStreamId;
     import org.apache.storm.task.WorkerTopologyContext;
     
    +import java.io.Serializable;
    +import java.util.*;
    --- End diff --
    
    nit: we usually discourage widcard imports.  They don't always maintain compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1790: STORM-2210 - remove array shuffle from ShuffleGrouping wh...

Posted by Crim <gi...@git.apache.org>.
Github user Crim commented on the issue:

    https://github.com/apache/storm/pull/1790
  
    I feel you, tests can't cover everything.  Performance tests are even more difficult to put together something that's actually valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---