You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by revans2 <gi...@git.apache.org> on 2015/09/21 22:03:54 UTC

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

GitHub user revans2 opened a pull request:

    https://github.com/apache/storm/pull/750

    STORM-350: Upgrade to newer version of disruptor

    This upgrades to version 3.3.2 of the Disruptor Queue.  There have been questions about stability in the past, and also out of order delivery.
    
    I really wanted to be sure that everything would be about the same.  I ran the DisruptorQueue related unit tests over the weekend and got no failures at all, with well over 10,000 runs.
    
    I did some performance tests too using the FastWordCountTopology I added as a part of this.  I ran 5 times with the original 0.11.0-SNAPSHOT this is based off of (e85921035fa9bb59d25f0347dc6d26002aac9fab) and with this branch.  By setting the topology.max.spout.pending to 200 I got essentially identical results.  The new Queue was slightly faster but it was small enough it could just be noise.
    
    Similarly when I did not set topology.max.spout.pending and relied on the automatic throttling I got very similar numbers, although the variance between the runs was much higher so having a real comparison there is much more difficult.
    
    @HeartSaVioR in the past you did some testing to see if out of order delivery was happening, I would love it if you could take a look at this patch and test it similarly.
    
    Anecdotally I have seen this version behave better than the current one we are using.  I have seen no NPEs from tuples disappearing and I have seen that show up in some of my stress testing using the old queue.  Again I don't know how often this happens so I cannot guarantee that it was a disruptor bug.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/revans2/incubator-storm disruptor-upgrade

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #750
    
----
commit 46df80cfc63f01c3a5587a879ce0d721fad3f07f
Author: Robert (Bobby) Evans <ev...@yahoo-inc.com>
Date:   2015-09-18T21:33:39Z

    STORM-350: Upgrade to newer version of disruptor

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142156078
  
    +1 LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142097964
  
    The travis failure looks unrelated.  I think we need to fix how we do ports with the DRPC test.  It seems that every so often we cannot bind to the port we want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-147728516
  
    @HeartSaVioR we started doing some stress/load testing on 0.10 + this patch and ran into the NPE issue.  I am going to close this pull request until the NPE can be resolved, but it looks like it is something with disruptor itself.  The exact same code but with the version of disruptor reverted back saw no issues.
    
    I will also update my batching pull request #765 to be based off of the original version of the disruptor queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142128746
  
    @revans2 
    Patch looks good overall.
    I'll run performance test and see there's no failed tuple. If it doesn't show failed tuples I think we're safe to merge. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-143089582
  
    @revans2 
    I mailed you to attach jar file. If you can't receive it, I'll find another way to share. (via dropbox?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-143017314
  
    @HeartSaVioR 
    I added an in-order test case that I ran for over 50 mins on both Linux and my Mac and they both passed with no errors and each processed over 35 million tuples.  Any help you could give me in reproducing the failures would be greatly appreciated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142605839
  
    I pushed the fix, please try again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-143236543
  
    @HeartSaVioR Oh you are using my perftest https://github.com/yahoo/storm-perf-test.  I will try it out and see if there are any failures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142601291
  
    @HeartSaVioR I figured out the failed tuples and will be pushing a new patch shortly.  This is the same null issue we were seeing previously.  Just with the newer disruptor it exacerbates the two issues.
    
    First MutableObject is not synchronized so it is possible for the data to not be flushed to memory and if the receiver thread is on a different core you can get a null out.
    
    Additionally I noticed that the non-bocking receive seems to not be putting the correct barriers in place, and the polling can try to read a MutableObject that has not been updated yet, and is still set to null.  The only way I could find to fix it was to implement it in terms of the blocking call.  From my performance tests I did not see any impact from these changes.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 closed the pull request at:

    https://github.com/apache/storm/pull/750


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-146045675
  
    @revans2 
    It just dropped throughput when tuples are being failed. 
    It occurred nearly end of performance test, so I'd like to test it again with longer test duration.
    If it occurs nearly end of performance test again, I also treat it to other issue, maybe transfer issue while worker shutdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-143259024
  
    @HeartSaVioR I ran a number of tests, and I am fairly sure that what you are seeing is a failure in the messaging layer and not disruptor.  I limited the number of workers to just 1 so that I would be sure to isolate things.
    ```storm jar storm_perf_test.jar com.yahoo.storm.perftest.Main --ack --ackers 8 --bolt 8 --maxPending 200 --workers 1 --spout 8 --testTime 8000 --pollFreq 30```
    
    I didn't run for the full 8000 seconds, but part way through I got this screen shot.
    
    ![run3](https://cloud.githubusercontent.com/assets/3441321/10105106/a3230edc-6372-11e5-9f68-be0e5994f5d3.png)
    
    I hope 185 million tuples from this test plus the 35 million tuples from my in-order test all with no failures is enough to convince you that the new disruptor queue is solid.  If you want me to try and debug the messaging layer and try to reproduce the failures you saw I am fine with that, but I would like to do it as a part of another JIRA.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142935635
  
    @HeartSaVioR can you share the topology that you are running so I can try and reproduce/debug it locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142443503
  
    @revans2 
    
    I ran performance test from small cluster (three machines, each one has zookeeper and supervisor, and only one has zookeeper, nimbus, supervisor) with configuration below, 
    
    > com.yahoo.storm.perftest.Main --ack --bolt 4 --name test -l 1 -n 1 --workers 4 --spout 3 --testTimeSec 900 -c topology.max.spout.pending=1092 --messageSize 10
    
    Unfortunately I'm seeing failed tuples again.
    
    <img width="1264" alt="storm-ui-failed-tuple-appeared" src="https://cloud.githubusercontent.com/assets/1317309/10033834/d29c01d0-61c6-11e5-8cfd-381c693b1978.png">
    
    I packaged perf test with 0.10.0-rc of storm-core, but I think it doesn't matter cause its scope is provided.
    
    I still don't know why this happens. According to log files, there's no Netty connection issue.
    
    Could you test it too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-350: Upgrade to newer version of disrupt...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/750#issuecomment-142759183
  
    @revans2 
    Seems like it doesn't resolve the failed tuples. First time I can't see failed tuple, but at second trial it appears.
    <img width="1205" alt="storm_ui-suspicious-2" src="https://cloud.githubusercontent.com/assets/1317309/10061688/14af5f3e-6296-11e5-91d3-6d22a60f81e9.png">



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---