You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by HeartSaVioR <gi...@git.apache.org> on 2017/08/24 15:59:57 UTC

[GitHub] storm pull request #2293: STORM-2231 Fix multi-threads issue on executor sen...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/2293

    STORM-2231 Fix multi-threads issue on executor send queue

    [STORM-2231](https://issues.apache.org/jira/browse/STORM-2231) is an issue to report broken thread-safety on output collector. We mostly considered about grouper implementation, but there's a spot in other side as well.
    
    In DisruptorQueue, each incoming threads have each ThreadLocalBatcher, meaning that 'synchronized' in add() can synchronize incoming thread and background flusher thread, but not between incoming threads.
    Hopefully DisruptorQueue publish*() are implemented to respect thread-safety, but it relies on RingBuffer producer option (Sequencer in RingBuffer), and unfortunately we set producer to SINGLE, not MULTI for executor batch queue. It seems one of optimization but we seemed to miss a spot.  
    
    So there're two simple options to choose: just using same ThreadLocalBatcher instance for all incoming thread (class name should be changed) or just changing producer mode to MULTI. This patch addresses later option, because it is just simple, and expected to be faster than former.
    
    The change is expected to bring any kind of performance hit, so may also need to do some performance tests against the patch.
    
    @revans2 Could you take a look at? It is related to disruptor batching.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-2231

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2293
    
----
commit 2b15fc4fc7189f2f42a0fea13f2ca00a6675d6c7
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2017-08-24T11:11:41Z

    STORM-2231 Fix multi-threads issue on executor send queue
    
    * we have use cases which launches threads and emit/ack concurrently
    * launched threads will create ThreadLocalBatcher for each
      * hence 'synchronized' in add() wouldn't help unlike background flushes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    @kevinconaway 
    I think I'm late on commenting to JIRA issue. Performance hit is not avoidable in this case.
    
    As I commented earlier, if we have consensus that the use case is minor (initiating poll to user@/dev@, or any other approaches), we can consider about not supporting thread-safety on output collector for the benefit of major use cases. We could even make grouper code thread-unsafe and gain more performance improvement if we're OK to drop it.
    
    If we drop thread-safety of output collector, even wrapping output collector will not work if you have the case - N tasks : 1 executor, because output collector instance is created per each task but executor send queue is shared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    FYI: also merged into 1.x version line (1.x, 1.1.x, 1.0.x) via cherry-picking https://github.com/apache/storm/commit/501ef1489813753b9bbec72c0b43b6b696277c4a


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by kevinconaway <gi...@git.apache.org>.
Github user kevinconaway commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    Can you explain the results a bit? From my reading, the number of tuples acked/second is nearly the same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    Attached perf. test numbers.
    
    TVL, rate 100000, topology.max.spout.pending=5000
    
    > before patch
    
    uptime:  241 acked: 3,015,580 acked/sec: 100,519.33 failed:        0 99%:      48,562,175 99.9%:      68,943,871 min:       6,598,656 max:     113,115,135 mean:   16,473,567.56 stddev:    7,837,977.69 user:    209,500 sys:     31,370 gc:      4,059 mem:     603.67
    uptime:  241 acked: 3,015,560 acked/sec: 100,518.67 failed:        0 99%:      59,047,935 99.9%:     118,292,479 min:       6,529,024 max:     190,054,399 mean:   16,693,789.11 stddev:   10,039,823.68 user:    210,070 sys:     31,890 gc:      3,942 mem:     696.79
    uptime:  240 acked: 3,013,360 acked/sec: 100,445.33 failed:        0 99%:      43,450,367 99.9%:      60,948,479 min:       6,533,120 max:      95,354,879 mean:   16,021,801.15 stddev:    6,956,089.85 user:    209,140 sys:     32,040 gc:      3,699 mem:     565.77
    
    > after patch
    
    uptime:  240 acked: 3,013,520 acked/sec: 100,450.67 failed:        0 99%:      70,975,487 99.9%:     126,418,943 min:       6,533,120 max:     209,321,983 mean:   17,371,349.66 stddev:   11,903,746.03 user:    212,520 sys:     30,080 gc:      4,248 mem:     680.16
    uptime:  240 acked: 3,014,140 acked/sec: 100,471.33 failed:        0 99%:      87,425,023 99.9%:     148,504,575 min:       6,488,064 max:     240,517,119 mean:   18,471,646.86 stddev:   14,432,363.00 user:    213,850 sys:     30,490 gc:      4,334 mem:     656.84
    uptime:  240 acked: 3,015,380 acked/sec: 100,512.67 failed:        0 99%:      49,119,231 99.9%:      77,791,231 min:       6,656,000 max:     132,579,327 mean:   16,009,039.03 stddev:    7,829,363.14 user:    206,440 sys:     31,640 gc:      3,828 mem:   1,066.89
    
    It seems to be clear performance hit which is expected. 
    
    So maybe, 2 cents, if we have consensus that the use case is minor (initiating poll to user@/dev@, or any other approaches), we can consider about not supporting thread-safety on output collector for the benefit of major use cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #2293: STORM-2231 Fix multi-threads issue on executor sen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/2293


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    @kevinconaway
    TVL topology tries to keep up acked/second rate so unless topology can't keep up, it should be similar to what we set to rate. (100000)
    
    So this tries to lock throughput, and see latency and also CPU / memory usage. The result fluctuates, so we run the test multiple times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2293: STORM-2231 Fix multi-threads issue on executor send queue

Posted by kevinconaway <gi...@git.apache.org>.
Github user kevinconaway commented on the issue:

    https://github.com/apache/storm/pull/2293
  
    So is the performance hit not a concern here?
    
    Is the workaround to synchronize on the outputcollector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---