You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by eshioji <gi...@git.apache.org> on 2015/05/29 12:11:36 UTC

[GitHub] storm pull request: STORM-839

GitHub user eshioji opened a pull request:

    https://github.com/apache/storm/pull/565

    STORM-839

    This fixes the reported deadlock between `disruptor-worker-transfer-queue` thread and `client-worker` thread, which seem to have been introduced by STORM-329.
    
    After reviewing the `v0.9.4` code, my conclusion was that the background flushing task can be removed without real change in the behavior. By doing this, the synchronization that was involved in the deadlock can be removed altogether.
    
    My reasoning is as follows:
     - One has three option to deal with Netty's buffer filling up:
      1. Discard incoming new messages
      2. Block client thread until there is space (back pressure)
      3. Keep buffering up until OOME is thrown
     - My guess is that the `v0.9.4` code attempted to implement option (i), but actually the behavior is option (iii). When Netty's `Channel.isWritable` returns false in `Client.send`, the thread avoids writing to the `Channel` and leaves messages in the `Channel.messageBatch` field, which the background task flushes when the `Channel.isWritable` starts to return true again. However, if `Client.send` is called again before this (which should be common), the content of `Channel.messageBatch` is written and buffered on the `Channel` anyways, because `flushMessages` does not check `Channel.isWritable`.
     - AFAIU we like option (ii), but this requires more work. Between option (i) and option (iii), IMO (iii) is superior because if OOME happens, the user can reduce MAX_PENDING_TUPLES to prevent this. Discarding messages would be harder to diagnose.
     - If we are content with option (ii), we can remove the background flushing task and related synchronization altogether, removing the source of deadlock. We'd simply write and buffer onto the unbounded buffer of `Channel`, and if there are too many pending messages, the worker will die of OOME, and the user should reduce this with indirect means like reducing MAX_PENDING_TUPLES until option (ii) is implemented in the future.
    
    Thoughts @miguno ?
    
    
    
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eshioji/storm STORM-839

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #565
    
----
commit 98f4e619d54052b73a309d23ab7214953e4c7774
Author: Sriharsha Chintalapani <ma...@harsha.io>
Date:   2015-02-05T18:08:05Z

    STORM-130: Supervisor getting killed due to java.io.FileNotFoundException: File '../stormconf.ser' does not exist.
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit a1e5893e1b94c224d39fedf11583b216c21351c8
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-02-24T20:46:12Z

    update changelog for STORM-130

commit 62788f295bb1fb1cc83b99c30f82beb40eea5f25
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-02-24T23:03:40Z

    port STORM-329 fix to 0.9.x

commit 81016c2ed7222da99138bc9971e335533d4cb518
Author: Michael G. Noll <mn...@verisign.com>
Date:   2015-02-16T09:01:27Z

    Track how many messages are being dropped when a connection is unavailable
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit 97a76fc896de508f015dbe32f1473ddbf10d736b
Author: Michael G. Noll <mn...@verisign.com>
Date:   2015-02-16T09:03:07Z

    Clarify name of method for dropping messages
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit 9138d9fc255639b4d0d43657379ce467591e8ef2
Author: Michael G. Noll <mn...@verisign.com>
Date:   2015-02-16T09:07:35Z

    Change log level for intentionally dropping messages from WARN to ERROR
    
    This change makes the log level for dropping messages consistent in
    Client.java.
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit 6b06d8468ff5e743fb12b85dd84fe0931041c2c3
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-02-24T23:18:43Z

    add STORM-329 to changelog

commit e63fb2af9086e2b2e688662ca42a4b4d0112274b
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-03-03T00:06:58Z

    STORM-693: when bolt fails to write tuple, it should report error instead of silently acking.
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit 92836de540ec8ab90d7591b96ba02126e80b5c3a
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T14:59:56Z

    add STORM-693 to changelog

commit c19e482b70f18d690ad165c78551860506486095
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-02-20T19:56:22Z

    STORM-682: supervisor should handle worker state corruption gracefully.
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit f0de11a20fe2f20dc1dc2f485549e0dc342f8680
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T15:05:30Z

    add STORM-682 to changelog

commit 835a410c879dc1eb02d9670410f65fe0be6f28c6
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-01-14T20:27:35Z

    STORM-559: ZkHosts in README should use 2181 as port.
    
    Signed-off-by: P. Taylor Goetz <pt...@gmail.com>

commit 30e0be8616c89cb1f8a51fcf462f76a075e6e964
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T15:11:16Z

    add STORM-559 to changelog

commit b1bbacb7134d17ff47c2e8b8857a66244a4d1d4f
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T15:28:11Z

    add missing import in supervisor.clj

commit edf596bac8feab0c8721f7de94474e5549858355
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T16:21:38Z

    [maven-release-plugin] prepare release v0.9.4

commit 48d10e20eb3c750fc41fcf0bef3d49501cf6d5a4
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T16:21:45Z

    [maven-release-plugin] prepare for next development iteration

commit 41f44f9914d4f27d0db3f211a85f88301533f09b
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T17:59:39Z

    Revert "[maven-release-plugin] prepare for next development iteration"
    
    This reverts commit 48d10e20eb3c750fc41fcf0bef3d49501cf6d5a4.

commit 233603c3cbd729fdfabd2759bfa7705811996aa4
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T18:00:11Z

    Revert "[maven-release-plugin] prepare release v0.9.4"
    
    This reverts commit edf596bac8feab0c8721f7de94474e5549858355.

commit 61e1b5c3e226122143c91bbd7527b605f8ed8727
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T18:05:51Z

    add Apache license header to ConnectionWithStatus.java

commit 00091d7952681a39281aa171adfad133a5e26330
Author: P. Taylor Goetz <pt...@gmail.com>
Date:   2015-03-18T18:13:57Z

    [maven-release-plugin] prepare release v0.9.4

commit ed8ab3ec194f19c75fc2f5c000609204f04b50e8
Author: Enno Shioji <es...@gmail.com>
Date:   2015-05-28T19:42:05Z

    Simplified the flow and removed the lock that was causing the deadlock

commit 91b8eb3840432e47b79f40abebec8304627732a8
Author: Enno Shioji <es...@gmail.com>
Date:   2015-05-28T19:46:17Z

    Bump version

commit b7d84bdc7fd3de34f45a94131cdbb6bfbd3763dc
Author: Enno Shioji <es...@gmail.com>
Date:   2015-05-28T21:27:31Z

    Remove background flushing because it doesn't seem necessary. Netty's Channel queues up written data on an unbounded buffer. The background flushing seems to have been added to avoid this, but in practice it was probably doing it anyways because flushMessages(), which is called by send() doesn't check for isWritable. Moreover, queuing on an unbounded buffer seems fine because back pressure is provided by MAX_PENDING_TUPLE. If OOME occurs due to this buffer overflowing, it seems reasonable that one has to reduce MAX_PENDING_TUPLE, rather than Storm trying to cope with it by dropping messages.

commit 679e42bc1e38f51c2759667b03cb45322c6a793b
Author: Enno Shioji <es...@gmail.com>
Date:   2015-05-28T21:31:35Z

    Change to a SNAPSHOT version for deployment purposes

commit 27a92e2aa3488c0203f500306e0583ff9e7e1e82
Author: Enno Shioji <es...@gmail.com>
Date:   2015-05-29T09:32:16Z

    Remove (now) dead comment and code

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-839

Posted by eshioji <gi...@git.apache.org>.
Github user eshioji closed the pull request at:

    https://github.com/apache/storm/pull/565


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-839

Posted by eshioji <gi...@git.apache.org>.
Github user eshioji commented on the pull request:

    https://github.com/apache/storm/pull/565#issuecomment-106765335
  
    Argh sorry I should have opened this against Storm v0.9.4, I worked off v0.9.4 tag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---