You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/06/24 21:52:14 UTC

[GitHub] storm pull request #2737: (1.x) STORM-3122 Avoid supervisor being crashed du...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/2737

    (1.x) STORM-3122 Avoid supervisor being crashed due to race condition between "async localizer" and "update blob" timer thread

    There's race condition between "async localizer" and "update blob" timer thread.
    
    When worker is shutting down, reference count for blob will be 0 and supervisor will remove actual blob file. There's also "update blob" timer thread which tries to keep blobs updated for downloaded topologies. While updating topology it should read some of blob files already downloaded assuming these files should be downloaded before, and the assumption is broken because of async localizer.
    
    @arunmahadevan suggested an approach to fix this: "updateBlobsForTopology" can just catch the FIleNotFoundException and skip updating the blobs in case it can't find the stormconf, and the approach looks simplest fix so I'll provide a patch based on suggestion.
    
    Btw, it doesn't apply to master branch, since in master branch all blobs are synced up separately (no need to read stormconf to enumerate topology related blobs), and update logic is already fault-tolerance (skip to next sync when it can't pull the blob).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-3122-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2737
    
----
commit 84d19c9ad66e2d24040c7e12dc96cef03ff7bcb3
Author: Jungtaek Lim <ka...@...>
Date:   2018-06-24T21:49:51Z

    STORM-3122 Avoid supervisor being crashed due to race condition between "async localizer" and "update blob" timer thread

----


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    Travis build is failing but I know the Travis build for 1.x version line has been unstable for a long time.
    Will track the build with my fork: https://travis-ci.org/HeartSaVioR/storm/builds/396178467


---

[GitHub] storm pull request #2737: (1.x) STORM-3122 Avoid supervisor being crashed du...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/2737


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by danny0405 <gi...@git.apache.org>.
Github user danny0405 commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    @HeartSaVioR 
    Seems to related to this PR https://github.com/apache/storm/pull/2618.


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    +1, LGTM


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    +1


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    `storm-core` tests are passing in above link. `!storm-core` test for JDK 7 is failing due to known issue (HDFS).


---

[GitHub] storm issue #2737: (1.x) STORM-3122 Avoid supervisor being crashed due to ra...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2737
  
    @danny0405 
    Yeah I haven't had time to go through #2618 since the code change is quite huge and async localizer is complicated. While this patch can look like an workaround to others, this patch is quite simple to avoid the issue, since we expect such race condition will not happen in the next update.


---