You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by harshach <gi...@git.apache.org> on 2015/01/29 19:23:54 UTC

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

GitHub user harshach opened a pull request:

    https://github.com/apache/storm/pull/401

    STORM-130: Supervisor getting killed due to java.io.FileNotFoundException: File '../stormconf.ser' does not exist.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harshach/incubator-storm STORM-130-V2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #401
    
----
commit 6b061aaab5a39e4b95670e3f0590bb48de4375fd
Author: Sriharsha Chintalapani <ma...@harsha.io>
Date:   2015-01-29T18:22:20Z

    STORM-130: Supervisor getting killed due to java.io.FileNotFoundException: File '../stormconf.ser' does not exist.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72131353
  
    @revans2 thanks for pointer on using cached assignment. I also spoke to @Parth-Brahmbhatt  about download-storm-code, added the lock to make sure there won't be multiple threads downloading the same code . Please take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72105884
  
    Ok So this is for an existing topology that moved and then moved back.  I understand now about the race.  I am +1 for the change, once we avoid hitting ZK quite so frequently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/401#discussion_r23801424
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/supervisor.clj ---
    @@ -349,6 +350,20 @@
                             (keys keepers))
                (zipmap (vals new-worker-ids) (keys new-worker-ids))
                ))
    +
    +    ;; check storm topology code dir exists before launching workers
    +    (doseq [[port assignment] reassign-executors]
    +      (let [storm-cluster-state (:storm-cluster-state supervisor)
    +            downloaded-storm-ids (set (read-downloaded-storm-ids conf))
    +            storm-id (:storm-id assignment)
    +            assignment-info (.assignment-info-with-version storm-cluster-state storm-id nil)
    --- End diff --
    
    This is going to cause a large load on zookeeper.  There is a reason we have the cache for the assignment info, and we only download it when it has changed.  We really should use the cache here if we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72726626
  
    @ptgoetz  will send a new PR against 0.9.3 branch. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-73105790
  
    @revans2 @ptgoetz  opened a PR against 0.9.3-branch https://github.com/apache/storm/pull/418 . Also updated the current PR to handle another case.
    This issue can happen if supervisors are down and user kills a topology , restart supervisors. These supervisors can crash with same issue as sync-process can start after cleaning up local deleted topology files . In general having try catch around launch-worker is good idea as there is no need to kill the supervisor itself if the launch-worker fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72235020
  
    +1 looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72078838
  
    @ptgoetz  @d2r @revans2 @Parth-Brahmbhatt  Please take a look at the new PR. I apologize for the whitespace issue. There are still few left in this patch. As Taylor suggested please ?w=1 to the URL.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by ptgoetz <gi...@git.apache.org>.
Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72725245
  
    +1. I'd also like to see this back-ported to the 0.9.3 branch, but that shouldn't block this from getting merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/401#discussion_r23801588
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/supervisor.clj ---
    @@ -349,6 +350,20 @@
                             (keys keepers))
                (zipmap (vals new-worker-ids) (keys new-worker-ids))
                ))
    +
    +    ;; check storm topology code dir exists before launching workers
    +    (doseq [[port assignment] reassign-executors]
    +      (let [storm-cluster-state (:storm-cluster-state supervisor)
    +            downloaded-storm-ids (set (read-downloaded-storm-ids conf))
    +            storm-id (:storm-id assignment)
    +            assignment-info (.assignment-info-with-version storm-cluster-state storm-id nil)
    --- End diff --
    
    @revans2 are you talking about .assignment-info-with-version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/401#discussion_r23803166
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/supervisor.clj ---
    @@ -349,6 +350,20 @@
                             (keys keepers))
                (zipmap (vals new-worker-ids) (keys new-worker-ids))
                ))
    +
    +    ;; check storm topology code dir exists before launching workers
    +    (doseq [[port assignment] reassign-executors]
    +      (let [storm-cluster-state (:storm-cluster-state supervisor)
    +            downloaded-storm-ids (set (read-downloaded-storm-ids conf))
    +            storm-id (:storm-id assignment)
    +            assignment-info (.assignment-info-with-version storm-cluster-state storm-id nil)
    --- End diff --
    
    Yes.  We could replace it with ```@(:assignment-versions supervisor)``` and presumably get the same result without hitting zookeeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/401


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72103305
  
    @revans2 thanks for the review. I am looking for cached assignment code will send updated patch. 
    To address downloading the same code twice. It can happen in download-storm-code method I am checking if the topology code dir exists in the supervisor it won't overwrite and deletes the tmp dir.  More details on this comment on how this can happen https://issues.apache.org/jira/browse/STORM-130?focusedCommentId=14296241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14296241


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-130: Supervisor getting killed due to ja...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/401#issuecomment-72102590
  
    For the most part it looks good, I would like to see us use the cached assignment instead of going to zookeeper again to get it.  I would also like to understand better how we avoid issues with downloading the same thing in two different threads.  I'm not sure I understand the exact cause of what puts us in this situation to begin with, so I don't feel comfortable that we will not try to download the same thing twice and cause some issues when we do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---