You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Diogo Monteiro (Jira)" <ji...@apache.org> on 2019/08/28 13:35:00 UTC

[jira] [Created] (STORM-3501) Local Cluster worker restarts

Diogo Monteiro created STORM-3501:
-------------------------------------

             Summary: Local Cluster worker restarts
                 Key: STORM-3501
                 URL: https://issues.apache.org/jira/browse/STORM-3501
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-server
    Affects Versions: 2.0.0, 2.1.0
         Environment: Linux
            Reporter: Diogo Monteiro


I was trying to launch a topology that I'm developing (in 2.0.0) and noticed that the worker was getting restarted each ~30 seconds. 
I placed a breakpoint in the _kill_ method of _LocalContainer_ ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66]) to try and understand why the worker was getting restarted. 
 
The call stack was:
{{kill:66, LocalContainer (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot (org.apache.storm.daemon.supervisor) }}{{handleRunning:724, Slot (org.apache.storm.daemon.supervisor) }}{{stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor) }}{{run:931, Slot (org.apache.storm.daemon.supervisor)  }}
 
With this I can understand that the worker is killed because a blob has changed ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]). In fact, there's a changing blob in the _dynamicState_ at that point.
 
I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and notifies the Slot state machine of a changing blob.
 
I noticed this:
 * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
 * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
 * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
 * [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
 
Which tell me that (correct me if I'm wrong):
 * Supervisor tries to update blobs each 30 seconds.
 * The topology jar blob requires extraction of the resources directory (either from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_.
 * The Slot is notified of a changing blob if:
 * the remote version is different from the local version (the code has changed).
 * OR the blob is not fully downloaded (the jar exists, and the extracted resources directory exists).

 
Well, I did not have a resources folder under the root of the classpath, and that's why the worker was being restarted each ~30 seconds, as the Slot was being notified of a changing blob everytime _updateBlobs_ ran. 
I created a resources folder (with dummy files) under the root of the classpath and the problem is now solved.
 
However, if I understand correctly, the resources folder is only required for _multilang_. Our topologies do not use _multilang_ and this do not happen in Storm 1.1.3 for instance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)