You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2017/05/15 14:13:04 UTC

[jira] [Created] (STORM-2513) NPE possible in getLeader call

Robert Joseph Evans created STORM-2513:
------------------------------------------

             Summary: NPE possible in getLeader call
                 Key: STORM-2513
                 URL: https://issues.apache.org/jira/browse/STORM-2513
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.0, 2.0.0, 1.1.0
            Reporter: Robert Joseph Evans


The getLeader call actually reads data from two different locations

https://github.com/apache/storm/blob/v1.1.0/storm-core/src/clj/org/apache/storm/daemon/nimbus.clj#L2371-L2385

One is /leader-lock and the other is /nimbuses.  There is a really rare possibility that these two can get out of sync when the leader crashes and we read from leader election saying it is still the leader, but after that it's entry is removed from ZK for /nimbuses.  So we either need to make them not be separate entries, or we need to add in some kind of a retry when this happens.

Also NimbusClient has not retry built in.  Not all operations are idempotent, but we really should look at adding a retry with possibly switching to a new nimbus on idempotent operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)