You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2017/05/15 14:13:04 UTC
[jira] [Created] (STORM-2513) NPE possible in getLeader call
Robert Joseph Evans created STORM-2513:
------------------------------------------
Summary: NPE possible in getLeader call
Key: STORM-2513
URL: https://issues.apache.org/jira/browse/STORM-2513
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Affects Versions: 1.0.0, 2.0.0, 1.1.0
Reporter: Robert Joseph Evans
The getLeader call actually reads data from two different locations
https://github.com/apache/storm/blob/v1.1.0/storm-core/src/clj/org/apache/storm/daemon/nimbus.clj#L2371-L2385
One is /leader-lock and the other is /nimbuses. There is a really rare possibility that these two can get out of sync when the leader crashes and we read from leader election saying it is still the leader, but after that it's entry is removed from ZK for /nimbuses. So we either need to make them not be separate entries, or we need to add in some kind of a retry when this happens.
Also NimbusClient has not retry built in. Not all operations are idempotent, but we really should look at adding a retry with possibly switching to a new nimbus on idempotent operations.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)