You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2017/08/24 07:09:00 UTC

[jira] [Updated] (STORM-1977) Leader Nimbus crashes with getClusterInfo when it doesn't have one or more replicated topology codes

     [ https://issues.apache.org/jira/browse/STORM-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated STORM-1977:
--------------------------------
    Fix Version/s:     (was: 1.1.0)
                   1.1.1

> Leader Nimbus crashes with getClusterInfo when it doesn't have one or more replicated topology codes
> ----------------------------------------------------------------------------------------------------
>
>                 Key: STORM-1977
>                 URL: https://issues.apache.org/jira/browse/STORM-1977
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.1.1
>
>
> While investigating STORM-1976, I found that there're cases for nimbus to not having topology codes. 
> Before BlobStore, only nimbuses which is having all topology codes can gain leadership, otherwise they give up leadership immediately. While introducing BlobStore, this logic is removed.
> I don't know it's intended or not, but it incurs one of nimbus to gain leadership which doesn't have replicated topology code, and the nimbus will be crashed when getClusterInfo is requested.
> Easiest way to reproduce is:
> 1. comment cleanup-corrupt-topologies! from nimbus.clj (It's a quick workaround for resolving STORM-1976), and patch Storm cluster
> 2. Launch Nimbus 1 (leader)
> 3. Run topology
> 4. Kill Nimbus 1
> 5. Launch Nimbus 2 from different node
> 6. Nimbus 2 gains leadership 
> 7. getClusterInfo is requested to Nimbus 2, and Nimbus 2 gets crashed
> Log:
> {code}
> 2016-07-17 08:47:48.378 o.a.s.b.FileBlobStoreImpl [INFO] Creating new blob store based in /grid/0/hadoop/storm/blobs
> ...
> 2016-07-17 08:47:48.619 o.a.s.zookeeper [INFO] Queued up for leader lock.
> 2016-07-17 08:47:48.651 o.a.s.zookeeper [INFO] <node1> gained leadership
> ...
> 2016-07-17 08:47:48.833 o.a.s.d.nimbus [INFO] Starting nimbus server for storm version '1.1.1-SNAPSHOT'
> 2016-07-17 08:47:49.295 o.a.s.t.ProcessFunction [ERROR] Internal error processing getClusterInfo
> KeyNotFoundException(msg:production-topology-2-1468745167-stormcode.ser)
>         at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:149)
>         at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:268)
> ...
>         at org.apache.storm.daemon.nimbus$get_blob_replication_count.invoke(nimbus.clj:498)
>         at org.apache.storm.daemon.nimbus$get_cluster_info$iter__9520__9524$fn__9525.invoke(nimbus.clj:1427)
> ...
>         at org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1401)
>         at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__9612.getClusterInfo(nimbus.clj:1838)
>         at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3724)
>         at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3708)
>         at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ...
> 2016-07-17 08:47:49.397 o.a.s.b.BlobStoreUtils [ERROR] Could not download blob with keyproduction-topology-2-1468745167-stormconf.ser
> 2016-07-17 08:47:49.400 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyproduction-topology-2-1468745167-stormconf.ser
> 2016-07-17 08:47:49.402 o.a.s.d.nimbus [ERROR] Error when processing event
> KeyNotFoundException(msg:production-topology-2-1468745167-stormconf.ser)
>         at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:149)
>         at org.apache.storm.blobstore.LocalFsBlobStore.getBlob(LocalFsBlobStore.java:239)
>         at org.apache.storm.blobstore.BlobStore.readBlobTo(BlobStore.java:271)
>         at org.apache.storm.blobstore.BlobStore.readBlob(BlobStore.java:300)
> ...
>        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
>         at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
>         at org.apache.storm.daemon.nimbus$read_storm_conf_as_nimbus.invoke(nimbus.clj:548)
>         at org.apache.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:555)
>         at org.apache.storm.daemon.nimbus$mk_assignments$iter__9205__9209$fn__9210.invoke(nimbus.clj:912)
> ...
>         at org.apache.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:911)
>         at clojure.lang.RestFn.invoke(RestFn.java:410)
>         at org.apache.storm.daemon.nimbus$fn__9769$exec_fn__1363__auto____9770$fn__9781$fn__9782.invoke(nimbus.clj:2216)
>         at org.apache.storm.daemon.nimbus$fn__9769$exec_fn__1363__auto____9770$fn__9781.invoke(nimbus.clj:2215)
>         at org.apache.storm.timer$schedule_recurring$this__1732.invoke(timer.clj:105)
>         at org.apache.storm.timer$mk_timer$fn__1715$fn__1716.invoke(timer.clj:50)
>         at org.apache.storm.timer$mk_timer$fn__1715.invoke(timer.clj:42)
> ...
> 2016-07-17 08:47:49.408 o.a.s.util [ERROR] Halting process: ("Error when processing an event")
> java.lang.RuntimeException: ("Error when processing an event")
>         at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
>         at org.apache.storm.daemon.nimbus$nimbus_data$fn__8727.invoke(nimbus.clj:205)
>         at org.apache.storm.timer$mk_timer$fn__1715$fn__1716.invoke(timer.clj:71)
>         at org.apache.storm.timer$mk_timer$fn__1715.invoke(timer.clj:42)
>         at clojure.lang.AFn.run(AFn.java:22)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-07-17 08:47:49.410 o.a.s.d.nimbus [INFO] Shutting down master
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)