You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Damien Raude-Morvan (JIRA)" <ji...@apache.org> on 2014/05/05 12:33:14 UTC
[jira] [Created] (STORM-307) After host crash, supervisor is unable
to restart itself
Damien Raude-Morvan created STORM-307:
-----------------------------------------
Summary: After host crash, supervisor is unable to restart itself
Key: STORM-307
URL: https://issues.apache.org/jira/browse/STORM-307
Project: Apache Storm (Incubating)
Issue Type: Bug
Affects Versions: 0.9.1-incubating
Environment: Debian Linux Wheezy
Zookeeper 3.3.3
Java 1.7.0_25
Reporter: Damien Raude-Morvan
Hi,
I've observed [multiple times|#links] that supervisor state de-serialisation after host crash or reboot can fail. Supervisor is then unable to come up without manual intervention. AFAICT, it seems that serialized supervisor state if invalid and coun't be read at next start.
Observed error in supervisor log :
{noformat}
2014-04-29 19:38:35 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-04-29 19:38:35 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=127.0.0.1:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@18d055e0
2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Opening socket connection to server /127.0.0.1:2181
2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x145a7cc1c7e48b1, negotiated timeout = 20000
2014-04-29 19:38:35 b.s.d.supervisor [INFO] Starting supervisor with id 71b01216-9d00-4fb6-8538-6673058ab5ef at host storm
2014-04-29 19:38:36 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: java.io.EOFException
at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at clojure.lang.AFn.applyToHelper(AFn.java:161) ~[clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323) ~[na:1.7.0_25]
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792) ~[na:1.7.0_25]
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:799) ~[na:1.7.0_25]
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.7.0_25]
at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
... 11 common frames omitted
2014-04-29 19:38:36 b.s.util [INFO] Halting process: ("Error when processing an event")
{noformat}
Current workaround : full stop supervisor daemon and delete all Storm's data/supervisor directory helped, and after restarting Supervisor is now running smoothly.
{anchor:links} Here is some references of very similar issues :
* http://mail-archives.apache.org/mod_mbox/storm-user/201402.mbox/%3C23100d14e7ac4cef947f7236ef8963e1@BY2PR08MB144.namprd08.prod.outlook.com%3E
* https://groups.google.com/forum/#!topic/storm-user/SL9FK9XeoI8
* https://groups.google.com/forum/#!topic/storm-user/2gapTYTRrX8
Regards,
--
This message was sent by Atlassian JIRA
(v6.2#6252)