You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Dimitris Samaras <di...@gmail.com> on 2014/11/25 13:42:47 UTC

Storm cluster restart

Hi all,

We are currently testing Storm framework with 4 VM nodes (1 nimbus , 3
supervisors) and a single node zookeeper cluster for the Storm cluster
management.
Everything works fine up with topologies etc, to the point that the Storm
cluster needs to be restarted.
In that case for storm.sh (nimbus, super ,ui) to run successfully on a node
Storm has to be redeployed on that  node and reconfigured(storm.yaml).

Any thoughts?
Thanks in advance,
Dimitris

Re: Storm cluster restart

Posted by Dimitris Samaras <di...@gmail.com>.
Will try,

thank you Harsha

2014-11-26 16:31 GMT+02:00 Harsha <st...@harsha.io>:

>  This could be due to your storm.local.dir getting corrupted. You can
> delete the contents of this dir and restart the storm cluster (nimbus,
> supervisor).
>
>
> On Wed, Nov 26, 2014, at 01:51 AM, Dimitris Samaras wrote:
>
> Hi all,
>
> @Harsha, by :
>
> "Everything works fine up with topologies etc, to the point that the
> Storm cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that node and reconfigured(storm.yaml)."
>
> i mean that i can deploy a fully functional cluster and run/test the
> topologies properly, everything ok on runtime.
> If the node gets restarted (it runs on VM) due to host pc restart etc.,
> when i execute "storm supervisor"  for example on a supervisor node to
> restart it, it does not start!
>
> @Samit, the supervisor.log is:
>
> 2014-11-26 11:26:16 b.s.d.supervisor [INFO] Starting supervisor with id
> ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:host.name
> =Ubuntu14super1
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.version=1.7.0_72
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.vendor=Oracle Corporation
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/local/sto$
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.io.tmpdir=/tmp
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.compiler=<NA>
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.name
> =Linux
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:os.version=3.13.0-40-generic
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:user.name
> =dimsam
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:user.home=/home/dimsam
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:user.dir=/usr/local/storm/bin
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
> host.name=Ubuntu14super1
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.version=1.7.0_72
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.vendor=Oracle Corporation
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/l$
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.io.tmpdir=/tmp
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.compiler=<NA>
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
> os.name=Linux
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.arch=amd64
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.version=3.13.0-40-generic
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
> user.name=dimsam
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.home=/home/dimsam
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.dir=/usr/local/storm/bin
> 2014-11-26 11:35:33 b.s.d.supervisor [INFO] Starting Supervisor with conf
> {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper",
> "topology.tick.tuple.freq.secs" nil,
> "topology.builtin.metrics.bucket.size.secs" 60,
> "topology.fall.back.on.java.serialization" true, "topology.ma$
> 2014-11-26 11:35:34 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-11-26 11:35:34 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=195.251.117.209:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4dddb4e
> 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server themis.iti.gr/195.251.117.209:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Socket connection established
> to themis.iti.gr/195.251.117.209:2181, initiating session
> 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server themis.iti.gr/195.251.117.209:2181, sessionid =
> 0x149eb6ae8d10006, negotiated timeout = 20000
> 2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
> 2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
> 2014-11-26 11:35:34 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Session: 0x149eb6ae8d10006
> closed
> 2014-11-26 11:35:35 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=195.251.117.209:2181/storm sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4e451d76
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server themis.iti.gr/195.251.117.209:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Socket connection established
> to themis.iti.gr/195.251.117.209:2181, initiating session
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server themis.iti.gr/195.251.117.209:2181, sessionid =
> 0x149eb6ae8d10007, negotiated timeout = 20000
> 2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
> 2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
> 2014-11-26 11:35:35 b.s.d.supervisor [INFO] Starting supervisor with id
> ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
> 2014-11-26 11:35:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException
>         at backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at clojure.lang.AFn.applyToHelper(AFn.java:161)
> [clojure-1.5.1.jar:na]
>         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>         at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>         at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
>         at clojure.lang.RestFn.invoke(RestFn.java:397)
> ~[clojure-1.5.1.jar:na]
>         at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
> Caused by: java.io.EOFException: null
>         at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72]
>         at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72]
>         at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72]
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72]
>         at backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         ... 11 common frames omitted
> 2014-11-26 11:35:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException
>         at backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6330.invoke(supervisor.clj:307)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
> Caused by: java.io.EOFException: null
>         at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72]
>         at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72]
>         at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72]
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72]
>         at backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
>         ... 6 common frames omitted
> 2014-11-26 11:35:36 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
> The first line is from when the strom supervisor was running properly!
> After a node restart the supervisor will not start and i get the rest of
> the log....
>
>
> by: "to run successfully on a node, Storm has to be redeployed on that
> node and reconfigured(storm.yaml)."
>  i mean that in order to run the supervisor/nimbus again i have to
> redeploy Storm on every node that fails to start! I do not change the
> config on storm.yaml, simply have to rewrite it with the same values.
>
>
> Thanks again!
>
> 2014-11-25 17:53 GMT+02:00 Harsha <st...@harsha.io>:
>
>
>
> Dimitris,
>        can you give more details on this "
> Everything works fine up with topologies etc, to the point that the Storm
> cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that  node and reconfigured(storm.yaml)."
>
>
>    Is the cluster going down when you deploy a topology?
> "to run successfully on a node Storm has to be redeployed on that  node
> and reconfigured(storm.yaml)."
>
>   what you mean by reconfiguration do you change the storm.yaml values
> from previous deployment.
>
> -Harsha
>
>
> On Tue, Nov 25, 2014, at 06:24 AM, Samit Sasan wrote:
>
> can you share the logs
>
> -Samit
>
> On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras <
> dimitris.samaras1@gmail.com> wrote:
>
> Hi all,
>
> We are currently testing Storm framework with 4 VM nodes (1 nimbus , 3
> supervisors) and a single node zookeeper cluster for the Storm cluster
> management.
> Everything works fine up with topologies etc, to the point that the Storm
> cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that  node and reconfigured(storm.yaml).
>
> Any thoughts?
> Thanks in advance,
> Dimitris
>
>
>
>
>
>
>
>
>

Re: Storm cluster restart

Posted by Harsha <st...@harsha.io>.
This could be due to your storm.local.dir getting corrupted. You can
delete the contents of this dir and restart the storm cluster (nimbus,
supervisor).


On Wed, Nov 26, 2014, at 01:51 AM, Dimitris Samaras wrote:
> Hi all,
>
> @Harsha, by :
>
> "Everything works fine up with topologies etc, to the point that the
> Storm cluster needs to be restarted. In that case for storm.sh
> (nimbus, super ,ui) to run successfully on a node Storm has to be
> redeployed on that node and reconfigured(storm.yaml)."
>
> i mean that i can deploy a fully functional cluster and run/test the
> topologies properly, everything ok on runtime. If the node gets
> restarted (it runs on VM) due to host pc restart etc., when i execute
> "storm supervisor" for example on a supervisor node to restart it, it
> does not start!
>
> @Samit, the supervisor.log is:
>
> 2014-11-26 11:26:16 b.s.d.supervisor [INFO] Starting supervisor with
> id ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
> GMT 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:host.name=Ubuntu14super1 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:java.version=1.7.0_72
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.vendor=Oracle Corporation 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/local/sto$
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.io.tmpdir=/tmp 2014-11-26 11:35:33 o.a.z.ZooKeeper
> [INFO] Client environment:java.compiler=<NA> 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:os.version=3.13.0-40-generic 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:user.name=dimsam 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:user.home=/home/dimsam 2014-11-26 11:35:33 o.a.z.ZooKeeper
> [INFO] Client environment:user.dir=/usr/local/storm/bin 2014-11-26
> 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
> GMT 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:host.name=Ubuntu14super1 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.version=1.7.0_72 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:java.vendor=Oracle
> Corporation 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre 2014-11-26
> 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/l$
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.io.tmpdir=/tmp 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA>
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.name=Linux 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer
> [INFO] Server environment:os.arch=amd64 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.version=3.13.0-40-generic 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:user.name=dimsam
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.home=/home/dimsam 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.dir=/usr/local/storm/bin 2014-11-26 11:35:33
> b.s.d.supervisor [INFO] Starting Supervisor with conf
> {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper",
> "topology.tick.tuple.freq.secs" nil,
> "topology.builtin.metrics.bucket.size.secs" 60,
> "topology.fall.back.on.java.serialization" true, "topology.ma$
> 2014-11-26 11:35:34 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-11-26 11:35:34 o.a.z.ZooKeeper [INFO] Initiating client
> connection, connectString=195.251.117.209:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4dddb4e 2014-11-26 11:35:34
> o.a.z.ClientCnxn [INFO] Opening socket connection to server
> themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
> using SASL (unknown error) 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO]
> Socket connection established to themis.iti.gr/195.251.117.209:2181,
> initiating session 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Session
> establishment complete on server themis.iti.gr/195.251.117.209:2181,
> sessionid = 0x149eb6ae8d10006, negotiated timeout = 20000 2014-11-26
> 11:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED 2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [WARN]
> There are no ConnectionStateListeners registered. 2014-11-26 11:35:34
> b.s.zookeeper [INFO] Zookeeper state update: :connected:none
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Session: 0x149eb6ae8d10006
> closed 2014-11-26 11:35:35 o.a.c.f.i.CuratorFrameworkImpl [INFO]
> Starting 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Initiating client
> connection, connectString=195.251.117.209:2181/storm
> sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4e451d76 2014-11-26
> 11:35:35 o.a.z.ClientCnxn [INFO] Opening socket connection to server
> themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
> using SASL (unknown error) 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO]
> Socket connection established to themis.iti.gr/195.251.117.209:2181,
> initiating session 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Session
> establishment complete on server themis.iti.gr/195.251.117.209:2181,
> sessionid = 0x149eb6ae8d10007, negotiated timeout = 20000 2014-11-26
> 11:35:35 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED 2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [WARN]
> There are no ConnectionStateListeners registered. 2014-11-26 11:35:35
> b.s.d.supervisor [INFO] Starting supervisor with id
> ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1 2014-11-26
> 11:35:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException at
> backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na] at
> clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] at
> clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na] at
> clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na] at clojure.lang.RestFn.invoke(RestFn.java:397)
> ~[clojure-1.5.1.jar:na] at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at
> java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] Caused by:
> java.io.EOFException: null at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72] at
> backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 11 common
> frames omitted 2014-11-26 11:35:36 b.s.event [ERROR] Error when
> processing event java.lang.RuntimeException: java.io.EOFException at
> backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6330.invoke(supervisor.clj:307)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at
> java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] Caused by:
> java.io.EOFException: null at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72] at
> backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 6 common
> frames omitted 2014-11-26 11:35:36 b.s.util [INFO] Halting process:
> ("Error when processing an event")
>
>
> The first line is from when the strom supervisor was running properly!
> After a node restart the supervisor will not start and i get the rest
> of the log....
>
>
> by: "to run successfully on a node, Storm has to be redeployed on that
> node and reconfigured(storm.yaml)." i mean that in order to run the
> supervisor/nimbus again i have to redeploy Storm on every node that
> fails to start! I do not change the config on storm.yaml, simply have
> to rewrite it with the same values.
>
>
> Thanks again!
>
> 2014-11-25 17:53 GMT+02:00 Harsha <st...@harsha.io>:
>> __
>>
>> Dimitris, can you give more details on this " Everything works fine
>> up with topologies etc, to the point that the Storm cluster needs to
>> be restarted. In that case for storm.sh (nimbus, super ,ui) to run
>> successfully on a node Storm has to be redeployed on that node and
>> reconfigured(storm.yaml)."
>>
>>
>> Is the cluster going down when you deploy a topology? "to run
>> successfully on a node Storm has to be redeployed on that node and
>> reconfigured(storm.yaml)."
>>
>> what you mean by reconfiguration do you change the storm.yaml values
>> from previous deployment.
>>
>> -Harsha
>>
>>
>> On Tue, Nov 25, 2014, at 06:24 AM, Samit Sasan wrote:
>>> can you share the logs
>>>
>>> -Samit
>>>
>>> On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras
>>> <di...@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> We are currently testing Storm framework with 4 VM nodes (1 nimbus
>>>> , 3 supervisors) and a single node zookeeper cluster for the Storm
>>>> cluster management. Everything works fine up with topologies etc,
>>>> to the point that the Storm cluster needs to be restarted. In that
>>>> case for storm.sh (nimbus, super ,ui) to run successfully on a node
>>>> Storm has to be redeployed on that node and
>>>> reconfigured(storm.yaml).
>>>>
>>>> Any thoughts? Thanks in advance, Dimitris
>>>
>>
>


Re: Storm cluster restart

Posted by Dimitris Samaras <di...@gmail.com>.
Hi all,

@Harsha, by :

"Everything works fine up with topologies etc, to the point that the Storm
cluster needs to be restarted.
In that case for storm.sh (nimbus, super ,ui) to run successfully on a node
Storm has to be redeployed on that node and reconfigured(storm.yaml)."

i mean that i can deploy a fully functional cluster and run/test the
topologies properly, everything ok on runtime.
If the node gets restarted (it runs on VM) due to host pc restart etc.,
when i execute "storm supervisor"  for example on a supervisor node to
restart it, it does not start!

@Samit, the supervisor.log is:

2014-11-26 11:26:16 b.s.d.supervisor [INFO] Starting supervisor with id
ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:host.name
=Ubuntu14super1
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.version=1.7.0_72
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.vendor=Oracle Corporation
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/local/sto$
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.io.tmpdir=/tmp
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:java.compiler=<NA>
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:os.version=3.13.0-40-generic
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:user.name
=dimsam
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:user.home=/home/dimsam
2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
environment:user.dir=/usr/local/storm/bin
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
host.name=Ubuntu14super1
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.version=1.7.0_72
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.vendor=Oracle Corporation
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/l$
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.io.tmpdir=/tmp
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.compiler=<NA>
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
os.name=Linux
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.arch=amd64
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.version=3.13.0-40-generic
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server environment:
user.name=dimsam
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.home=/home/dimsam
2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.dir=/usr/local/storm/bin
2014-11-26 11:35:33 b.s.d.supervisor [INFO] Starting Supervisor with conf
{"dev.zookeeper.path" "/tmp/dev-storm-zookeeper",
"topology.tick.tuple.freq.secs" nil,
"topology.builtin.metrics.bucket.size.secs" 60,
"topology.fall.back.on.java.serialization" true, "topology.ma$
2014-11-26 11:35:34 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-11-26 11:35:34 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=195.251.117.209:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@4dddb4e
2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Opening socket connection to
server themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
using SASL (unknown error)
2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Socket connection established
to themis.iti.gr/195.251.117.209:2181, initiating session
2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Session establishment complete
on server themis.iti.gr/195.251.117.209:2181, sessionid =
0x149eb6ae8d10006, negotiated timeout = 20000
2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-11-26 11:35:34 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none
2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] EventThread shut down
2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Session: 0x149eb6ae8d10006 closed
2014-11-26 11:35:35 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=195.251.117.209:2181/storm sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@4e451d76
2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Opening socket connection to
server themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
using SASL (unknown error)
2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Socket connection established
to themis.iti.gr/195.251.117.209:2181, initiating session
2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Session establishment complete
on server themis.iti.gr/195.251.117.209:2181, sessionid =
0x149eb6ae8d10007, negotiated timeout = 20000
2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-11-26 11:35:35 b.s.d.supervisor [INFO] Starting supervisor with id
ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
2014-11-26 11:35:36 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: java.io.EOFException
        at backtype.storm.utils.Utils.deserialize(Utils.java:93)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.applyToHelper(AFn.java:161)
[clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
        at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: java.io.EOFException: null
        at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
~[na:1.7.0_72]
        at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
~[na:1.7.0_72]
        at
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
~[na:1.7.0_72]
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
~[na:1.7.0_72]
        at backtype.storm.utils.Utils.deserialize(Utils.java:88)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 11 common frames omitted
2014-11-26 11:35:36 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: java.io.EOFException
        at backtype.storm.utils.Utils.deserialize(Utils.java:93)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at
backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6330.invoke(supervisor.clj:307)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: java.io.EOFException: null
        at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
~[na:1.7.0_72]
        at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
~[na:1.7.0_72]
        at
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
~[na:1.7.0_72]
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
~[na:1.7.0_72]
        at backtype.storm.utils.Utils.deserialize(Utils.java:88)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 6 common frames omitted
2014-11-26 11:35:36 b.s.util [INFO] Halting process: ("Error when
processing an event")


The first line is from when the strom supervisor was running properly!
After a node restart the supervisor will not start and i get the rest of
the log....


by: "to run successfully on a node, Storm has to be redeployed on that node
and reconfigured(storm.yaml)."
 i mean that in order to run the supervisor/nimbus again i have to redeploy
Storm on every node that fails to start! I do not change the config on
storm.yaml, simply have to rewrite it with the same values.


Thanks again!

2014-11-25 17:53 GMT+02:00 Harsha <st...@harsha.io>:

>
> Dimitris,
>        can you give more details on this "
> Everything works fine up with topologies etc, to the point that the Storm
> cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that  node and reconfigured(storm.yaml)."
>
>    Is the cluster going down when you deploy a topology?
> "to run successfully on a node Storm has to be redeployed on that  node
> and reconfigured(storm.yaml)."
>   what you mean by reconfiguration do you change the storm.yaml values
> from previous deployment.
>
> -Harsha
>
> On Tue, Nov 25, 2014, at 06:24 AM, Samit Sasan wrote:
>
> can you share the logs
>
> -Samit
>
> On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras <
> dimitris.samaras1@gmail.com> wrote:
>
> Hi all,
>
> We are currently testing Storm framework with 4 VM nodes (1 nimbus , 3
> supervisors) and a single node zookeeper cluster for the Storm cluster
> management.
> Everything works fine up with topologies etc, to the point that the Storm
> cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that  node and reconfigured(storm.yaml).
>
> Any thoughts?
> Thanks in advance,
> Dimitris
>
>
>
>
>

Re: Storm cluster restart

Posted by Harsha <st...@harsha.io>.

Dimitris, can you give more details on this " Everything works fine up
with topologies etc, to the point that the Storm cluster needs to be
restarted. In that case for storm.sh (nimbus, super ,ui) to run
successfully on a node Storm has to be redeployed on that node and
reconfigured(storm.yaml)."

Is the cluster going down when you deploy a topology? "to run
successfully on a node Storm has to be redeployed on that node and
reconfigured(storm.yaml)." what you mean by reconfiguration do you
change the storm.yaml values from previous deployment.

-Harsha

On Tue, Nov 25, 2014, at 06:24 AM, Samit Sasan wrote:
> can you share the logs
>
> -Samit
>
> On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras
> <di...@gmail.com> wrote:
>> Hi all,
>>
>> We are currently testing Storm framework with 4 VM nodes (1 nimbus ,
>> 3 supervisors) and a single node zookeeper cluster for the Storm
>> cluster management. Everything works fine up with topologies etc, to
>> the point that the Storm cluster needs to be restarted. In that case
>> for storm.sh (nimbus, super ,ui) to run successfully on a node Storm
>> has to be redeployed on that node and reconfigured(storm.yaml).
>>
>> Any thoughts? Thanks in advance, Dimitris
>


Re: Storm cluster restart

Posted by Samit Sasan <sa...@gmail.com>.
can you share the logs

-Samit

On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras <
dimitris.samaras1@gmail.com> wrote:

> Hi all,
>
> We are currently testing Storm framework with 4 VM nodes (1 nimbus , 3
> supervisors) and a single node zookeeper cluster for the Storm cluster
> management.
> Everything works fine up with topologies etc, to the point that the Storm
> cluster needs to be restarted.
> In that case for storm.sh (nimbus, super ,ui) to run successfully on a
> node Storm has to be redeployed on that  node and reconfigured(storm.yaml).
>
> Any thoughts?
> Thanks in advance,
> Dimitris
>