You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Benjamin SOULAS <be...@gmail.com> on 2014/08/29 14:34:00 UTC
Supervisor always down 3s after execution
Hello everyone, i have a problem during implementing storm on a cluster
(Grid 5000 if anyone knows). I took the inubator-storm-master from the
github branch with the sources, i succeeded to create my own release (no
code modification, just for maven errors that were disturbing...)
It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified this Topology
to allow 50 workers in the configuration.
Now on a cluster, when I try to do the same thing, supervisors are down
just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
too.
I read somewhere on the apache website you need to implement a real
zookeeper (not the one in storm).
Please, does someone knows a good tutorial explaining how running a
zookeeper server on a cluster for storm?
I hope I am clear ...
Kind regards.
Benjamin SOULAS
Re: Supervisor always down 3s after execution
Posted by Benjamin SOULAS <be...@gmail.com>.
Hi everyone,
I found the how to fix it In fact, on my cluster, we use the NFS technology
and I didn't change my storm.yaml where I had to define *storm.local.dir*.
Indeed, if you don't define it, storm will take it in the file
defaults.yaml where it defines this : *storm.local.dir: "storm-local"*.
The problem here, where you have NFS on your cluster is it create the
storm-local directory in your home. But in NFS as you should probably know,
every nodes have access to the same home.
So when storm creates is directory "storm-local", inside of it, he create a
directory named "supervisor". But every time you launch a Supervisor, it
erases the last one created so the last supervisor will be stopped.
So the only solution is to modify your storm.yaml and insert the new
storm.local.dir. In my case, because I am on an NFS cluster, I had to put
this path : *storm.local.dir: "/tmp/storm-local"* because it is not in your
home directory so it won't be shared between your nodes.
So every nodes launching a Supervisor will have its own Supervisor
directory.
I sincerely hope to be clear and that could help someone.
Benjamin.
2014-09-02 19:53 GMT+02:00 Benjamin SOULAS <be...@gmail.com>:
> Hi Harsha,
>
> You're right, I didn't export STORM_HOME ...
>
> I will do it, maybe this is the problem.
>
> Thanks
>
>
> 2014-09-02 18:08 GMT+02:00 Harsha <st...@harsha.io>:
>
>> Hi Benjamin,
>> Correct me if I missed it , in your config I don't see
>> storm.local.dir defined. If its not defined in config storm will create one
>> in the storm_installation dir which seems to be
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
>> and are you running the supervisor and nimbus as user "bsoulas". When you
>> are running "storm nimbus or storm supervisor" command which storm command
>> its pointing. Did you export
>> STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben"
>> and also added it to PATH. I am checking to see if you had any previous
>> installation of storm and invoking the storm command from previous
>> installation.
>> Can you also check zookeeper logs .
>> -Harsha
>>
>> On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
>>
>> Hi everyone,
>>
>> I followed your instructions for installing a zookeeper server, i
>> downloaded it on the website, extract the tar file somewhere in a machine
>> on my cluster, i made those modifications in my zoo.cfg :
>>
>>
>>
>> # The number of milliseconds of each tick
>>
>> tickTime=2000
>>
>> # The number of ticks that the initial
>>
>> # synchronization phase can take
>>
>> initLimit=10
>>
>> # The number of ticks that can pass between
>>
>> # sending a request and getting an acknowledgement
>>
>> syncLimit=5
>>
>> # the directory where the snapshot is stored.
>>
>> # do not use /tmp for storage, /tmp here is just
>>
>> # example sakes.
>>
>> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>>
>> # the port at which the clients will connect
>>
>> clientPort=2181
>>
>> # the maximum number of client connections.
>>
>> # increase this if you need to handle more clients
>>
>> #maxClientCnxns=60
>>
>> #
>>
>> # Be sure to read the maintenance section of the
>>
>> # administrator guide before turning on autopurge.
>>
>> #
>>
>> #
>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>
>> #
>>
>> # The number of snapshots to retain in dataDir
>>
>> #autopurge.snapRetainCount=3
>>
>> # Purge task interval in hours
>>
>> # Set to "0" to disable auto purge feature
>>
>> #autopurge.purgeInterval=1
>>
>>
>> In the log4j.properties, i uncommented the line for the log file :
>>
>>
>> # Example with rolling log file
>>
>> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>>
>>
>> Then i went to my storm.yaml (located here in my case, because i took the
>> source version) :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>>
>>
>> This file contain this configuration :
>>
>>
>> ########### These MUST be filled in for a storm configuration
>>
>> storm.zookeeper.servers:
>>
>> - "paradent-4"
>>
>> # - "paradent-47"
>>
>> # - "paradent-48"
>>
>>
>> #
>>
>> nimbus.host: "paradent-4"
>>
>> #
>>
>> #
>>
>> # ##### These may optionally be filled in:
>>
>> #
>>
>> ## List of custom serializations
>>
>> # topology.kryo.register:
>>
>> # - org.mycompany.MyType
>>
>> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>>
>> #
>>
>> ## List of custom kryo decorators
>>
>> # topology.kryo.decorators:
>>
>> # - org.mycompany.MyDecorator
>>
>> #
>>
>> ## Locations of the drpc servers
>>
>> # drpc.servers:
>>
>> # - "server1"
>>
>> # - "server2"
>>
>>
>> ## Metrics Consumers
>>
>> # topology.metrics.consumer.register:
>>
>> # - class: "backtype.storm.metric.LoggingMetricsConsumer"
>>
>> # parallelism.hint: 1
>>
>> # - class: "org.mycompany.MyMetricsConsumer"
>>
>> # parallelism.hint: 1
>>
>> # argument:
>>
>> # - endpoint: "metrics-collector.mycompany.org"
>>
>> dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
>> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>>
>> storm.zookeeper.port: 2181
>>
>> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
>> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
>> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
>> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>>
>> After i launch my *storm ui* for having a visual of my storm app (on
>> paradent-4). Until now, everything work fine. Now, the logical way implies
>> i launch my supervisor, on a different machine (here *paradent-39*)
>> thanks to *storm supervisor*, it is launched but once again, 3 or 4
>> seconds after it's down.
>>
>> So i watched the supervisor.log located :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>>
>>
>> And here appear a tricky error :
>>
>>
>> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181 sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@220df4c8
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90004, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
>> closed
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>>
>> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181/storm sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@c6d625b
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90005, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
>> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
>> paradent-39.rennes.grid5000.fr
>>
>> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>>
>> java.io.FileNotFoundException: File
>> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
>> does not exist
>>
>> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at
>> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at backtype.storm.utils.LocalState.get(LocalState.java:56)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>>
>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>>
>> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
>> ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>>
>> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
>> processing an event")
>>
>>
>> I understood that there was a missing file, my question is "why?????". If
>> i watch the rights with ls -l at this path :
>>
>>
>> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>>
>> I have this :
>>
>>
>> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>>
>> So for me this is not the problem, can someone help me? I am really stuck
>> here :S
>>
>> I sincerely hope to be clear and precise enough ...
>>
>> Kind regards.
>>
>>
>>
>>
>>
>>
>> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>>
>>
>>
>> Hi Benjamin,
>> Storm cluster needs a zookeeper quorum to function.
>> ExclamationTopology accepts command line params to deploy on a storm
>> cluster. If you don't pass any arguments it will use LocalCluster(a
>> simulated local cluster) to deploy.
>> I recommend you to go through
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>> for setting up zookeeper. Here is an excellent write up on storm
>> cluster setup along with zookeeper
>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>> Hope that helps.
>> -Harsha
>>
>>
>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>
>> Hello everyone, i have a problem during implementing storm on a cluster
>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>> github branch with the sources, i succeeded to create my own release (no
>> code modification, just for maven errors that were disturbing...)
>>
>> It's working fine on my own laptop in local, i modified the
>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>> to allow 50 workers in the configuration.
>>
>> Now on a cluster, when I try to do the same thing, supervisors are down
>> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
>> too.
>>
>> I read somewhere on the apache website you need to implement a real
>> zookeeper (not the one in storm).
>>
>> Please, does someone knows a good tutorial explaining how running a
>> zookeeper server on a cluster for storm?
>>
>> I hope I am clear ...
>>
>> Kind regards.
>>
>> Benjamin SOULAS
>>
>>
>>
>>
>>
>>
>>
>
>
Re: Supervisor always down 3s after execution
Posted by Benjamin SOULAS <be...@gmail.com>.
Hi Harsha,
You're right, I didn't export STORM_HOME ...
I will do it, maybe this is the problem.
Thanks
2014-09-02 18:08 GMT+02:00 Harsha <st...@harsha.io>:
> Hi Benjamin,
> Correct me if I missed it , in your config I don't see
> storm.local.dir defined. If its not defined in config storm will create one
> in the storm_installation dir which seems to be
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
> and are you running the supervisor and nimbus as user "bsoulas". When you
> are running "storm nimbus or storm supervisor" command which storm command
> its pointing. Did you export
> STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben"
> and also added it to PATH. I am checking to see if you had any previous
> installation of storm and invoking the storm command from previous
> installation.
> Can you also check zookeeper logs .
> -Harsha
>
> On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
>
> Hi everyone,
>
> I followed your instructions for installing a zookeeper server, i
> downloaded it on the website, extract the tar file somewhere in a machine
> on my cluster, i made those modifications in my zoo.cfg :
>
>
>
> # The number of milliseconds of each tick
>
> tickTime=2000
>
> # The number of ticks that the initial
>
> # synchronization phase can take
>
> initLimit=10
>
> # The number of ticks that can pass between
>
> # sending a request and getting an acknowledgement
>
> syncLimit=5
>
> # the directory where the snapshot is stored.
>
> # do not use /tmp for storage, /tmp here is just
>
> # example sakes.
>
> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>
> # the port at which the clients will connect
>
> clientPort=2181
>
> # the maximum number of client connections.
>
> # increase this if you need to handle more clients
>
> #maxClientCnxns=60
>
> #
>
> # Be sure to read the maintenance section of the
>
> # administrator guide before turning on autopurge.
>
> #
>
> #
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>
> #
>
> # The number of snapshots to retain in dataDir
>
> #autopurge.snapRetainCount=3
>
> # Purge task interval in hours
>
> # Set to "0" to disable auto purge feature
>
> #autopurge.purgeInterval=1
>
>
> In the log4j.properties, i uncommented the line for the log file :
>
>
> # Example with rolling log file
>
> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>
>
> Then i went to my storm.yaml (located here in my case, because i took the
> source version) :
>
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>
>
> This file contain this configuration :
>
>
> ########### These MUST be filled in for a storm configuration
>
> storm.zookeeper.servers:
>
> - "paradent-4"
>
> # - "paradent-47"
>
> # - "paradent-48"
>
>
> #
>
> nimbus.host: "paradent-4"
>
> #
>
> #
>
> # ##### These may optionally be filled in:
>
> #
>
> ## List of custom serializations
>
> # topology.kryo.register:
>
> # - org.mycompany.MyType
>
> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>
> #
>
> ## List of custom kryo decorators
>
> # topology.kryo.decorators:
>
> # - org.mycompany.MyDecorator
>
> #
>
> ## Locations of the drpc servers
>
> # drpc.servers:
>
> # - "server1"
>
> # - "server2"
>
>
> ## Metrics Consumers
>
> # topology.metrics.consumer.register:
>
> # - class: "backtype.storm.metric.LoggingMetricsConsumer"
>
> # parallelism.hint: 1
>
> # - class: "org.mycompany.MyMetricsConsumer"
>
> # parallelism.hint: 1
>
> # argument:
>
> # - endpoint: "metrics-collector.mycompany.org"
>
> dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>
> storm.zookeeper.port: 2181
>
> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>
> After i launch my *storm ui* for having a visual of my storm app (on
> paradent-4). Until now, everything work fine. Now, the logical way implies
> i launch my supervisor, on a different machine (here *paradent-39*)
> thanks to *storm supervisor*, it is launched but once again, 3 or 4
> seconds after it's down.
>
> So i watched the supervisor.log located :
>
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>
>
> And here appear a tricky error :
>
>
> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@220df4c8
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90004, negotiated timeout = 20000
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
> closed
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>
> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181/storm sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@c6d625b
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90005, negotiated timeout = 20000
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
> paradent-39.rennes.grid5000.fr
>
> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>
> java.io.FileNotFoundException: File
> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
> does not exist
>
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
> ~[commons-io-2.4.jar:2.4]
>
> at
> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
> ~[commons-io-2.4.jar:2.4]
>
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>
> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
> I understood that there was a missing file, my question is "why?????". If
> i watch the rights with ls -l at this path :
>
>
> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>
> I have this :
>
>
> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>
> So for me this is not the problem, can someone help me? I am really stuck
> here :S
>
> I sincerely hope to be clear and precise enough ...
>
> Kind regards.
>
>
>
>
>
>
> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>
>
>
> Hi Benjamin,
> Storm cluster needs a zookeeper quorum to function.
> ExclamationTopology accepts command line params to deploy on a storm
> cluster. If you don't pass any arguments it will use LocalCluster(a
> simulated local cluster) to deploy.
> I recommend you to go through
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
> for setting up zookeeper. Here is an excellent write up on storm cluster
> setup along with zookeeper
> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
> Hope that helps.
> -Harsha
>
>
> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>
> Hello everyone, i have a problem during implementing storm on a cluster
> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
> github branch with the sources, i succeeded to create my own release (no
> code modification, just for maven errors that were disturbing...)
>
> It's working fine on my own laptop in local, i modified the
> ExclamationTopology in adding 40 more bolts. I also modified this Topology
> to allow 50 workers in the configuration.
>
> Now on a cluster, when I try to do the same thing, supervisors are down
> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
> too.
>
> I read somewhere on the apache website you need to implement a real
> zookeeper (not the one in storm).
>
> Please, does someone knows a good tutorial explaining how running a
> zookeeper server on a cluster for storm?
>
> I hope I am clear ...
>
> Kind regards.
>
> Benjamin SOULAS
>
>
>
>
>
>
>
Re: Supervisor always down 3s after execution
Posted by Harsha <st...@harsha.io>.
Hi Benjamin,
Correct me if I missed it , in your config I don't
see storm.local.dir defined. If its not defined in config storm
will create one in the storm_installation dir which seems to
be
/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
and are you running the supervisor and nimbus as user
"bsoulas". When you are running "storm nimbus or storm
supervisor" command which storm command its pointing. Did you
export
STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/bina
ry/target/apache-storm-0.9.3-ben" and also added it to PATH. I
am checking to see if you had any previous installation of
storm and invoking the storm command from previous
installation.
Can you also check zookeeper logs .
-Harsha
On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
Hi everyone,
I followed your instructions for installing a zookeeper server,
i downloaded it on the website, extract the tar file somewhere
in a machine on my cluster, i made those modifications in my
zoo.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
#
[1]http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#
sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
In the log4j.properties, i uncommented the line for the log
file :
# Example with rolling log file
log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
Then i went to my storm.yaml (located here in my case, because
i took the source version) :
/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
This file contain this configuration :
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "paradent-4"
# - "paradent-47"
# - "paradent-48"
#
nimbus.host: "paradent-4"
#
#
# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
# - org.mycompany.MyType
# - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
# - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
# - "server1"
# - "server2"
## Metrics Consumers
# topology.metrics.consumer.register:
# - class: "backtype.storm.metric.LoggingMetricsConsumer"
# parallelism.hint: 1
# - class: "org.mycompany.MyMetricsConsumer"
# parallelism.hint: 1
# argument:
# - endpoint: "[2]metrics-collector.mycompany.org"
dev.zookeeper.path:
"paradent-4.rennes.grid5000.fr:~/home/bsoulas/zookeeper/zookeep
er-3.4.6/"
storm.zookeeper.port: 2181
To launch storm on the cluster, i launch it thanks to storm
nimbus (on a machine named paradent-4), then my zookeeper
Server sh zkServer.sh start (on paradent-4 again)(which create
a zookeeper_server.pid where the pid of the zookeeper is
written, i know it's obvious ...>_< ).
After i launch my storm ui for having a visual of my storm app
(on paradent-4). Until now, everything work fine. Now, the
logical way implies i launch my supervisor, on a different
machine (here paradent-39) thanks to storm supervisor, it is
launched but once again, 3 or 4 seconds after it's down.
So i watched the supervisor.log located :
/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
And here appear a tricky error :
2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting
2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@220df4c8
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[3]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection
established to
[4]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[5]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90004, negotiated timeout = 20000
2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED
2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.
2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state
update: :connected:none
2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session:
0x14835a48ca90004 closed
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut
down
2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting
2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181/storm
sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@c6d625b
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[6]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection
established to
[7]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[8]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90005, negotiated timeout = 20000
2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED
2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.
2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor
with id 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
[9]paradent-39.rennes.grid5000.fr
2014-09-02 09:31:39 b.s.event [ERROR] Error when processing
event
java.io.FileNotFoundException: File
'/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73
cbaf7c5c/heartbeats/1409146760275' does not exist
at
org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:
299) ~[commons-io-2.4.jar:2.4]
at
org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.j
ava:1763) ~[commons-io-2.4.jar:2.4]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(s
upervisor.clj:77) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6
381__6385$fn__6386.invoke(supervisor.clj:90)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.LazySeq.sval(LazySeq.java:42)
~[clojure-1.5.1.jar:na]
at clojure.lang.LazySeq.seq(LazySeq.java:60)
~[clojure-1.5.1.jar:na]
at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
at clojure.lang.LazySeq.next(LazySeq.java:92)
~[clojure-1.5.1.jar:na]
at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
at clojure.core$next.invoke(core.clj:64)
~[clojure-1.5.1.jar:na]
at clojure.core$dorun.invoke(core.clj:2781)
~[clojure-1.5.1.jar:na]
at clojure.core$doall.invoke(core.clj:2796)
~[clojure-1.5.1.jar:na]
at
backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(
supervisor.clj:89) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_allocated_workers.invoke(
supervisor.clj:106) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervis
or.clj:209) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.AFn.applyToHelper(AFn.java:161)
[clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151)
[clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:619)
~[clojure-1.5.1.jar:na]
at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.5.1.jar:na]
at
backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39
) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error
when processing an event")
I understood that there was a missing file, my question is
"why?????". If i watch the rights with ls -l at this path :
/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73c
baf7c5c/
I have this :
drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
So for me this is not the problem, can someone help me? I am
really stuck here :S
I sincerely hope to be clear and precise enough ...
Kind regards.
2014-08-29 16:47 GMT+02:00 Harsha <[1...@harsha.io>:
Hi Benjamin,
Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.
I recommend you to go through
[11]http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [12]http://www.michael-noll.com/tutorials/running-mul
ti-node-storm-cluster/.
Hope that helps.
-Harsha
On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)
It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.
Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.
I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).
Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?
I hope I am clear ...
Kind regards.
Benjamin SOULAS
References
Visible links
1. http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
2. http://metrics-collector.mycompany.org/
3. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
4. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
5. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
6. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
7. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
8. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
9. http://paradent-39.rennes.grid5000.fr/
10. mailto:storm@harsha.io
11. http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
12. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/
Hidden links:
14. http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
Re: Supervisor always down 3s after execution
Posted by Benjamin SOULAS <be...@gmail.com>.
Hi Supun,
It works at first but then it crash, again ...
2014-09-02 16:43 GMT+02:00 Supun Kamburugamuva <su...@gmail.com>:
> Usually when this happens, we remove the storm directory from ZooKeeper
> using zkCli.sh, remove the storm-local directories and start fresh.
>
> Thanks,
> Supun..
>
>
> On Tue, Sep 2, 2014 at 6:39 AM, Benjamin SOULAS <
> benjamin.soulas45@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I followed your instructions for installing a zookeeper server, i
>> downloaded it on the website, extract the tar file somewhere in a machine
>> on my cluster, i made those modifications in my zoo.cfg :
>>
>>
>> # The number of milliseconds of each tick
>>
>> tickTime=2000
>>
>> # The number of ticks that the initial
>>
>> # synchronization phase can take
>>
>> initLimit=10
>>
>> # The number of ticks that can pass between
>>
>> # sending a request and getting an acknowledgement
>>
>> syncLimit=5
>>
>> # the directory where the snapshot is stored.
>>
>> # do not use /tmp for storage, /tmp here is just
>>
>> # example sakes.
>>
>> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>>
>> # the port at which the clients will connect
>>
>> clientPort=2181
>>
>> # the maximum number of client connections.
>>
>> # increase this if you need to handle more clients
>>
>> #maxClientCnxns=60
>>
>> #
>>
>> # Be sure to read the maintenance section of the
>>
>> # administrator guide before turning on autopurge.
>>
>> #
>>
>> #
>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>
>> #
>>
>> # The number of snapshots to retain in dataDir
>>
>> #autopurge.snapRetainCount=3
>>
>> # Purge task interval in hours
>>
>> # Set to "0" to disable auto purge feature
>>
>> #autopurge.purgeInterval=1
>>
>>
>> In the log4j.properties, i uncommented the line for the log file :
>>
>> # Example with rolling log file
>>
>> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>>
>>
>> Then i went to my storm.yaml (located here in my case, because i took the
>> source version) :
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>>
>>
>> This file contain this configuration :
>>
>> ########### These MUST be filled in for a storm configuration
>>
>> storm.zookeeper.servers:
>>
>> - "paradent-4"
>>
>> # - "paradent-47"
>>
>> # - "paradent-48"
>>
>>
>> #
>>
>> nimbus.host: "paradent-4"
>>
>> #
>>
>> #
>>
>> # ##### These may optionally be filled in:
>>
>> #
>>
>> ## List of custom serializations
>>
>> # topology.kryo.register:
>>
>> # - org.mycompany.MyType
>>
>> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>>
>> #
>>
>> ## List of custom kryo decorators
>>
>> # topology.kryo.decorators:
>>
>> # - org.mycompany.MyDecorator
>>
>> #
>>
>> ## Locations of the drpc servers
>>
>> # drpc.servers:
>>
>> # - "server1"
>>
>> # - "server2"
>>
>>
>> ## Metrics Consumers
>>
>> # topology.metrics.consumer.register:
>>
>> # - class: "backtype.storm.metric.LoggingMetricsConsumer"
>>
>> # parallelism.hint: 1
>>
>> # - class: "org.mycompany.MyMetricsConsumer"
>>
>> # parallelism.hint: 1
>>
>> # argument:
>>
>> # - endpoint: "metrics-collector.mycompany.org"
>>
>> dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
>> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>>
>> storm.zookeeper.port: 2181
>>
>> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
>> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
>> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
>> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>>
>> After i launch my *storm ui* for having a visual of my storm app (on
>> paradent-4). Until now, everything work fine. Now, the logical way implies
>> i launch my supervisor, on a different machine (here *paradent-39*)
>> thanks to *storm supervisor*, it is launched but once again, 3 or 4
>> seconds after it's down.
>>
>> So i watched the supervisor.log located :
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>>
>>
>> And here appear a tricky error :
>>
>> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181 sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@220df4c8
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90004, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
>> closed
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>>
>> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181/storm sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@c6d625b
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90005, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
>> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
>> paradent-39.rennes.grid5000.fr
>>
>> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>>
>> java.io.FileNotFoundException: File
>> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
>> does not exist
>>
>> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at
>> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at backtype.storm.utils.LocalState.get(LocalState.java:56)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>>
>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>>
>> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
>> ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>>
>> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
>> processing an event")
>>
>>
>> I understood that there was a missing file, my question is "why?????". If
>> i watch the rights with ls -l at this path :
>>
>> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>>
>> I have this :
>>
>> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>>
>> So for me this is not the problem, can someone help me? I am really stuck
>> here :S
>>
>> I sincerely hope to be clear and precise enough ...
>>
>> Kind regards.
>>
>>
>>
>>
>>
>>
>> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>>
>>
>>> Hi Benjamin,
>>> Storm cluster needs a zookeeper quorum to function.
>>> ExclamationTopology accepts command line params to deploy on a storm
>>> cluster. If you don't pass any arguments it will use LocalCluster(a
>>> simulated local cluster) to deploy.
>>> I recommend you to go through
>>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>>> for setting up zookeeper. Here is an excellent write up on storm
>>> cluster setup along with zookeeper
>>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>>> Hope that helps.
>>> -Harsha
>>>
>>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>>
>>> Hello everyone, i have a problem during implementing storm on a cluster
>>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>>> github branch with the sources, i succeeded to create my own release (no
>>> code modification, just for maven errors that were disturbing...)
>>>
>>> It's working fine on my own laptop in local, i modified the
>>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>>> to allow 50 workers in the configuration.
>>>
>>> Now on a cluster, when I try to do the same thing, supervisors are
>>> down just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm
>>> ui too.
>>>
>>> I read somewhere on the apache website you need to implement a real
>>> zookeeper (not the one in storm).
>>>
>>> Please, does someone knows a good tutorial explaining how running a
>>> zookeeper server on a cluster for storm?
>>>
>>> I hope I am clear ...
>>>
>>> Kind regards.
>>>
>>> Benjamin SOULAS
>>>
>>>
>>>
>>
>>
>
>
> --
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: supun06@gmail.com; Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com
>
>
Re: Supervisor always down 3s after execution
Posted by Supun Kamburugamuva <su...@gmail.com>.
Usually when this happens, we remove the storm directory from ZooKeeper
using zkCli.sh, remove the storm-local directories and start fresh.
Thanks,
Supun..
On Tue, Sep 2, 2014 at 6:39 AM, Benjamin SOULAS <benjamin.soulas45@gmail.com
> wrote:
> Hi everyone,
>
> I followed your instructions for installing a zookeeper server, i
> downloaded it on the website, extract the tar file somewhere in a machine
> on my cluster, i made those modifications in my zoo.cfg :
>
>
> # The number of milliseconds of each tick
>
> tickTime=2000
>
> # The number of ticks that the initial
>
> # synchronization phase can take
>
> initLimit=10
>
> # The number of ticks that can pass between
>
> # sending a request and getting an acknowledgement
>
> syncLimit=5
>
> # the directory where the snapshot is stored.
>
> # do not use /tmp for storage, /tmp here is just
>
> # example sakes.
>
> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>
> # the port at which the clients will connect
>
> clientPort=2181
>
> # the maximum number of client connections.
>
> # increase this if you need to handle more clients
>
> #maxClientCnxns=60
>
> #
>
> # Be sure to read the maintenance section of the
>
> # administrator guide before turning on autopurge.
>
> #
>
> #
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>
> #
>
> # The number of snapshots to retain in dataDir
>
> #autopurge.snapRetainCount=3
>
> # Purge task interval in hours
>
> # Set to "0" to disable auto purge feature
>
> #autopurge.purgeInterval=1
>
>
> In the log4j.properties, i uncommented the line for the log file :
>
> # Example with rolling log file
>
> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>
>
> Then i went to my storm.yaml (located here in my case, because i took the
> source version) :
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>
>
> This file contain this configuration :
>
> ########### These MUST be filled in for a storm configuration
>
> storm.zookeeper.servers:
>
> - "paradent-4"
>
> # - "paradent-47"
>
> # - "paradent-48"
>
>
> #
>
> nimbus.host: "paradent-4"
>
> #
>
> #
>
> # ##### These may optionally be filled in:
>
> #
>
> ## List of custom serializations
>
> # topology.kryo.register:
>
> # - org.mycompany.MyType
>
> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>
> #
>
> ## List of custom kryo decorators
>
> # topology.kryo.decorators:
>
> # - org.mycompany.MyDecorator
>
> #
>
> ## Locations of the drpc servers
>
> # drpc.servers:
>
> # - "server1"
>
> # - "server2"
>
>
> ## Metrics Consumers
>
> # topology.metrics.consumer.register:
>
> # - class: "backtype.storm.metric.LoggingMetricsConsumer"
>
> # parallelism.hint: 1
>
> # - class: "org.mycompany.MyMetricsConsumer"
>
> # parallelism.hint: 1
>
> # argument:
>
> # - endpoint: "metrics-collector.mycompany.org"
>
> dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>
> storm.zookeeper.port: 2181
>
> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>
> After i launch my *storm ui* for having a visual of my storm app (on
> paradent-4). Until now, everything work fine. Now, the logical way implies
> i launch my supervisor, on a different machine (here *paradent-39*)
> thanks to *storm supervisor*, it is launched but once again, 3 or 4
> seconds after it's down.
>
> So i watched the supervisor.log located :
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>
>
> And here appear a tricky error :
>
> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@220df4c8
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90004, negotiated timeout = 20000
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
> closed
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>
> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181/storm sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@c6d625b
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90005, negotiated timeout = 20000
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
> paradent-39.rennes.grid5000.fr
>
> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>
> java.io.FileNotFoundException: File
> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
> does not exist
>
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
> ~[commons-io-2.4.jar:2.4]
>
> at
> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
> ~[commons-io-2.4.jar:2.4]
>
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>
> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
> I understood that there was a missing file, my question is "why?????". If
> i watch the rights with ls -l at this path :
>
> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>
> I have this :
>
> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>
> So for me this is not the problem, can someone help me? I am really stuck
> here :S
>
> I sincerely hope to be clear and precise enough ...
>
> Kind regards.
>
>
>
>
>
>
> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>
>
>> Hi Benjamin,
>> Storm cluster needs a zookeeper quorum to function.
>> ExclamationTopology accepts command line params to deploy on a storm
>> cluster. If you don't pass any arguments it will use LocalCluster(a
>> simulated local cluster) to deploy.
>> I recommend you to go through
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>> for setting up zookeeper. Here is an excellent write up on storm
>> cluster setup along with zookeeper
>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>> Hope that helps.
>> -Harsha
>>
>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>
>> Hello everyone, i have a problem during implementing storm on a cluster
>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>> github branch with the sources, i succeeded to create my own release (no
>> code modification, just for maven errors that were disturbing...)
>>
>> It's working fine on my own laptop in local, i modified the
>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>> to allow 50 workers in the configuration.
>>
>> Now on a cluster, when I try to do the same thing, supervisors are down
>> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
>> too.
>>
>> I read somewhere on the apache website you need to implement a real
>> zookeeper (not the one in storm).
>>
>> Please, does someone knows a good tutorial explaining how running a
>> zookeeper server on a cluster for storm?
>>
>> I hope I am clear ...
>>
>> Kind regards.
>>
>> Benjamin SOULAS
>>
>>
>>
>
>
--
Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun06@gmail.com; Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com
Re: Supervisor always down 3s after execution
Posted by Benjamin SOULAS <be...@gmail.com>.
Hi everyone,
I followed your instructions for installing a zookeeper server, i
downloaded it on the website, extract the tar file somewhere in a machine
on my cluster, i made those modifications in my zoo.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
In the log4j.properties, i uncommented the line for the log file :
# Example with rolling log file
log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
Then i went to my storm.yaml (located here in my case, because i took the
source version) :
/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
This file contain this configuration :
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "paradent-4"
# - "paradent-47"
# - "paradent-48"
#
nimbus.host: "paradent-4"
#
#
# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
# - org.mycompany.MyType
# - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
# - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
# - "server1"
# - "server2"
## Metrics Consumers
# topology.metrics.consumer.register:
# - class: "backtype.storm.metric.LoggingMetricsConsumer"
# parallelism.hint: 1
# - class: "org.mycompany.MyMetricsConsumer"
# parallelism.hint: 1
# argument:
# - endpoint: "metrics-collector.mycompany.org"
dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
storm.zookeeper.port: 2181
To launch storm on the cluster, i launch it thanks to *storm nimbus *(on a
machine named paradent-4), then my zookeeper Server *sh zkServer.sh start* (on
paradent-4 again)(which create a *zookeeper_server.pid* where the pid of
the zookeeper is written, i know it's obvious ...>_< ).
After i launch my *storm ui* for having a visual of my storm app (on
paradent-4). Until now, everything work fine. Now, the logical way implies
i launch my supervisor, on a different machine (here *paradent-39*) thanks
to *storm supervisor*, it is launched but once again, 3 or 4 seconds after
it's down.
So i watched the supervisor.log located :
/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
And here appear a tricky error :
2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=paradent-4:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@220df4c8
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt to
authenticate using SASL (unknown error)
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90004, negotiated timeout = 20000
2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none
2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004 closed
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=paradent-4:2181/storm sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@c6d625b
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt to
authenticate using SASL (unknown error)
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90005, negotiated timeout = 20000
2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host paradent-39.rennes.grid5000.fr
2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File
'/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
~[commons-io-2.4.jar:2.4]
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
~[commons-io-2.4.jar:2.4]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
at
backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
processing an event")
I understood that there was a missing file, my question is "why?????". If i
watch the rights with ls -l at this path :
/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
I have this :
drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
So for me this is not the problem, can someone help me? I am really stuck
here :S
I sincerely hope to be clear and precise enough ...
Kind regards.
2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>
> Hi Benjamin,
> Storm cluster needs a zookeeper quorum to function.
> ExclamationTopology accepts command line params to deploy on a storm
> cluster. If you don't pass any arguments it will use LocalCluster(a
> simulated local cluster) to deploy.
> I recommend you to go through
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
> for setting up zookeeper. Here is an excellent write up on storm cluster
> setup along with zookeeper
> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
> Hope that helps.
> -Harsha
>
> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>
> Hello everyone, i have a problem during implementing storm on a cluster
> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
> github branch with the sources, i succeeded to create my own release (no
> code modification, just for maven errors that were disturbing...)
>
> It's working fine on my own laptop in local, i modified the
> ExclamationTopology in adding 40 more bolts. I also modified this Topology
> to allow 50 workers in the configuration.
>
> Now on a cluster, when I try to do the same thing, supervisors are down
> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
> too.
>
> I read somewhere on the apache website you need to implement a real
> zookeeper (not the one in storm).
>
> Please, does someone knows a good tutorial explaining how running a
> zookeeper server on a cluster for storm?
>
> I hope I am clear ...
>
> Kind regards.
>
> Benjamin SOULAS
>
>
>
Re: Supervisor always down 3s after execution
Posted by Harsha <st...@harsha.io>.
Hi Benjamin,
Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.
I recommend you to go through
[1]http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [2]http://www.michael-noll.com/tutorials/running-mult
i-node-storm-cluster/.
Hope that helps.
-Harsha
On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)
It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.
Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.
I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).
Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?
I hope I am clear ...
Kind regards.
Benjamin SOULAS
References
Visible links
1. http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
2. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/
Hidden links:
4. http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html