You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Benjamin SOULAS <be...@gmail.com> on 2014/08/29 14:34:00 UTC

Supervisor always down 3s after execution

Hello everyone, i have a problem during implementing storm on a cluster
(Grid 5000 if anyone knows). I took the inubator-storm-master from the
github branch with the sources, i succeeded to create my own release (no
code modification, just for maven errors that were disturbing...)

It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified this Topology
to allow 50 workers in the configuration.

Now on a cluster, when I try to do the same thing, supervisors are down
just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
too.

I read somewhere on the apache website you need to implement a real
zookeeper (not the one in storm).

Please, does someone knows a good tutorial explaining how running a
zookeeper server on a cluster for storm?

I hope I am clear ...

Kind regards.

Benjamin SOULAS

Re: Supervisor always down 3s after execution

Posted by Benjamin SOULAS <be...@gmail.com>.

Hi everyone,

I found the how to fix it In fact, on my cluster, we use the NFS technology
and I didn't change my storm.yaml where I had to define *storm.local.dir*.
Indeed, if you don't define it, storm will take it in the file
defaults.yaml where it defines this : *storm.local.dir: "storm-local"*.

The problem here, where you have NFS on your cluster is it create the
storm-local directory in your home. But in NFS as you should probably know,
every nodes have access to the same home.

So when storm creates is directory "storm-local", inside of it, he create a
directory named "supervisor". But every time you launch a Supervisor, it
erases the last one created so the last supervisor will be stopped.

So the only solution is to modify your storm.yaml and insert the new
storm.local.dir. In my case, because I am on an NFS cluster, I had to put
this path : *storm.local.dir: "/tmp/storm-local"* because it is not in your
home directory so it won't be shared between your nodes.

So every nodes launching a Supervisor will have its own Supervisor
directory.

I sincerely hope to be clear and that could help someone.

Benjamin.

2014-09-02 19:53 GMT+02:00 Benjamin SOULAS <be...@gmail.com>:

> Hi Harsha,
>
> You're right, I didn't export STORM_HOME ...
>
> I will do it, maybe this is the problem.
>
> Thanks
>
>
> 2014-09-02 18:08 GMT+02:00 Harsha <st...@harsha.io>:
>
>>  Hi Benjamin,
>>          Correct me if I missed it  , in your config  I don't see
>> storm.local.dir defined. If its not defined in config storm will create one
>> in the storm_installation dir which seems to be
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
>> and are you running the supervisor and nimbus as user "bsoulas". When you
>> are running "storm nimbus or storm supervisor" command which storm command
>> its pointing. Did you export
>> STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben"
>> and also added it to PATH. I am checking to see if you had any previous
>> installation of storm and invoking the storm command from previous
>> installation.
>>  Can you also check zookeeper logs .
>> -Harsha
>>
>> On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
>>
>> Hi everyone,
>>
>> I followed your instructions for installing a zookeeper server, i
>> downloaded it on the website, extract the tar file somewhere in a machine
>> on my cluster, i made those modifications in my zoo.cfg :
>>
>>
>>
>> # The number of milliseconds of each tick
>>
>> tickTime=2000
>>
>> # The number of ticks that the initial
>>
>> # synchronization phase can take
>>
>> initLimit=10
>>
>> # The number of ticks that can pass between
>>
>> # sending a request and getting an acknowledgement
>>
>> syncLimit=5
>>
>> # the directory where the snapshot is stored.
>>
>> # do not use /tmp for storage, /tmp here is just
>>
>> # example sakes.
>>
>> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>>
>> # the port at which the clients will connect
>>
>> clientPort=2181
>>
>> # the maximum number of client connections.
>>
>> # increase this if you need to handle more clients
>>
>> #maxClientCnxns=60
>>
>> #
>>
>> # Be sure to read the maintenance section of the
>>
>> # administrator guide before turning on autopurge.
>>
>> #
>>
>> #
>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>
>> #
>>
>> # The number of snapshots to retain in dataDir
>>
>> #autopurge.snapRetainCount=3
>>
>> # Purge task interval in hours
>>
>> # Set to "0" to disable auto purge feature
>>
>> #autopurge.purgeInterval=1
>>
>>
>> In the log4j.properties, i uncommented the line for the log file :
>>
>>
>> # Example with rolling log file
>>
>> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>>
>>
>> Then i went to my storm.yaml (located here in my case, because i took the
>> source version) :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>>
>>
>> This file contain this configuration :
>>
>>
>> ########### These MUST be filled in for a storm configuration
>>
>>  storm.zookeeper.servers:
>>
>>      - "paradent-4"
>>
>> #     - "paradent-47"
>>
>> #     - "paradent-48"
>>
>>
>> #
>>
>>  nimbus.host: "paradent-4"
>>
>> #
>>
>> #
>>
>> # ##### These may optionally be filled in:
>>
>> #
>>
>> ## List of custom serializations
>>
>> # topology.kryo.register:
>>
>> #     - org.mycompany.MyType
>>
>> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>>
>> #
>>
>> ## List of custom kryo decorators
>>
>> # topology.kryo.decorators:
>>
>> #     - org.mycompany.MyDecorator
>>
>> #
>>
>> ## Locations of the drpc servers
>>
>> # drpc.servers:
>>
>> #     - "server1"
>>
>> #     - "server2"
>>
>>
>> ## Metrics Consumers
>>
>> # topology.metrics.consumer.register:
>>
>> #   - class: "backtype.storm.metric.LoggingMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #   - class: "org.mycompany.MyMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #     argument:
>>
>> #       - endpoint: "metrics-collector.mycompany.org"
>>
>>  dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
>> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>>
>>  storm.zookeeper.port: 2181
>>
>> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
>> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
>> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
>> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>>
>> After i launch my *storm ui* for having a visual of my storm app (on
>> paradent-4). Until now, everything work fine. Now, the logical way implies
>> i launch my supervisor, on a different machine (here *paradent-39*)
>> thanks to *storm supervisor*, it is launched but once again, 3 or 4
>> seconds after it's down.
>>
>> So i watched the supervisor.log located :
>>
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>>
>>
>> And here appear a tricky error :
>>
>>
>> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181 sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@220df4c8
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90004, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
>> closed
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>>
>> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181/storm sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@c6d625b
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90005, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
>> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
>> paradent-39.rennes.grid5000.fr
>>
>> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>>
>> java.io.FileNotFoundException: File
>> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
>> does not exist
>>
>> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at
>> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at backtype.storm.utils.LocalState.get(LocalState.java:56)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>>
>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>>
>> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
>> ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>>
>> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
>> processing an event")
>>
>>
>> I understood that there was a missing file, my question is "why?????". If
>> i watch the rights with ls -l at this path :
>>
>>
>> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>>
>> I have this :
>>
>>
>> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>>
>> So for me this is not the problem, can someone help me? I am really stuck
>> here :S
>>
>> I sincerely hope to be clear and precise enough ...
>>
>> Kind regards.
>>
>>
>>
>>
>>
>>
>> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>>
>>
>>
>> Hi Benjamin,
>>             Storm cluster needs a zookeeper quorum to function.
>> ExclamationTopology accepts command line params to deploy on a storm
>> cluster. If you don't pass any arguments it will use LocalCluster(a
>> simulated local cluster) to deploy.
>>  I recommend you to go through
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>>  for setting up zookeeper. Here is an excellent write up on storm
>> cluster setup along with zookeeper
>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>>  Hope that helps.
>> -Harsha
>>
>>
>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>
>> Hello everyone, i have a problem during implementing storm on a cluster
>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>> github branch with the sources, i succeeded to create my own release (no
>> code modification, just for maven errors that were disturbing...)
>>
>> It's working fine on my own laptop in local, i modified the
>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>> to allow 50 workers in the configuration.
>>
>>  Now on a cluster, when I try to do the same thing, supervisors are down
>> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
>> too.
>>
>>  I read somewhere on the apache website you need to implement a real
>> zookeeper (not the one in storm).
>>
>>  Please, does someone knows a good tutorial explaining how running a
>> zookeeper server on a cluster for storm?
>>
>>  I hope I am clear ...
>>
>>  Kind regards.
>>
>>  Benjamin SOULAS
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Supervisor always down 3s after execution

Posted by Benjamin SOULAS <be...@gmail.com>.

Hi Harsha,

You're right, I didn't export STORM_HOME ...

I will do it, maybe this is the problem.

Thanks


2014-09-02 18:08 GMT+02:00 Harsha <st...@harsha.io>:

>  Hi Benjamin,
>          Correct me if I missed it  , in your config  I don't see
> storm.local.dir defined. If its not defined in config storm will create one
> in the storm_installation dir which seems to be
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/
> and are you running the supervisor and nimbus as user "bsoulas". When you
> are running "storm nimbus or storm supervisor" command which storm command
> its pointing. Did you export
> STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben"
> and also added it to PATH. I am checking to see if you had any previous
> installation of storm and invoking the storm command from previous
> installation.
>  Can you also check zookeeper logs .
> -Harsha
>
> On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:
>
> Hi everyone,
>
> I followed your instructions for installing a zookeeper server, i
> downloaded it on the website, extract the tar file somewhere in a machine
> on my cluster, i made those modifications in my zoo.cfg :
>
>
>
> # The number of milliseconds of each tick
>
> tickTime=2000
>
> # The number of ticks that the initial
>
> # synchronization phase can take
>
> initLimit=10
>
> # The number of ticks that can pass between
>
> # sending a request and getting an acknowledgement
>
> syncLimit=5
>
> # the directory where the snapshot is stored.
>
> # do not use /tmp for storage, /tmp here is just
>
> # example sakes.
>
> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>
> # the port at which the clients will connect
>
> clientPort=2181
>
> # the maximum number of client connections.
>
> # increase this if you need to handle more clients
>
> #maxClientCnxns=60
>
> #
>
> # Be sure to read the maintenance section of the
>
> # administrator guide before turning on autopurge.
>
> #
>
> #
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>
> #
>
> # The number of snapshots to retain in dataDir
>
> #autopurge.snapRetainCount=3
>
> # Purge task interval in hours
>
> # Set to "0" to disable auto purge feature
>
> #autopurge.purgeInterval=1
>
>
> In the log4j.properties, i uncommented the line for the log file :
>
>
> # Example with rolling log file
>
> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>
>
> Then i went to my storm.yaml (located here in my case, because i took the
> source version) :
>
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>
>
> This file contain this configuration :
>
>
> ########### These MUST be filled in for a storm configuration
>
>  storm.zookeeper.servers:
>
>      - "paradent-4"
>
> #     - "paradent-47"
>
> #     - "paradent-48"
>
>
> #
>
>  nimbus.host: "paradent-4"
>
> #
>
> #
>
> # ##### These may optionally be filled in:
>
> #
>
> ## List of custom serializations
>
> # topology.kryo.register:
>
> #     - org.mycompany.MyType
>
> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>
> #
>
> ## List of custom kryo decorators
>
> # topology.kryo.decorators:
>
> #     - org.mycompany.MyDecorator
>
> #
>
> ## Locations of the drpc servers
>
> # drpc.servers:
>
> #     - "server1"
>
> #     - "server2"
>
>
> ## Metrics Consumers
>
> # topology.metrics.consumer.register:
>
> #   - class: "backtype.storm.metric.LoggingMetricsConsumer"
>
> #     parallelism.hint: 1
>
> #   - class: "org.mycompany.MyMetricsConsumer"
>
> #     parallelism.hint: 1
>
> #     argument:
>
> #       - endpoint: "metrics-collector.mycompany.org"
>
>  dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>
>  storm.zookeeper.port: 2181
>
> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>
> After i launch my *storm ui* for having a visual of my storm app (on
> paradent-4). Until now, everything work fine. Now, the logical way implies
> i launch my supervisor, on a different machine (here *paradent-39*)
> thanks to *storm supervisor*, it is launched but once again, 3 or 4
> seconds after it's down.
>
> So i watched the supervisor.log located :
>
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>
>
> And here appear a tricky error :
>
>
> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@220df4c8
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90004, negotiated timeout = 20000
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
> closed
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>
> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181/storm sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@c6d625b
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90005, negotiated timeout = 20000
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
> paradent-39.rennes.grid5000.fr
>
> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>
> java.io.FileNotFoundException: File
> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
> does not exist
>
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
> ~[commons-io-2.4.jar:2.4]
>
> at
> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
> ~[commons-io-2.4.jar:2.4]
>
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>
> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
> I understood that there was a missing file, my question is "why?????". If
> i watch the rights with ls -l at this path :
>
>
> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>
> I have this :
>
>
> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>
> So for me this is not the problem, can someone help me? I am really stuck
> here :S
>
> I sincerely hope to be clear and precise enough ...
>
> Kind regards.
>
>
>
>
>
>
> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>
>
>
> Hi Benjamin,
>             Storm cluster needs a zookeeper quorum to function.
> ExclamationTopology accepts command line params to deploy on a storm
> cluster. If you don't pass any arguments it will use LocalCluster(a
> simulated local cluster) to deploy.
>  I recommend you to go through
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>  for setting up zookeeper. Here is an excellent write up on storm cluster
> setup along with zookeeper
> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>  Hope that helps.
> -Harsha
>
>
> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>
> Hello everyone, i have a problem during implementing storm on a cluster
> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
> github branch with the sources, i succeeded to create my own release (no
> code modification, just for maven errors that were disturbing...)
>
> It's working fine on my own laptop in local, i modified the
> ExclamationTopology in adding 40 more bolts. I also modified this Topology
> to allow 50 workers in the configuration.
>
>  Now on a cluster, when I try to do the same thing, supervisors are down
> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
> too.
>
>  I read somewhere on the apache website you need to implement a real
> zookeeper (not the one in storm).
>
>  Please, does someone knows a good tutorial explaining how running a
> zookeeper server on a cluster for storm?
>
>  I hope I am clear ...
>
>  Kind regards.
>
>  Benjamin SOULAS
>
>
>
>
>
>
>

Re: Supervisor always down 3s after execution

Posted by Harsha <st...@harsha.io>.

Hi Benjamin,

         Correct me if I missed it  , in your config  I don't
see storm.local.dir defined. If its not defined in config storm
will create one in the storm_installation dir which seems to
be

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/

and are you running the supervisor and nimbus as user
"bsoulas". When you are running "storm nimbus or storm
supervisor" command which storm command its pointing. Did you
export
STORM_HOME=/home/bsoulas/incubator-storm-master/storm-dist/bina
ry/target/apache-storm-0.9.3-ben" and also added it to PATH. I
am checking to see if you had any previous installation of
storm and invoking the storm command from previous
installation.

Can you also check zookeeper logs .

-Harsha



On Tue, Sep 2, 2014, at 03:39 AM, Benjamin SOULAS wrote:

Hi everyone,

I followed your instructions for installing a zookeeper server,
i downloaded it on the website, extract the tar file somewhere
in a machine on my cluster, i made those modifications in my
zoo.cfg :


# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

#
[1]http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#
sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1


In the log4j.properties, i uncommented the line for the log
file :

# Example with rolling log file

log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE


Then i went to my storm.yaml (located here in my case, because
i took the source version) :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf


This file contain this configuration :

########### These MUST be filled in for a storm configuration

 storm.zookeeper.servers:

     - "paradent-4"

#     - "paradent-47"

#     - "paradent-48"

#

 nimbus.host: "paradent-4"

#

#

# ##### These may optionally be filled in:

#

## List of custom serializations

# topology.kryo.register:

#     - org.mycompany.MyType

#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer

#

## List of custom kryo decorators

# topology.kryo.decorators:

#     - org.mycompany.MyDecorator

#

## Locations of the drpc servers

# drpc.servers:

#     - "server1"

#     - "server2"

## Metrics Consumers

# topology.metrics.consumer.register:

#   - class: "backtype.storm.metric.LoggingMetricsConsumer"

#     parallelism.hint: 1

#   - class: "org.mycompany.MyMetricsConsumer"

#     parallelism.hint: 1

#     argument:

#       - endpoint: "[2]metrics-collector.mycompany.org"

 dev.zookeeper.path:
"paradent-4.rennes.grid5000.fr:~/home/bsoulas/zookeeper/zookeep
er-3.4.6/"

 storm.zookeeper.port: 2181

To launch storm on the cluster, i launch it thanks to storm
nimbus (on a machine named paradent-4), then my zookeeper
Server sh zkServer.sh start (on paradent-4 again)(which create
a zookeeper_server.pid where the pid of the zookeeper is
written, i know it's obvious ...>_< ).

After i launch my storm ui for having a visual of my storm app
(on paradent-4). Until now, everything work fine. Now, the
logical way implies i launch my supervisor, on a different
machine (here paradent-39) thanks to storm supervisor, it is
launched but once again, 3 or 4 seconds after it's down.

So i watched the supervisor.log located :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/a
pache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs


And here appear a tricky error :

2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting

2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@220df4c8

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[3]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection
established to
[4]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[5]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90004, negotiated timeout = 20000

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state
update: :connected:none

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session:
0x14835a48ca90004 closed

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut
down

2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO]
Starting

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client
connection, connectString=paradent-4:2181/storm
sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@c6d625b

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket
connection to server
[6]paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not
attempt to authenticate using SASL (unknown error)

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection
established to
[7]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating
session

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session
establishment complete on server
[8]paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90005, negotiated timeout = 20000

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO]
State change: CONNECTED

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN]
There are no ConnectionStateListeners registered.

2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor
with id 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
[9]paradent-39.rennes.grid5000.fr

2014-09-02 09:31:39 b.s.event [ERROR] Error when processing
event

java.io.FileNotFoundException: File
'/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73
cbaf7c5c/heartbeats/1409146760275' does not exist

at
org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:
299) ~[commons-io-2.4.jar:2.4]

at
org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.j
ava:1763) ~[commons-io-2.4.jar:2.4]

at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(s
upervisor.clj:77) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6
381__6385$fn__6386.invoke(supervisor.clj:90)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.LazySeq.sval(LazySeq.java:42)
~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.seq(LazySeq.java:60)
~[clojure-1.5.1.jar:na]

at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.next(LazySeq.java:92)
~[clojure-1.5.1.jar:na]

at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]

at clojure.core$next.invoke(core.clj:64)
~[clojure-1.5.1.jar:na]

at clojure.core$dorun.invoke(core.clj:2781)
~[clojure-1.5.1.jar:na]

at clojure.core$doall.invoke(core.clj:2796)
~[clojure-1.5.1.jar:na]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(
supervisor.clj:89) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_allocated_workers.invoke(
supervisor.clj:106) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervis
or.clj:209) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.applyToHelper(AFn.java:161)
[clojure-1.5.1.jar:na]

at clojure.lang.AFn.applyTo(AFn.java:151)
[clojure-1.5.1.jar:na]

at clojure.core$apply.invoke(core.clj:619)
~[clojure-1.5.1.jar:na]

at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.5.1.jar:na]

at
backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39
) ~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error
when processing an event")


I understood that there was a missing file, my question is
"why?????". If i watch the rights with ls -l at this path :

/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73c
baf7c5c/

I have this :

drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats

So for me this is not the problem, can someone help me? I am
really stuck here :S

I sincerely hope to be clear and precise enough ...

Kind regards.






2014-08-29 16:47 GMT+02:00 Harsha <[1...@harsha.io>:


Hi Benjamin,
            Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.
I recommend you to go through
[11]http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [12]http://www.michael-noll.com/tutorials/running-mul
ti-node-storm-cluster/.
Hope that helps.
-Harsha

On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:

Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)

It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.

Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.

I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).

Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?

I hope I am clear ...

Kind regards.

Benjamin SOULAS

References

Visible links
1. http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
2. http://metrics-collector.mycompany.org/
3. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
4. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
5. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
6. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
7. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
8. http://paradent-4.rennes.grid5000.fr/172.16.97.4:2181
9. http://paradent-39.rennes.grid5000.fr/
  10. mailto:storm@harsha.io
  11. http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
  12. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/

Hidden links:
  14. http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html

Re: Supervisor always down 3s after execution

Posted by Benjamin SOULAS <be...@gmail.com>.

Hi Supun,

It works at first but then it crash, again ...




2014-09-02 16:43 GMT+02:00 Supun Kamburugamuva <su...@gmail.com>:

> Usually when this happens, we remove the storm directory from ZooKeeper
> using zkCli.sh, remove the storm-local directories and start fresh.
>
> Thanks,
> Supun..
>
>
> On Tue, Sep 2, 2014 at 6:39 AM, Benjamin SOULAS <
> benjamin.soulas45@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I followed your instructions for installing a zookeeper server, i
>> downloaded it on the website, extract the tar file somewhere in a machine
>> on my cluster, i made those modifications in my zoo.cfg :
>>
>>
>> # The number of milliseconds of each tick
>>
>> tickTime=2000
>>
>> # The number of ticks that the initial
>>
>> # synchronization phase can take
>>
>> initLimit=10
>>
>> # The number of ticks that can pass between
>>
>> # sending a request and getting an acknowledgement
>>
>> syncLimit=5
>>
>> # the directory where the snapshot is stored.
>>
>> # do not use /tmp for storage, /tmp here is just
>>
>> # example sakes.
>>
>> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>>
>> # the port at which the clients will connect
>>
>> clientPort=2181
>>
>> # the maximum number of client connections.
>>
>> # increase this if you need to handle more clients
>>
>> #maxClientCnxns=60
>>
>> #
>>
>> # Be sure to read the maintenance section of the
>>
>> # administrator guide before turning on autopurge.
>>
>> #
>>
>> #
>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>
>> #
>>
>> # The number of snapshots to retain in dataDir
>>
>> #autopurge.snapRetainCount=3
>>
>> # Purge task interval in hours
>>
>> # Set to "0" to disable auto purge feature
>>
>> #autopurge.purgeInterval=1
>>
>>
>> In the log4j.properties, i uncommented the line for the log file :
>>
>> # Example with rolling log file
>>
>> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>>
>>
>> Then i went to my storm.yaml (located here in my case, because i took the
>> source version) :
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>>
>>
>> This file contain this configuration :
>>
>> ########### These MUST be filled in for a storm configuration
>>
>>  storm.zookeeper.servers:
>>
>>      - "paradent-4"
>>
>> #     - "paradent-47"
>>
>> #     - "paradent-48"
>>
>>
>> #
>>
>>  nimbus.host: "paradent-4"
>>
>> #
>>
>> #
>>
>> # ##### These may optionally be filled in:
>>
>> #
>>
>> ## List of custom serializations
>>
>> # topology.kryo.register:
>>
>> #     - org.mycompany.MyType
>>
>> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>>
>> #
>>
>> ## List of custom kryo decorators
>>
>> # topology.kryo.decorators:
>>
>> #     - org.mycompany.MyDecorator
>>
>> #
>>
>> ## Locations of the drpc servers
>>
>> # drpc.servers:
>>
>> #     - "server1"
>>
>> #     - "server2"
>>
>>
>> ## Metrics Consumers
>>
>> # topology.metrics.consumer.register:
>>
>> #   - class: "backtype.storm.metric.LoggingMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #   - class: "org.mycompany.MyMetricsConsumer"
>>
>> #     parallelism.hint: 1
>>
>> #     argument:
>>
>> #       - endpoint: "metrics-collector.mycompany.org"
>>
>>  dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
>> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>>
>>  storm.zookeeper.port: 2181
>>
>> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
>> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
>> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
>> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>>
>> After i launch my *storm ui* for having a visual of my storm app (on
>> paradent-4). Until now, everything work fine. Now, the logical way implies
>> i launch my supervisor, on a different machine (here *paradent-39*)
>> thanks to *storm supervisor*, it is launched but once again, 3 or 4
>> seconds after it's down.
>>
>> So i watched the supervisor.log located :
>>
>>
>> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>>
>>
>> And here appear a tricky error :
>>
>> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181 sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@220df4c8
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90004, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
>> closed
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>>
>> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>
>> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=paradent-4:2181/storm sessionTimeout=20000
>> watcher=org.apache.curator.ConnectionState@c6d625b
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
>> to authenticate using SASL (unknown error)
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
>> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>>
>> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181,
>> sessionid = 0x14835a48ca90005, negotiated timeout = 20000
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> CONNECTED
>>
>> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
>> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
>> paradent-39.rennes.grid5000.fr
>>
>> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>>
>> java.io.FileNotFoundException: File
>> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
>> does not exist
>>
>> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at
>> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
>> ~[commons-io-2.4.jar:2.4]
>>
>> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at backtype.storm.utils.LocalState.get(LocalState.java:56)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>>
>> at
>> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at
>> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>>
>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>>
>> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>>
>> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
>> ~[clojure-1.5.1.jar:na]
>>
>> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>>
>> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
>> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>>
>> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
>> processing an event")
>>
>>
>> I understood that there was a missing file, my question is "why?????". If
>> i watch the rights with ls -l at this path :
>>
>> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>>
>> I have this :
>>
>> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>>
>> So for me this is not the problem, can someone help me? I am really stuck
>> here :S
>>
>> I sincerely hope to be clear and precise enough ...
>>
>> Kind regards.
>>
>>
>>
>>
>>
>>
>> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>>
>>
>>> Hi Benjamin,
>>>             Storm cluster needs a zookeeper quorum to function.
>>> ExclamationTopology accepts command line params to deploy on a storm
>>> cluster. If you don't pass any arguments it will use LocalCluster(a
>>> simulated local cluster) to deploy.
>>>  I recommend you to go through
>>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>>>  for setting up zookeeper. Here is an excellent write up on storm
>>> cluster setup along with zookeeper
>>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>>>  Hope that helps.
>>> -Harsha
>>>
>>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>>
>>> Hello everyone, i have a problem during implementing storm on a cluster
>>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>>> github branch with the sources, i succeeded to create my own release (no
>>> code modification, just for maven errors that were disturbing...)
>>>
>>> It's working fine on my own laptop in local, i modified the
>>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>>> to allow 50 workers in the configuration.
>>>
>>>  Now on a cluster, when I try to do the same thing, supervisors are
>>> down just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm
>>> ui too.
>>>
>>>  I read somewhere on the apache website you need to implement a real
>>> zookeeper (not the one in storm).
>>>
>>>  Please, does someone knows a good tutorial explaining how running a
>>> zookeeper server on a cluster for storm?
>>>
>>>  I hope I am clear ...
>>>
>>>  Kind regards.
>>>
>>>  Benjamin SOULAS
>>>
>>>
>>>
>>
>>
>
>
> --
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com
>
>

Re: Supervisor always down 3s after execution

Posted by Supun Kamburugamuva <su...@gmail.com>.

Usually when this happens, we remove the storm directory from ZooKeeper
using zkCli.sh, remove the storm-local directories and start fresh.

Thanks,
Supun..


On Tue, Sep 2, 2014 at 6:39 AM, Benjamin SOULAS <benjamin.soulas45@gmail.com
> wrote:

> Hi everyone,
>
> I followed your instructions for installing a zookeeper server, i
> downloaded it on the website, extract the tar file somewhere in a machine
> on my cluster, i made those modifications in my zoo.cfg :
>
>
> # The number of milliseconds of each tick
>
> tickTime=2000
>
> # The number of ticks that the initial
>
> # synchronization phase can take
>
> initLimit=10
>
> # The number of ticks that can pass between
>
> # sending a request and getting an acknowledgement
>
> syncLimit=5
>
> # the directory where the snapshot is stored.
>
> # do not use /tmp for storage, /tmp here is just
>
> # example sakes.
>
> dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/
>
> # the port at which the clients will connect
>
> clientPort=2181
>
> # the maximum number of client connections.
>
> # increase this if you need to handle more clients
>
> #maxClientCnxns=60
>
> #
>
> # Be sure to read the maintenance section of the
>
> # administrator guide before turning on autopurge.
>
> #
>
> #
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>
> #
>
> # The number of snapshots to retain in dataDir
>
> #autopurge.snapRetainCount=3
>
> # Purge task interval in hours
>
> # Set to "0" to disable auto purge feature
>
> #autopurge.purgeInterval=1
>
>
> In the log4j.properties, i uncommented the line for the log file :
>
> # Example with rolling log file
>
> log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
>
>
> Then i went to my storm.yaml (located here in my case, because i took the
> source version) :
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf
>
>
> This file contain this configuration :
>
> ########### These MUST be filled in for a storm configuration
>
>  storm.zookeeper.servers:
>
>      - "paradent-4"
>
> #     - "paradent-47"
>
> #     - "paradent-48"
>
>
> #
>
>  nimbus.host: "paradent-4"
>
> #
>
> #
>
> # ##### These may optionally be filled in:
>
> #
>
> ## List of custom serializations
>
> # topology.kryo.register:
>
> #     - org.mycompany.MyType
>
> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>
> #
>
> ## List of custom kryo decorators
>
> # topology.kryo.decorators:
>
> #     - org.mycompany.MyDecorator
>
> #
>
> ## Locations of the drpc servers
>
> # drpc.servers:
>
> #     - "server1"
>
> #     - "server2"
>
>
> ## Metrics Consumers
>
> # topology.metrics.consumer.register:
>
> #   - class: "backtype.storm.metric.LoggingMetricsConsumer"
>
> #     parallelism.hint: 1
>
> #   - class: "org.mycompany.MyMetricsConsumer"
>
> #     parallelism.hint: 1
>
> #     argument:
>
> #       - endpoint: "metrics-collector.mycompany.org"
>
>  dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
> ~/home/bsoulas/zookeeper/zookeeper-3.4.6/"
>
>  storm.zookeeper.port: 2181
>
> To launch storm on the cluster, i launch it thanks to *storm nimbus *(on
> a machine named paradent-4), then my zookeeper Server *sh zkServer.sh
> start* (on paradent-4 again)(which create a *zookeeper_server.pid* where
> the pid of the zookeeper is written, i know it's obvious ...>_< ).
>
> After i launch my *storm ui* for having a visual of my storm app (on
> paradent-4). Until now, everything work fine. Now, the logical way implies
> i launch my supervisor, on a different machine (here *paradent-39*)
> thanks to *storm supervisor*, it is launched but once again, 3 or 4
> seconds after it's down.
>
> So i watched the supervisor.log located :
>
>
> /home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs
>
>
> And here appear a tricky error :
>
> 2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@220df4c8
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90004, negotiated timeout = 20000
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004
> closed
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down
>
> 2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
>
> 2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=paradent-4:2181/storm sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@c6d625b
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt
> to authenticate using SASL (unknown error)
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
> to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session
>
> 2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
> 0x14835a48ca90005, negotiated timeout = 20000
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED
>
> 2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
> 2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
> 280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host
> paradent-39.rennes.grid5000.fr
>
> 2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event
>
> java.io.FileNotFoundException: File
> '/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
> does not exist
>
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
> ~[commons-io-2.4.jar:2.4]
>
> at
> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
> ~[commons-io-2.4.jar:2.4]
>
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
>
> at
> backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
>
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
>
> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
>
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
>
> at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
> ~[storm-core-0.9.3-ben.jar:0.9.3-ben]
>
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
> 2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
> I understood that there was a missing file, my question is "why?????". If
> i watch the rights with ls -l at this path :
>
> /home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/
>
> I have this :
>
> drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats
>
> So for me this is not the problem, can someone help me? I am really stuck
> here :S
>
> I sincerely hope to be clear and precise enough ...
>
> Kind regards.
>
>
>
>
>
>
> 2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:
>
>
>> Hi Benjamin,
>>             Storm cluster needs a zookeeper quorum to function.
>> ExclamationTopology accepts command line params to deploy on a storm
>> cluster. If you don't pass any arguments it will use LocalCluster(a
>> simulated local cluster) to deploy.
>>  I recommend you to go through
>> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
>> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>>  for setting up zookeeper. Here is an excellent write up on storm
>> cluster setup along with zookeeper
>> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>>  Hope that helps.
>> -Harsha
>>
>> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>>
>> Hello everyone, i have a problem during implementing storm on a cluster
>> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
>> github branch with the sources, i succeeded to create my own release (no
>> code modification, just for maven errors that were disturbing...)
>>
>> It's working fine on my own laptop in local, i modified the
>> ExclamationTopology in adding 40 more bolts. I also modified this Topology
>> to allow 50 workers in the configuration.
>>
>>  Now on a cluster, when I try to do the same thing, supervisors are down
>> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
>> too.
>>
>>  I read somewhere on the apache website you need to implement a real
>> zookeeper (not the one in storm).
>>
>>  Please, does someone knows a good tutorial explaining how running a
>> zookeeper server on a cluster for storm?
>>
>>  I hope I am clear ...
>>
>>  Kind regards.
>>
>>  Benjamin SOULAS
>>
>>
>>
>
>


-- 
Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com

Re: Supervisor always down 3s after execution

Posted by Benjamin SOULAS <be...@gmail.com>.

Hi everyone,

I followed your instructions for installing a zookeeper server, i
downloaded it on the website, extract the tar file somewhere in a machine
on my cluster, i made those modifications in my zoo.cfg :


# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/home/bsoulas/zookeeper/zookeeper-3.4.6/data/

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1


In the log4j.properties, i uncommented the line for the log file :

# Example with rolling log file

log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE


Then i went to my storm.yaml (located here in my case, because i took the
source version) :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/conf


This file contain this configuration :

########### These MUST be filled in for a storm configuration

 storm.zookeeper.servers:

     - "paradent-4"

#     - "paradent-47"

#     - "paradent-48"


#

 nimbus.host: "paradent-4"

#

#

# ##### These may optionally be filled in:

#

## List of custom serializations

# topology.kryo.register:

#     - org.mycompany.MyType

#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer

#

## List of custom kryo decorators

# topology.kryo.decorators:

#     - org.mycompany.MyDecorator

#

## Locations of the drpc servers

# drpc.servers:

#     - "server1"

#     - "server2"


## Metrics Consumers

# topology.metrics.consumer.register:

#   - class: "backtype.storm.metric.LoggingMetricsConsumer"

#     parallelism.hint: 1

#   - class: "org.mycompany.MyMetricsConsumer"

#     parallelism.hint: 1

#     argument:

#       - endpoint: "metrics-collector.mycompany.org"

 dev.zookeeper.path: "paradent-4.rennes.grid5000.fr:
~/home/bsoulas/zookeeper/zookeeper-3.4.6/"

 storm.zookeeper.port: 2181

To launch storm on the cluster, i launch it thanks to *storm nimbus *(on a
machine named paradent-4), then my zookeeper Server *sh zkServer.sh start* (on
paradent-4 again)(which create a *zookeeper_server.pid* where the pid of
the zookeeper is written, i know it's obvious ...>_< ).

After i launch my *storm ui* for having a visual of my storm app (on
paradent-4). Until now, everything work fine. Now, the logical way implies
i launch my supervisor, on a different machine (here *paradent-39*) thanks
to *storm supervisor*, it is launched but once again, 3 or 4 seconds after
it's down.

So i watched the supervisor.log located :

/home/bsoulas/incubator-storm-master/storm-dist/binary/target/apache-storm-0.9.3-ben/apache-storm-0.9.3-ben/logs


And here appear a tricky error :

2014-09-02 09:31:37 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting

2014-09-02 09:31:37 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=paradent-4:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@220df4c8

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Opening socket connection to
server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt to
authenticate using SASL (unknown error)

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Socket connection established
to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session

2014-09-02 09:31:37 o.a.z.ClientCnxn [INFO] Session establishment complete
on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90004, negotiated timeout = 20000

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED

2014-09-02 09:31:37 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.

2014-09-02 09:31:37 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Session: 0x14835a48ca90004 closed

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] EventThread shut down

2014-09-02 09:31:38 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting

2014-09-02 09:31:38 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=paradent-4:2181/storm sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@c6d625b

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Opening socket connection to
server paradent-4.rennes.grid5000.fr/172.16.97.4:2181. Will not attempt to
authenticate using SASL (unknown error)

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Socket connection established
to paradent-4.rennes.grid5000.fr/172.16.97.4:2181, initiating session

2014-09-02 09:31:38 o.a.z.ClientCnxn [INFO] Session establishment complete
on server paradent-4.rennes.grid5000.fr/172.16.97.4:2181, sessionid =
0x14835a48ca90005, negotiated timeout = 20000

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED

2014-09-02 09:31:38 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.

2014-09-02 09:31:38 b.s.d.supervisor [INFO] Starting supervisor with id
280caffa-d6c5-4fd4-8282-7d8c1dec7e66 at host paradent-39.rennes.grid5000.fr

2014-09-02 09:31:39 b.s.event [ERROR] Error when processing event

java.io.FileNotFoundException: File
'/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/heartbeats/1409146760275'
does not exist

at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299)
~[commons-io-2.4.jar:2.4]

at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763)
~[commons-io-2.4.jar:2.4]

at backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at backtype.storm.utils.LocalState.get(LocalState.java:56)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeat.invoke(supervisor.clj:77)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats$iter__6381__6385$fn__6386.invoke(supervisor.clj:90)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]

at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]

at clojure.lang.LazySeq.next(LazySeq.java:92) ~[clojure-1.5.1.jar:na]

at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]

at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]

at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]

at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]

at
backtype.storm.daemon.supervisor$read_worker_heartbeats.invoke(supervisor.clj:89)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$read_allocated_workers.invoke(supervisor.clj:106)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:209)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]

at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]

at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]

at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]

at backtype.storm.event$event_manager$fn__4687.invoke(event.clj:39)
~[storm-core-0.9.3-ben.jar:0.9.3-ben]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

2014-09-02 09:31:39 b.s.util [INFO] Halting process: ("Error when
processing an event")


I understood that there was a missing file, my question is "why?????". If i
watch the rights with ls -l at this path :

/home/bsoulas/storm-local/workers/fc350518-ded6-48f4-abf9-da73cbaf7c5c/

I have this :

drwxr-xr-x 2 bsoulas users 4096 Aug 27 15:39 heartbeats

So for me this is not the problem, can someone help me? I am really stuck
here :S

I sincerely hope to be clear and precise enough ...

Kind regards.






2014-08-29 16:47 GMT+02:00 Harsha <st...@harsha.io>:

>
> Hi Benjamin,
>             Storm cluster needs a zookeeper quorum to function.
> ExclamationTopology accepts command line params to deploy on a storm
> cluster. If you don't pass any arguments it will use LocalCluster(a
> simulated local cluster) to deploy.
>  I recommend you to go through
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
> <http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html>
>  for setting up zookeeper. Here is an excellent write up on storm cluster
> setup along with zookeeper
> http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/.
>  Hope that helps.
> -Harsha
>
> On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:
>
> Hello everyone, i have a problem during implementing storm on a cluster
> (Grid 5000 if anyone knows). I took the inubator-storm-master from the
> github branch with the sources, i succeeded to create my own release (no
> code modification, just for maven errors that were disturbing...)
>
> It's working fine on my own laptop in local, i modified the
> ExclamationTopology in adding 40 more bolts. I also modified this Topology
> to allow 50 workers in the configuration.
>
>  Now on a cluster, when I try to do the same thing, supervisors are down
> just 3s after their execution. Nimbus is ok, dev-zookeeeper too, storm ui
> too.
>
>  I read somewhere on the apache website you need to implement a real
> zookeeper (not the one in storm).
>
>  Please, does someone knows a good tutorial explaining how running a
> zookeeper server on a cluster for storm?
>
>  I hope I am clear ...
>
>  Kind regards.
>
>  Benjamin SOULAS
>
>
>

Re: Supervisor always down 3s after execution

Posted by Harsha <st...@harsha.io>.


Hi Benjamin,

            Storm cluster needs a zookeeper quorum to function.
ExclamationTopology accepts command line params to deploy on a
storm cluster. If you don't pass any arguments it will use
LocalCluster(a simulated local cluster) to deploy.

I recommend you to go through
[1]http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html

for setting up zookeeper. Here is an excellent write up on
storm cluster setup along with
zookeeper [2]http://www.michael-noll.com/tutorials/running-mult
i-node-storm-cluster/.

Hope that helps.

-Harsha



On Fri, Aug 29, 2014, at 05:34 AM, Benjamin SOULAS wrote:

Hello everyone, i have a problem during implementing storm on a
cluster (Grid 5000 if anyone knows). I took the
inubator-storm-master from the github branch with the sources,
i succeeded to create my own release (no code modification,
just for maven errors that were disturbing...)

It's working fine on my own laptop in local, i modified the
ExclamationTopology in adding 40 more bolts. I also modified
this Topology to allow 50 workers in the configuration.

Now on a cluster, when I try to do the same thing, supervisors
are down just 3s after their execution. Nimbus is ok,
dev-zookeeeper too, storm ui too.

I read somewhere on the apache website you need to implement a
real zookeeper (not the one in storm).

Please, does someone knows a good tutorial explaining how
running a zookeeper server on a cluster for storm?

I hope I am clear ...

Kind regards.

Benjamin SOULAS

References

Visible links
1. http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
2. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/

Hidden links:
4. http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html