You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Nick R. Katsipoulakis" <ni...@gmail.com> on 2015/06/26 17:03:06 UTC

Re: Problems with Storm EC2 setup

Hello all,

I know that this mailing list might not be the right place to ask the
following question, but, I have searched through the archives and asked for
help in the users' mailing list and could not get an answer. Please, any
ideas for the following?

Thank you very much for your time.

Regards,
Nick

2015-06-25 15:57 GMT-04:00 Nick R. Katsipoulakis <ni...@gmail.com>:

> Hello all,
>
> I apologize for the long message, but I have no idea what is going wrong
> in my setup and I tried to give a lot of info about my cluster. I have the
> following EC2 setup:
>
> 1) 3x m4.xlarge nodes for a 3-node ZooKeeper ensemble and a nimbus
>
> 2) 4x m4.xlarge nodes for my Supervisors.
>
> All of the machines are running Ubuntu Linux v14, OpenJDK v1.7 and Apache
> Storm v0.9.4. The storm.yaml I am currently having in the nimbus node
> (only) has the following values:
>
> storm.home: "/opt/apache-storm-0.9.4"
> storm.local.dir: "/mnt/storm"
> storm.zookeeper.servers:
>      - "172.31.28.73"
>      - "172.31.38.251"
>      - "172.31.38.252"
> storm.zookeeper.port: 2181
> storm.zookeeper.root: "/storm"
> storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
> storm.zookeeper.retry.times: 5
> storm.zookeeper.retry.interval: 1000
> storm.zookeeper.retry.invervalceiling.millis: 30000
> storm.cluster.mode: "distributed"
> storm.local.mode.zmq: false
> storm.thrift.transport:
> "backtype.storm.security.auth.SimpleTransportPlugin"
> storm.messaging.transport: "backtype.storm.messaging.netty.Context"
>
> nimbus.host: "127.0.0.1"
> nimbus.thrift.port: 6627
> nimbus.thrift.max_buffer_size: 1048576
> nimbus.thrift.threads: 256
> nimbus.childopts: "-Xmx256m"
> nimbus.task.timeout.secs: 30
> nimbus.supervisor.timeout.secs: 60
> nimbus.monitor.freq.secs: 10
> nimbus.cleanup.inbox.freq.secs: 600
> nimbus.inbox.jar.expiration.secs: 3600
> nimbus.task.launch.secs: 120
> nimbus.reassign: true
> nimbus.file.copy.expiration.secs: 600
> nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"
>
> ui.port: 8080
> ui.childopts: "-Xmx768m"
> logviewer.port: 8000
> logviewer.childopts: "-Xmx256m"
> logviewer.appender.name: "A1"
>
> drpc.port: 3772
> drpc.worker.threads: 64
> drpc.queue.size: 128
> drpc.invocations.port: 3773
> drpc.request.timeout.secs: 600
> drpc.childopts: "-Xmx768m"
>
> transactional.zookeeper.root: "/transactional"
> transactional.zookeeper.servers: null
> transactional.zookeeper.port: null
>
> supervisor.slots.ports:
>      - 6700
>      - 6701
>      - 6702
>      - 6703
> supervisor.childopts: "-Xmx256m"
> supervisor.worker.start.timeout.secs: 120
> supervisor.worker.timeout.secs: 30
> supervisor.monitor.frequency.secs: 3
> supervisor.heartbeat.frequency.secs: 5
> supervisor.enable: true
>
> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:NewSize=128m
> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled
> -Djava.net.preferIPv4Stack=true"
> worker.heartbeat.frequency.secs: 1
>
> task.heartbeat.frequency.secs: 3
> task.refresh.poll.secs: 10
>
> zmq.threads: 1
> zmq.linger.millis: 5000
> zmq.hwm: 0
>
> storm.messaging.netty.server_worker_threads: 4
> storm.messaging.netty.client_worker_threads: 4
> storm.messaging.netty.buffer_size: 10485760
> storm.messaging.netty.max_retries: 100
> storm.messaging.netty.max_wait_ms: 1000
> storm.messaging.netty.min_wait_ms: 100
> topology.enable.message.timeouts: true
> topology.debug: false
> topology.optimize: true
> topology.workers: 1
> topology.acker.executors: null
> topology.tasks: null
> topology.message.timeout.secs: 30
> topology.skip.missing.kryo.registrations: false
> topology.max.task.parallelism: null
> topology.max.spout.pending: null
> topology.state.synchronization.timeout.secs: 60
> topology.stats.sample.rate: 0.05
> topology.builtin.metrics.bucket.size.secs: 60
> topology.fall.back.on.java.serialization: true
> topology.worker.childopts: null
> topology.executor.receive.buffer.size: 1024
> topology.executor.send.buffer.size: 1024
> topology.receiver.buffer.size: 8
> topology.transfer.buffer.size: 1024
> topology.tick.tuple.freq.secs: null
> topology.worker.shared.thread.pool.size: 4
> topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy"
> topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy"
> topology.sleep.spout.wait.strategy.time.ms: 1
> topology.error.throttle.interval.secs: 10
> topology.max.error.report.per.interval: 5
> topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
> topology.tuple.serializer:
> "backtype.storm.serialization.types.ListDelegateSerializer"
> topology.trident.batch.emit.interval.millis: 500
>
> dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
>
> The problem is that every time I submit a topology, I got a lot of Netty
> messages in my worker logs (found in the supervisor machines) and
> many of them had similar to the following messages:
>
> 2015-06-25T19:42:32.534+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
> [5]
> 2015-06-25T19:42:32.625+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
> Starting
> 2015-06-25T19:42:32.629+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
> 172.31.38.252:2181 sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@5172aa5a
> 2015-06-25T19:42:32.649+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-06-25T19:42:32.655+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 172.31.28.73/172.31.28.73:2181, initiating session
> 2015-06-25T19:42:32.670+0000 o.a.s.z.ClientCnxn [INFO] Session
> establishment complete on server 172.31.28.73/172.31.28.73:2181,
> sessionid = 0x14e2b0caa01005f, negotiated timeout = 20000
> 2015-06-25T19:42:32.672+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2015-06-25T19:42:32.674+0000 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
> 2015-06-25T19:42:32.703+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut
> down
> 2015-06-25T19:42:32.703+0000 o.a.s.z.ZooKeeper [INFO] Session:
> 0x14e2b0caa01005f closed
> 2015-06-25T19:42:32.705+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries
> [5]
> 2015-06-25T19:42:32.706+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO]
> Starting
> 2015-06-25T19:42:32.716+0000 o.a.s.z.ZooKeeper [INFO] Initiating client
> connection, connectString=172.31.28.73:2181,172.31.38.251:2181,
> 172.31.38.252:2181/storm sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@3f308697
> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Opening socket
> connection to server 172.31.28.73/172.31.28.73:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-06-25T19:42:32.727+0000 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 172.31.28.73/172.31.28.73:2181, initiating session
> 2015-06-25T19:42:32.733+0000 o.a.s.z.ClientCnxn [INFO] Session
> establishment complete on server 172.31.28.73/172.31.28.73:2181,
> sessionid = 0x14e2b0caa010061, negotiated timeout = 20000
> 2015-06-25T19:42:32.733+0000 o.a.s.c.f.s.ConnectionStateManager [INFO]
> State change: CONNECTED
> 2015-06-25T19:42:32.774+0000 b.s.d.worker [INFO] Reading Assignments.
> 2015-06-25T19:42:32.838+0000 b.s.m.TransportFactory [INFO] Storm peer
> transport plugin:backtype.storm.messaging.netty.Context
> 2015-06-25T19:42:32.971+0000 b.s.d.worker [INFO] Launching receive-thread
> for 58e551ba-f944-4aec-9c8f-5621053021dd:6703
> 2015-06-25T19:42:32.983+0000 b.s.m.n.Server [INFO] Create Netty Server
> Netty-server-localhost-6703, buffer_size: 10485760, maxWorkers: 4
> 2015-06-25T19:42:33.011+0000 b.s.m.loader [INFO] Starting receive-thread:
> [stormId: tpch-q5-top-6-1435261345, port: 6703, thread-id: 0 ]
> 2015-06-25T19:42:33.041+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6703, bufferSize:
> 10485760
> 2015-06-25T19:42:33.041+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.041+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.042+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6703 scheduled to run in 0 ms
> 2015-06-25T19:42:33.067+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.068+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6703 scheduled to run in 103 ms
> 2015-06-25T19:42:33.071+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6703, bufferSize:
> 10485760
> 2015-06-25T19:42:33.071+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.071+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.076+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6703 scheduled to run in 0 ms
> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6702, bufferSize:
> 10485760
> 2015-06-25T19:42:33.080+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.080+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.080+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6702 scheduled to run in 0 ms
> 2015-06-25T19:42:33.081+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6702, bufferSize:
> 10485760
> 2015-06-25T19:42:33.082+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.082+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.082+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6702 scheduled to run in 0 ms
> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-19-254.us-west-2.compute.internal:6701, bufferSize:
> 10485760
> 2015-06-25T19:42:33.084+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
> 2015-06-25T19:42:33.084+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.084+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6701 scheduled to run in 0 ms
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] creating Netty Client,
> connecting to ip-172-31-13-184.us-west-2.compute.internal:6701, bufferSize:
> 10485760
> 2015-06-25T19:42:33.162+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6703 scheduled to run in 103 ms
> 2015-06-25T19:42:33.163+0000 o.a.s.c.r.ExponentialBackoffRetry [WARN]
> maxRetries too large (100). Pinning to 29
>
> and
>
> 2015-06-25T19:42:33.176+0000 b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries
> [100]
> 2015-06-25T19:42:33.176+0000 b.s.m.n.Client [INFO] connection attempt 1 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6700 scheduled to run in 0 ms
> 2015-06-25T19:42:33.178+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6700 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.189+0000 b.s.m.n.Client [ERROR] connection attempt 2
> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6703 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.190+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6700 scheduled to run in 103 ms
> 2015-06-25T19:42:33.191+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6703 scheduled to run in 105 ms
> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6700 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.195+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6700 scheduled to run in 102 ms
> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6702 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.196+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6702 scheduled to run in 102 ms
> 2015-06-25T19:42:33.197+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6700 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6700 scheduled to run in 103 ms
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6703 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [ERROR] connection attempt 1
> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6702 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6703 scheduled to run in 103 ms
> 2015-06-25T19:42:33.198+0000 b.s.m.n.Client [INFO] connection established
> to Netty-Client-ip-172-31-19-252.us-west-2.compute.internal/
> 172.31.19.252:6701
> 2015-06-25T19:42:33.206+0000 b.s.m.n.Client [INFO] connection attempt 2 to
> Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6702 scheduled to run in 102 ms
> 2015-06-25T19:42:33.205+0000 b.s.m.n.Client [INFO] connection established
> to Netty-Client-ip-172-31-19-253.us-west-2.compute.internal/
> 172.31.19.253:6701
> 2015-06-25T19:42:33.268+0000 b.s.m.n.Client [ERROR] connection attempt 2
> to Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6703 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.272+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-13-184.us-west-2.compute.internal/
> 172.31.13.184:6703 scheduled to run in 105 ms
> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [ERROR] connection attempt 2
> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6701 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.273+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6701 scheduled to run in 105 ms
> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [ERROR] connection attempt 2
> to Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6702 failed: java.lang.RuntimeException: Returned channel
> was actually not established
> 2015-06-25T19:42:33.274+0000 b.s.m.n.Client [INFO] connection attempt 3 to
> Netty-Client-ip-172-31-19-254.us-west-2.compute.internal/
> 172.31.19.254:6702 scheduled to run in 106 ms
> 2015-06-25T19:42:33.275+0000 b.s
>
> Why am I getting the above. Initially, I thought that the input rate of
> tuples in my topology is too high, and Netty's buffers are filled up too
> fast. However, I submitted a debug topology
> that sent one tuple every 1 second and I still got the above messages.
>
> Am I doing something wrong in my configuration? Why do I have the previous
> Netty messages, which obviously show that something is going wrong? Please,
> any hint on my setup will be really helpful.
>
> Regards,
> Nick
>



-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate