You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "vishnu rao (JIRA)" <ji...@apache.org> on 2015/10/19 13:10:05 UTC

[jira] [Commented] (STORM-1022) disconnectiong between workers

    [ https://issues.apache.org/jira/browse/STORM-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963167#comment-14963167 ] 

vishnu rao commented on STORM-1022:
-----------------------------------

I came across the same exception while using STORM  '0.10.0.2.3.0.0-2557' in hortonworks data plaform 2.3.

Before this exception hits, a runtime exception was thrown which resulted in supervisor restarting workers , there by leading to 
'SEVERE: RuntimeException while executing runnable org.apache.storm.guava.util.concurrent.Futures'. 

Here is my scenario:

(1) i have registered a metrics consumer for my topology:
conf.registerMetricsConsumer(NewRelicMetricConsumer.class,10);

(2) the topology starts off  (in the example below at 2015-10-19T06:53)
2015-10-19T06:53:17.757-0400 b.s.d.supervisor [INFO] 6afff1fd-4741-49a8-bd38-769146e28d81 still hasn't started

(3) The metrics consumer threw a Runtime exception which results in storm "Async loop died" and 'Worker died'
-----------------------------------------------------------------------------------------------
015-10-19 06:54:31 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.Number
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.Number
	at com.pocketmath.nebula.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55) ~[stormjar.jar:na]
	at backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	... 6 common frames omitted
2015-10-19 06:54:31 b.s.d.executor [ERROR] 
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.Number
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.Number
	at com.test.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55) ~[stormjar.jar:na]
	at backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	... 6 common frames omitted
-----------------------------------------------------------------------------------------------

(3) the supervisor detects death and re-launches worker

2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Worker Process 6afff1fd-4741-49a8-bd38-769146e28d81 has died!
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down and clearing state for id 6afff1fd-4741-49a8-bd38-769146e28d81. Current supervisor time: 1445252082. State: :timed-out, Heartbeat: {:time-secs 1445252079, :storm-id "ambari_test3-27-1445251979", :executors [[41 41] [33 33] [1 1] [53 53] [65 65] [9 9] [57 57] [-1 -1] [61 61] [13 13] [21 21] [5 5] [29 29] [45 45] [37 37] [25 25] [49 49] [17 17]], :port 6700}
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down 5f60d8c8-4895-4dab-ae04-709434648e26:6afff1fd-4741-49a8-bd38-769146e28d81——

2015-10-19T06:54:43.567-0400 b.s.d.supervisor [INFO] Launching worker with command: ..........................

-----------------------------------------------------------------------------------------------
(4) moments later as we wait for worker startup.. we get the exception:

2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:Oct 19, 2015 7:03:15 AM org.apache.storm.guava.util.concurrent.ExecutionList executeListener
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:SEVERE: RuntimeException while executing runnable org.apache.storm.guava.util.concurrent.Futures$4@3bdb38ba with executor org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@351cd1b1
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:java.lang.RuntimeException: Failed to connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client.connect(Client.java:300)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client.access$1100(Client.java:66)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:289)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:267)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.FutureTask.set(FutureTask.java:233)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.FutureTask.run(FutureTask.java:274)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at java.lang.Thread.run(Thread.java:745)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:Caused by: java.lang.RuntimeException: Giving up to connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701 after 32 failed attempts
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at backtype.storm.messaging.netty.Client.connect(Client.java:295)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:       ... 19 more
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process cf9fc058-0bda-40f7-8450-cb9037cb05c3:
2015-10-19T07:03:41.748-0400 b.s.d.supervisor [INFO] Worker Process 4366cfef-65fa-4385-b36a-eb4c272f4324 exited with code: 20
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Worker Process 4366cfef-65fa-4385-b36a-eb4c272f4324 has died!
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Shutting down and clearing state for id 4366cfef-65fa-4385-b36a-eb4c272f4324. Current supervisor time: 1445252622. State: :timed-out, Heartbeat: {:time-secs 1445252620, :storm-id "ambari_test3-27-1445251979", :executors [[47 47] [7 7] [51 51] [3 3] [39 39] [35 35] [43 43] [63 63] [23 23] [11 11] [31 31] [-1 -1] [55 55] [19 19] [27 27] [59 59] [15 15]], :port 6701}

------------------------

On disabling my buggy metrics consumer, the exception no longer appeared.

Hope this helps.

---
Vishnu Rao

> disconnectiong between workers
> ------------------------------
>
>                 Key: STORM-1022
>                 URL: https://issues.apache.org/jira/browse/STORM-1022
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>            Reporter: Jackson Chung
>
> We upgraded to 0.9.5 ando ran into the following exception. The supervisors did go down:
> 1 caution in our upgrade is we started a new nimbus, without any supervisors attached. Then we deployed topologies (from CICD). Next we build new supervisors and the supervisors will start on startup. However, in between the network service is restarted (due to hostname changed during the build <- chef). Just wanna throw this out in case this makes a difference.
> In other word, it could be that supervisors started, picked up work,  then network restarted. 
> {code}
> SEVERE: RuntimeException while executing runnable org.apache.storm.guava.util.concurrent.Futures$4@445058b with executor org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@691bc565
> java.lang.RuntimeException: Failed to connect to Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700
> at backtype.storm.messaging.netty.Client.connect(Client.java:308)
> at backtype.storm.messaging.netty.Client.access$1100(Client.java:78)
> at backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:297)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:283)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
> at org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
> at org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> at org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
> at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
> at java.util.concurrent.FutureTask.set(FutureTask.java:233)
> at java.util.concurrent.FutureTask.run(FutureTask.java:274)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Giving up to connect to Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700 after 102 failed attempts
> at backtype.storm.messaging.netty.Client.connect(Client.java:303)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)