You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Sahan Maldeniya <sa...@haulmatic.com> on 2017/08/07 04:33:10 UTC

Storm workers get killed in the middle and supervisor restarts

Hi,

We are using apache storm to analyze a GPS data stream by subscribing to
rabbit mq message channel. We use apache storm 1.11.

We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
EC2 instances. Also we have a local storm supervisor which is pointing to
the remote nimbus in ec2 via the same zookeeper.


When we run the topology as a local cluster or submit to the storm when
only local supervisor is running ( we stop the remote supervisor instance),
everything works as expected.

Problem arise when we submit the topology to the remote supervisor
(production like env). This keeps restarting the supervisor with below logs.


*worker.log*
2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
worker GpsDataAnalyticsTopology-6-1501922728
a383ada8-62a4-418c-9c7c-4e5a5e19f051
6700
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
messaging context
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
executors
    2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-executor[2
2] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
2]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-executor[3
3] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
3]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor __acker:[1 1]
    2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
[INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
executor __acker:[1 1]
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
6]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.util
Thread-10-rabbit-mq-gps-reader-spout-executor[6
6] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor __system:[-1 -1]
    2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[-1
-1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
[INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
executor __system:[-1 -1]
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
5]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-executor[5
5] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting down
executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-executor[4
4] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
4]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
executors
    2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
transfer thread
    2017-08-05 08:50:27.353 o.a.s.util
Thread-17-disruptor-worker-transfer-queue
[INFO] Async loop interrupted!
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
transfer thread
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
backpressure thread
    2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
worker shutdown hooks
    2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
from storm cluster state context
    2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl
Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
    2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
[INFO] EventThread shut down
    2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
Session: 0x15d7d7e305c03a6 closed
    2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
worker GpsDataAnalyticsTopology-6-1501922728
a383ada8-62a4-418c-9c7c-4e5a5e19f051
6700


*supervisor.log*

2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
msInState: 0 topo:GpsDataAnalyticsTopology-6-1501922728
worker:5ecd5438-1e1f-465b-bb82-765881af690c
    2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
Changing current assignment from null to LocalAssignment(topology_id:Gp
sDataAnalyticsTopology-6-1501922728, executors:[ExecutorInfo(task_start:4,
task_end:4), ExecutorInfo(task_start:3, task_end:3),
ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
task_end:6), ExecutorInfo(task_start:1, task_end:1),
ExecutorInfo(task_start:5, task_end:5)],
resources:WorkerResources(mem_on_heap:0.0,
mem_off_heap:0.0, cpu:0.0))
    2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
WAITING_FOR_WORKER_START msInState: 4003
topo:GpsDataAnalyticsTopology-6-1501922728
worker:5ecd5438-1e1f-465b-bb82-765881af690c -> RUNNING msInState: 0
topo:GpsDataAnalyticsTopology-6-1501922728 worker:5ecd5438-1e1f-465b-bb82
-765881af690c
    2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
Error when processing event
    java.lang.NullPointerException: null
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
~[?:1.8.0_144]
        at org.apache.storm.localizer.Localizer.updateBlobs(Localizer.java:332)
~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.updateB
lobsForTopology(UpdateBlobs.java:99) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:72)
~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54)
~[storm-core-1.1.1.jar:1.1.1]
    2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting process:
Error when processing an event
    java.lang.RuntimeException: Halting process: Error when processing an
event
        at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773)
~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:63)
~[storm-core-1.1.1.jar:1.1.1]



Thank You

-- 
Thank You,
Best Regards.

Haul*Matic*
Technologies
*Sahan Maldeniya*
Software Craftsman
sahan@haulmatic.com | +94776306579 | +94114693330
HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
120, Highlevel Road, Colombo 05, Sri Lanka
<https://www.linkedin.com/company/haulmatic-technologies>
<https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>

Re: Storm workers get killed in the middle and supervisor restarts

Posted by Sahan Maldeniya <sa...@haulmatic.com>.
Hi bobby and Martin,

Thanks for reach quickly. I am new to storm and do not know how to
configure supervisor users. Since martin has filed a jira i will follow if
for updates.

Thanks again

On 7 August 2017 at 20:50, Martin Burian <ma...@gmail.com> wrote:

> Created as https://issues.apache.org/jira/browse/STORM-2682
> M.
>
> po 7. 8. 2017 v 16:56 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:
>
>> OK Please file JIRA (https://issues.apache.org/jira) under the STORM
>> project and I will see what I can do to reproduce the case/fix it.  I
>> assume that it is not going to be possible to get a snapshot of your
>> zookeeper while this is happening?
>>
>>
>>
>> - Bobby
>>
>>
>>
>> On Monday, August 7, 2017, 9:53:34 AM CDT, Martin Burian <
>> martin.burianjr@gmail.com> wrote:
>>
>>
>> The supervisor restarts, but crashes again after some 30 seconds. I tried
>> restarting the supervisor and clearing its data, it did not help.
>> Martin
>>
>> po 7. 8. 2017 v 16:34 odesílatel Bobby Evans <ev...@yahoo-inc.com>
>> napsal:
>>
>> No you should not need to configure the users ahead of time.  In storm if
>> security is turned off the "user" or "owner" of the topology is stored
>> internally as the unix user running nimbus.  The fact that it is null
>> indicates that there is some kind of a bug.  I am just trying to gauge how
>> serious of a bug it is.  Does the supervisor recover after a little while
>> or is it stuck in a bad state?
>>
>>
>>
>> - Bobby
>>
>>
>>
>> On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <
>> martin.burianjr@gmail.com> wrote:
>>
>>
>> I am seeing exactly the same exception being thrown by supervisors in my
>> topology after an update from storm 1.0.3 to 1.0.4.
>> A quick search took me to http://storm.apache.org/
>> releases/1.0.4/SECURITY.html where there are multiple references to some
>> "users". Do I need to configure one of those? I Didn't have to do anything
>> with users before.
>>
>> Martin
>>
>> po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com>
>> napsal:
>>
>> from the code
>>
>> https://github.com/apache/storm/blob/v1.1.1/storm-core/
>> src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
>>
>> it looks like your user is null when trying to update the resources for
>> the topology?  Did the supervisor and your topology recover when the
>> supervisor was relaunched?  If not please file a bug JIRA and we can look
>> into it.
>>
>>
>> - Bobby
>>
>>
>>
>> On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <
>> sahan@haulmatic.com> wrote:
>>
>>
>> Hi,
>>
>> We are using apache storm to analyze a GPS data stream by subscribing to
>> rabbit mq message channel. We use apache storm 1.11.
>>
>> We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
>> EC2 instances. Also we have a local storm supervisor which is pointing to
>> the remote nimbus in ec2 via the same zookeeper.
>>
>>
>> When we run the topology as a local cluster or submit to the storm when
>> only local supervisor is running ( we stop the remote supervisor instance),
>> everything works as expected.
>>
>> Problem arise when we submit the topology to the remote supervisor
>> (production like env). This keeps restarting the supervisor with below logs.
>>
>>
>> *worker.log*
>> 2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
>> 19f051 6700
>>
>>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
>> messaging context
>>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> executors
>>     2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor fuel-data-analyzer:[2 2]
>>     2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex
>> ecutor[2 2] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
>> 2]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor fuel-data-analyzer:[2 2]
>>     2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor fuel-data-save-to-db:[3 3]
>>     2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-
>> executor[3 3] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
>> 3]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor fuel-data-save-to-db:[3 3]
>>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor __acker:[1 1]
>>     2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
>> [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
>> 1]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor __acker:[1 1]
>>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor rabbit-mq-gps-reader-spout:[6 6]
>>     2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
>> 6]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader
>> -spout-executor[6 6] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor rabbit-mq-gps-reader-spout:[6 6]
>>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor __system:[-1 -1]
>>     2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1
>> -1]-send-queue [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
>> [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor __system:[-1 -1]
>>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor gps-data-logger:[5 5]
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
>> 5]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec
>> utor[5 5] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor gps-data-logger:[5 5]
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor gps-data-devider:[4 4]
>>     2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe
>> cutor[4 4] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
>> 4]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor gps-data-devider:[4 4]
>>     2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
>> executors
>>     2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> transfer thread
>>     2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra
>> nsfer-queue [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
>> transfer thread
>>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
>> backpressure thread
>>     2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> default resources
>>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
>> default resources
>>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
>> worker shutdown hooks
>>     2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
>> from storm cluster state context
>>     2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl
>> Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
>>
>>     2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
>> [INFO] EventThread shut down
>>     2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
>> Session: 0x15d7d7e305c03a6 closed
>>     2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
>> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
>> 19f051 6700
>>
>>
>>
>> *supervisor.log*
>>
>> 2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
>> WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
>> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
>> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>>     2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
>> Changing current assignment from null to LocalAssignment(topology_id:Gp
>> sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s
>> tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3),
>> ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
>> task_end:6), ExecutorInfo(task_start:1, task_end:1),
>> ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_
>> on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
>>     2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
>> WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology-
>> 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING
>> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
>> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>>
>>     2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
>> Error when processing event
>>     java.lang.NullPointerException : null
>>         at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap
>> .java:936) ~[?:1.8.0_144]
>>         at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer.
>> java:332) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB
>> lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run(
>> UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
>> .java:54) ~[storm-core-1.1.1.jar:1.1.1]
>>
>>     2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting
>> process: Error when processing an event
>>     java.lang.RuntimeException: Halting process: Error when processing an
>> event
>>         at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773)
>> ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
>> .java:63) ~[storm-core-1.1.1.jar:1.1.1]
>>
>>
>>
>>
>> Thank You
>>
>> --
>> Thank You,
>> Best Regards.
>>
>> Haul*Matic*
>> Technologies
>> *Sahan Maldeniya*
>> Software Craftsman
>> sahan@haulmatic.com | +94776306579 | +94114693330
>> HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
>> 120, Highlevel Road, Colombo 05, Sri Lanka
>> <https://www.linkedin.com/company/haulmatic-technologies>
>> <https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>
>>
>>


-- 
Thank You,
Best Regards.

Haul*Matic*
Technologies
*Sahan Maldeniya*
Software Craftsman
sahan@haulmatic.com | +94776306579 | +94114693330
HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
120, Highlevel Road, Colombo 05, Sri Lanka
<https://www.linkedin.com/company/haulmatic-technologies>
<https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>

Re: Storm workers get killed in the middle and supervisor restarts

Posted by Martin Burian <ma...@gmail.com>.
Created as https://issues.apache.org/jira/browse/STORM-2682
M.

po 7. 8. 2017 v 16:56 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

> OK Please file JIRA (https://issues.apache.org/jira) under the STORM
> project and I will see what I can do to reproduce the case/fix it.  I
> assume that it is not going to be possible to get a snapshot of your
> zookeeper while this is happening?
>
>
>
> - Bobby
>
>
>
> On Monday, August 7, 2017, 9:53:34 AM CDT, Martin Burian <
> martin.burianjr@gmail.com> wrote:
>
>
> The supervisor restarts, but crashes again after some 30 seconds. I tried
> restarting the supervisor and clearing its data, it did not help.
> Martin
>
> po 7. 8. 2017 v 16:34 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:
>
> No you should not need to configure the users ahead of time.  In storm if
> security is turned off the "user" or "owner" of the topology is stored
> internally as the unix user running nimbus.  The fact that it is null
> indicates that there is some kind of a bug.  I am just trying to gauge how
> serious of a bug it is.  Does the supervisor recover after a little while
> or is it stuck in a bad state?
>
>
>
> - Bobby
>
>
>
> On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <
> martin.burianjr@gmail.com> wrote:
>
>
> I am seeing exactly the same exception being thrown by supervisors in my
> topology after an update from storm 1.0.3 to 1.0.4.
> A quick search took me to
> http://storm.apache.org/releases/1.0.4/SECURITY.html where there are
> multiple references to some "users". Do I need to configure one of those? I
> Didn't have to do anything with users before.
>
> Martin
>
> po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:
>
> from the code
>
>
> https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
>
> it looks like your user is null when trying to update the resources for
> the topology?  Did the supervisor and your topology recover when the
> supervisor was relaunched?  If not please file a bug JIRA and we can look
> into it.
>
>
> - Bobby
>
>
>
> On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <
> sahan@haulmatic.com> wrote:
>
>
> Hi,
>
> We are using apache storm to analyze a GPS data stream by subscribing to
> rabbit mq message channel. We use apache storm 1.11.
>
> We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
> EC2 instances. Also we have a local storm supervisor which is pointing to
> the remote nimbus in ec2 via the same zookeeper.
>
>
> When we run the topology as a local cluster or submit to the storm when
> only local supervisor is running ( we stop the remote supervisor instance),
> everything works as expected.
>
> Problem arise when we submit the topology to the remote supervisor
> (production like env). This keeps restarting the supervisor with below logs.
>
>
> *worker.log*
> 2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
> messaging context
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> executors
>     2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex
> ecutor[2 2] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
> 2]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-
> executor[3 3] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
> 3]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __acker:[1 1]
>     2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
> 1]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __acker:[1 1]
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
> 6]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader
> -spout-executor[6 6] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __system:[-1 -1]
>     2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1
> -1]-send-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __system:[-1 -1]
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
> 5]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec
> utor[5 5] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe
> cutor[4 4] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
> 4]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
> executors
>     2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
> transfer thread
>     2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra
> nsfer-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> transfer thread
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> backpressure thread
>     2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
> worker shutdown hooks
>     2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
> from storm cluster state context
>     2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl
> Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
>
>     2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
> [INFO] EventThread shut down
>     2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
> Session: 0x15d7d7e305c03a6 closed
>     2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>
>
> *supervisor.log*
>
> 2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>     2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
> Changing current assignment from null to LocalAssignment(topology_id:Gp
> sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s
> tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3),
> ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
> task_end:6), ExecutorInfo(task_start:1, task_end:1),
> ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_
> on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
>     2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology-
> 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>
>     2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
> Error when processing event
>     java.lang.NullPointerException : null
>         at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap
> .java:936) ~[?:1.8.0_144]
>         at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer.
> java:332) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB
> lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run(
> UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:54) ~[storm-core-1.1.1.jar:1.1.1]
>
>     2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting
> process: Error when processing an event
>     java.lang.RuntimeException: Halting process: Error when processing an
> event
>         at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773)
> ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:63) ~[storm-core-1.1.1.jar:1.1.1]
>
>
>
>
> Thank You
>
> --
> Thank You,
> Best Regards.
>
> Haul*Matic*
> Technologies
> *Sahan Maldeniya*
> Software Craftsman
> sahan@haulmatic.com | +94776306579 | +94114693330
> HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
> 120, Highlevel Road, Colombo 05, Sri Lanka
> <https://www.linkedin.com/company/haulmatic-technologies>
> <https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>
>
>

Re: Storm workers get killed in the middle and supervisor restarts

Posted by Bobby Evans <ev...@yahoo-inc.com>.
OK Please file JIRA (https://issues.apache.org/jira) under the STORM project and I will see what I can do to reproduce the case/fix it.  I assume that it is not going to be possible to get a snapshot of your zookeeper while this is happening?


- Bobby


On Monday, August 7, 2017, 9:53:34 AM CDT, Martin Burian <ma...@gmail.com> wrote:

The supervisor restarts, but crashes again after some 30 seconds. I tried restarting the supervisor and clearing its data, it did not help.Martin
po 7. 8. 2017 v 16:34 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

No you should not need to configure the users ahead of time.  In storm if security is turned off the "user" or "owner" of the topology is stored internally as the unix user running nimbus.  The fact that it is null indicates that there is some kind of a bug.  I am just trying to gauge how serious of a bug it is.  Does the supervisor recover after a little while or is it stuck in a bad state?


- Bobby


On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <ma...@gmail.com> wrote:

I am seeing exactly the same exception being thrown by supervisors in my topology after an update from storm 1.0.3 to 1.0.4.A quick search took me to http://storm.apache.org/releases/1.0.4/SECURITY.html where there are multiple references to some "users". Do I need to configure one of those? I Didn't have to do anything with users before.
Martin

po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

from the code
https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
it looks like your user is null when trying to update the resources for the topology?  Did the supervisor and your topology recover when the supervisor was relaunched?  If not please file a bug JIRA and we can look into it.


- Bobby


On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <sa...@haulmatic.com> wrote:


Hi,

We are using apache storm to analyze a GPS data stream by subscribing to rabbit mq message channel. We use apache storm 1.11.

We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon EC2 instances. Also we have a local storm supervisor which is pointing to the remote nimbus in ec2 via the same zookeeper. 





When we run the topology as a local cluster or submit to the storm when only local supervisor is running ( we stop the remote supervisor instance), everything works as expected.

Problem arise when we submit the topology to the remote supervisor (production like env). This keeps restarting the supervisor with below logs.




worker.log
2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating messaging context
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down executors
    2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex ecutor[2 2] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2 2]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db- executor[3 3] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3 3]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __acker:[1 1]
    2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1 1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down executor __acker:[1 1]
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6 6]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader -spout-executor[6 6] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __system:[-1 -1]
    2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1 -1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down executor __system:[-1 -1]
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5 5]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec utor[5 5] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe cutor[4 4] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4 4]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down executors
    2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down transfer thread
    2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra nsfer-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down transfer thread
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down backpressure thread
    2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any worker shutdown hooks
    2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting from storm cluster state context
    2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
    2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread [INFO] EventThread shut down
    2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO] Session: 0x15d7d7e305c03a6 closed
    2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700


supervisor.log

2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700: Changing current assignment from null to LocalAssignment(topology_id:Gp sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_ on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
    2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {} Error when processing event
    java.lang.NullPointerException : null
        at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap .java:936) ~[?:1.8.0_144]
        at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer. java:332) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run( UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:54) ~[storm-core-1.1.1.jar:1.1.1]
    2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting process: Error when processing an event
    java.lang.RuntimeException: Halting process: Error when processing an event
        at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:63) ~[storm-core-1.1.1.jar:1.1.1]



Thank You

-- 
Thank You,Best Regards.

| HaulMatic
 Technologies |  Sahan Maldeniya Software Craftsman sahan@haulmatic.com | +94776306579 | +94114693330 HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com 120, Highlevel Road, Colombo 05, Sri Lanka      |




Re: Storm workers get killed in the middle and supervisor restarts

Posted by Martin Burian <ma...@gmail.com>.
The supervisor restarts, but crashes again after some 30 seconds. I tried
restarting the supervisor and clearing its data, it did not help.
Martin

po 7. 8. 2017 v 16:34 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

> No you should not need to configure the users ahead of time.  In storm if
> security is turned off the "user" or "owner" of the topology is stored
> internally as the unix user running nimbus.  The fact that it is null
> indicates that there is some kind of a bug.  I am just trying to gauge how
> serious of a bug it is.  Does the supervisor recover after a little while
> or is it stuck in a bad state?
>
>
>
> - Bobby
>
>
>
> On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <
> martin.burianjr@gmail.com> wrote:
>
>
> I am seeing exactly the same exception being thrown by supervisors in my
> topology after an update from storm 1.0.3 to 1.0.4.
> A quick search took me to
> http://storm.apache.org/releases/1.0.4/SECURITY.html where there are
> multiple references to some "users". Do I need to configure one of those? I
> Didn't have to do anything with users before.
>
> Martin
>
> po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:
>
> from the code
>
>
> https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
>
> it looks like your user is null when trying to update the resources for
> the topology?  Did the supervisor and your topology recover when the
> supervisor was relaunched?  If not please file a bug JIRA and we can look
> into it.
>
>
> - Bobby
>
>
>
> On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <
> sahan@haulmatic.com> wrote:
>
>
> Hi,
>
> We are using apache storm to analyze a GPS data stream by subscribing to
> rabbit mq message channel. We use apache storm 1.11.
>
> We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
> EC2 instances. Also we have a local storm supervisor which is pointing to
> the remote nimbus in ec2 via the same zookeeper.
>
>
> When we run the topology as a local cluster or submit to the storm when
> only local supervisor is running ( we stop the remote supervisor instance),
> everything works as expected.
>
> Problem arise when we submit the topology to the remote supervisor
> (production like env). This keeps restarting the supervisor with below logs.
>
>
> *worker.log*
> 2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
> messaging context
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> executors
>     2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex
> ecutor[2 2] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
> 2]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-
> executor[3 3] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
> 3]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __acker:[1 1]
>     2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
> 1]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __acker:[1 1]
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
> 6]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader
> -spout-executor[6 6] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __system:[-1 -1]
>     2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1
> -1]-send-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __system:[-1 -1]
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
> 5]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec
> utor[5 5] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe
> cutor[4 4] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
> 4]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
> executors
>     2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
> transfer thread
>     2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra
> nsfer-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> transfer thread
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> backpressure thread
>     2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
> worker shutdown hooks
>     2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
> from storm cluster state context
>     2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl
> Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
>
>     2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
> [INFO] EventThread shut down
>     2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
> Session: 0x15d7d7e305c03a6 closed
>     2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>
>
> *supervisor.log*
>
> 2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>     2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
> Changing current assignment from null to LocalAssignment(topology_id:Gp
> sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s
> tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3),
> ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
> task_end:6), ExecutorInfo(task_start:1, task_end:1),
> ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_
> on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
>     2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology-
> 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>
>     2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
> Error when processing event
>     java.lang.NullPointerException : null
>         at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap
> .java:936) ~[?:1.8.0_144]
>         at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer.
> java:332) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB
> lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run(
> UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:54) ~[storm-core-1.1.1.jar:1.1.1]
>
>     2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting
> process: Error when processing an event
>     java.lang.RuntimeException: Halting process: Error when processing an
> event
>         at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773)
> ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:63) ~[storm-core-1.1.1.jar:1.1.1]
>
>
>
>
> Thank You
>
> --
> Thank You,
> Best Regards.
>
> Haul*Matic*
> Technologies
> *Sahan Maldeniya*
> Software Craftsman
> sahan@haulmatic.com | +94776306579 | +94114693330
> HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
> 120, Highlevel Road, Colombo 05, Sri Lanka
> <https://www.linkedin.com/company/haulmatic-technologies>
> <https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>
>
>

Re: Storm workers get killed in the middle and supervisor restarts

Posted by Bobby Evans <ev...@yahoo-inc.com>.
No you should not need to configure the users ahead of time.  In storm if security is turned off the "user" or "owner" of the topology is stored internally as the unix user running nimbus.  The fact that it is null indicates that there is some kind of a bug.  I am just trying to gauge how serious of a bug it is.  Does the supervisor recover after a little while or is it stuck in a bad state?


- Bobby


On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <ma...@gmail.com> wrote:

I am seeing exactly the same exception being thrown by supervisors in my topology after an update from storm 1.0.3 to 1.0.4.A quick search took me to http://storm.apache.org/releases/1.0.4/SECURITY.html where there are multiple references to some "users". Do I need to configure one of those? I Didn't have to do anything with users before.
Martin

po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

from the code
https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
it looks like your user is null when trying to update the resources for the topology?  Did the supervisor and your topology recover when the supervisor was relaunched?  If not please file a bug JIRA and we can look into it.


- Bobby


On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <sa...@haulmatic.com> wrote:


Hi,

We are using apache storm to analyze a GPS data stream by subscribing to rabbit mq message channel. We use apache storm 1.11.

We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon EC2 instances. Also we have a local storm supervisor which is pointing to the remote nimbus in ec2 via the same zookeeper. 





When we run the topology as a local cluster or submit to the storm when only local supervisor is running ( we stop the remote supervisor instance), everything works as expected.

Problem arise when we submit the topology to the remote supervisor (production like env). This keeps restarting the supervisor with below logs.




worker.log
2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating messaging context
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down executors
    2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex ecutor[2 2] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2 2]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db- executor[3 3] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3 3]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __acker:[1 1]
    2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1 1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down executor __acker:[1 1]
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6 6]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader -spout-executor[6 6] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __system:[-1 -1]
    2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1 -1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down executor __system:[-1 -1]
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5 5]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec utor[5 5] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe cutor[4 4] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4 4]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down executors
    2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down transfer thread
    2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra nsfer-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down transfer thread
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down backpressure thread
    2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any worker shutdown hooks
    2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting from storm cluster state context
    2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
    2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread [INFO] EventThread shut down
    2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO] Session: 0x15d7d7e305c03a6 closed
    2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700


supervisor.log

2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700: Changing current assignment from null to LocalAssignment(topology_id:Gp sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_ on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
    2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {} Error when processing event
    java.lang.NullPointerException : null
        at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap .java:936) ~[?:1.8.0_144]
        at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer. java:332) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run( UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:54) ~[storm-core-1.1.1.jar:1.1.1]
    2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting process: Error when processing an event
    java.lang.RuntimeException: Halting process: Error when processing an event
        at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:63) ~[storm-core-1.1.1.jar:1.1.1]



Thank You

-- 
Thank You,Best Regards.

| HaulMatic
 Technologies |  Sahan Maldeniya Software Craftsman sahan@haulmatic.com | +94776306579 | +94114693330 HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com 120, Highlevel Road, Colombo 05, Sri Lanka      |



Re: Storm workers get killed in the middle and supervisor restarts

Posted by Martin Burian <ma...@gmail.com>.
I am seeing exactly the same exception being thrown by supervisors in my
topology after an update from storm 1.0.3 to 1.0.4.
A quick search took me to
http://storm.apache.org/releases/1.0.4/SECURITY.html where there are
multiple references to some "users". Do I need to configure one of those? I
Didn't have to do anything with users before.

Martin

po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <ev...@yahoo-inc.com> napsal:

> from the code
>
>
> https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
>
> it looks like your user is null when trying to update the resources for
> the topology?  Did the supervisor and your topology recover when the
> supervisor was relaunched?  If not please file a bug JIRA and we can look
> into it.
>
>
> - Bobby
>
>
>
> On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <
> sahan@haulmatic.com> wrote:
>
>
> Hi,
>
> We are using apache storm to analyze a GPS data stream by subscribing to
> rabbit mq message channel. We use apache storm 1.11.
>
> We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
> EC2 instances. Also we have a local storm supervisor which is pointing to
> the remote nimbus in ec2 via the same zookeeper.
>
>
> When we run the topology as a local cluster or submit to the storm when
> only local supervisor is running ( we stop the remote supervisor instance),
> everything works as expected.
>
> Problem arise when we submit the topology to the remote supervisor
> (production like env). This keeps restarting the supervisor with below logs.
>
>
> *worker.log*
> 2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
> messaging context
>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
> executors
>     2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex
> ecutor[2 2] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
> 2]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-analyzer:[2 2]
>     2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-
> executor[3 3] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
> 3]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor fuel-data-save-to-db:[3 3]
>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __acker:[1 1]
>     2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
> 1]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __acker:[1 1]
>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
> 6]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader
> -spout-executor[6 6] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor rabbit-mq-gps-reader-spout:[6 6]
>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor __system:[-1 -1]
>     2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1
> -1]-send-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
> [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor __system:[-1 -1]
>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
> 5]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec
> utor[5 5] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-logger:[5 5]
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting
> down executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe
> cutor[4 4] [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
> 4]-send-queue [INFO] Async loop interrupted!
>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
> executor gps-data-devider:[4 4]
>     2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
> executors
>     2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
> transfer thread
>     2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra
> nsfer-queue [INFO] Async loop interrupted!
>
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> transfer thread
>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
> backpressure thread
>     2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
> default resources
>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
> worker shutdown hooks
>     2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
> from storm cluster state context
>     2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl
> Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
>
>     2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
> [INFO] EventThread shut down
>     2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
> Session: 0x15d7d7e305c03a6 closed
>     2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
> 19f051 6700
>
>
>
> *supervisor.log*
>
> 2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>     2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
> Changing current assignment from null to LocalAssignment(topology_id:Gp
> sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s
> tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3),
> ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
> task_end:6), ExecutorInfo(task_start:1, task_end:1),
> ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_
> on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
>     2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
> WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology-
> 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING
> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>
>     2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
> Error when processing event
>     java.lang.NullPointerException : null
>         at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap
> .java:936) ~[?:1.8.0_144]
>         at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer.
> java:332) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB
> lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run(
> UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:54) ~[storm-core-1.1.1.jar:1.1.1]
>
>     2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting
> process: Error when processing an event
>     java.lang.RuntimeException: Halting process: Error when processing an
> event
>         at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773)
> ~[storm-core-1.1.1.jar:1.1.1]
>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
> .java:63) ~[storm-core-1.1.1.jar:1.1.1]
>
>
>
>
> Thank You
>
> --
> Thank You,
> Best Regards.
>
> Haul*Matic*
> Technologies
> *Sahan Maldeniya*
> Software Craftsman
> sahan@haulmatic.com | +94776306579 | +94114693330
> HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
> 120, Highlevel Road, Colombo 05, Sri Lanka
> <https://www.linkedin.com/company/haulmatic-technologies>
> <https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>
>

Re: Storm workers get killed in the middle and supervisor restarts

Posted by Bobby Evans <ev...@yahoo-inc.com>.
from the code
https://github.com/apache/storm/blob/v1.1.1/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
it looks like your user is null when trying to update the resources for the topology?  Did the supervisor and your topology recover when the supervisor was relaunched?  If not please file a bug JIRA and we can look into it.


- Bobby


On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <sa...@haulmatic.com> wrote:


Hi,

We are using apache storm to analyze a GPS data stream by subscribing to rabbit mq message channel. We use apache storm 1.11.

We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon EC2 instances. Also we have a local storm supervisor which is pointing to the remote nimbus in ec2 via the same zookeeper. 





When we run the topology as a local cluster or submit to the storm when only local supervisor is running ( we stop the remote supervisor instance), everything works as expected.

Problem arise when we submit the topology to the remote supervisor (production like env). This keeps restarting the supervisor with below logs.




worker.log
2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating messaging context
    2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down executors
    2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex ecutor[2 2] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2 2]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-analyzer:[2 2]
    2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db- executor[3 3] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3 3]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down executor fuel-data-save-to-db:[3 3]
    2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __acker:[1 1]
    2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1 1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down executor __acker:[1 1]
    2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6 6]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader -spout-executor[6 6] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down executor rabbit-mq-gps-reader-spout:[6 6]
    2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting down executor __system:[-1 -1]
    2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1 -1]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down executor __system:[-1 -1]
    2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5 5]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec utor[5 5] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-logger:[5 5]
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe cutor[4 4] [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4 4]-send-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down executor gps-data-devider:[4 4]
    2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down executors
    2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down transfer thread
    2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra nsfer-queue [INFO] Async loop interrupted!
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down transfer thread
    2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down backpressure thread
    2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down default resources
    2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any worker shutdown hooks
    2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting from storm cluster state context
    2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
    2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread [INFO] EventThread shut down
    2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO] Session: 0x15d7d7e305c03a6 closed
    2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e 19f051 6700


supervisor.log

2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700: Changing current assignment from null to LocalAssignment(topology_id:Gp sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_ on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
    2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c
    2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {} Error when processing event
    java.lang.NullPointerException : null
        at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap .java:936) ~[?:1.8.0_144]
        at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer. java:332) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run( UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:54) ~[storm-core-1.1.1.jar:1.1.1]
    2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting process: Error when processing an event
    java.lang.RuntimeException: Halting process: Error when processing an event
        at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773) ~[storm-core-1.1.1.jar:1.1.1]
        at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp .java:63) ~[storm-core-1.1.1.jar:1.1.1]



Thank You

-- 
Thank You,Best Regards.

| HaulMatic
 Technologies |  Sahan Maldeniya Software Craftsman sahan@haulmatic.com | +94776306579 | +94114693330 HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com 120, Highlevel Road, Colombo 05, Sri Lanka      |