You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by 姚驰 <ya...@163.com> on 2015/02/01 08:15:55 UTC

About the disallowed of a worker.

Hi everyone, yesterday I found one of my workers died under high cpu usage. After I check the log, I found that it was killed by the supervisor because its status changed to "disallowed".
Could anybody give me some information about the meaning of this status and some possible reasons case this happen?
Here is my log, I hope this will help:


worker:
2015-01-30 17:11:25 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13926ms for sessionid 0x14b16171294b383, closing socket connection and attempting reconnect
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:11:26 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.251/10.x.xx.251:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.251/10.x.xx.251:2181, initiating session
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.251/10.x.xx.251:2181, sessionid = 0x14b16171294b383, negotiated timeout = 20000
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 33078ms for sessionid 0x14b16171294b383, closing socket connection and attempting reconnect
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.250/10.x.xx.250:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.250/10.x.xx.250:2181, initiating session
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: LOST
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service, session 0x14b16171294b383 has expired, closing socket connection
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :expired::none: with disconnected Zookeeper.
2015-01-30 17:12:00 o.a.s.c.ConnectionState [WARN] Session expired event received
2015-01-30 17:12:00 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.x.xx.249:2181,10.x.xx.250:2181,10.x.xx.251:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@501fdcfb
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.249/10.x.xx.249:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.249/10.x.xx.249:2181, initiating session
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.249/10.x.xx.249:2181, sessionid = 0x14b16171294d177, negotiated timeout = 20000
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED


supervisor:
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down and clearing state for id 835881ca-2d64-45b5-b6a3-a1b3562cb164. Current supervisor time: 1422609124. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1422609124, :storm-id "topo-rtmonitor-33-1422515858", :executors #{[66 66] [162 162] [258 258] [42 42] [138 138] [234 234] [18 18] [114 114] [210 210] [306 306] [90 90] [186 186] [282 282] [-1 -1]}, :port 6709}
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:12:05 b.s.d.supervisor [INFO] Shut down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:13:24 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "topo-rtmonitor-33-1422515858", :executors ([38 38] [134 134] [230 230] [326 326] [14 14] [110 110] [206 206] [302 302] [86 86] [182 182] [278 278] [62 62] [158 158] [254 254])} for this supervisor f04d65ae-13ce-486f-8e54-a95a16fe96c3 on port 6709 with id 80d9c045-3633-4534-87ed-2702fada89f4


Thanks for any response

Re:Re: About the disallowed of a worker.

Posted by 姚驰 <ya...@163.com>.
Got it! Thanks for response.

At 2015-02-02 13:39:00, "Kosala Dissanayake" <um...@gmail.com> wrote:

'Disallowed means that Nimbus reassigned that worker somewhere else' https://groups.google.com/d/msg/storm-user/iylcrH4Vu40/iwNfRZDkKSEJ


Your worker was being starved of CPU and was not able to heartbeat with the supervisor often enough. The supervisor thought that the worker was dead and killed it. 


You have a problem with high CPU usage in a bolt. Look at the 'Capacity' column in the Storm UI for clues (any bolts which have capacity close to 1 is a red flag). 


On Sun, Feb 1, 2015 at 6:15 PM, 姚驰 <ya...@163.com> wrote:

Hi everyone, yesterday I found one of my workers died under high cpu usage. After I check the log, I found that it was killed by the supervisor because its status changed to "disallowed".
Could anybody give me some information about the meaning of this status and some possible reasons case this happen?
Here is my log, I hope this will help:


worker:
2015-01-30 17:11:25 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13926ms for sessionid 0x14b16171294b383, closing socket connection and attempting reconnect
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:11:26 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.251/10.x.xx.251:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.251/10.x.xx.251:2181, initiating session
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.251/10.x.xx.251:2181, sessionid = 0x14b16171294b383, negotiated timeout = 20000
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 33078ms for sessionid 0x14b16171294b383, closing socket connection and attempting reconnect
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.250/10.x.xx.250:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.250/10.x.xx.250:2181, initiating session
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: LOST
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service, session 0x14b16171294b383 has expired, closing socket connection
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :expired::none: with disconnected Zookeeper.
2015-01-30 17:12:00 o.a.s.c.ConnectionState [WARN] Session expired event received
2015-01-30 17:12:00 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.x.xx.249:2181,10.x.xx.250:2181,10.x.xx.251:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@501fdcfb
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.249/10.x.xx.249:2181. Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.249/10.x.xx.249:2181, initiating session
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.249/10.x.xx.249:2181, sessionid = 0x14b16171294d177, negotiated timeout = 20000
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED


supervisor:
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down and clearing state for id 835881ca-2d64-45b5-b6a3-a1b3562cb164. Current supervisor time: 1422609124. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1422609124, :storm-id "topo-rtmonitor-33-1422515858", :executors #{[66 66] [162 162] [258 258] [42 42] [138 138] [234 234] [18 18] [114 114] [210 210] [306 306] [90 90] [186 186] [282 282] [-1 -1]}, :port 6709}
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:12:05 b.s.d.supervisor [INFO] Shut down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:13:24 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "topo-rtmonitor-33-1422515858", :executors ([38 38] [134 134] [230 230] [326 326] [14 14] [110 110] [206 206] [302 302] [86 86] [182 182] [278 278] [62 62] [158 158] [254 254])} for this supervisor f04d65ae-13ce-486f-8e54-a95a16fe96c3 on port 6709 with id 80d9c045-3633-4534-87ed-2702fada89f4


Thanks for any response





Re: About the disallowed of a worker.

Posted by Kosala Dissanayake <um...@gmail.com>.
'Disallowed means that Nimbus reassigned that worker somewhere else'
https://groups.google.com/d/msg/storm-user/iylcrH4Vu40/iwNfRZDkKSEJ

Your worker was being starved of CPU and was not able to heartbeat with the
supervisor often enough. The supervisor thought that the worker was dead
and killed it.

You have a problem with high CPU usage in a bolt. Look at the 'Capacity'
column in the Storm UI for clues (any bolts which have capacity close to 1
is a red flag).

On Sun, Feb 1, 2015 at 6:15 PM, 姚驰 <ya...@163.com> wrote:

> Hi everyone, yesterday I found one of my workers died under high cpu
> usage. After I check the log, I found that it was killed by the supervisor
> because its status changed to "disallowed".
> Could anybody give me some information about the meaning of this status
> and some possible reasons case this happen?
> Here is my log, I hope this will help:
>
> *worker:*
> 2015-01-30 17:11:25 o.a.s.z.ClientCnxn [INFO] Client session timed out,
> have not heard from server in 13926ms for sessionid 0x14b16171294b383,
> closing socket connection and attempting reconnect
> 2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: SUSPENDED
> 2015-01-30 17:11:26 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.251/10.x.xx.251:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.251/10.x.xx.251:2181, initiating session
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Session establishment
> complete on server 10.x.xx.251/10.x.xx.251:2181, sessionid =
> 0x14b16171294b383, negotiated timeout = 20000
> 2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: RECONNECTED
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Client session timed out,
> have not heard from server in 33078ms for sessionid 0x14b16171294b383,
> closing socket connection and attempting reconnect
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: SUSPENDED
> 2015-01-30 17:12:00 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.250/10.x.xx.250:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.250/10.x.xx.250:2181, initiating session
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: LOST
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Unable to reconnect to
> ZooKeeper service, session 0x14b16171294b383 has expired, closing socket
> connection
> 2015-01-30 17:12:00 b.s.cluster [WARN] Received event :expired::none: with
> disconnected Zookeeper.
> 2015-01-30 17:12:00 o.a.s.c.ConnectionState [WARN] Session expired event
> received
> 2015-01-30 17:12:00 o.a.s.z.ZooKeeper [INFO] Initiating client connection,
> connectString=10.x.xx.249:2181,10.x.xx.250:2181,10.x.xx.251:2181/storm
> sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@501fdcfb
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] EventThread shut down
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.249/10.x.xx.249:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.249/10.x.xx.249:2181, initiating session
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Session establishment
> complete on server 10.x.xx.249/10.x.xx.249:2181, sessionid =
> 0x14b16171294d177, negotiated timeout = 20000
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: RECONNECTED
>
> *supervisor:*
> 2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 835881ca-2d64-45b5-b6a3-a1b3562cb164. Current supervisor time:
> 1422609124. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1422609124,
> :storm-id "topo-rtmonitor-33-1422515858", :executors #{[66 66] [162 162]
> [258 258] [42 42] [138 138] [234 234] [18 18] [114 114] [210 210] [306 306]
> [90 90] [186 186] [282 282] [-1 -1]}, :port 6709}
> 2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down
> f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
> 2015-01-30 17:12:05 b.s.d.supervisor [INFO] Shut down
> f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
> 2015-01-30 17:13:24 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "topo-rtmonitor-33-1422515858", :executors ([38 38] [134 134] [230 230]
> [326 326] [14 14] [110 110] [206 206] [302 302] [86 86] [182 182] [278 278]
> [62 62] [158 158] [254 254])} for this supervisor
> f04d65ae-13ce-486f-8e54-a95a16fe96c3 on port 6709 with id
> 80d9c045-3633-4534-87ed-2702fada89f4
>
> Thanks for any response
>
>
>