You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Tousif <to...@gmail.com> on 2015/02/17 15:03:58 UTC

worker disconnecting with ZK for long running bolt

Hello,

I have a bolt which uses a pool of large objects. When pool
reinitialises(once in 4 hours) bolt waits for few seconds and
disconnects with zookeper.

I have specified following properties in yaml but still worker dies.

supervisor.worker.start.timeout.secs 300
supervisor.worker.timeout.secs 60


Here are the logs:

2015-02-17 04:35:01 o.a.z.ClientCnxn [INFO] Client session timed out, have
not heard from server in 15906ms for sessionid 0x14b9200ea400009, closing
socket connection and attempting reconnect
2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [INFO] State change:
SUSPENDED
2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2015-02-17 04:35:01 b.s.cluster [WARN] Received event :disconnected::none:
with disconnected Zookeeper.
2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Opening socket connection to
server realtimeanalytics.novalocal/10.0.0.11:2181. Will not attempt to
authenticate using SASL (Unable to locate a login configuration)
2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Socket connection established
to realtimeanalytics.novalocal/10.0.0.11:2181, initiating session
2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Session establishment complete
on server realtimeanalytics.novalocal/10.0.0.11:2181, sessionid =
0x14b9200ea400009, negotiated timeout = 20000
2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [INFO] State change:
RECONNECTED
2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2015-02-17 04:35:33 o.a.z.ClientCnxn [INFO] Client session timed out, have
not heard from server in 13499ms for sessionid 0x14b9200ea400009, closing
socket connection and attempting reconnect
2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
SUSPENDED
2015-02-17 04:35:34 b.s.cluster [WARN] Received event :disconnected::none:
with disconnected Zookeeper.
2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.



-- 


Regards
Tousif Khazi

Re: worker disconnecting with ZK for long running bolt

Posted by Tousif <to...@gmail.com>.
Hello,

After increasing the config parameters i dont see zookeeper suspended
message but  worker is restarted  on other machine.  Does it have anything
to do with Netty connection setting?

 storm.zookeeper.session.timeout: 40000
 storm.zookeeper.connection.timeout: 30000


 storm.messaging.transport: "backtype.storm.messaging.netty.Context"
 storm.messaging.netty.buffer_size: 209715200
 storm.messaging.netty.max_retries: 10
 storm.messaging.netty.max_wait_ms: 5000
 storm.messaging.netty.min_wait_ms: 10000

2015-02-18 15:13:47 b.s.m.n.Client [INFO] New Netty Client, connect to
realtimeslave1.novalocal, 6702, config: , buffer_size: 209715200
2015-02-18 15:13:47 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-realtimeslave1.novalocal/10.0.0.14:6702... [0]
2015-02-18 15:13:47 b.s.m.n.Client [INFO] Closing Netty Client
Netty-Client-realtimeslave1.novalocal/10.0.0.14:6703
2015-02-18 15:13:47 b.s.m.n.Client [INFO] Waiting for pending batchs to be
sent with Netty-Client-realtimeslave1.novalocal/10.0.0.14:6703..., timeout:
600000ms, pendings: 0
2015-02-18 15:13:52 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-realtimeslave1.novalocal/10.0.0.14:6702... [1]
2015-02-18 15:13:57 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-realtimeslave1.novalocal/10.0.0.14:6702... [2]
2015-02-18 15:14:02 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-realtimeslave1.novalocal/10.0.0.14:6702... [3]
2015-02-18 15:14:03 b.s.m.n.Client [INFO] connection established to a
remote host Netty-Client-realtimeslave1.novalocal/10.0.0.14:6702, [id:
0x320fd4e4, /10.0.0.11:48658 => realtimeslave1.novalocal/10.0.0.14:6702]





On Tue, Feb 17, 2015 at 10:31 PM, Tousif <to...@gmail.com> wrote:

> Thanks,
> I will try out these config properties.
> On Feb 17, 2015 7:58 PM, "Harsha" <st...@harsha.io> wrote:
>
>>
>> You might be loosing zookeeper connection. Try increasing these two values
>> storm.zookeeper.session.timeout: 20000
>> storm.zookeeper.connection.timeout: 15000
>>
>>
>> On Tue, Feb 17, 2015, at 06:03 AM, Tousif wrote:
>>
>> Hello,
>>
>> I have a bolt which uses a pool of large objects. When pool reinitialises(once in 4 hours) bolt waits for few seconds and disconnects with zookeper.
>>
>> I have specified following properties in yaml but still worker dies.
>>
>> supervisor.worker.start.timeout.secs 300
>> supervisor.worker.timeout.secs 60
>>
>>
>> Here are the logs:
>>
>> 2015-02-17 04:35:01 o.a.z.ClientCnxn [INFO] Client session timed out,
>> have not heard from server in 15906ms for sessionid 0x14b9200ea400009,
>> closing socket connection and attempting reconnect
>> 2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> SUSPENDED
>> 2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>> 2015-02-17 04:35:01 b.s.cluster [WARN] Received event
>> :disconnected::none: with disconnected Zookeeper.
>> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server realtimeanalytics.novalocal/10.0.0.11:2181. Will not attempt to
>> authenticate using SASL (Unable to locate a login configuration)
>> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Socket connection established
>> to realtimeanalytics.novalocal/10.0.0.11:2181, initiating session
>> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server realtimeanalytics.novalocal/10.0.0.11:2181, sessionid
>> = 0x14b9200ea400009, negotiated timeout = 20000
>> 2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> RECONNECTED
>> 2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>> 2015-02-17 04:35:33 o.a.z.ClientCnxn [INFO] Client session timed out,
>> have not heard from server in 13499ms for sessionid 0x14b9200ea400009,
>> closing socket connection and attempting reconnect
>> 2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
>> SUSPENDED
>> 2015-02-17 04:35:34 b.s.cluster [WARN] Received event
>> :disconnected::none: with disconnected Zookeeper.
>> 2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [WARN] There are no
>> ConnectionStateListeners registered.
>>
>>
>>
>> --
>>
>>
>> Regards
>> Tousif Khazi
>>
>>
>>
>>
>


-- 


Regards
Tousif Khazi

Re: worker disconnecting with ZK for long running bolt

Posted by Tousif <to...@gmail.com>.
Thanks,
I will try out these config properties.
On Feb 17, 2015 7:58 PM, "Harsha" <st...@harsha.io> wrote:

>
> You might be loosing zookeeper connection. Try increasing these two values
> storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
>
>
> On Tue, Feb 17, 2015, at 06:03 AM, Tousif wrote:
>
> Hello,
>
> I have a bolt which uses a pool of large objects. When pool reinitialises(once in 4 hours) bolt waits for few seconds and disconnects with zookeper.
>
> I have specified following properties in yaml but still worker dies.
>
> supervisor.worker.start.timeout.secs 300
> supervisor.worker.timeout.secs 60
>
>
> Here are the logs:
>
> 2015-02-17 04:35:01 o.a.z.ClientCnxn [INFO] Client session timed out, have
> not heard from server in 15906ms for sessionid 0x14b9200ea400009, closing
> socket connection and attempting reconnect
> 2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> SUSPENDED
> 2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
> 2015-02-17 04:35:01 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server realtimeanalytics.novalocal/10.0.0.11:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Socket connection established
> to realtimeanalytics.novalocal/10.0.0.11:2181, initiating session
> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server realtimeanalytics.novalocal/10.0.0.11:2181, sessionid =
> 0x14b9200ea400009, negotiated timeout = 20000
> 2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> RECONNECTED
> 2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
> 2015-02-17 04:35:33 o.a.z.ClientCnxn [INFO] Client session timed out, have
> not heard from server in 13499ms for sessionid 0x14b9200ea400009, closing
> socket connection and attempting reconnect
> 2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> SUSPENDED
> 2015-02-17 04:35:34 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-02-17 04:35:34 o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
>
>
> --
>
>
> Regards
> Tousif Khazi
>
>
>
>

Re: worker disconnecting with ZK for long running bolt

Posted by Harsha <st...@harsha.io>.

You might be loosing zookeeper connection. Try increasing these two
values storm.zookeeper.session.timeout: 20000
storm.zookeeper.connection.timeout: 15000


On Tue, Feb 17, 2015, at 06:03 AM, Tousif wrote:
> Hello, I have a bolt which uses a pool of large objects. When pool
> reinitialises(once in 4 hours) bolt waits for few seconds and
> disconnects with zookeper. I have specified following properties in
> yaml but still worker dies. supervisor.worker.start.timeout.secs 300
supervisor.worker.timeout.secs 60
>
> Here are the logs:
>
> 2015-02-17 04:35:01 o.a.z.ClientCnxn [INFO] Client session timed out,
> have not heard from server in 15906ms for sessionid 0x14b9200ea400009,
> closing socket connection and attempting reconnect 2015-02-17 04:35:01
> o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
> 2015-02-17 04:35:01 o.a.c.f.s.ConnectionStateManager [WARN] There are
> no ConnectionStateListeners registered. 2015-02-17 04:35:01
> b.s.cluster [WARN] Received event :disconnected::none: with
> disconnected Zookeeper. 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO]
> Opening socket connection to server
> realtimeanalytics.novalocal/10.0.0.11:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Socket connection
> established to realtimeanalytics.novalocal/10.0.0.11:2181, initiating
> session 2015-02-17 04:35:02 o.a.z.ClientCnxn [INFO] Session
> establishment complete on server
> realtimeanalytics.novalocal/10.0.0.11:2181, sessionid =
> 0x14b9200ea400009, negotiated timeout = 20000 2015-02-17 04:35:02
> o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED
> 2015-02-17 04:35:02 o.a.c.f.s.ConnectionStateManager [WARN] There are
> no ConnectionStateListeners registered. 2015-02-17 04:35:33
> o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from
> server in 13499ms for sessionid 0x14b9200ea400009, closing socket
> connection and attempting reconnect 2015-02-17 04:35:34
> o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
> 2015-02-17 04:35:34 b.s.cluster [WARN] Received event
> :disconnected::none: with disconnected Zookeeper. 2015-02-17 04:35:34
> o.a.c.f.s.ConnectionStateManager [WARN] There are no
> ConnectionStateListeners registered.
>
>
>
> --
>
>
> Regards Tousif Khazi
>