You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2014/04/09 19:55:13 UTC

curator-2.4.0 cannot recover connection loss

Last night, I rolling-restarted zookeeper 3.4.5 to update configuration and
I saw curator-2.4.0 couldn't recover connection loss.

ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
org.apache.curator.framework.state.ConnectionStateManager: State change:
RECONNECTED
INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
org.apache.curator.framework.state.ConnectionStateManager: State change:
SUSPENDED
ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
exception was not retry-able or retry gave up
java.lang.NullPointerException
        at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
        at
com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
        at com.google.common.collect.Lists.transform(Lists.java:510)
        at
org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
        at
org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
        at
org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

I am not sure this bug is on PathChildrenCache.

I need to restart all instances using curator-2.4.0, which is really bad.

Thank you
Best, Jae

Re: curator-2.4.0 cannot recover connection loss

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.
Hi Jordan

I created https://issues.apache.org/jira/browse/CURATOR-103

Thank you
Best, Jae


On Sat, Apr 12, 2014 at 11:01 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> This looks like a bug in PathChildrenCache to me. I see an NPE there.
> Please open an issue in Jira for this.
>
> -Jordan
>
>
> From: Bae, Jae Hyeon metacret@gmail.com
> Reply: user@curator.apache.org user@curator.apache.org
> Date: April 9, 2014 at 12:55:51 PM
> To: user@curator.apache.org user@curator.apache.org
> Subject:  curator-2.4.0 cannot recover connection loss
>
>  Last night, I rolling-restarted zookeeper 3.4.5 to update configuration
> and I saw curator-2.4.0 couldn't recover connection loss.
>
>  ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
> gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>  INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
> org.apache.curator.framework.state.ConnectionStateManager: State change:
> RECONNECTED
> INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
> org.apache.curator.framework.state.ConnectionStateManager: State change:
> SUSPENDED
> ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
> exception was not retry-able or retry gave up
> java.lang.NullPointerException
>         at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
>         at
> com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
>         at com.google.common.collect.Lists.transform(Lists.java:510)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
> I am not sure this bug is on PathChildrenCache.
>
> I need to restart all instances using curator-2.4.0, which is really bad.
>
> Thank you
> Best, Jae
>
>

Re: curator-2.4.0 cannot recover connection loss

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
This looks like a bug in PathChildrenCache to me. I see an NPE there. Please open an issue in Jira for this.

-Jordan


From: Bae, Jae Hyeon metacret@gmail.com
Reply: user@curator.apache.org user@curator.apache.org
Date: April 9, 2014 at 12:55:51 PM
To: user@curator.apache.org user@curator.apache.org
Subject:  curator-2.4.0 cannot recover connection loss  

Last night, I rolling-restarted zookeeper 3.4.5 to update configuration and I saw curator-2.4.0 couldn't recover connection loss.

ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2] org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager: State change: RECONNECTED
INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager: State change: SUSPENDED
ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2] org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
        at com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
        at com.google.common.collect.Lists.transform(Lists.java:510)
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

I am not sure this bug is on PathChildrenCache.

I need to restart all instances using curator-2.4.0, which is really bad.

Thank you
Best, Jae

Re: curator-2.4.0 cannot recover connection loss

Posted by Osman <os...@gmail.com>.
You may use attached one.

Regards.


On 10 April 2014 18:54, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Could you send me your test code to test in my environment?
>
> Thank you
> Best, Jae
>
>
> On Thu, Apr 10, 2014 at 9:11 AM, Osman <os...@gmail.com> wrote:
>
>> Hi Jae;
>>
>> just letting you know that, using zookeeper 3.4.6 and curator 2.4.1, I
>> could not verify your case in my environment.
>> It would be nice If see this problem in my environment, How can I
>> elaborate that?
>>
>> After starting the application (using PathChildrenCacheListener) , I
>> stop the zookeeper and 40 seconds after restart it.
>> Application switch to RECONNECTED state after  SUSPENDED state ,
>> reporting ConnectionLoss.
>> (After 30 minutes checking logs, It did not go back to SUSPENDED state
>> ,still connected and listening the children node changes.)
>>
>> java.io.IOException: An existing connection was forcibly closed by the
>> remote host
>> 08:40:34.464 [main-EventThread] INFO
>>  o.a.c.f.state.ConnectionStateManager - State change: SUSPENDED
>> 08:40:34.473 [PathChildrenCache-0] ERROR
>> o.a.c.f.r.cache.PathChildrenCache -
>> 08:40:40.198 [CuratorFramework-0] WARN
>>  org.apache.curator.ConnectionState - Connection attempt unsuccessful after
>> 2000 (greater than max timeout of 500). Resetting connection and trying
>> again with a new connection.
>> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
>> Closing session: 0x0
>> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
>> Closing client for session: 0x0
>> 08:40:42.344 [CuratorFramework-0] WARN
>>  org.apache.curator.ConnectionState - Connection attempt unsuccessful after
>> 2146 (greater than max timeout of 500). Resetting connection and trying
>> again with a new connection.
>> 08:40:42.344 [CuratorFramework-0] DEBUG
>> org.apache.curator.ConnectionState - reset
>> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
>> Closing session: 0x0
>> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
>> Closing client for session: 0x0
>> 08:40:42.403 [CuratorFramework-0-SendThread(127.0.0.1:2181)] INFO
>>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error)
>> 08:40:42.409 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
>> - Background operation retry gave up
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss
>> 08:40:42.410 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
>> - Background retry gave up
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>> 08:40:46.920 [CuratorFramework-0] WARN
>>  org.apache.curator.ConnectionState - Connection attempt unsuccessful after
>> 1389 (greater than max timeout of 500). Resetting connection and trying
>> again with a new connection.
>> 08:40:46.920 [CuratorFramework-0] DEBUG
>> org.apache.curator.ConnectionState - reset
>> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
>> Closing session: 0x0
>> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
>> Closing client for session: 0x0
>> 08:41:14.303 [CuratorFramework-0-SendThread(0:0:0:0:0:0:0:1:2181)] DEBUG
>> o.a.zookeeper.ClientCnxnSocketNIO - Ignoring exception during shutdown input
>> java.net.SocketException: Socket is not connected
>>
>> Then After starting zookeeper instance Path Children Cache Continue to
>> get updated
>> 08:41:15.804 [CuratorFramework-0-EventThread] INFO
>>  o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
>>
>>
>>
>>
>> Regards.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 9 April 2014 18:55, Bae, Jae Hyeon <me...@gmail.com> wrote:
>>
>>> Last night, I rolling-restarted zookeeper 3.4.5 to update configuration
>>> and I saw curator-2.4.0 couldn't recover connection loss.
>>>
>>> ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
>>> gave up
>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>>> ConnectionLoss
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>>         at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:724)
>>>
>>> INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
>>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>>> RECONNECTED
>>> INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
>>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>>> SUSPENDED
>>> ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
>>> exception was not retry-able or retry gave up
>>> java.lang.NullPointerException
>>>         at
>>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
>>>         at
>>> com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
>>>         at com.google.common.collect.Lists.transform(Lists.java:510)
>>>         at
>>> org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
>>>         at
>>> org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
>>>         at
>>> org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>>         at
>>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>>         at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:724)
>>>
>>> I am not sure this bug is on PathChildrenCache.
>>>
>>> I need to restart all instances using curator-2.4.0, which is really bad.
>>>
>>> Thank you
>>> Best, Jae
>>>
>>
>>
>>
>> --
>> Osman Sebati Çam
>>
>> https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam>
>> http://osmanscam.blogspot.ie
>>
>>
>>
>>
>


-- 
Osman Sebati Çam

https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam>
http://osmanscam.blogspot.ie

Re: curator-2.4.0 cannot recover connection loss

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.
Could you send me your test code to test in my environment?

Thank you
Best, Jae


On Thu, Apr 10, 2014 at 9:11 AM, Osman <os...@gmail.com> wrote:

> Hi Jae;
>
> just letting you know that, using zookeeper 3.4.6 and curator 2.4.1, I
> could not verify your case in my environment.
> It would be nice If see this problem in my environment, How can I
> elaborate that?
>
> After starting the application (using PathChildrenCacheListener) , I stop
> the zookeeper and 40 seconds after restart it.
> Application switch to RECONNECTED state after  SUSPENDED state , reporting
> ConnectionLoss.
> (After 30 minutes checking logs, It did not go back to SUSPENDED state
> ,still connected and listening the children node changes.)
>
> java.io.IOException: An existing connection was forcibly closed by the
> remote host
> 08:40:34.464 [main-EventThread] INFO  o.a.c.f.state.ConnectionStateManager
> - State change: SUSPENDED
> 08:40:34.473 [PathChildrenCache-0] ERROR o.a.c.f.r.cache.PathChildrenCache
> -
> 08:40:40.198 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 2000 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:40:42.344 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 2146 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
> - reset
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:40:42.403 [CuratorFramework-0-SendThread(127.0.0.1:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 08:40:42.409 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
> - Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
> 08:40:42.410 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
> - Background retry gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> 08:40:46.920 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 1389 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
> - reset
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:41:14.303 [CuratorFramework-0-SendThread(0:0:0:0:0:0:0:1:2181)] DEBUG
> o.a.zookeeper.ClientCnxnSocketNIO - Ignoring exception during shutdown input
> java.net.SocketException: Socket is not connected
>
> Then After starting zookeeper instance Path Children Cache Continue to get
> updated
> 08:41:15.804 [CuratorFramework-0-EventThread] INFO
>  o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
>
>
>
>
> Regards.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 9 April 2014 18:55, Bae, Jae Hyeon <me...@gmail.com> wrote:
>
>> Last night, I rolling-restarted zookeeper 3.4.5 to update configuration
>> and I saw curator-2.4.0 couldn't recover connection loss.
>>
>> ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
>> gave up
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:724)
>>
>> INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>> RECONNECTED
>> INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>> SUSPENDED
>> ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
>> exception was not retry-able or retry gave up
>> java.lang.NullPointerException
>>         at
>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
>>         at
>> com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
>>         at com.google.common.collect.Lists.transform(Lists.java:510)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:724)
>>
>> I am not sure this bug is on PathChildrenCache.
>>
>> I need to restart all instances using curator-2.4.0, which is really bad.
>>
>> Thank you
>> Best, Jae
>>
>
>
>
> --
> Osman Sebati Çam
>
> https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam>
> http://osmanscam.blogspot.ie
>
>
>
>

Re: curator-2.4.0 cannot recover connection loss

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.
Hi Osman

Thank you for testing. If I can reproduce this problem, I will test
zookeeper 3.4.6 and curator 2.4.1 combination.


On Thu, Apr 10, 2014 at 9:11 AM, Osman <os...@gmail.com> wrote:

> Hi Jae;
>
> just letting you know that, using zookeeper 3.4.6 and curator 2.4.1, I
> could not verify your case in my environment.
> It would be nice If see this problem in my environment, How can I
> elaborate that?
>
> After starting the application (using PathChildrenCacheListener) , I stop
> the zookeeper and 40 seconds after restart it.
> Application switch to RECONNECTED state after  SUSPENDED state , reporting
> ConnectionLoss.
> (After 30 minutes checking logs, It did not go back to SUSPENDED state
> ,still connected and listening the children node changes.)
>
> java.io.IOException: An existing connection was forcibly closed by the
> remote host
> 08:40:34.464 [main-EventThread] INFO  o.a.c.f.state.ConnectionStateManager
> - State change: SUSPENDED
> 08:40:34.473 [PathChildrenCache-0] ERROR o.a.c.f.r.cache.PathChildrenCache
> -
> 08:40:40.198 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 2000 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:40:42.344 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 2146 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
> - reset
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:40:42.403 [CuratorFramework-0-SendThread(127.0.0.1:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 08:40:42.409 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
> - Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
> 08:40:42.410 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl
> - Background retry gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> 08:40:46.920 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
> - Connection attempt unsuccessful after 1389 (greater than max timeout of
> 500). Resetting connection and trying again with a new connection.
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
> - reset
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
> Closing session: 0x0
> 08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
> Closing client for session: 0x0
> 08:41:14.303 [CuratorFramework-0-SendThread(0:0:0:0:0:0:0:1:2181)] DEBUG
> o.a.zookeeper.ClientCnxnSocketNIO - Ignoring exception during shutdown input
> java.net.SocketException: Socket is not connected
>
> Then After starting zookeeper instance Path Children Cache Continue to get
> updated
> 08:41:15.804 [CuratorFramework-0-EventThread] INFO
>  o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
>
>
>
>
> Regards.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 9 April 2014 18:55, Bae, Jae Hyeon <me...@gmail.com> wrote:
>
>> Last night, I rolling-restarted zookeeper 3.4.5 to update configuration
>> and I saw curator-2.4.0 couldn't recover connection loss.
>>
>> ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
>> gave up
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:724)
>>
>> INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>> RECONNECTED
>> INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
>> org.apache.curator.framework.state.ConnectionStateManager: State change:
>> SUSPENDED
>> ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
>> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
>> exception was not retry-able or retry gave up
>> java.lang.NullPointerException
>>         at
>> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
>>         at
>> com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
>>         at com.google.common.collect.Lists.transform(Lists.java:510)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
>>         at
>> org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>>         at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:724)
>>
>> I am not sure this bug is on PathChildrenCache.
>>
>> I need to restart all instances using curator-2.4.0, which is really bad.
>>
>> Thank you
>> Best, Jae
>>
>
>
>
> --
> Osman Sebati Çam
>
> https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam>
> http://osmanscam.blogspot.ie
>
>
>
>

Re: curator-2.4.0 cannot recover connection loss

Posted by Osman <os...@gmail.com>.
Hi Jae;

just letting you know that, using zookeeper 3.4.6 and curator 2.4.1, I
could not verify your case in my environment.
It would be nice If see this problem in my environment, How can I elaborate
that?

After starting the application (using PathChildrenCacheListener) , I stop
the zookeeper and 40 seconds after restart it.
Application switch to RECONNECTED state after  SUSPENDED state , reporting
ConnectionLoss.
(After 30 minutes checking logs, It did not go back to SUSPENDED state
,still connected and listening the children node changes.)

java.io.IOException: An existing connection was forcibly closed by the
remote host
08:40:34.464 [main-EventThread] INFO  o.a.c.f.state.ConnectionStateManager
- State change: SUSPENDED
08:40:34.473 [PathChildrenCache-0] ERROR o.a.c.f.r.cache.PathChildrenCache
-
08:40:40.198 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
- Connection attempt unsuccessful after 2000 (greater than max timeout of
500). Resetting connection and trying again with a new connection.
08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
Closing session: 0x0
08:40:40.198 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
Closing client for session: 0x0
08:40:42.344 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
- Connection attempt unsuccessful after 2146 (greater than max timeout of
500). Resetting connection and trying again with a new connection.
08:40:42.344 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
- reset
08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
Closing session: 0x0
08:40:42.344 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
Closing client for session: 0x0
08:40:42.403 [CuratorFramework-0-SendThread(127.0.0.1:2181)] INFO
 org.apache.zookeeper.ClientCnxn - Opening socket connection to server
127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error)
08:40:42.409 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl -
Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
08:40:42.410 [CuratorFramework-0] ERROR o.a.c.f.imps.CuratorFrameworkImpl -
Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
08:40:46.920 [CuratorFramework-0] WARN  org.apache.curator.ConnectionState
- Connection attempt unsuccessful after 1389 (greater than max timeout of
500). Resetting connection and trying again with a new connection.
08:40:46.920 [CuratorFramework-0] DEBUG org.apache.curator.ConnectionState
- reset
08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ZooKeeper -
Closing session: 0x0
08:40:46.920 [CuratorFramework-0] DEBUG org.apache.zookeeper.ClientCnxn -
Closing client for session: 0x0
08:41:14.303 [CuratorFramework-0-SendThread(0:0:0:0:0:0:0:1:2181)] DEBUG
o.a.zookeeper.ClientCnxnSocketNIO - Ignoring exception during shutdown input
java.net.SocketException: Socket is not connected

Then After starting zookeeper instance Path Children Cache Continue to get
updated
08:41:15.804 [CuratorFramework-0-EventThread] INFO
 o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED




Regards.













On 9 April 2014 18:55, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Last night, I rolling-restarted zookeeper 3.4.5 to update configuration
> and I saw curator-2.4.0 couldn't recover connection loss.
>
> ERROR 2014-04-09 17:48:15,231 [DaemonThreadFactory-2-thread-2]
> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background retry
> gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:766)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
> INFO  2014-04-09 17:48:15,276 [ServerInventoryView-0-EventThread]
> org.apache.curator.framework.state.ConnectionStateManager: State change:
> RECONNECTED
> INFO  2014-04-09 17:48:15,382 [ServerInventoryView-0-EventThread]
> org.apache.curator.framework.state.ConnectionStateManager: State change:
> SUSPENDED
> ERROR 2014-04-09 17:48:15,748 [DaemonThreadFactory-2-thread-2]
> org.apache.curator.framework.imps.CuratorFrameworkImpl: Background
> exception was not retry-able or retry gave up
> java.lang.NullPointerException
>         at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
>         at
> com.google.common.collect.Lists$TransformingSequentialList.<init>(Lists.java:527)
>         at com.google.common.collect.Lists.transform(Lists.java:510)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:635)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68)
>         at
> org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:476)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:783)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
> I am not sure this bug is on PathChildrenCache.
>
> I need to restart all instances using curator-2.4.0, which is really bad.
>
> Thank you
> Best, Jae
>



-- 
Osman Sebati Çam

https://twitter.com/osmanscam <https://twitter.com/#!/osmanscam>
http://osmanscam.blogspot.ie