You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by John Lindwall <jl...@yahoo.com.INVALID> on 2017/06/20 22:14:45 UTC

Client hangs waiting for connection

We are seeing some occasional incidents where a zookeeper java client 
will hang in CountDownLatch.await() while waiting for a connection to be 
established.  Our connect() code is pretty standard I think and it 
similar to this:

     private ZooKeeper connect(String hosts, int sessionTimeout) throws 
IOException, InterruptedException {
         final CountDownLatch connectedSignal = new CountDownLatch(1);

         ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
             @Override
             public void process(WatchedEvent event) {
                 if (event.getState() == Event.KeeperState.SyncConnected) {
                     connectedSignal.countDown();
                 }
             }
         });

         connectedSignal.await();
         return zk;
     }

Has anyone else had an issue with the await() blocking forever like 
this?  Any advice?

As a "fix" I am considering adding a timeout to the CountDownLatch 
await() call; if we fail to connect within that timeout then retry the 
connection attempt. After, say, 3 retries, give up entirely.

Thanks!
-- 
John Lindwall


Re: Client hangs waiting for connection

Posted by Dan Benediktson <db...@twitter.com.INVALID>.
Have you tried setting up log4j logging on the client application at a
moderately high verbosity level to see if it's actually doing any work
under the hood when it's apparently hanging? ZOOKEEPER-2471
<https://issues.apache.org/jira/browse/ZOOKEEPER-2471> caused basically the
symptoms you describe for me, but you need the right set of conditions to
hit that, so it's unlikely that's what you've got, but it's at least worth
checking if there's anything to be dredged out of the logs; the ZK client's
logging is pretty thorough.

On Tue, Jun 20, 2017 at 10:42 PM, Abraham Fine <af...@apache.org> wrote:

> Would it be possible to include the rest of the jstack, it appears that
> is just the thread waiting on the latch and doesn't tell us why the
> latch has not been counted down. Also, did ZK produce any interesting
> logs?
>
> Thanks,
> Abe
>
> On Tue, Jun 20, 2017, at 17:23, John Lindwall wrote:
> > Thanks for the reply! I forgot to include the thread dump that I have
> > collected.  This process has been hung for almost a day so I'm guessing
> > it'll never connect properly ;-)  We actually had 2 such processes hung
> > today with the same stack trace (at least the same root cause as I show
> > below).  Please note that this problem is rare but supremely not good
> > when it does happen if we fail to detect it. We've been running this
> > code for many months now and this issue has only recently occurred.
> >
> > Thread 4396: (state = BLOCKED)
> >
> > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
> >
> > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
> > line=186 (Interpreted frame)
> >
> > -
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.
> parkAndCheckInterrupt()
> > @bci=1, line=834 (Interpreted frame)
> >
> > -
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.
> doAcquireSharedInterruptibly(int)
> > @bci=72, line=994 (Interpreted frame)
> >
> > -
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.
> acquireSharedInterruptibly(int)
> > @bci=24, line=1303 (Interpreted frame)
> >
> > - java.util.concurrent.CountDownLatch.await() @bci=5, line=236
> > (Interpreted frame)
> >
> > - com.mycode.ZooKeeperFactory.connect(java.lang.String, int) @bci=34,
> > line=59 (Interpreted frame)
> > ...
> > [remainder of stack trace omitted]
> >
> > John
> >
> >
> > Michael Han wrote:
> > > Sounds like a dead lock on client library. One idea is to instrument
> your
> > > client code and dump the thread stack when the wait timeouts. The stack
> > > will hopefully contain the states of various threads and provide some
> > > insights on what to look for next.
> > >
> > > On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall<jlindwall@yahoo.com.
> invalid>
> > > wrote:
> > >
> > >> We are seeing some occasional incidents where a zookeeper java client
> will
> > >> hang in CountDownLatch.await() while waiting for a connection to be
> > >> established.  Our connect() code is pretty standard I think and it
> similar
> > >> to this:
> > >>
> > >>      private ZooKeeper connect(String hosts, int sessionTimeout)
> throws
> > >> IOException, InterruptedException {
> > >>          final CountDownLatch connectedSignal = new CountDownLatch(1);
> > >>
> > >>          ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new
> Watcher() {
> > >>              @Override
> > >>              public void process(WatchedEvent event) {
> > >>                  if (event.getState() == Event.KeeperState.SyncConnected)
> {
> > >>                      connectedSignal.countDown();
> > >>                  }
> > >>              }
> > >>          });
> > >>
> > >>          connectedSignal.await();
> > >>          return zk;
> > >>      }
> > >>
> > >> Has anyone else had an issue with the await() blocking forever like
> this?
> > >> Any advice?
> > >>
> > >> As a "fix" I am considering adding a timeout to the CountDownLatch
> await()
> > >> call; if we fail to connect within that timeout then retry the
> connection
> > >> attempt. After, say, 3 retries, give up entirely.
> > >>
> > >> Thanks!
> > >> --
> > >> John Lindwall
> > >>
> > >>
> > >
> > >
> >
> > --
> > John Lindwall
> >
>

Re: Client hangs waiting for connection

Posted by John Lindwall <jl...@yahoo.com.INVALID>.
The problem has been solved; closing the loop here.

ZooKeeper was behaving properly.  A configuration issue caused the code 
to try opening a connection to a zookeeper server that was not permitted 
based on a firewall.

Thanks to everyone for chiming in!
John

Abraham Fine wrote:
> Would it be possible to include the rest of the jstack, it appears that
> is just the thread waiting on the latch and doesn't tell us why the
> latch has not been counted down. Also, did ZK produce any interesting
> logs?
>
> Thanks,
> Abe
>
> On Tue, Jun 20, 2017, at 17:23, John Lindwall wrote:
>> Thanks for the reply! I forgot to include the thread dump that I have
>> collected.  This process has been hung for almost a day so I'm guessing
>> it'll never connect properly ;-)  We actually had 2 such processes hung
>> today with the same stack trace (at least the same root cause as I show
>> below).  Please note that this problem is rare but supremely not good
>> when it does happen if we fail to detect it. We've been running this
>> code for many months now and this issue has only recently occurred.
>>
>> Thread 4396: (state = BLOCKED)
>>
>> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
>>
>> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
>> line=186 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>> @bci=1, line=834 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int)
>> @bci=72, line=994 (Interpreted frame)
>>
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int)
>> @bci=24, line=1303 (Interpreted frame)
>>
>> - java.util.concurrent.CountDownLatch.await() @bci=5, line=236
>> (Interpreted frame)
>>
>> - com.mycode.ZooKeeperFactory.connect(java.lang.String, int) @bci=34,
>> line=59 (Interpreted frame)
>> ...
>> [remainder of stack trace omitted]
>>
>> John
>>
>>
>> Michael Han wrote:
>>> Sounds like a dead lock on client library. One idea is to instrument your
>>> client code and dump the thread stack when the wait timeouts. The stack
>>> will hopefully contain the states of various threads and provide some
>>> insights on what to look for next.
>>>
>>> On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall<jl...@yahoo.com.invalid>
>>> wrote:
>>>
>>>> We are seeing some occasional incidents where a zookeeper java client will
>>>> hang in CountDownLatch.await() while waiting for a connection to be
>>>> established.  Our connect() code is pretty standard I think and it similar
>>>> to this:
>>>>
>>>>       private ZooKeeper connect(String hosts, int sessionTimeout) throws
>>>> IOException, InterruptedException {
>>>>           final CountDownLatch connectedSignal = new CountDownLatch(1);
>>>>
>>>>           ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
>>>>               @Override
>>>>               public void process(WatchedEvent event) {
>>>>                   if (event.getState() == Event.KeeperState.SyncConnected) {
>>>>                       connectedSignal.countDown();
>>>>                   }
>>>>               }
>>>>           });
>>>>
>>>>           connectedSignal.await();
>>>>           return zk;
>>>>       }
>>>>
>>>> Has anyone else had an issue with the await() blocking forever like this?
>>>> Any advice?
>>>>
>>>> As a "fix" I am considering adding a timeout to the CountDownLatch await()
>>>> call; if we fail to connect within that timeout then retry the connection
>>>> attempt. After, say, 3 retries, give up entirely.
>>>>
>>>> Thanks!
>>>> --
>>>> John Lindwall
>>>>
>>>>
>> -- 
>> John Lindwall
>>

-- 
John Lindwall


Re: Client hangs waiting for connection

Posted by Abraham Fine <af...@apache.org>.
Would it be possible to include the rest of the jstack, it appears that
is just the thread waiting on the latch and doesn't tell us why the
latch has not been counted down. Also, did ZK produce any interesting
logs?

Thanks,
Abe

On Tue, Jun 20, 2017, at 17:23, John Lindwall wrote:
> Thanks for the reply! I forgot to include the thread dump that I have 
> collected.  This process has been hung for almost a day so I'm guessing 
> it'll never connect properly ;-)  We actually had 2 such processes hung 
> today with the same stack trace (at least the same root cause as I show 
> below).  Please note that this problem is rare but supremely not good 
> when it does happen if we fail to detect it. We've been running this 
> code for many months now and this issue has only recently occurred.
> 
> Thread 4396: (state = BLOCKED)
> 
> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
> 
> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=186 (Interpreted frame)
> 
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=834 (Interpreted frame)
> 
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int) 
> @bci=72, line=994 (Interpreted frame)
> 
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int) 
> @bci=24, line=1303 (Interpreted frame)
> 
> - java.util.concurrent.CountDownLatch.await() @bci=5, line=236 
> (Interpreted frame)
> 
> - com.mycode.ZooKeeperFactory.connect(java.lang.String, int) @bci=34, 
> line=59 (Interpreted frame)
> ...
> [remainder of stack trace omitted]
> 
> John
> 
> 
> Michael Han wrote:
> > Sounds like a dead lock on client library. One idea is to instrument your
> > client code and dump the thread stack when the wait timeouts. The stack
> > will hopefully contain the states of various threads and provide some
> > insights on what to look for next.
> >
> > On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall<jl...@yahoo.com.invalid>
> > wrote:
> >
> >> We are seeing some occasional incidents where a zookeeper java client will
> >> hang in CountDownLatch.await() while waiting for a connection to be
> >> established.  Our connect() code is pretty standard I think and it similar
> >> to this:
> >>
> >>      private ZooKeeper connect(String hosts, int sessionTimeout) throws
> >> IOException, InterruptedException {
> >>          final CountDownLatch connectedSignal = new CountDownLatch(1);
> >>
> >>          ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
> >>              @Override
> >>              public void process(WatchedEvent event) {
> >>                  if (event.getState() == Event.KeeperState.SyncConnected) {
> >>                      connectedSignal.countDown();
> >>                  }
> >>              }
> >>          });
> >>
> >>          connectedSignal.await();
> >>          return zk;
> >>      }
> >>
> >> Has anyone else had an issue with the await() blocking forever like this?
> >> Any advice?
> >>
> >> As a "fix" I am considering adding a timeout to the CountDownLatch await()
> >> call; if we fail to connect within that timeout then retry the connection
> >> attempt. After, say, 3 retries, give up entirely.
> >>
> >> Thanks!
> >> --
> >> John Lindwall
> >>
> >>
> >
> >
> 
> -- 
> John Lindwall
> 

Re: Client hangs waiting for connection

Posted by John Lindwall <jl...@yahoo.com.INVALID>.
Thanks for the reply! I forgot to include the thread dump that I have 
collected.  This process has been hung for almost a day so I'm guessing 
it'll never connect properly ;-)  We actually had 2 such processes hung 
today with the same stack trace (at least the same root cause as I show 
below).  Please note that this problem is rare but supremely not good 
when it does happen if we fail to detect it. We've been running this 
code for many months now and this issue has only recently occurred.

Thread 4396: (state = BLOCKED)

- sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)

- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Interpreted frame)

- 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
@bci=1, line=834 (Interpreted frame)

- 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int) 
@bci=72, line=994 (Interpreted frame)

- 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int) 
@bci=24, line=1303 (Interpreted frame)

- java.util.concurrent.CountDownLatch.await() @bci=5, line=236 
(Interpreted frame)

- com.mycode.ZooKeeperFactory.connect(java.lang.String, int) @bci=34, 
line=59 (Interpreted frame)
...
[remainder of stack trace omitted]

John


Michael Han wrote:
> Sounds like a dead lock on client library. One idea is to instrument your
> client code and dump the thread stack when the wait timeouts. The stack
> will hopefully contain the states of various threads and provide some
> insights on what to look for next.
>
> On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall<jl...@yahoo.com.invalid>
> wrote:
>
>> We are seeing some occasional incidents where a zookeeper java client will
>> hang in CountDownLatch.await() while waiting for a connection to be
>> established.  Our connect() code is pretty standard I think and it similar
>> to this:
>>
>>      private ZooKeeper connect(String hosts, int sessionTimeout) throws
>> IOException, InterruptedException {
>>          final CountDownLatch connectedSignal = new CountDownLatch(1);
>>
>>          ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
>>              @Override
>>              public void process(WatchedEvent event) {
>>                  if (event.getState() == Event.KeeperState.SyncConnected) {
>>                      connectedSignal.countDown();
>>                  }
>>              }
>>          });
>>
>>          connectedSignal.await();
>>          return zk;
>>      }
>>
>> Has anyone else had an issue with the await() blocking forever like this?
>> Any advice?
>>
>> As a "fix" I am considering adding a timeout to the CountDownLatch await()
>> call; if we fail to connect within that timeout then retry the connection
>> attempt. After, say, 3 retries, give up entirely.
>>
>> Thanks!
>> --
>> John Lindwall
>>
>>
>
>

-- 
John Lindwall


Re: Client hangs waiting for connection

Posted by Michael Han <ha...@cloudera.com>.
Sounds like a dead lock on client library. One idea is to instrument your
client code and dump the thread stack when the wait timeouts. The stack
will hopefully contain the states of various threads and provide some
insights on what to look for next.

On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall <jl...@yahoo.com.invalid>
wrote:

> We are seeing some occasional incidents where a zookeeper java client will
> hang in CountDownLatch.await() while waiting for a connection to be
> established.  Our connect() code is pretty standard I think and it similar
> to this:
>
>     private ZooKeeper connect(String hosts, int sessionTimeout) throws
> IOException, InterruptedException {
>         final CountDownLatch connectedSignal = new CountDownLatch(1);
>
>         ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
>             @Override
>             public void process(WatchedEvent event) {
>                 if (event.getState() == Event.KeeperState.SyncConnected) {
>                     connectedSignal.countDown();
>                 }
>             }
>         });
>
>         connectedSignal.await();
>         return zk;
>     }
>
> Has anyone else had an issue with the await() blocking forever like this?
> Any advice?
>
> As a "fix" I am considering adding a timeout to the CountDownLatch await()
> call; if we fail to connect within that timeout then retry the connection
> attempt. After, say, 3 retries, give up entirely.
>
> Thanks!
> --
> John Lindwall
>
>


-- 
Cheers
Michael.