You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by karthick rn <ka...@gmail.com> on 2020/03/30 23:11:52 UTC

Zookeeper & Netty framework

Hello dev team,

We are using Hadoop, Accumulo & Zookeeper in our environment, after
enabling TLS for ZK we noticed that starting Accumulo master service hangs
in an intermediate process as shown below and require to kill the process
in-order for Accumulo master to start.

[user1@host1 ~]$ jps -m

23314 JournalNode

23011 NameNode

23539 DFSZKFailoverController

*84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*

22590 QuorumPeerMain

89790 Jps -m


[user1@host1 ~]$ *kill -9 84118*

[user1@host1 ~]$ jps -m

23314 JournalNode

23011 NameNode

23539 DFSZKFailoverController

89892 Jps -m

*89847 Main master*

22590 QuorumPeerMain

[user1@host1 ~]$

Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
while the rest are daemon threads. Would like to check with the dev team if
"nioEventLoopGroup" threads are expected to be non-daemon? If so, any
thoughts on what else might be causing the issue?
I have copied only a portion of the jstack output, let me know in-case you
need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop 3.2.1 &
Accumulo 2.0. Let me know if you need any further details? Many thanks

"org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)" #25
daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s tid=0x0000000003a10800
nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for  <0x000000070f0acf38> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
/LockSupport.java:234)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
/AbstractQueuedSynchronizer.java:2123)
at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
/LinkedBlockingDeque.java:513)
at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
/LinkedBlockingDeque.java:675)
at
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)

   Locked ownable synchronizers:
- None

"org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for  <0x000000070f0f8af0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
/LockSupport.java:194)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
/LinkedBlockingQueue.java:433)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)

   Locked ownable synchronizers:
- None

"*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms elapsed=95.28s
tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
/EPollSelectorImpl.java:120)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
/SelectorImpl.java:124)
- locked <0x000000070f079a28> (a
io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
at
io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)

   Locked ownable synchronizers:
- None

"org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)" #28
daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s tid=0x0000000005942800
nid=0x16259 waiting on condition  [0x00007f5c7b145000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for  <0x000000070f2acbc0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
/LockSupport.java:234)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
/AbstractQueuedSynchronizer.java:2123)
at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
/LinkedBlockingDeque.java:513)
at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
/LinkedBlockingDeque.java:675)
at
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)

   Locked ownable synchronizers:
- None

"org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
nid=0x1625a waiting on condition  [0x00007f5c7b044000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for  <0x000000070f2adff8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
/LockSupport.java:194)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
/LinkedBlockingQueue.java:433)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)

   Locked ownable synchronizers:
- None

"*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms elapsed=94.30s
tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
/EPollSelectorImpl.java:120)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
/SelectorImpl.java:124)
- locked <0x000000070f2ab868> (a
io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
at
io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)

   Locked ownable synchronizers:
- None

Re: Zookeeper & Netty framework

Posted by karthick rn <ka...@gmail.com>.
Christopher, I have created an issue (
https://github.com/apache/accumulo/issues/1578) on the github page. Fyi

Thanks,
Karthick

On Wed, 1 Apr 2020 at 11:58, karthick rn <ka...@gmail.com>
wrote:

> Hi Christopher, I have talked through this issue with Keith internally but
> haven't raised an official channel for discussion, because I suspected the
> issue could be related to ZK / Netty framework as only after enabling TLS
> we are seeing this.
> May be I should have opened an issue on Accumulo git first. I'm doing this
> now.
>
> Thanks,
> Karthick
>
> On Wed, 1 Apr 2020 at 08:51, Christopher <ct...@apache.org> wrote:
>
>> Karthick, I haven't seen this discussed in the Accumulo community. Can you
>> point me to the conversation there?
>>
>> On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote:
>>
>> > Why would they need to be daemon threads?
>> > I’m not an expert of Java threading, but afaik I/O threads should not be
>> > daemon threads in most cases.
>> >
>> > Also those threads are Netty internal threads, so this question is
>> better
>> > to be asked in Netty community.
>> > ZK threads reported in jstack are just waiting for input to
>> send/receive.
>> > Do you know at which point Accumulo does stuck?
>> >
>> > Andor
>> >
>> >
>> >
>> > > On 2020. Mar 31., at 14:27, karthick rn <karthick.narendran@gmail.com
>> >
>> > wrote:
>> > >
>> > > Hi Enrico,
>> > >
>> > > Yes, I have already run this through Accumulo folks they have looked
>> at
>> > the
>> > > jstack output & advised to check with ZK devs if those 2 threads (#27
>> &
>> > > #30) are expected to be non-daemon threads.
>> > > Also, in this cluster we have wire encryption enabled only for ZK and
>> by
>> > > disabling it we don't encounter this issue.
>> > > There are no error messages reported on the ZK server log, below are
>> the
>> > > INFO messages when running the "accumulo-service master start" command
>> > >
>> > > 2020-03-31 12:16:28,626 [myid:2] - INFO
>> > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] -
>> Authenticated
>> > Id
>> > > 'CN=host2' for Scheme 'x509'
>> > >
>> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
>> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet
>> /<host2
>> > > IP>:46332
>> > >
>> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
>> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
>> > > IP>:46332
>> > >
>> > > This issue is reproducible everytime I start Accumulo master. Let me
>> know
>> > > for any further details?
>> > >
>> > > Many thanks
>> > >
>> > > Regards,
>> > > Karthick
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eo...@gmail.com>
>> > wrote:
>> > >
>> > >> Hi,
>> > >> Did you check with Accumulo community?
>> > >> Do you see errors or informational messages in ZK server logs?
>> > >>
>> > >> Enrico
>> > >>
>> > >> Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com>
>> ha
>> > >> scritto:
>> > >>
>> > >>> Hello dev team,
>> > >>>
>> > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after
>> > >>> enabling TLS for ZK we noticed that starting Accumulo master service
>> > >> hangs
>> > >>> in an intermediate process as shown below and require to kill the
>> > process
>> > >>> in-order for Accumulo master to start.
>> > >>>
>> > >>> [user1@host1 ~]$ jps -m
>> > >>>
>> > >>> 23314 JournalNode
>> > >>>
>> > >>> 23011 NameNode
>> > >>>
>> > >>> 23539 DFSZKFailoverController
>> > >>>
>> > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
>> > >>>
>> > >>> 22590 QuorumPeerMain
>> > >>>
>> > >>> 89790 Jps -m
>> > >>>
>> > >>>
>> > >>> [user1@host1 ~]$ *kill -9 84118*
>> > >>>
>> > >>> [user1@host1 ~]$ jps -m
>> > >>>
>> > >>> 23314 JournalNode
>> > >>>
>> > >>> 23011 NameNode
>> > >>>
>> > >>> 23539 DFSZKFailoverController
>> > >>>
>> > >>> 89892 Jps -m
>> > >>>
>> > >>> *89847 Main master*
>> > >>>
>> > >>> 22590 QuorumPeerMain
>> > >>>
>> > >>> [user1@host1 ~]$
>> > >>>
>> > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 &
>> #30)
>> > >>> while the rest are daemon threads. Would like to check with the dev
>> > team
>> > >> if
>> > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so,
>> any
>> > >>> thoughts on what else might be causing the issue?
>> > >>> I have copied only a portion of the jstack output, let me know
>> in-case
>> > >> you
>> > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
>> > >> 3.2.1 &
>> > >>> Accumulo 2.0. Let me know if you need any further details? Many
>> thanks
>> > >>>
>> > >>>
>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
>> > >> #25
>> > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
>> > >> tid=0x0000000003a10800
>> > >>> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
>> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
>> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>> > >>> - parking to wait for  <0x000000070f0acf38> (a
>> > >>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
>> > >>> /LockSupport.java:234)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
>> > >>> /AbstractQueuedSynchronizer.java:2123)
>> > >>> at
>> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
>> > >>> /LinkedBlockingDeque.java:513)
>> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
>> > >>> /LinkedBlockingDeque.java:675)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
>> > >>> at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26
>> daemon
>> > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
>> > >>> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
>> > >>>   java.lang.Thread.State: WAITING (parking)
>> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>> > >>> - parking to wait for  <0x000000070f0f8af0> (a
>> > >>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
>> > >>> /LockSupport.java:194)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
>> > >>> /AbstractQueuedSynchronizer.java:2081)
>> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
>> > >>> /LinkedBlockingQueue.java:433)
>> > >>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
>> > >> elapsed=95.28s
>> > >>> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
>> > >>>   java.lang.Thread.State: RUNNABLE
>> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
>> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
>> > >>> /EPollSelectorImpl.java:120)
>> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
>> > >>> /SelectorImpl.java:124)
>> > >>> - locked <0x000000070f079a28> (a
>> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
>> > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
>> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
>> > >> /SelectorImpl.java:141)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>> > >>> at
>> > >>>
>> >
>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>>
>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
>> > >> #28
>> > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s
>> > tid=0x0000000005942800
>> > >>> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
>> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
>> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>> > >>> - parking to wait for  <0x000000070f2acbc0> (a
>> > >>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
>> > >>> /LockSupport.java:234)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
>> > >>> /AbstractQueuedSynchronizer.java:2123)
>> > >>> at
>> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
>> > >>> /LinkedBlockingDeque.java:513)
>> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
>> > >>> /LinkedBlockingDeque.java:675)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
>> > >>> at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29
>> daemon
>> > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
>> > >>> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
>> > >>>   java.lang.Thread.State: WAITING (parking)
>> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>> > >>> - parking to wait for  <0x000000070f2adff8> (a
>> > >>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
>> > >>> /LockSupport.java:194)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
>> > >>> /AbstractQueuedSynchronizer.java:2081)
>> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
>> > >>> /LinkedBlockingQueue.java:433)
>> > >>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
>> > >> elapsed=94.30s
>> > >>> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
>> > >>>   java.lang.Thread.State: RUNNABLE
>> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
>> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
>> > >>> /EPollSelectorImpl.java:120)
>> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
>> > >>> /SelectorImpl.java:124)
>> > >>> - locked <0x000000070f2ab868> (a
>> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
>> > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
>> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
>> > >> /SelectorImpl.java:141)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>> > >>> at
>> > >>>
>> >
>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>> > >>> at
>> > >>>
>> > >>>
>> > >>
>> >
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>> > >>>
>> > >>>   Locked ownable synchronizers:
>> > >>> - None
>> > >>>
>> > >>
>> >
>> >
>>
>

Re: Zookeeper & Netty framework

Posted by karthick rn <ka...@gmail.com>.
Hi Christopher, I have talked through this issue with Keith internally but
haven't raised an official channel for discussion, because I suspected the
issue could be related to ZK / Netty framework as only after enabling TLS
we are seeing this.
May be I should have opened an issue on Accumulo git first. I'm doing this
now.

Thanks,
Karthick

On Wed, 1 Apr 2020 at 08:51, Christopher <ct...@apache.org> wrote:

> Karthick, I haven't seen this discussed in the Accumulo community. Can you
> point me to the conversation there?
>
> On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote:
>
> > Why would they need to be daemon threads?
> > I’m not an expert of Java threading, but afaik I/O threads should not be
> > daemon threads in most cases.
> >
> > Also those threads are Netty internal threads, so this question is better
> > to be asked in Netty community.
> > ZK threads reported in jstack are just waiting for input to send/receive.
> > Do you know at which point Accumulo does stuck?
> >
> > Andor
> >
> >
> >
> > > On 2020. Mar 31., at 14:27, karthick rn <ka...@gmail.com>
> > wrote:
> > >
> > > Hi Enrico,
> > >
> > > Yes, I have already run this through Accumulo folks they have looked at
> > the
> > > jstack output & advised to check with ZK devs if those 2 threads (#27 &
> > > #30) are expected to be non-daemon threads.
> > > Also, in this cluster we have wire encryption enabled only for ZK and
> by
> > > disabling it we don't encounter this issue.
> > > There are no error messages reported on the ZK server log, below are
> the
> > > INFO messages when running the "accumulo-service master start" command
> > >
> > > 2020-03-31 12:16:28,626 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated
> > Id
> > > 'CN=host2' for Scheme 'x509'
> > >
> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
> > > IP>:46332
> > >
> > > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
> > > IP>:46332
> > >
> > > This issue is reproducible everytime I start Accumulo master. Let me
> know
> > > for any further details?
> > >
> > > Many thanks
> > >
> > > Regards,
> > > Karthick
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> > >
> > >> Hi,
> > >> Did you check with Accumulo community?
> > >> Do you see errors or informational messages in ZK server logs?
> > >>
> > >> Enrico
> > >>
> > >> Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com>
> ha
> > >> scritto:
> > >>
> > >>> Hello dev team,
> > >>>
> > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after
> > >>> enabling TLS for ZK we noticed that starting Accumulo master service
> > >> hangs
> > >>> in an intermediate process as shown below and require to kill the
> > process
> > >>> in-order for Accumulo master to start.
> > >>>
> > >>> [user1@host1 ~]$ jps -m
> > >>>
> > >>> 23314 JournalNode
> > >>>
> > >>> 23011 NameNode
> > >>>
> > >>> 23539 DFSZKFailoverController
> > >>>
> > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
> > >>>
> > >>> 22590 QuorumPeerMain
> > >>>
> > >>> 89790 Jps -m
> > >>>
> > >>>
> > >>> [user1@host1 ~]$ *kill -9 84118*
> > >>>
> > >>> [user1@host1 ~]$ jps -m
> > >>>
> > >>> 23314 JournalNode
> > >>>
> > >>> 23011 NameNode
> > >>>
> > >>> 23539 DFSZKFailoverController
> > >>>
> > >>> 89892 Jps -m
> > >>>
> > >>> *89847 Main master*
> > >>>
> > >>> 22590 QuorumPeerMain
> > >>>
> > >>> [user1@host1 ~]$
> > >>>
> > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 &
> #30)
> > >>> while the rest are daemon threads. Would like to check with the dev
> > team
> > >> if
> > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> > >>> thoughts on what else might be causing the issue?
> > >>> I have copied only a portion of the jstack output, let me know
> in-case
> > >> you
> > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
> > >> 3.2.1 &
> > >>> Accumulo 2.0. Let me know if you need any further details? Many
> thanks
> > >>>
> > >>>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
> > >> #25
> > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
> > >> tid=0x0000000003a10800
> > >>> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f0acf38> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > >>> /LockSupport.java:234)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2123)
> > >>> at
> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:513)
> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:675)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > >>> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26
> daemon
> > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> > >>> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f0f8af0> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > >>> /LockSupport.java:194)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2081)
> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > >>> /LinkedBlockingQueue.java:433)
> > >>> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
> > >> elapsed=95.28s
> > >>> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
> > >>>   java.lang.Thread.State: RUNNABLE
> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > >>> /EPollSelectorImpl.java:120)
> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > >>> /SelectorImpl.java:124)
> > >>> - locked <0x000000070f079a28> (a
> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
> > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> > >> /SelectorImpl.java:141)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > >>> at
> > >>>
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
> > >> #28
> > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s
> > tid=0x0000000005942800
> > >>> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
> > >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f2acbc0> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > >>> /LockSupport.java:234)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2123)
> > >>> at
> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:513)
> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > >>> /LinkedBlockingDeque.java:675)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > >>> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29
> daemon
> > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> > >>> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
> > >>>   java.lang.Thread.State: WAITING (parking)
> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > >>> - parking to wait for  <0x000000070f2adff8> (a
> > >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > >>> /LockSupport.java:194)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > >>> /AbstractQueuedSynchronizer.java:2081)
> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > >>> /LinkedBlockingQueue.java:433)
> > >>> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
> > >> elapsed=94.30s
> > >>> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
> > >>>   java.lang.Thread.State: RUNNABLE
> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > >>> /EPollSelectorImpl.java:120)
> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > >>> /SelectorImpl.java:124)
> > >>> - locked <0x000000070f2ab868> (a
> > >>> io.netty.channel.nio.SelectedSelectionKeySet)
> > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> > >> /SelectorImpl.java:141)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > >>> at
> > >>>
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > >>> at
> > >>>
> > >>>
> > >>
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> > >>>
> > >>>   Locked ownable synchronizers:
> > >>> - None
> > >>>
> > >>
> >
> >
>

Re: Zookeeper & Netty framework

Posted by Christopher <ct...@apache.org>.
Karthick, I haven't seen this discussed in the Accumulo community. Can you
point me to the conversation there?

On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote:

> Why would they need to be daemon threads?
> I’m not an expert of Java threading, but afaik I/O threads should not be
> daemon threads in most cases.
>
> Also those threads are Netty internal threads, so this question is better
> to be asked in Netty community.
> ZK threads reported in jstack are just waiting for input to send/receive.
> Do you know at which point Accumulo does stuck?
>
> Andor
>
>
>
> > On 2020. Mar 31., at 14:27, karthick rn <ka...@gmail.com>
> wrote:
> >
> > Hi Enrico,
> >
> > Yes, I have already run this through Accumulo folks they have looked at
> the
> > jstack output & advised to check with ZK devs if those 2 threads (#27 &
> > #30) are expected to be non-daemon threads.
> > Also, in this cluster we have wire encryption enabled only for ZK and by
> > disabling it we don't encounter this issue.
> > There are no error messages reported on the ZK server log, below are the
> > INFO messages when running the "accumulo-service master start" command
> >
> > 2020-03-31 12:16:28,626 [myid:2] - INFO
> > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated
> Id
> > 'CN=host2' for Scheme 'x509'
> >
> > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
> > IP>:46332
> >
> > 2020-03-31 12:16:28,676 [myid:2] - INFO
> > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
> > IP>:46332
> >
> > This issue is reproducible everytime I start Accumulo master. Let me know
> > for any further details?
> >
> > Many thanks
> >
> > Regards,
> > Karthick
> >
> >
> >
> >
> >
> >
> >
> > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eo...@gmail.com>
> wrote:
> >
> >> Hi,
> >> Did you check with Accumulo community?
> >> Do you see errors or informational messages in ZK server logs?
> >>
> >> Enrico
> >>
> >> Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com> ha
> >> scritto:
> >>
> >>> Hello dev team,
> >>>
> >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after
> >>> enabling TLS for ZK we noticed that starting Accumulo master service
> >> hangs
> >>> in an intermediate process as shown below and require to kill the
> process
> >>> in-order for Accumulo master to start.
> >>>
> >>> [user1@host1 ~]$ jps -m
> >>>
> >>> 23314 JournalNode
> >>>
> >>> 23011 NameNode
> >>>
> >>> 23539 DFSZKFailoverController
> >>>
> >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
> >>>
> >>> 22590 QuorumPeerMain
> >>>
> >>> 89790 Jps -m
> >>>
> >>>
> >>> [user1@host1 ~]$ *kill -9 84118*
> >>>
> >>> [user1@host1 ~]$ jps -m
> >>>
> >>> 23314 JournalNode
> >>>
> >>> 23011 NameNode
> >>>
> >>> 23539 DFSZKFailoverController
> >>>
> >>> 89892 Jps -m
> >>>
> >>> *89847 Main master*
> >>>
> >>> 22590 QuorumPeerMain
> >>>
> >>> [user1@host1 ~]$
> >>>
> >>> Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
> >>> while the rest are daemon threads. Would like to check with the dev
> team
> >> if
> >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> >>> thoughts on what else might be causing the issue?
> >>> I have copied only a portion of the jstack output, let me know in-case
> >> you
> >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
> >> 3.2.1 &
> >>> Accumulo 2.0. Let me know if you need any further details? Many thanks
> >>>
> >>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
> >> #25
> >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
> >> tid=0x0000000003a10800
> >>> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
> >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> >>> - parking to wait for  <0x000000070f0acf38> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> >>> /LockSupport.java:234)
> >>> at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> >>> /AbstractQueuedSynchronizer.java:2123)
> >>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> >>> /LinkedBlockingDeque.java:513)
> >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> >>> /LinkedBlockingDeque.java:675)
> >>> at
> >>>
> >>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
> >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> >>> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
> >>>   java.lang.Thread.State: WAITING (parking)
> >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> >>> - parking to wait for  <0x000000070f0f8af0> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> >>> /LockSupport.java:194)
> >>> at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> >>> /AbstractQueuedSynchronizer.java:2081)
> >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> >>> /LinkedBlockingQueue.java:433)
> >>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
> >> elapsed=95.28s
> >>> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
> >>>   java.lang.Thread.State: RUNNABLE
> >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> >>> /EPollSelectorImpl.java:120)
> >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> >>> /SelectorImpl.java:124)
> >>> - locked <0x000000070f079a28> (a
> >>> io.netty.channel.nio.SelectedSelectionKeySet)
> >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> >> /SelectorImpl.java:141)
> >>> at
> >>>
> >>>
> >>
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> >>> at
> >>>
> >>>
> >>
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> >>> at
> >>>
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> >>> at
> >>>
> >>>
> >>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
> >> #28
> >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s
> tid=0x0000000005942800
> >>> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
> >>>   java.lang.Thread.State: TIMED_WAITING (parking)
> >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> >>> - parking to wait for  <0x000000070f2acbc0> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> >>> /LockSupport.java:234)
> >>> at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> >>> /AbstractQueuedSynchronizer.java:2123)
> >>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> >>> /LinkedBlockingDeque.java:513)
> >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> >>> /LinkedBlockingDeque.java:675)
> >>> at
> >>>
> >>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
> >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> >>> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
> >>>   java.lang.Thread.State: WAITING (parking)
> >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> >>> - parking to wait for  <0x000000070f2adff8> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> >>> /LockSupport.java:194)
> >>> at
> >>>
> >>>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> >>> /AbstractQueuedSynchronizer.java:2081)
> >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> >>> /LinkedBlockingQueue.java:433)
> >>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
> >> elapsed=94.30s
> >>> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
> >>>   java.lang.Thread.State: RUNNABLE
> >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> >>> /EPollSelectorImpl.java:120)
> >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> >>> /SelectorImpl.java:124)
> >>> - locked <0x000000070f2ab868> (a
> >>> io.netty.channel.nio.SelectedSelectionKeySet)
> >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> >> /SelectorImpl.java:141)
> >>> at
> >>>
> >>>
> >>
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> >>> at
> >>>
> >>>
> >>
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> >>> at
> >>>
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> >>> at
> >>>
> >>>
> >>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >>>
> >>>   Locked ownable synchronizers:
> >>> - None
> >>>
> >>
>
>

Re: Zookeeper & Netty framework

Posted by Andor Molnar <an...@apache.org>.
Why would they need to be daemon threads?
I’m not an expert of Java threading, but afaik I/O threads should not be daemon threads in most cases.

Also those threads are Netty internal threads, so this question is better to be asked in Netty community.
ZK threads reported in jstack are just waiting for input to send/receive. Do you know at which point Accumulo does stuck?

Andor



> On 2020. Mar 31., at 14:27, karthick rn <ka...@gmail.com> wrote:
> 
> Hi Enrico,
> 
> Yes, I have already run this through Accumulo folks they have looked at the
> jstack output & advised to check with ZK devs if those 2 threads (#27 &
> #30) are expected to be non-daemon threads.
> Also, in this cluster we have wire encryption enabled only for ZK and by
> disabling it we don't encounter this issue.
> There are no error messages reported on the ZK server log, below are the
> INFO messages when running the "accumulo-service master start" command
> 
> 2020-03-31 12:16:28,626 [myid:2] - INFO
> [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated Id
> 'CN=host2' for Scheme 'x509'
> 
> 2020-03-31 12:16:28,676 [myid:2] - INFO
> [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
> IP>:46332
> 
> 2020-03-31 12:16:28,676 [myid:2] - INFO
> [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
> IP>:46332
> 
> This issue is reproducible everytime I start Accumulo master. Let me know
> for any further details?
> 
> Many thanks
> 
> Regards,
> Karthick
> 
> 
> 
> 
> 
> 
> 
> On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eo...@gmail.com> wrote:
> 
>> Hi,
>> Did you check with Accumulo community?
>> Do you see errors or informational messages in ZK server logs?
>> 
>> Enrico
>> 
>> Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com> ha
>> scritto:
>> 
>>> Hello dev team,
>>> 
>>> We are using Hadoop, Accumulo & Zookeeper in our environment, after
>>> enabling TLS for ZK we noticed that starting Accumulo master service
>> hangs
>>> in an intermediate process as shown below and require to kill the process
>>> in-order for Accumulo master to start.
>>> 
>>> [user1@host1 ~]$ jps -m
>>> 
>>> 23314 JournalNode
>>> 
>>> 23011 NameNode
>>> 
>>> 23539 DFSZKFailoverController
>>> 
>>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
>>> 
>>> 22590 QuorumPeerMain
>>> 
>>> 89790 Jps -m
>>> 
>>> 
>>> [user1@host1 ~]$ *kill -9 84118*
>>> 
>>> [user1@host1 ~]$ jps -m
>>> 
>>> 23314 JournalNode
>>> 
>>> 23011 NameNode
>>> 
>>> 23539 DFSZKFailoverController
>>> 
>>> 89892 Jps -m
>>> 
>>> *89847 Main master*
>>> 
>>> 22590 QuorumPeerMain
>>> 
>>> [user1@host1 ~]$
>>> 
>>> Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
>>> while the rest are daemon threads. Would like to check with the dev team
>> if
>>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
>>> thoughts on what else might be causing the issue?
>>> I have copied only a portion of the jstack output, let me know in-case
>> you
>>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
>> 3.2.1 &
>>> Accumulo 2.0. Let me know if you need any further details? Many thanks
>>> 
>>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
>> #25
>>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
>> tid=0x0000000003a10800
>>> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
>>>   java.lang.Thread.State: TIMED_WAITING (parking)
>>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>>> - parking to wait for  <0x000000070f0acf38> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
>>> /LockSupport.java:234)
>>> at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
>>> /AbstractQueuedSynchronizer.java:2123)
>>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
>>> /LinkedBlockingDeque.java:513)
>>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
>>> /LinkedBlockingDeque.java:675)
>>> at
>>> 
>>> 
>> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
>>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
>>> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
>>>   java.lang.Thread.State: WAITING (parking)
>>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>>> - parking to wait for  <0x000000070f0f8af0> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
>>> /LockSupport.java:194)
>>> at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
>>> /AbstractQueuedSynchronizer.java:2081)
>>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
>>> /LinkedBlockingQueue.java:433)
>>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
>> elapsed=95.28s
>>> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
>>>   java.lang.Thread.State: RUNNABLE
>>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
>>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
>>> /EPollSelectorImpl.java:120)
>>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
>>> /SelectorImpl.java:124)
>>> - locked <0x000000070f079a28> (a
>>> io.netty.channel.nio.SelectedSelectionKeySet)
>>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
>>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
>> /SelectorImpl.java:141)
>>> at
>>> 
>>> 
>> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>>> at
>>> 
>>> 
>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>> at
>>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>> at
>>> 
>>> 
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
>> #28
>>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s tid=0x0000000005942800
>>> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
>>>   java.lang.Thread.State: TIMED_WAITING (parking)
>>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>>> - parking to wait for  <0x000000070f2acbc0> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
>>> /LockSupport.java:234)
>>> at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
>>> /AbstractQueuedSynchronizer.java:2123)
>>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
>>> /LinkedBlockingDeque.java:513)
>>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
>>> /LinkedBlockingDeque.java:675)
>>> at
>>> 
>>> 
>> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
>>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
>>> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
>>>   java.lang.Thread.State: WAITING (parking)
>>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>>> - parking to wait for  <0x000000070f2adff8> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
>>> /LockSupport.java:194)
>>> at
>>> 
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
>>> /AbstractQueuedSynchronizer.java:2081)
>>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
>>> /LinkedBlockingQueue.java:433)
>>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
>> elapsed=94.30s
>>> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
>>>   java.lang.Thread.State: RUNNABLE
>>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
>>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
>>> /EPollSelectorImpl.java:120)
>>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
>>> /SelectorImpl.java:124)
>>> - locked <0x000000070f2ab868> (a
>>> io.netty.channel.nio.SelectedSelectionKeySet)
>>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
>>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
>> /SelectorImpl.java:141)
>>> at
>>> 
>>> 
>> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>>> at
>>> 
>>> 
>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>> at
>>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>> at
>>> 
>>> 
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>>> 
>>>   Locked ownable synchronizers:
>>> - None
>>> 
>> 


Re: Zookeeper & Netty framework

Posted by karthick rn <ka...@gmail.com>.
Hi Enrico,

Yes, I have already run this through Accumulo folks they have looked at the
jstack output & advised to check with ZK devs if those 2 threads (#27 &
#30) are expected to be non-daemon threads.
Also, in this cluster we have wire encryption enabled only for ZK and by
disabling it we don't encounter this issue.
There are no error messages reported on the ZK server log, below are the
INFO messages when running the "accumulo-service master start" command

2020-03-31 12:16:28,626 [myid:2] - INFO
[nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated Id
'CN=host2' for Scheme 'x509'

2020-03-31 12:16:28,676 [myid:2] - INFO
[nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
IP>:46332

2020-03-31 12:16:28,676 [myid:2] - INFO
[nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
IP>:46332

This issue is reproducible everytime I start Accumulo master. Let me know
for any further details?

Many thanks

Regards,
Karthick







On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eo...@gmail.com> wrote:

> Hi,
> Did you check with Accumulo community?
> Do you see errors or informational messages in ZK server logs?
>
> Enrico
>
> Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com> ha
> scritto:
>
> > Hello dev team,
> >
> > We are using Hadoop, Accumulo & Zookeeper in our environment, after
> > enabling TLS for ZK we noticed that starting Accumulo master service
> hangs
> > in an intermediate process as shown below and require to kill the process
> > in-order for Accumulo master to start.
> >
> > [user1@host1 ~]$ jps -m
> >
> > 23314 JournalNode
> >
> > 23011 NameNode
> >
> > 23539 DFSZKFailoverController
> >
> > *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
> >
> > 22590 QuorumPeerMain
> >
> > 89790 Jps -m
> >
> >
> > [user1@host1 ~]$ *kill -9 84118*
> >
> > [user1@host1 ~]$ jps -m
> >
> > 23314 JournalNode
> >
> > 23011 NameNode
> >
> > 23539 DFSZKFailoverController
> >
> > 89892 Jps -m
> >
> > *89847 Main master*
> >
> > 22590 QuorumPeerMain
> >
> > [user1@host1 ~]$
> >
> > Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
> > while the rest are daemon threads. Would like to check with the dev team
> if
> > "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> > thoughts on what else might be causing the issue?
> > I have copied only a portion of the jstack output, let me know in-case
> you
> > need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
> 3.2.1 &
> > Accumulo 2.0. Let me know if you need any further details? Many thanks
> >
> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
> #25
> > daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
> tid=0x0000000003a10800
> > nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f0acf38> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > /LockSupport.java:234)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2123)
> > at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > /LinkedBlockingDeque.java:513)
> > at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > /LinkedBlockingDeque.java:675)
> > at
> >
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
> > prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> > nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
> >    java.lang.Thread.State: WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f0f8af0> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > /LockSupport.java:194)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2081)
> > at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > /LinkedBlockingQueue.java:433)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
> elapsed=95.28s
> > tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
> >    java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > /EPollSelectorImpl.java:120)
> > at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > /SelectorImpl.java:124)
> > - locked <0x000000070f079a28> (a
> > io.netty.channel.nio.SelectedSelectionKeySet)
> > - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> > at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> /SelectorImpl.java:141)
> > at
> >
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > at
> >
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > at
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > at
> >
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
> #28
> > daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s tid=0x0000000005942800
> > nid=0x16259 waiting on condition  [0x00007f5c7b145000]
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f2acbc0> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > /LockSupport.java:234)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2123)
> > at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > /LinkedBlockingDeque.java:513)
> > at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > /LinkedBlockingDeque.java:675)
> > at
> >
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
> > prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> > nid=0x1625a waiting on condition  [0x00007f5c7b044000]
> >    java.lang.Thread.State: WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f2adff8> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > /LockSupport.java:194)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2081)
> > at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > /LinkedBlockingQueue.java:433)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
> elapsed=94.30s
> > tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
> >    java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > /EPollSelectorImpl.java:120)
> > at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > /SelectorImpl.java:124)
> > - locked <0x000000070f2ab868> (a
> > io.netty.channel.nio.SelectedSelectionKeySet)
> > - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> > at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> /SelectorImpl.java:141)
> > at
> >
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > at
> >
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > at
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > at
> >
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >
> >    Locked ownable synchronizers:
> > - None
> >
>

Re: Zookeeper & Netty framework

Posted by Enrico Olivelli <eo...@gmail.com>.
Hi,
Did you check with Accumulo community?
Do you see errors or informational messages in ZK server logs?

Enrico

Il Mar 31 Mar 2020, 01:12 karthick rn <ka...@gmail.com> ha
scritto:

> Hello dev team,
>
> We are using Hadoop, Accumulo & Zookeeper in our environment, after
> enabling TLS for ZK we noticed that starting Accumulo master service hangs
> in an intermediate process as shown below and require to kill the process
> in-order for Accumulo master to start.
>
> [user1@host1 ~]$ jps -m
>
> 23314 JournalNode
>
> 23011 NameNode
>
> 23539 DFSZKFailoverController
>
> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
>
> 22590 QuorumPeerMain
>
> 89790 Jps -m
>
>
> [user1@host1 ~]$ *kill -9 84118*
>
> [user1@host1 ~]$ jps -m
>
> 23314 JournalNode
>
> 23011 NameNode
>
> 23539 DFSZKFailoverController
>
> 89892 Jps -m
>
> *89847 Main master*
>
> 22590 QuorumPeerMain
>
> [user1@host1 ~]$
>
> Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
> while the rest are daemon threads. Would like to check with the dev team if
> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> thoughts on what else might be causing the issue?
> I have copied only a portion of the jstack output, let me know in-case you
> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop 3.2.1 &
> Accumulo 2.0. Let me know if you need any further details? Many thanks
>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)" #25
> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s tid=0x0000000003a10800
> nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for  <0x000000070f0acf38> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> /LockSupport.java:234)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:2123)
> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> /LinkedBlockingDeque.java:513)
> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> /LinkedBlockingDeque.java:675)
> at
>
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>
>    Locked ownable synchronizers:
> - None
>
> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
>    java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for  <0x000000070f0f8af0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> /LockSupport.java:194)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:2081)
> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> /LinkedBlockingQueue.java:433)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>
>    Locked ownable synchronizers:
> - None
>
> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms elapsed=95.28s
> tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
>    java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> /EPollSelectorImpl.java:120)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> /SelectorImpl.java:124)
> - locked <0x000000070f079a28> (a
> io.netty.channel.nio.SelectedSelectionKeySet)
> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
> at
>
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>
>    Locked ownable synchronizers:
> - None
>
> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)" #28
> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s tid=0x0000000005942800
> nid=0x16259 waiting on condition  [0x00007f5c7b145000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for  <0x000000070f2acbc0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> /LockSupport.java:234)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:2123)
> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> /LinkedBlockingDeque.java:513)
> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> /LinkedBlockingDeque.java:675)
> at
>
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
>
>    Locked ownable synchronizers:
> - None
>
> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> nid=0x1625a waiting on condition  [0x00007f5c7b044000]
>    java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for  <0x000000070f2adff8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> /LockSupport.java:194)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:2081)
> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> /LinkedBlockingQueue.java:433)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>
>    Locked ownable synchronizers:
> - None
>
> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms elapsed=94.30s
> tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
>    java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> /EPollSelectorImpl.java:120)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> /SelectorImpl.java:124)
> - locked <0x000000070f2ab868> (a
> io.netty.channel.nio.SelectedSelectionKeySet)
> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
> at
>
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
>
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
>
>    Locked ownable synchronizers:
> - None
>