You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/04/01 12:44:38 UTC

[GitHub] [accumulo] karthick-rn opened a new issue #1578: Accumulo master hangs after TLS on ZK

karthick-rn opened a new issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578
 
 
   After enabling TLS for ZK, I noticed that starting Accumulo master hangs in an intermediate process as shown below and require to kill the process in-order for Accumulo master to start. 
   
   `[user1@host1 ~]$ jps -m
   23314 JournalNode
   23011 NameNode
   23539 DFSZKFailoverController
   **84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL**
   22590 QuorumPeerMain 
   89790 Jps -m
   
   [user1@host1 ~]$ **kill -9 84118**
   
   [user1@host1 ~]$ jps -m
   23314 JournalNode
   23011 NameNode
   23539 DFSZKFailoverController
   89892 Jps -m
   89847 **Main master**
   22590 QuorumPeerMain
   [user1@host1 ~]$ `
   
   Below is the jstack output collected during the hang
   
   `2020-04-01 12:16:50
   Full thread dump OpenJDK 64-Bit Server VM (11.0.6+10-LTS mixed mode, sharing):
   
   Threads class SMR info:
   _java_thread_list=0x00000000017945d0, length=18, elements={
   0x0000000000715800, 0x0000000000718000, 0x0000000000726800, 0x0000000000728800,
   0x000000000072b800, 0x0000000000735800, 0x0000000000858000, 0x0000000000869800,
   0x000000000066a800, 0x000000000180f800, 0x00000000018f4000, 0x0000000001d4e000,
   0x0000000001d4f800, 0x0000000001fdf800, 0x0000000001ed8800, 0x0000000003281000,
   0x0000000003282800, 0x00000000019ed000
   }
   
   "Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.28ms elapsed=179.54s tid=0x0000000000715800 nid=0x5f95 waiting on condition  [0x00007fa1630b4000]
      java.lang.Thread.State: RUNNABLE
   	at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.6/Native Method)
   	at java.lang.ref.Reference.processPendingReferences(java.base@11.0.6/Reference.java:241)
   	at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.6/Reference.java:213)
   
      Locked ownable synchronizers:
   	- None
   
   "Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.56ms elapsed=179.54s tid=0x0000000000718000 nid=0x5f96 in Object.wait()  [0x00007fa162fb3000]
      java.lang.Thread.State: WAITING (on object monitor)
   	at java.lang.Object.wait(java.base@11.0.6/Native Method)
   	- waiting on <0x000000070000a2b0> (a java.lang.ref.ReferenceQueue$Lock)
   	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.6/ReferenceQueue.java:155)
   	- waiting to re-lock in wait() <0x000000070000a2b0> (a java.lang.ref.ReferenceQueue$Lock)
   	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.6/ReferenceQueue.java:176)
   	at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.6/Finalizer.java:170)
   
      Locked ownable synchronizers:
   	- None
   
   "Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.46ms elapsed=179.53s tid=0x0000000000726800 nid=0x5f97 runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
   	- None
   
   "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=1496.64ms elapsed=179.53s tid=0x0000000000728800 nid=0x5f98 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
      No compile task
   
      Locked ownable synchronizers:
   	- None
   
   "C1 CompilerThread0" #8 daemon prio=9 os_prio=0 cpu=1079.64ms elapsed=179.53s tid=0x000000000072b800 nid=0x5f99 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
      No compile task
   
      Locked ownable synchronizers:
   	- None
   
   "Sweeper thread" #9 daemon prio=9 os_prio=0 cpu=72.66ms elapsed=179.53s tid=0x0000000000735800 nid=0x5f9a runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
   	- None
   
   "Service Thread" #10 daemon prio=9 os_prio=0 cpu=0.13ms elapsed=179.50s tid=0x0000000000858000 nid=0x5f9b runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
   	- None
   
   "Common-Cleaner" #11 daemon prio=8 os_prio=0 cpu=0.43ms elapsed=179.49s tid=0x0000000000869800 nid=0x5f9d in Object.wait()  [0x00007fa15c381000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
   	at java.lang.Object.wait(java.base@11.0.6/Native Method)
   	- waiting on <0x00000007000a7668> (a java.lang.ref.ReferenceQueue$Lock)
   	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.6/ReferenceQueue.java:155)
   	- waiting to re-lock in wait() <0x00000007000a7668> (a java.lang.ref.ReferenceQueue$Lock)
   	at jdk.internal.ref.CleanerImpl.run(java.base@11.0.6/CleanerImpl.java:148)
   	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
   	at jdk.internal.misc.InnocuousThread.run(java.base@11.0.6/InnocuousThread.java:134)
   
      Locked ownable synchronizers:
   	- None
   
   "DestroyJavaVM" #15 prio=5 os_prio=0 cpu=824.32ms elapsed=178.73s tid=0x000000000066a800 nid=0x5f90 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
   	- None
   
   "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #18 daemon prio=5 os_prio=0 cpu=0.16ms elapsed=178.29s tid=0x000000000180f800 nid=0x5fa5 in Object.wait()  [0x00007fa15b4f3000]
      java.lang.Thread.State: WAITING (on object monitor)
   	at java.lang.Object.wait(java.base@11.0.6/Native Method)
   	- waiting on <0x00000007077c4ab8> (a java.lang.ref.ReferenceQueue$Lock)
   	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.6/ReferenceQueue.java:155)
   	- waiting to re-lock in wait() <0x00000007077c4ab8> (a java.lang.ref.ReferenceQueue$Lock)
   	at java.lang.ref.ReferenceQueue.remove(java.base@11.0.6/ReferenceQueue.java:176)
   	at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3762)
   	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
   
      Locked ownable synchronizers:
   	- None
   
   "client DomainSocketWatcher" #19 daemon prio=5 os_prio=0 cpu=1.85ms elapsed=177.83s tid=0x00000000018f4000 nid=0x5fa6 runnable  [0x00007fa15a7cc000]
      java.lang.Thread.State: RUNNABLE
   	at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
   	at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
   	at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
   	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
   
      Locked ownable synchronizers:
   	- None
   
   "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-0:2281)" #25 daemon prio=5 os_prio=0 cpu=142.24ms elapsed=177.47s tid=0x0000000001d4e000 nid=0x5fab waiting on condition  [0x00007fa159ec3000]
      java.lang.Thread.State: TIMED_WAITING (parking)
   	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
   	- parking to wait for  <0x000000070eddc9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6/AbstractQueuedSynchronizer.java:2123)
   	at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6/LinkedBlockingDeque.java:513)
   	at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6/LinkedBlockingDeque.java:675)
   	at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
   	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
   
      Locked ownable synchronizers:
   	- None
   
   "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon prio=5 os_prio=0 cpu=1.06ms elapsed=177.47s tid=0x0000000001d4f800 nid=0x5fac waiting on condition  [0x00007fa159dc2000]
      java.lang.Thread.State: WAITING (parking)
   	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
   	- parking to wait for  <0x000000070ee25888> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6/LockSupport.java:194)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6/AbstractQueuedSynchronizer.java:2081)
   	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6/LinkedBlockingQueue.java:433)
   	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
   
      Locked ownable synchronizers:
   	- None
   
   "nioEventLoopGroup-2-1" #27 prio=10 os_prio=0 cpu=776.37ms elapsed=177.24s tid=0x0000000001fdf800 nid=0x5fae runnable  [0x00007fa15b917000]
      java.lang.Thread.State: RUNNABLE
   	at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
   	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6/EPollSelectorImpl.java:120)
   	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6/SelectorImpl.java:124)
   	- locked <0x000000070edab1a0> (a io.netty.channel.nio.SelectedSelectionKeySet)
   	- locked <0x000000070ed9edb8> (a sun.nio.ch.EPollSelectorImpl)
   	at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
   	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
   	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
   
      Locked ownable synchronizers:
   	- None
   
   "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-4:2281)" #28 daemon prio=5 os_prio=0 cpu=12.24ms elapsed=176.19s tid=0x0000000001ed8800 nid=0x5fb7 waiting on condition  [0x00007fa159357000]
      java.lang.Thread.State: TIMED_WAITING (parking)
   	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
   	- parking to wait for  <0x000000070efd03b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6/AbstractQueuedSynchronizer.java:2123)
   	at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6/LinkedBlockingDeque.java:513)
   	at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6/LinkedBlockingDeque.java:675)
   	at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
   	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
   
      Locked ownable synchronizers:
   	- None
   
   "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon prio=5 os_prio=0 cpu=0.34ms elapsed=176.19s tid=0x0000000003281000 nid=0x5fb8 waiting on condition  [0x00007fa159256000]
      java.lang.Thread.State: WAITING (parking)
   	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
   	- parking to wait for  <0x000000070efd17e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6/LockSupport.java:194)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6/AbstractQueuedSynchronizer.java:2081)
   	at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6/LinkedBlockingQueue.java:433)
   	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
   
      Locked ownable synchronizers:
   	- None
   
   "nioEventLoopGroup-3-1" #30 prio=10 os_prio=0 cpu=342.16ms elapsed=176.18s tid=0x0000000003282800 nid=0x5fb9 runnable  [0x00007fa159155000]
      java.lang.Thread.State: RUNNABLE
   	at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
   	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6/EPollSelectorImpl.java:120)
   	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6/SelectorImpl.java:124)
   	- locked <0x000000070efcf058> (a io.netty.channel.nio.SelectedSelectionKeySet)
   	- locked <0x000000070efcee30> (a sun.nio.ch.EPollSelectorImpl)
   	at sun.nio.ch.SelectorImpl.select(java.base@11.0.6/SelectorImpl.java:141)
   	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
   	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
   
      Locked ownable synchronizers:
   	- None
   
   "Attach Listener" #31 daemon prio=9 os_prio=0 cpu=0.74ms elapsed=0.14s tid=0x00000000019ed000 nid=0x6153 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
   	- None
   
   "VM Thread" os_prio=0 cpu=154.07ms elapsed=179.54s tid=0x0000000000712800 nid=0x5f94 runnable  
   
   "GC Thread#0" os_prio=0 cpu=28.68ms elapsed=179.56s tid=0x000000000067e800 nid=0x5f91 runnable  
   
   "CMS Main Thread" os_prio=0 cpu=1010.50ms elapsed=179.55s tid=0x00000000006ee800 nid=0x5f93 runnable  
   
   "CMS Thread#0" os_prio=0 cpu=1.06ms elapsed=179.55s tid=0x00000000006ec000 nid=0x5f92 runnable  
   
   "CMS Thread#1" os_prio=0 cpu=1.07ms elapsed=177.50s tid=0x0000000001d34000 nid=0x5faa runnable  
   
   "VM Periodic Task Thread" os_prio=0 cpu=103.05ms elapsed=179.49s tid=0x0000000000873800 nid=0x5f9c waiting on condition  
   
   JNI global refs: 19, weak refs: 0`
   
   I & @keith-turner discussed this issue & looked at the jstack output, noticed there are 2 non-daemon threads (no. 27 & 30). As per the description on this [link](https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setDaemon(boolean)) - JVM exits when the only thread running are all daemon threads. 
   Based on which we wanted to check with ZK dev team if those 2 threads are expected to be non-daemon threads. 
   
   Let me know if anyone else noticed this issue?
   I'm using ZK 3.5.7, Accumulo 2.0, Hadoop 3.2.1. 
   
   Thanks
   Karthick

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609907620
 
 
   > Also, if you can explain the specific steps you took to configure TLS on ZK, so we can reproduce it, that could be helpful to test in different environments.
   
   **Steps to configure TLS on ZK:**
   a) Generate certificates & keystores:
   1) Run the below commands on each hosts & generate a '.crt' file per host
   ```
   keytool -genkeypair -alias $(hostname -f) -keyalg RSA -keysize 2048 -dname "cn=$(hostname -f)" -keypass changeit -keystore keystore.jks -storepass changeit
   keytool -exportcert -alias $(hostname -f) -keystore keystore.jks -file $(hostname -f).crt -rfc -storepass changeit
   ```
   2) Copy the '*.crt' file generated on each hosts to host1 and generate the truststore.jks as shown below 
   ```
   for i in `ls *.crt`; do
       name=$(echo $i | sed 's/\.crt//g')
       keytool -importcert -alias $name -file $name.crt -keystore truststore.jks -storepass changeit -noprompt
   done
   ```
   3) Copy "truststore.jks" to all the hosts
   ```
   for i in `cat host_list`; do 
       scp truststore.jks $i:/path/to/truststore/; 
   done
   ```
   where `host_list` is a file that contains fqdn of all hosts
   
   4) Verify the contents of the truststore.jks  & ensure it contains all the hosts in the cluster
   `keytool -list -v -keystore truststore.jks`
   
   b) Configurations:
   
   1) Update the server & quorum configs on $ZOOKEEPER_HOME/conf/zoo.cfg
   ```
   # Server configuration
   secureClientPort=2281
   serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
   
   # Quorum configuration
   sslQuorum=true
   ssl.quorum.keyStore.location=/path/to/keystore.jks
   ssl.quorum.keyStore.password=changeit
   ssl.quorum.trustStore.location=/path/to/truststore.jks
   ssl.quorum.trustStore.password=changeit
   
   # the port at which the clients will connect
   #clientPort=2181 (Comment or remove the insecure client port)
   ```
   
   2) Update the client & server configs on $ZOOKEEPER_HOME/bin/zkEnv.sh
   ```
   SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   
   CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
     -Dzookeeper.client.secure=true \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   ```
   c) Testing
   1) Start Zookeeper service on the hosts running ZK
   `$ZOOKEEPER_HOME/bin/zkServer.sh start`
   
   2) The following messages in the ZK log confirms the ensemble is running on TLS
   ```
   INFO  [main:QuorumPeer@1779] - Using TLS encrypted quorum communication
   INFO  [main:QuorumPeer@1787] - Port unification disabled
   INFO  [QuorumPeerListener:QuorumCnxManager$Listener@894] - Creating TLS-only quorum server socket
   ```
   **Reference:**
   https://zookeeper.apache.org/doc/r3.5.7/zookeeperAdmin.html#Quorum+TLS
   https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-611640060
 
 
   Following the changes done on `SingletonManager` & `SetGoalState` Java class, the patch was tested cluster wide & we are able to successfully start all Accumulo services including the master service without any hiccups. However, stopping the services exhibit similar hang and @keith-turner helped me with the code change on `ZooZap` & `Admin` Java class to fix those. With all these changes in-place, we're now able to start/stop Accumulo successfully. The next step would be to land these changes to the master branch.
   On a side note, noticed the below warning message appear on the console during start & stop of services. May be I'll raise a separate issue to track what is causing this. Thanks!
   `2020-04-09 16:09:02,562 [zookeeper.ZooCache] WARN : Unhandled: WatchedEvent state:Closed type:None path:null` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] billierinaldi commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
billierinaldi commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609897677
 
 
   I helped Karthick take a look at this. It seems like SetGoalState is using a ServerContext with SingletonReservation.noop(), so maybe that means that ZooKeeper is not automatically being closed. Should SetGoalState be using a client reservation instead?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610453403
 
 
   @keith-turner I have initiated an email thread with ZK dev team but not JIRA. You can find the discussion [here](https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202004.mbox/browser) with subject "Zookeeper & Netty framework". Feel free to comment your thoughts. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610500976
 
 
   @karthick-rn BTW, thanks for your very excellent, and detailed guide for reproducing! I haven't had a chance to try it, but your guide left me with zero questions as to how to proceed. :smiley_cat: Once this is all figured out and working, that would make an excellent blog post (and possibly an integration test for accumulo-minicluster).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609935522
 
 
   > That's the kind of thing I was thinking of being at the center of this. I don't know if closing `ServerContext` will achieve the desired result, though and clean up the ZooKeeper threads.
   
   This could be a separate issue with this utility.  I currently don't know enough TLS configuration to debug this deeper, since there seems to be an issue with it in the Master

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610614780
 
 
   I & @keith-turner made changes to `SingletonManager` & `SetGoalState` Java class as described [here](https://github.com/apache/accumulo/issues/1578#issuecomment-610425219) & did a quick test with the new jars. When we started Accumulo master it now starts successfully without hang. I'll put these new jars cluster wide and perform some more tests and keep you all posted. Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn edited a comment on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn edited a comment on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609907620
 
 
   > Also, if you can explain the specific steps you took to configure TLS on ZK, so we can reproduce it, that could be helpful to test in different environments.
   
   **Steps to configure TLS on ZK:**
   a) Generate certificates & keystores:
   1) Run the below commands on each hosts & generate a '.crt' file per host
   ```
   keytool -genkeypair -alias $(hostname -f) -keyalg RSA -keysize 2048 -dname "cn=$(hostname -f)" -keypass changeit -keystore keystore.jks -storepass changeit
   keytool -exportcert -alias $(hostname -f) -keystore keystore.jks -file $(hostname -f).crt -rfc -storepass changeit
   ```
   2) Copy the '*.crt' file generated on each hosts to host1 and generate the truststore.jks as shown below 
   ```
   for i in `ls *.crt`; do
       name=$(echo $i | sed 's/\.crt//g')
       keytool -importcert -alias $name -file $name.crt -keystore truststore.jks -storepass changeit -noprompt
   done
   ```
   3) Copy "truststore.jks" to all the hosts
   ```
   for i in `cat host_list`; do 
       scp truststore.jks $i:/path/to/truststore/; 
   done
   ```
   where `host_list` is a file that contains fqdn of all hosts
   
   4) Verify the contents of the truststore.jks  & ensure it contains all the hosts in the cluster
   `keytool -list -v -keystore truststore.jks`
   
   b) Configurations:
   
   1) Update the server & quorum configs on $ZOOKEEPER_HOME/conf/zoo.cfg
   ```
   # Server configuration
   secureClientPort=2281
   serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
   
   # Quorum configuration
   sslQuorum=true
   ssl.quorum.keyStore.location=/path/to/keystore.jks
   ssl.quorum.keyStore.password=changeit
   ssl.quorum.trustStore.location=/path/to/truststore.jks
   ssl.quorum.trustStore.password=changeit
   
   # the port at which the clients will connect
   #clientPort=2181 (Comment or remove the insecure client port)
   ```
   
   2) Update the client & server configs on $ZOOKEEPER_HOME/bin/zkEnv.sh
   ```
   SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   
   CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
     -Dzookeeper.client.secure=true \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   ```
   c) Testing
   1) Start Zookeeper service on the hosts running ZK
   `$ZOOKEEPER_HOME/bin/zkServer.sh start`
   
   2) The following messages in the ZK log confirms the ensemble is running on TLS
   ```
   INFO  [main:QuorumPeer@1779] - Using TLS encrypted quorum communication
   INFO  [main:QuorumPeer@1787] - Port unification disabled
   INFO  [QuorumPeerListener:QuorumCnxManager$Listener@894] - Creating TLS-only quorum server socket
   ```
   
   Accumulo (client connection)
   
   The following changes are required to $ACCUMULO_HOME/conf/accumulo-env.sh
   <pre>
   CLASSPATH="${conf}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:<b>${ZOOKEEPER_HOME}/lib/"
   
   CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
     -Dzookeeper.client.secure=true \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   </b>
   JAVA_OPTS=("${JAVA_OPTS[@]}"
     "-Daccumulo.log.dir=${ACCUMULO_LOG_DIR}"
     "-Daccumulo.application=${cmd}${ACCUMULO_SERVICE_INSTANCE}_$(hostname)"
     "-Daccumulo.metrics.service.instance=${ACCUMULO_SERVICE_INSTANCE}" <b>$CLIENT_JVMFLAGS </b>)
   </pre>
   
   Also, replace the ZK port from `2181` to `2281` on `accumulo.properties` & `accumulo-client.properties`
   
   
   **Reference:**
   https://zookeeper.apache.org/doc/r3.5.7/zookeeperAdmin.html#Quorum+TLS
   https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610576095
 
 
   > @keith-turner Do you suspect this is a ZK issue and not an Accumulo issue on how it is using ZK?
   
   Accumulo probably should be closing ZK.  I suspected that ZK would want all of their threads to be daemon threads because many of their threads are and daemon is not the default.  In the ZK mailing list thread that @karthick-rn linked to someone said it was a netty thread, so not under ZKs control.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii edited a comment on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609924738
 
 
   > > I helped Karthick take a look at this. It seems like SetGoalState is using a ServerContext with SingletonReservation.noop(), so maybe that means that ZooKeeper is not automatically being closed. Should SetGoalState be using a client reservation instead?
   
   @keith-turner is the expert on the reservation stuff. I think most of that was done to try to internally manage resources when using Connector vs. the more lifecycle-oriented AccumuloClient (which can clean itself up when it closes). I'm not sure how it applies here.
   
   > 
   > I noticed that SetGoalState is the only server utility using ServerContext that doesn't have a try-with-resources to automatically call close when finished. I think this might be a bug with SetGoalState
   
   That's the kind of thing I was thinking of being at the center of this. I don't know if closing `ServerContext` will achieve the desired result, though and clean up the ZooKeeper threads.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609920935
 
 
   > I helped Karthick take a look at this. It seems like SetGoalState is using a ServerContext with SingletonReservation.noop(), so maybe that means that ZooKeeper is not automatically being closed. Should SetGoalState be using a client reservation instead?
   
   I noticed that SetGoalState is the only server utility using ServerContext that doesn't have a try-with-resources to automatically call close when finished.  I think this might be a bug with SetGoalState

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609924738
 
 
   > > I helped Karthick take a look at this. It seems like SetGoalState is using a ServerContext with SingletonReservation.noop(), so maybe that means that ZooKeeper is not automatically being closed. Should SetGoalState be using a client reservation instead?
   
   @keith-turner is the expert on the reservation stuff. I think most of that was done to try to internally manage resources when using Connector vs. the more lifecycle-oriented AccumuloClient. I'm not sure how it applies here.
   
   > 
   > I noticed that SetGoalState is the only server utility using ServerContext that doesn't have a try-with-resources to automatically call close when finished. I think this might be a bug with SetGoalState
   
   That's the kind of thing I was thinking of being at the center of this. I don't know if closing `ServerContext` will achieve the desired result, though and clean up the ZooKeeper threads.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner edited a comment on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610425219
 
 
   The try-with-resources for the ServerContext will not work because the SingletonManager never closes anything for servers currently.  Creating a ServerContext puts the SingletonManager in server mode.  One possible solution to this is to add a new Mode.CLOSED for SingletonManager.  Then the SetGoalState could do something like the following.
   
   ```java
   main() {
     try {
       // current code in main
     } finally {
       // force stop of all singletons and render future attempts to get a singleton inoperable.
       SingletonManager.setMode(Mode.CLOSED);
     }
   }
   ```
   
   I think it would also be good to open a zookeeper issue in their Jira.  While not closing ZK may be an Accumulo issue, I strongly suspect that ZK would like all background threads to be daemon threads.  So they may want to know they are creating non-deamon threads when using TLS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610498375
 
 
   @keith-turner Do you suspect this is a ZK issue and not an Accumulo issue on how it is using ZK?
   In order to create a ZK issue in their JIRA, it would be nice to describe the behavior of ZK in terms of expected vs. actual. Currently, the discussion is centered around the behavior of Accumulo. To create a ZK issue, I think we'd want to be able to reproduce and describe the issue for an arbitrary ZK client, independently of Accumulo's behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn edited a comment on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn edited a comment on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609907620
 
 
   > Also, if you can explain the specific steps you took to configure TLS on ZK, so we can reproduce it, that could be helpful to test in different environments.
   
   **Steps to configure TLS on ZK:**
   a) Generate certificates & keystores:
   1) Run the below commands on each hosts & generate a '.crt' file per host
   ```
   keytool -genkeypair -alias $(hostname -f) -keyalg RSA -keysize 2048 -dname "cn=$(hostname -f)" -keypass changeit -keystore keystore.jks -storepass changeit
   keytool -exportcert -alias $(hostname -f) -keystore keystore.jks -file $(hostname -f).crt -rfc -storepass changeit
   ```
   2) Copy the '*.crt' file generated on each hosts to host1 and generate the truststore.jks as shown below 
   ```
   for i in `ls *.crt`; do
       name=$(echo $i | sed 's/\.crt//g')
       keytool -importcert -alias $name -file $name.crt -keystore truststore.jks -storepass changeit -noprompt
   done
   ```
   3) Copy "truststore.jks" to all the hosts
   ```
   for i in `cat host_list`; do 
       scp truststore.jks $i:/path/to/truststore/; 
   done
   ```
   where `host_list` is a file that contains fqdn of all hosts
   
   4) Verify the contents of the truststore.jks  & ensure it contains all the hosts in the cluster
   `keytool -list -v -keystore truststore.jks`
   
   b) Configurations:
   
   1) Update the server & quorum configs on $ZOOKEEPER_HOME/conf/zoo.cfg
   ```
   # Server configuration
   secureClientPort=2281
   serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
   
   # Quorum configuration
   sslQuorum=true
   ssl.quorum.keyStore.location=/path/to/keystore.jks
   ssl.quorum.keyStore.password=changeit
   ssl.quorum.trustStore.location=/path/to/truststore.jks
   ssl.quorum.trustStore.password=changeit
   
   # the port at which the clients will connect
   #clientPort=2181 (Comment or remove the insecure client port)
   ```
   
   2) Update the client & server configs on $ZOOKEEPER_HOME/bin/zkEnv.sh
   ```
   SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   
   CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
     -Dzookeeper.client.secure=true \
     -Dzookeeper.ssl.keyStore.location=/path/to/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=changeit \
     -Dzookeeper.ssl.trustStore.location=/path/to/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=changeit"
   ```
   c) Testing
   1) Start Zookeeper service on the hosts running ZK
   `$ZOOKEEPER_HOME/bin/zkServer.sh start`
   
   2) The following messages in the ZK log confirms the ensemble is running on TLS
   ```
   INFO  [main:QuorumPeer@1779] - Using TLS encrypted quorum communication
   INFO  [main:QuorumPeer@1787] - Port unification disabled
   INFO  [QuorumPeerListener:QuorumCnxManager$Listener@894] - Creating TLS-only quorum server socket
   ```
   
   Accumulo (client connection)
   
   The following changes are required to $ACCUMULO_HOME/conf/accumulo-env.sh
   <pre>
   CLASSPATH="${conf}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:<b>${ZOOKEEPER_HOME}/lib/"
   
   CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
     -Dzookeeper.client.secure=true \
     -Dzookeeper.ssl.keyStore.location=$ZOOKEEPER_HOME/conf/ssl/keystore.jks \
     -Dzookeeper.ssl.keyStore.password=hadoop \
     -Dzookeeper.ssl.trustStore.location=$ZOOKEEPER_HOME/conf/ssl/truststore.jks \
     -Dzookeeper.ssl.trustStore.password=hadoop"
   </b>
   JAVA_OPTS=("${JAVA_OPTS[@]}"
     "-Daccumulo.log.dir=${ACCUMULO_LOG_DIR}"
     "-Daccumulo.application=${cmd}${ACCUMULO_SERVICE_INSTANCE}_$(hostname)"
     "-Daccumulo.metrics.service.instance=${ACCUMULO_SERVICE_INSTANCE}" <b>$CLIENT_JVMFLAGS </b>)
   </pre>
   
   Also, replace the ZK port from `2181` to `2281` on `accumulo.properties` & `accumulo-client.properties`
   
   
   **Reference:**
   https://zookeeper.apache.org/doc/r3.5.7/zookeeperAdmin.html#Quorum+TLS
   https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-607416857
 
 
   @karthick-rn Have you looked at the SetGoalState to verify that Accumulo isn't leaving any ZK objects unclosed?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609854769
 
 
   Also, if you can explain the specific steps you took to configure TLS on ZK, so we can reproduce it, that could be helpful to test in different environments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609853981
 
 
   @karthick-rn I'm not sure you answered my question about the `SetGoalState` Java class. In order to update the node in ZK, that class needs to create a connection to ZooKeeper, and issue the command to update the node. What I'm wondering is if the ZooKeeper connection object used for this is properly closed with a `.close()` call that might clean up the extra threads you saw. I'm just checking to see if you've ruled out this possibility of an unclosed ZooKeeper connection object in the Java code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii closed issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
ctubbsii closed issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610425219
 
 
   The try-with-resources for the ServerContext will not work because the SingletonManager never closes anything for servers currently.  Creating a ServerContext puts the SingletonManager in server mode.  One possible solution to this is to add a new Mode.CLOSED for SingletonManager.  Then the SetGoalState could do something like the following.
   
   ```java
   main() {
     try {
       // current code in main
     } finally {
       // force stop of all singletons and render future attempts to get a singleton inoperable.
       SingletonManager.setMode(Mode.CLOSED);
     }
   ```
   
   I think it would also be good to open a zookeeper issue in their Jira.  While not closing ZK may be an Accumulo issue, I strongly suspect that ZK would like all background threads to be daemon threads.  So they may want to know they are creating non-deamon threads when using TLS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-610285913
 
 
   I attempted try-with-resources for the ServerContext as shown below and still seeing the Accumulo master hangs.
   ```
   try (ServerContext context = new ServerContext(new SiteConfiguration())) {
   ..
   ..
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK

Posted by GitBox <gi...@apache.org>.
karthick-rn commented on issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578#issuecomment-609731131
 
 
   > @karthick-rn Have you looked at the SetGoalState to verify that Accumulo isn't leaving any ZK objects unclosed?
   
   @ctubbsii Sorry for the delay. There is a 'goal_state' node that exist in ZK before starting Accumulo master and the value is 'NORMAL' as shown below. 
   This 'goal_state' does exist on a non-TLS ZK cluster as well where Accumulo master doesn't hang. Fyi
   
   Before starting Accumulo master:
   ```
   [zk: kn-fix-0:2281(CONNECTED) 2] get -s -w /accumulo/7a7a53a5-077b-4506-a674-a010a273ba5b/masters/goal_state
   NORMAL
   cZxid = 0x10000006a
   ctime = Tue Mar 24 16:38:32 UTC 2020
   mZxid = 0x500000015
   mtime = Fri Apr 03 15:50:49 UTC 2020
   pZxid = 0x10000006a
   cversion = 0
   dataVersion = 39
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 6
   numChildren = 0
   [zk: kn-fix-0:2281(CONNECTED) 3] 
   WATCHER::
   ```
   Trying to start Accumulo master overwrites the 'goal_state' which can be noticed through the change in "mtime" as shown below 
   
   ```
   WatchedEvent state:SyncConnected type:NodeDataChanged path:/accumulo/7a7a53a5-077b-4506-a674-a010a273ba5b/masters/goal_state
   
   [zk: kn-fix-0:2281(CONNECTED) 3] get -s -w /accumulo/7a7a53a5-077b-4506-a674-a010a273ba5b/masters/goal_state
   NORMAL
   cZxid = 0x10000006a
   ctime = Tue Mar 24 16:38:32 UTC 2020
   mZxid = 0x50000b7f6
   mtime = Mon Apr 06 09:31:43 UTC 2020
   pZxid = 0x10000006a
   cversion = 0
   dataVersion = 40
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 6
   numChildren = 0
   ```
   Fyi, below is the console output when starting Accumulo master. The hang happens exactly after "Connected to HDFS" and the last line is a result of killing that intermediate process.
   
   ```
   [knarendran@kn-fix-0 ~]$ accumulo-service master start
   Starting master on kn-fix-0
   OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in [jar:file:/opt/muchos/install/accumulo-2.0.0/lib/slf4j-log4j12-1.7.26.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in [jar:file:/opt/muchos/install/apache-zookeeper-3.5.7-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   2020-04-06 09:31:41,210 [conf.SiteConfiguration] INFO : Found Accumulo configuration on classpath at /opt/muchos/install/accumulo-2.0.0/conf/accumulo.properties
   2020-04-06 09:31:42,095 [conf.ConfigurationTypeHelper] DEBUG: Loaded class : org.apache.accumulo.server.fs.RandomVolumeChooser
   2020-04-06 09:31:43,417 [server.ServerUtil] INFO : Attempting to talk to zookeeper
   2020-04-06 09:31:43,939 [server.ServerUtil] INFO : ZooKeeper connected and initialized, attempting to talk to HDFS
   2020-04-06 09:31:43,958 [server.ServerUtil] INFO : Connected to HDFS
   /opt/muchos/install/accumulo-2.0.0/bin/accumulo-service: line 57: 116969 Killed                  "${bin}/accumulo" org.apache.accumulo.master.state.SetGoalState NORMAL
   ```
   
   Let me know if there is any further checks you want me to perform?
   
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services