You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/07/17 10:43:00 UTC

[jira] [Commented] (DRILL-3751) Query hang when zookeeper is stopped

    [ https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886912#comment-16886912 ] 

ASF GitHub Bot commented on DRILL-3751:
---------------------------------------

vvysotskyi commented on issue #248: DRILL-3751: Reduce zookeeper's retry time to 10
URL: https://github.com/apache/drill/pull/248#issuecomment-512201647
 
 
   @hsuanyi, could you please rework your PR as it was discussed in https://lists.apache.org/thread.html/9e5b2e02453e14d69bd34977ef7f4ae232d56d55d61fae663a0b2b25@1446835382@%3Cdev.drill.apache.org%3E?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Query hang when zookeeper is stopped
> ------------------------------------
>
>                 Key: DRILL-3751
>                 URL: https://issues.apache.org/jira/browse/DRILL-3751
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>             Fix For: Future
>
>
> I see an indefinite hang on sqlline prompt, issue a long running query and then stop zookeeper process when the query is still being executed. Sqlline prompt is never returned and it hangs showing the below stack trace. I am on master.
> Steps to reproduce the problem
> clush -g khurram service mapr-warden stop
> clush -g khurram service mapr-warden start
> Issue long running query from sqlline
> While query is running, stop zookeeper using script.
> To stop zookeeper 
> {code}
> [root@centos-01 bin]# ./zkServer.sh stop
> JMX enabled by default
> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
> Stopping zookeeper ... STOPPED
> {code}
> Issue below long running query from sqlline
> {code}
> ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000;
> ...
> | 7.40907649723E8  | g    |
> | 1.12378007695E9  | d    |
> 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.10.100.201:5181) and timeout (5000) / elapsed (5013)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> 	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
> 	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
> Here is the stack for sqlline process
> {code}
> [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
> 2015-09-05 03:21:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800 nid=0x26e1 waiting on condition [0x00007f8317c2e000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007e2117798> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
> "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995)
> "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16 waiting on condition [0x00007f831751f000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137)
> 	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> 	at java.lang.Thread.run(Thread.java:744)
> "Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable [0x00007f8317620000]
>    java.lang.Thread.State: RUNNABLE
> 	at io.netty.channel.epoll.Native.epollWait0(Native Method)
> 	at io.netty.channel.epoll.Native.epollWait(Native.java:148)
> 	at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180)
> 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:744)
> "ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on condition [0x00007f831792b000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9c658> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting on condition [0x00007f8317a2c000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9ebd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:68)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f waiting on condition [0x00007f8317d2f000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fffb2288> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
> 	at org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208)
> 	at org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42)
> 	at org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0 in Object.wait() [0x00007f83186ab000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278)
> 	- locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at java.lang.Thread.run(Thread.java:744)
> "Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait() [0x00007f832c574000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> 	- locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
> "Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in Object.wait() [0x00007f832c675000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:503)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> 	- locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition [0x00007f837cac2000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000700d3a210> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
> 	at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536)
> 	at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175)
> 	at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
> 	at net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161)
> 	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
> 	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> 	at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> 	at sqlline.SqlLine.print(SqlLine.java:1583)
> 	at sqlline.Commands.execute(Commands.java:852)
> 	at sqlline.Commands.sql(Commands.java:751)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:738)
> 	at sqlline.SqlLine.begin(SqlLine.java:612)
> 	at sqlline.SqlLine.start(SqlLine.java:366)
> 	at sqlline.SqlLine.main(SqlLine.java:259)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)