You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/09/14 21:52:20 UTC

[jira] [Updated] (DRILL-3751) Query hang when zookeeper is stopped

     [ https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-3751:
-------------------------------
    Assignee: Sorabh Hamirwasia  (was: Sean Hsuan-Yi Chu)

> Query hang when zookeeper is stopped
> ------------------------------------
>
>                 Key: DRILL-3751
>                 URL: https://issues.apache.org/jira/browse/DRILL-3751
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Sorabh Hamirwasia
>            Priority: Critical
>             Fix For: Future
>
>
> I see an indefinite hang on sqlline prompt, issue a long running query and then stop zookeeper process when the query is still being executed. Sqlline prompt is never returned and it hangs showing the below stack trace. I am on master.
> Steps to reproduce the problem
> clush -g khurram service mapr-warden stop
> clush -g khurram service mapr-warden start
> Issue long running query from sqlline
> While query is running, stop zookeeper using script.
> To stop zookeeper 
> {code}
> [root@centos-01 bin]# ./zkServer.sh stop
> JMX enabled by default
> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
> Stopping zookeeper ... STOPPED
> {code}
> Issue below long running query from sqlline
> {code}
> ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000;
> ...
> | 7.40907649723E8  | g    |
> | 1.12378007695E9  | d    |
> 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.10.100.201:5181) and timeout (5000) / elapsed (5013)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> 	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
> 	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
> Here is the stack for sqlline process
> {code}
> [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
> 2015-09-05 03:21:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800 nid=0x26e1 waiting on condition [0x00007f8317c2e000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007e2117798> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
> "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995)
> "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16 waiting on condition [0x00007f831751f000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137)
> 	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> 	at java.lang.Thread.run(Thread.java:744)
> "Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable [0x00007f8317620000]
>    java.lang.Thread.State: RUNNABLE
> 	at io.netty.channel.epoll.Native.epollWait0(Native Method)
> 	at io.netty.channel.epoll.Native.epollWait(Native.java:148)
> 	at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180)
> 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:744)
> "ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on condition [0x00007f831792b000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9c658> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting on condition [0x00007f8317a2c000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9ebd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:68)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f waiting on condition [0x00007f8317d2f000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fffb2288> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
> 	at org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208)
> 	at org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42)
> 	at org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0 in Object.wait() [0x00007f83186ab000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278)
> 	- locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at java.lang.Thread.run(Thread.java:744)
> "Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait() [0x00007f832c574000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> 	- locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
> "Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in Object.wait() [0x00007f832c675000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:503)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> 	- locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition [0x00007f837cac2000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000700d3a210> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
> 	at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536)
> 	at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175)
> 	at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
> 	at net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161)
> 	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
> 	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> 	at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> 	at sqlline.SqlLine.print(SqlLine.java:1583)
> 	at sqlline.Commands.execute(Commands.java:852)
> 	at sqlline.Commands.sql(Commands.java:751)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:738)
> 	at sqlline.SqlLine.begin(SqlLine.java:612)
> 	at sqlline.SqlLine.start(SqlLine.java:366)
> 	at sqlline.SqlLine.main(SqlLine.java:259)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)