You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Sorabh Hamirwasia (JIRA)" <ji...@apache.org> on 2016/09/28 23:36:20 UTC

[jira] [Commented] (DRILL-3751) Query hang when zookeeper is stopped

    [ https://issues.apache.org/jira/browse/DRILL-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531240#comment-15531240 ] 

Sorabh Hamirwasia commented on DRILL-3751:
------------------------------------------

Hi Khurram,
I was trying to reproduce the scenario (following the steps listed above) to see why the sqlline client hangs using both json and parquet data, but was not able to. The query is executing until completion for me and also the state on WebUI is shown as Completed. Can you please try to reproduce it with latest drill version (I am using locally build 1.9) ? 

About the exceptions seen on sqlline prompt those are expected as part of CuratorFramework threads since they have retry logic to try to connect to Zookeeper.

> Query hang when zookeeper is stopped
> ------------------------------------
>
>                 Key: DRILL-3751
>                 URL: https://issues.apache.org/jira/browse/DRILL-3751
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>             Fix For: Future
>
>
> I see an indefinite hang on sqlline prompt, issue a long running query and then stop zookeeper process when the query is still being executed. Sqlline prompt is never returned and it hangs showing the below stack trace. I am on master.
> Steps to reproduce the problem
> clush -g khurram service mapr-warden stop
> clush -g khurram service mapr-warden start
> Issue long running query from sqlline
> While query is running, stop zookeeper using script.
> To stop zookeeper 
> {code}
> [root@centos-01 bin]# ./zkServer.sh stop
> JMX enabled by default
> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
> Stopping zookeeper ... STOPPED
> {code}
> Issue below long running query from sqlline
> {code}
> ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 8000000;
> ...
> | 7.40907649723E8  | g    |
> | 1.12378007695E9  | d    |
> 03:03:28.482 [CuratorFramework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.10.100.201:5181) and timeout (5000) / elapsed (5013)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> 	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na]
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
> 	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
> Here is the stack for sqlline process
> {code}
> [root@centos-01 bin]# /usr/java/jdk1.7.0_45/bin/jstack 32136
> 2015-09-05 03:21:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x00007f8328003800 nid=0x27f1 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CuratorFramework-0-EventThread" daemon prio=10 tid=0x00000000012fd800 nid=0x26e1 waiting on condition [0x00007f8317c2e000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007e2117798> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
> "CuratorFramework-0-SendThread(centos-01.qa.lab:5181)" daemon prio=10 tid=0x0000000001109800 nid=0x26e0 waiting on condition [0x00007f8317b2d000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:995)
> "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f833043b800 nid=0x7e16 waiting on condition [0x00007f831751f000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137)
> 	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> 	at java.lang.Thread.run(Thread.java:744)
> "Client-1" daemon prio=10 tid=0x00007f8378df7000 nid=0x7e15 runnable [0x00007f8317620000]
>    java.lang.Thread.State: RUNNABLE
> 	at io.netty.channel.epoll.Native.epollWait0(Native Method)
> 	at io.netty.channel.epoll.Native.epollWait(Native.java:148)
> 	at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:180)
> 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:205)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:744)
> "ServiceCache-0" daemon prio=10 tid=0x00007f8378d22000 nid=0x7e13 waiting on condition [0x00007f831792b000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9c658> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "CuratorFramework-0" daemon prio=10 tid=0x00007f8378c95800 nid=0x7e12 waiting on condition [0x00007f8317a2c000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fff9ebd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
> 	at java.util.concurrent.DelayQueue.take(DelayQueue.java:68)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:781)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57)
> 	at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "ConnectionStateManager-0" daemon prio=10 tid=0x00007f8378c60800 nid=0x7e0f waiting on condition [0x00007f8317d2f000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000006fffb2288> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> 	at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
> 	at org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:208)
> 	at org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:42)
> 	at org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:110)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> "NonBlockingInputStreamThread" daemon prio=10 tid=0x00007f8378836000 nid=0x7de0 in Object.wait() [0x00007f83186ab000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at jline.internal.NonBlockingInputStream.run(NonBlockingInputStream.java:278)
> 	- locked <0x00000006fffb2438> (a jline.internal.NonBlockingInputStream)
> 	at java.lang.Thread.run(Thread.java:744)
> "Service Thread" daemon prio=10 tid=0x00007f83780c1000 nid=0x7dcd runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=10 tid=0x00007f83780be800 nid=0x7dcc waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=10 tid=0x00007f83780bb800 nid=0x7dcb waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007f83780b1800 nid=0x7dca runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007f837809a800 nid=0x7dc9 in Object.wait() [0x00007f832c574000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> 	- locked <0x00000006fffb2668> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
> "Reference Handler" daemon prio=10 tid=0x00007f8378091000 nid=0x7dc8 in Object.wait() [0x00007f832c675000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:503)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> 	- locked <0x00000006fffb2700> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007f8378011000 nid=0x7db4 waiting on condition [0x00007f837cac2000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x0000000700d3a210> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
> 	at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl$ResultsListener.getNext(DrillResultSetImpl.java:1536)
> 	at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:175)
> 	at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:320)
> 	at net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
> 	at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:161)
> 	at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
> 	at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> 	at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> 	at sqlline.SqlLine.print(SqlLine.java:1583)
> 	at sqlline.Commands.execute(Commands.java:852)
> 	at sqlline.Commands.sql(Commands.java:751)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:738)
> 	at sqlline.SqlLine.begin(SqlLine.java:612)
> 	at sqlline.SqlLine.start(SqlLine.java:366)
> 	at sqlline.SqlLine.main(SqlLine.java:259)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)