You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Chris Drawater <ch...@jdsu.com> on 2014/10/27 12:33:14 UTC

Still unable to run a Distributed Drll Query...

We have 3 * Ubuntu 14.04 VM nodes each running a single 0.6 Drillbit , with 
1 of the VMs also running a Zookeeper 3.4.6 instance.

Each VM has an identical data directory structure on local persistent 
filesystem and hosts JSON files.

Zookeeper is aware of the 3 * Drillbits and 'select * from sys.drillbits' 
shows 3 Drillbits.

UDP Multicast for the VM subnet is supposedly enabled.



Using  Squirrl/JDBC driver on a Windows we can connect to Drill (vai the 
zookeeper instance) and run SQL queries against JSON files.

However the queries are only returning rows from the 'foreman' drillbit.



We have never managed to run a distributed query !



Indeed looking at the Drillbit logs we see :



Not a hint of any awareneess of other Drillbits

No 'fragments' mentioned in any plan apart from 'fragment_id : 0'



but we do see this (Client connected,via Zookeeper on Node 1,  to  Drillbit 
on Node 3) : 



2014-10-27 10:42:48,914 [dbc13c29-bbd8-4890-93fa-a8a98f4cc8fd:frag:0:0] 
ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context received 
failure.

java.lang.RuntimeException: Failure while accessing Zookeeper

        at org.apache.drill.exec.store.sys.zk.ZkPStore.put
(ZkPStore.java:111) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubating]

        at org.apache.drill.exec.work.foreman.QueryStatus.updateCache
(QueryStatus.java:125) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryStatus.update
(QueryStatus.java:119) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.updateStatus
(QueryManager.java:173) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.finished
(QueryManager.java:189) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate
(QueryManager.java:162) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.foreman.QueryManager$RootStatusHandler.statusChan
ge(QueryManager.java:284) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.finished
(AbstractStatusReporter.java:101) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.stateChanged
(AbstractStatusReporter.java:73) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.fragment.FragmentExecutor.updateState
(FragmentExecutor.java:172) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
(FragmentExecutor.java:110) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
(WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubating]

        at java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1145) [na:1.7.0_65]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615) [na:1.7.0_65]

        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

Caused by: java.lang.InterruptedException: null

        at java.lang.Object.wait(Native Method) ~[na:1.7.0_65]

        at java.lang.Object.wait(Object.java:503) ~[na:1.7.0_65]

        at org.apache.zookeeper.ClientCnxn.submitRequest
(ClientCnxn.java:1309) ~[zookeeper-3.4.5.jar:3.4.5-1392090]

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) ~
[zookeeper-3.4.5.jar:3.4.5-1392090]

        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
(ExistsBuilderImpl.java:172) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
(ExistsBuilderImpl.java:161) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~
[curator-client-2.5.0.jar:na]

        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground
(ExistsBuilderImpl.java:157) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
(ExistsBuilderImpl.java:148) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
(ExistsBuilderImpl.java:36) ~[curator-framework-2.5.0.jar:na]

        at org.apache.drill.exec.store.sys.zk.ZkPStore.put
(ZkPStore.java:104) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubating]

        ... 14 common frames omitted



2014-10-27 10:42:48,926 [dbc13c29-bbd8-4890-93fa-a8a98f4cc8fd:frag:0:0] 
ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 6f41051f-af65-4be8-9cce-
fe1895643d70: Failure while running fragment.

java.lang.InterruptedException: null

        at java.lang.Object.wait(Native Method) ~[na:1.7.0_65]

        at java.lang.Object.wait(Object.java:503) ~[na:1.7.0_65]

        at org.apache.zookeeper.ClientCnxn.submitRequest
(ClientCnxn.java:1309) ~[zookeeper-3.4.5.jar:3.4.5-1392090]

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) ~
[zookeeper-3.4.5.jar:3.4.5-1392090]

        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
(ExistsBuilderImpl.java:172) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
(ExistsBuilderImpl.java:161) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~
[curator-client-2.5.0.jar:na]

        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground
(ExistsBuilderImpl.java:157) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
(ExistsBuilderImpl.java:148) ~[curator-framework-2.5.0.jar:na]

        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
(ExistsBuilderImpl.java:36) ~[curator-framework-2.5.0.jar:na]

        at org.apache.drill.exec.store.sys.zk.ZkPStore.put
(ZkPStore.java:104) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubating]

        at org.apache.drill.exec.work.foreman.QueryStatus.updateCache
(QueryStatus.java:125) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryStatus.update
(QueryStatus.java:119) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.updateStatus
(QueryManager.java:173) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.finished
(QueryManager.java:189) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate
(QueryManager.java:162) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.foreman.QueryManager$RootStatusHandler.statusChan
ge(QueryManager.java:284) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.finished
(AbstractStatusReporter.java:101) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.stateChanged
(AbstractStatusReporter.java:73) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.fragment.FragmentExecutor.updateState
(FragmentExecutor.java:172) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
(FragmentExecutor.java:110) ~[drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incubating]

        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
(WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubating]

        at java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1145) [na:1.7.0_65]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615) [na:1.7.0_65]

        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]



Does anyone have any ideas or pointers regarding this ? 



Also, I have a few questions...



1. When do the Drillbits become 'aware' of each other ?

2. Is there any Drill tracing that can switched on to reveal the   (lack 
of) communication between the Drillbits ?



Any help once again gratefully received.



Thanks,

    Chris






Re: Still unable to run a Distributed Drll Query...

Posted by Ted Dunning <te...@gmail.com>.
I may have missed it as it went by, but what was the evidence that the zk quorum actually includes all the zookeeper nodes?  This could be answered by examination if the logs, but more definitive and simpler might be to configure to use only one zk node instead of three. 

The rationale here is that if difference drillbits talked to different zk modes they could well have not known about each other.  

Sent from my iPhone

> On Oct 27, 2014, at 10:26, Chris Drawater <ch...@jdsu.com> wrote:
> 
> Ramana Inukonda <ri...@...> writes:
> 
> 
> 
> 
>> Could you look at the zookeeper logs and see if there is any information
> 
>> there? Zookeeper logs should be at zk install location/ logs. There should
> 
>> be two files. A .log and .out. Please check both.
> 
> 
>> Regards
> 
>> Ramana
> 
> 
> 
> 
> Thanks  Ramana.
> 
> 
> 
> We've now isolated our 3 * VMs onto their own private network...
> 
> 
> 
> Now we see the following in the DrillBit.log :
> 
> 
> 
> 
> 
> 2014-10-27 15:34:37,461 [d80e5b2c-3658-47ff-be30-fe884475feab:frag:0:0] 
> WARN  o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send 
> complete.
> 
> java.lang.InterruptedException: null
> 
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterru
> ptibly(AbstractQueuedSynchronizer.java:996) ~[na:1.7.0_65]
> 
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterrupt
> ibly(AbstractQueuedSynchronizer.java:1303) ~[na:1.7.0_65]
> 
>        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~
> [na:1.7.0_65]
> 
>        at 
> org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete
> (SendingAccountor.java:44) ~[drill-java-exec-0.6.0-incubating-rebuffe
> 
> d.jar:0.6.0-incubating]
> 
>        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop
> (ScreenCreator.java:186) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.
> 
> 0-incubating]
> 
>        at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources
> (FragmentExecutor.java:134) [drill-java-exec-0.6.0-incubating-rebuffed.
> 
> jar:0.6.0-incubating]
> 
>        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
> (FragmentExecutor.java:109) [drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incu
> 
> bating]
> 
>        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
> (WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubat
> 
> ing]
> 
>        at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1145) [na:1.7.0_65]
> 
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:615) [na:1.7.0_65]
> 
>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> 
> 
> 
> but no corresponding errors in the Zookeeper logs...
> 
> 
> 
> Chris
> 
> 
> 
> 
> 
> 
> 

Re: Still unable to run a Distributed Drll Query...

Posted by Ramana Inukonda <ri...@maprtech.com>.
Can you please share the query and type of data?
Are you able to query the sample drill data bundled with drill to verify
your setup?

https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes#ApacheDrillin10Minutes-QuerySampleData

You can copy the sample files mentioned to the distributed file system


Regards
Ramana


On Mon, Oct 27, 2014 at 11:03 AM, Ted Dunning <te...@gmail.com> wrote:

>
> Argh.  Just reread the thread and see that you have a single zk node
> exactly as I suggested.
>
> Never mind.  Drill team is ahead of me several steps.
>
> Sent from my iPhone
>
> > On Oct 27, 2014, at 10:26, Chris Drawater <ch...@jdsu.com>
> wrote:
> >
> > Ramana Inukonda <ri...@...> writes:
> >
> >
> >
> >
> >> Could you look at the zookeeper logs and see if there is any information
> >
> >> there? Zookeeper logs should be at zk install location/ logs. There
> should
> >
> >> be two files. A .log and .out. Please check both.
> >
> >
> >> Regards
> >
> >> Ramana
> >
> >
> >
> >
> > Thanks  Ramana.
> >
> >
> >
> > We've now isolated our 3 * VMs onto their own private network...
> >
> >
> >
> > Now we see the following in the DrillBit.log :
> >
> >
> >
> >
> >
> > 2014-10-27 15:34:37,461 [d80e5b2c-3658-47ff-be30-fe884475feab:frag:0:0]
> > WARN  o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send
> > complete.
> >
> > java.lang.InterruptedException: null
> >
> >        at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterru
> > ptibly(AbstractQueuedSynchronizer.java:996) ~[na:1.7.0_65]
> >
> >        at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterrupt
> > ibly(AbstractQueuedSynchronizer.java:1303) ~[na:1.7.0_65]
> >
> >        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~
> > [na:1.7.0_65]
> >
> >        at
> > org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete
> > (SendingAccountor.java:44) ~[drill-java-exec-0.6.0-incubating-rebuffe
> >
> > d.jar:0.6.0-incubating]
> >
> >        at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop
> > (ScreenCreator.java:186)
> [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.
> >
> > 0-incubating]
> >
> >        at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources
> > (FragmentExecutor.java:134) [drill-java-exec-0.6.0-incubating-rebuffed.
> >
> > jar:0.6.0-incubating]
> >
> >        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
> > (FragmentExecutor.java:109) [drill-java-exec-0.6.0-incubating-
> > rebuffed.jar:0.6.0-incu
> >
> > bating]
> >
> >        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
> > (WorkManager.java:250)
> [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> > incubat
> >
> > ing]
> >
> >        at java.util.concurrent.ThreadPoolExecutor.runWorker
> > (ThreadPoolExecutor.java:1145) [na:1.7.0_65]
> >
> >        at java.util.concurrent.ThreadPoolExecutor$Worker.run
> > (ThreadPoolExecutor.java:615) [na:1.7.0_65]
> >
> >        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> >
> >
> >
> > but no corresponding errors in the Zookeeper logs...
> >
> >
> >
> > Chris
> >
> >
> >
> >
> >
> >
> >
>

Re: Still unable to run a Distributed Drll Query...

Posted by Ted Dunning <te...@gmail.com>.
Argh.  Just reread the thread and see that you have a single zk node exactly as I suggested. 

Never mind.  Drill team is ahead of me several steps.  

Sent from my iPhone

> On Oct 27, 2014, at 10:26, Chris Drawater <ch...@jdsu.com> wrote:
> 
> Ramana Inukonda <ri...@...> writes:
> 
> 
> 
> 
>> Could you look at the zookeeper logs and see if there is any information
> 
>> there? Zookeeper logs should be at zk install location/ logs. There should
> 
>> be two files. A .log and .out. Please check both.
> 
> 
>> Regards
> 
>> Ramana
> 
> 
> 
> 
> Thanks  Ramana.
> 
> 
> 
> We've now isolated our 3 * VMs onto their own private network...
> 
> 
> 
> Now we see the following in the DrillBit.log :
> 
> 
> 
> 
> 
> 2014-10-27 15:34:37,461 [d80e5b2c-3658-47ff-be30-fe884475feab:frag:0:0] 
> WARN  o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send 
> complete.
> 
> java.lang.InterruptedException: null
> 
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterru
> ptibly(AbstractQueuedSynchronizer.java:996) ~[na:1.7.0_65]
> 
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterrupt
> ibly(AbstractQueuedSynchronizer.java:1303) ~[na:1.7.0_65]
> 
>        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~
> [na:1.7.0_65]
> 
>        at 
> org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete
> (SendingAccountor.java:44) ~[drill-java-exec-0.6.0-incubating-rebuffe
> 
> d.jar:0.6.0-incubating]
> 
>        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop
> (ScreenCreator.java:186) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.
> 
> 0-incubating]
> 
>        at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources
> (FragmentExecutor.java:134) [drill-java-exec-0.6.0-incubating-rebuffed.
> 
> jar:0.6.0-incubating]
> 
>        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
> (FragmentExecutor.java:109) [drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incu
> 
> bating]
> 
>        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
> (WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubat
> 
> ing]
> 
>        at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1145) [na:1.7.0_65]
> 
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:615) [na:1.7.0_65]
> 
>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> 
> 
> 
> but no corresponding errors in the Zookeeper logs...
> 
> 
> 
> Chris
> 
> 
> 
> 
> 
> 
> 

Re: Still unable to run a Distributed Drll Query...

Posted by Chris Drawater <ch...@jdsu.com>.
Ramana Inukonda <ri...@...> writes:



> 

> Could you look at the zookeeper logs and see if there is any information

> there? Zookeeper logs should be at zk install location/ logs. There should

> be two files. A .log and .out. Please check both.

> 

> Regards

> Ramana

> 



Thanks  Ramana.



We've now isolated our 3 * VMs onto their own private network...



Now we see the following in the DrillBit.log :





2014-10-27 15:34:37,461 [d80e5b2c-3658-47ff-be30-fe884475feab:frag:0:0] 
WARN  o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send 
complete.

java.lang.InterruptedException: null

        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterru
ptibly(AbstractQueuedSynchronizer.java:996) ~[na:1.7.0_65]

        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterrupt
ibly(AbstractQueuedSynchronizer.java:1303) ~[na:1.7.0_65]

        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~
[na:1.7.0_65]

        at 
org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete
(SendingAccountor.java:44) ~[drill-java-exec-0.6.0-incubating-rebuffe

d.jar:0.6.0-incubating]

        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop
(ScreenCreator.java:186) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.

0-incubating]

        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources
(FragmentExecutor.java:134) [drill-java-exec-0.6.0-incubating-rebuffed.

jar:0.6.0-incubating]

        at org.apache.drill.exec.work.fragment.FragmentExecutor.run
(FragmentExecutor.java:109) [drill-java-exec-0.6.0-incubating-
rebuffed.jar:0.6.0-incu

bating]

        at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
(WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
incubat

ing]

        at java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1145) [na:1.7.0_65]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615) [na:1.7.0_65]

        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]



but no corresponding errors in the Zookeeper logs...



Chris








Re: Still unable to run a Distributed Drll Query...

Posted by Ramana Inukonda <ri...@maprtech.com>.
Could you look at the zookeeper logs and see if there is any information
there? Zookeeper logs should be at zk install location/ logs. There should
be two files. A .log and .out. Please check both.

Regards
Ramana

On Monday, October 27, 2014, Chris Drawater <ch...@jdsu.com> wrote:

> We have 3 * Ubuntu 14.04 VM nodes each running a single 0.6 Drillbit , with
> 1 of the VMs also running a Zookeeper 3.4.6 instance.
>
> Each VM has an identical data directory structure on local persistent
> filesystem and hosts JSON files.
>
> Zookeeper is aware of the 3 * Drillbits and 'select * from sys.drillbits'
> shows 3 Drillbits.
>
> UDP Multicast for the VM subnet is supposedly enabled.
>
>
>
> Using  Squirrl/JDBC driver on a Windows we can connect to Drill (vai the
> zookeeper instance) and run SQL queries against JSON files.
>
> However the queries are only returning rows from the 'foreman' drillbit.
>
>
>
> We have never managed to run a distributed query !
>
>
>
> Indeed looking at the Drillbit logs we see :
>
>
>
> Not a hint of any awareneess of other Drillbits
>
> No 'fragments' mentioned in any plan apart from 'fragment_id : 0'
>
>
>
> but we do see this (Client connected,via Zookeeper on Node 1,  to  Drillbit
> on Node 3) :
>
>
>
> 2014-10-27 10:42:48,914 [dbc13c29-bbd8-4890-93fa-a8a98f4cc8fd:frag:0:0]
> ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context received
> failure.
>
> java.lang.RuntimeException: Failure while accessing Zookeeper
>
>         at org.apache.drill.exec.store.sys.zk.ZkPStore.put
> (ZkPStore.java:111) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryStatus.updateCache
> (QueryStatus.java:125) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryStatus.update
> (QueryStatus.java:119) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.updateStatus
> (QueryManager.java:173) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.finished
> (QueryManager.java:189) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate
> (QueryManager.java:162) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
>
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusHandler.statusChan
> ge(QueryManager.java:284) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.finished
> (AbstractStatusReporter.java:101) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.stateChanged
> (AbstractStatusReporter.java:73) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.updateState
> (FragmentExecutor.java:172) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run
> (FragmentExecutor.java:110) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
> (WorkManager.java:250)
> [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubating]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1145) [na:1.7.0_65]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:615) [na:1.7.0_65]
>
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
> Caused by: java.lang.InterruptedException: null
>
>         at java.lang.Object.wait(Native Method) ~[na:1.7.0_65]
>
>         at java.lang.Object.wait(Object.java:503) ~[na:1.7.0_65]
>
>         at org.apache.zookeeper.ClientCnxn.submitRequest
> (ClientCnxn.java:1309) ~[zookeeper-3.4.5.jar:3.4.5-1392090]
>
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) ~
> [zookeeper-3.4.5.jar:3.4.5-1392090]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
> (ExistsBuilderImpl.java:172) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
> (ExistsBuilderImpl.java:161) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~
> [curator-client-2.5.0.jar:na]
>
>         at
> org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground
> (ExistsBuilderImpl.java:157) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
> (ExistsBuilderImpl.java:148) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
> (ExistsBuilderImpl.java:36) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.drill.exec.store.sys.zk.ZkPStore.put
> (ZkPStore.java:104) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubating]
>
>         ... 14 common frames omitted
>
>
>
> 2014-10-27 10:42:48,926 [dbc13c29-bbd8-4890-93fa-a8a98f4cc8fd:frag:0:0]
> ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 6f41051f-af65-4be8-9cce-
> fe1895643d70: Failure while running fragment.
>
> java.lang.InterruptedException: null
>
>         at java.lang.Object.wait(Native Method) ~[na:1.7.0_65]
>
>         at java.lang.Object.wait(Object.java:503) ~[na:1.7.0_65]
>
>         at org.apache.zookeeper.ClientCnxn.submitRequest
> (ClientCnxn.java:1309) ~[zookeeper-3.4.5.jar:3.4.5-1392090]
>
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) ~
> [zookeeper-3.4.5.jar:3.4.5-1392090]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
> (ExistsBuilderImpl.java:172) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call
> (ExistsBuilderImpl.java:161) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~
> [curator-client-2.5.0.jar:na]
>
>         at
> org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground
> (ExistsBuilderImpl.java:157) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
> (ExistsBuilderImpl.java:148) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath
> (ExistsBuilderImpl.java:36) ~[curator-framework-2.5.0.jar:na]
>
>         at org.apache.drill.exec.store.sys.zk.ZkPStore.put
> (ZkPStore.java:104) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryStatus.updateCache
> (QueryStatus.java:125) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryStatus.update
> (QueryStatus.java:119) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.updateStatus
> (QueryManager.java:173) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.finished
> (QueryManager.java:189) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate
> (QueryManager.java:162) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
>
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusHandler.statusChan
> ge(QueryManager.java:284) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.finished
> (AbstractStatusReporter.java:101) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.stateChanged
> (AbstractStatusReporter.java:73) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.updateState
> (FragmentExecutor.java:172) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run
> (FragmentExecutor.java:110) ~[drill-java-exec-0.6.0-incubating-
> rebuffed.jar:0.6.0-incubating]
>
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run
> (WorkManager.java:250)
> [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-
> incubating]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1145) [na:1.7.0_65]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:615) [na:1.7.0_65]
>
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>
>
>
> Does anyone have any ideas or pointers regarding this ?
>
>
>
> Also, I have a few questions...
>
>
>
> 1. When do the Drillbits become 'aware' of each other ?
>
> 2. Is there any Drill tracing that can switched on to reveal the   (lack
> of) communication between the Drillbits ?
>
>
>
> Any help once again gratefully received.
>
>
>
> Thanks,
>
>     Chris
>
>
>
>
>
>