You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Chris Drawater <Ch...@jdsu.com> on 2014/10/22 11:38:35 UTC

Distributed SQL on Drill question

Hi,

We have started to evaluate Apache Drill.

Using  multiple nodes/VMs  we wish to stream JSON data (via Apache Storm) to  persistent local filesystem (with a consistent dir structure so we can define various storage plugins) and then use Drill to run distributed SQL queries across these JSON files.
If at all possible we don't wish to install Hadoop HDFS/Hbase/Hive.


SQL execution would be hopefully be via ODBC and/or JDBC.

So far , embedded drill experiments work just great but the data processed is local to the sqlline.
Using SQL against JSON works really well.

Unfortunately,  experiments to run distributed SQL queries have (so far) not been successful.
We have tried

*         a 3 node VM based system  with a 3 node zookeeper quorum + 1 drillbit per node

*         a 3 node VM based system  with a single zookeeper instance (covering all 3 nodes)   + again 1 drillbit per node
and 'select * from sys.drillbits' from sqlline outputs all 3 drill bits in both  case.   Likewise, zookeeper confirms the existence of the drill cluster.
But we've not managed to run a distributed query.

We have tested against both the 0.5 release and a 0.6 build on 64 bit Ubuntu 14.04.

So far, we can only connect via ODBC to a specific drillbit.
We cannot get the JDBC driver (using squirrel) to work.
Also ODBC to a zookeeper quorum  doesn't appear to work.  But we have tested client access using telnet IP_ADDR 2181  and that's OK.

So my questions are :


*         Would we expect to be able to run a distr SQL query by connecting  via JDBC direct to specific drillbit ?

*         Would we expect to be able to run a distr SQL query by connecting  via ODBC direct to specific drillbit ?

*         Would we expect to be able to run a distr SQL query by connecting  via JDBC via a zookeeper quorum connection ?

*         Would we expect to be able to run a distr SQL query by connecting  via ODBC via a zookeeper quorum connection ?

*         How can we identify the use of multiple nodes due to a distributed  SQL query    via explain plan output or the JSON QEP ?

*         Any ideas or issues why we can't connect via the a zookeeper quorum connection ?

Any help or insights you can give would be most appreciated.

Thanks.
   Chris

Chris Drawater
Database Architect
[Description: AriesoA-JDSU-Mobility-Solution_logo 300px wide]<http://www.arieso.com/>
Office +44 1635 232470  |  Fax +44 1635 232471
Email chris.drawater@jdsu.com<ma...@jdsu.com>  |  Web www.arieso.com<http://www.arieso.com/>


Re: Distributed SQL on Drill question

Posted by Timothy Chen <tn...@gmail.com>.
Hi Chris,

It should work as you described, so can you tell us what error message
you're getting?

Tim

On Wed, Oct 22, 2014 at 2:38 AM, Chris Drawater <Ch...@jdsu.com>
wrote:

>
>
> Hi,
>
>
>
> We have started to evaluate Apache Drill.
>
>
>
> Using  multiple nodes/VMs  we wish to stream JSON data (via Apache Storm)
> to  persistent local filesystem (with a consistent dir structure so we can
> define various storage plugins) and then use Drill to run distributed SQL
> queries across these JSON files.
>
> If at all possible we don't wish to install Hadoop HDFS/Hbase/Hive.
>
>
>
>
>
> SQL execution would be hopefully be via ODBC and/or JDBC.
>
>
>
> So far , embedded drill experiments work just great but the data processed
> is local to the sqlline.
>
> Using SQL against JSON works really well.
>
>
>
> Unfortunately,  experiments to run distributed SQL queries have (so far)
> not been successful.
>
> We have tried
>
> ·         a 3 node VM based system  with a 3 node zookeeper quorum + 1
> drillbit per node
>
> ·         a 3 node VM based system  with a single zookeeper instance
> (covering all 3 nodes)   + again 1 drillbit per node
>
> and 'select * from sys.drillbits' from sqlline outputs all 3 drill bits in
> both  case.   *Likewise, zookeeper confirms the existence of the drill
> cluster.*
>
> But we’ve not managed to run a distributed query.
>
>
>
> We have tested against both the 0.5 release and a 0.6 build on 64 bit
> Ubuntu 14.04.
>
>
>
> So far, we can only connect via ODBC to a specific drillbit.
>
> We cannot get the JDBC driver (using squirrel) to work.
>
> Also ODBC to a zookeeper quorum  doesn't appear to work.  * But we have
> tested client access using telnet IP_ADDR 2181  and that's OK.*
>
>
>
> So my questions are :
>
>
>
> ·         Would we expect to be able to run a distr SQL query by
> connecting  via JDBC direct to specific drillbit ?
>
> ·         Would we expect to be able to run a distr SQL query by
> connecting  via ODBC direct to specific drillbit ?
>
> ·         Would we expect to be able to run a distr SQL query by
> connecting  via JDBC via a zookeeper quorum connection ?
>
> ·         Would we expect to be able to run a distr SQL query by
> connecting  via ODBC via a zookeeper quorum connection ?
>
> ·         How can we identify the use of multiple nodes due to a
> distributed  SQL query    via explain plan output or the JSON QEP ?
>
> ·         Any ideas or issues why we can't connect via the a zookeeper
> quorum connection ?
>
>
>
> Any help or insights you can give would be most appreciated.
>
>
>
> Thanks.
>
>    Chris
>
>
>
> Chris Drawater
>
> Database Architect
>
> *[image: Description: AriesoA-JDSU-Mobility-Solution_logo 300px wide]*
> <http://www.arieso.com/>
>
> Office +44 1635 232470  |  Fax +44 1635 232471
>
> Email chris.drawater@jdsu.com  |  Web www.arieso.com
>
>
>