You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "kulkarni.swarnim@gmail.com" <ku...@gmail.com> on 2012/10/18 00:26:13 UTC

Drill query parser and Hive

Hello,

For past few days I have been reading about drill and it surely seems like
a very interesting project. However, as far as I understood, the query
syntax seems to be very similar to what Apache Hive uses. I was wondering
why we couldn't really take the hive query analyzer as a starting point and
try to refactor it according to our needs? Hive also uses the ANTLR based
parser to parse the queries and generate a Abstract Syntax Tree for the
queries. I understand that the part where the AST gets converted to M/R
tasks would be different, but seems like the first part might be reusable.

Surely seems like I am missing something here.

Thanks,
-- 
Swarnim

Re: Drill development

Posted by Ted Dunning <te...@gmail.com>.

Go for it.

Raw numbers for various execution options will be very helpful soon.

On Wed, Oct 17, 2012 at 8:15 PM, Philip Haynes <
philip.haynes@virtualnation.com.au> wrote:

> So to execute the above, my steps were:
> A) create a PB structure and then injector which reads the Wikipedia data
> sets and converts it into PB format, and reports other ingestion
> information to aid subsequent reads.
> B) Write a C++ program that loads the above data set into supersonic and
> tests out various supersonic queries  and their performance.
> C) Do above with a java infrastructure to enable comparative performance.
> D) Write a LLVM interpreter with interpretive script fragments to execute
> the above.
>
> If this of interest, then please let me know.
>

Re: Drill development

Posted by Ted Dunning <te...@gmail.com>.

That ratio should obviously have been most lacking / what is hard for you.

Oops.

On Wed, Oct 17, 2012 at 4:54 PM, Ted Dunning <te...@gmail.com> wrote:

> Step 1 is to subscribe to the mailing list by sending email to:
> drill-dev-subscribe@incubator.apache.org
>
> Step 2 is to look at what little documentation exists (mostly slide shows
> and the beginning of the spec on the physical and execution plan languages)
> and possibly some of the mailing list archives.  Take note of things that
> look very right or very wrong.
>
> Step 3 is to decide what you think is most lacking / what you most can do.
>  Whatever has the highest ratio is what you should start with.  Jump on it!
>
>
> On Wed, Oct 17, 2012 at 4:33 PM, Christopher Bartos <
> bartosenator@gmail.com> wrote:
>
>> Hey,
>>
>> I've been interested in Drill for awhile. I was looking at Big Data / Big
>> Query for some time
>> for my job and that's when I stumbled upon Drill. Now that some
>> development is underway
>> I would love to contribute.
>>
>> I work as a programmer and web developer with a CS degree and in dire
>> need of a side project
>> to work on that interests me.
>>
>> How does one get started? Since, development seems to have just begun,
>> there is probably
>> not a whole lot I'd be able to do. But, maybe there is.
>>
>> Please let me know!
>>
>> Thanks,
>>
>> --
>> Christopher Bartos
>> Columbus, OH
>> (330) 324-0018
>>
>>
>

Re: Drill development

Posted by Ted Dunning <te...@gmail.com>.

Step 1 is to subscribe to the mailing list by sending email to:
drill-dev-subscribe@incubator.apache.org

Step 2 is to look at what little documentation exists (mostly slide shows
and the beginning of the spec on the physical and execution plan languages)
and possibly some of the mailing list archives.  Take note of things that
look very right or very wrong.

Step 3 is to decide what you think is most lacking / what you most can do.
 Whatever has the highest ratio is what you should start with.  Jump on it!

On Wed, Oct 17, 2012 at 4:33 PM, Christopher Bartos
<ba...@gmail.com>wrote:

> Hey,
>
> I've been interested in Drill for awhile. I was looking at Big Data / Big
> Query for some time
> for my job and that's when I stumbled upon Drill. Now that some
> development is underway
> I would love to contribute.
>
> I work as a programmer and web developer with a CS degree and in dire need
> of a side project
> to work on that interests me.
>
> How does one get started? Since, development seems to have just begun,
> there is probably
> not a whole lot I'd be able to do. But, maybe there is.
>
> Please let me know!
>
> Thanks,
>
> --
> Christopher Bartos
> Columbus, OH
> (330) 324-0018
>
>

Re: Drill development

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Oct 21, 2012 at 9:19 AM, Ashwin Aravind <as...@gmail.com>wrote:

> I had a query with respect to query execution plan of Drill.
> Since Drill will not be using map reduce and the query execution flow will
> be that of normal databases (except that schema is  nested and storage is
> column oriented). How different will be query plan to that of relational
> DBs?
>

Well, the big difference is that Drill is oriented around the full table
scan of a single driving table with the minor exception of hash-joins
against small tables.

This makes the query planning massively simpler.

Also is there some reference which anyone could share - to convert Query to
> Query plan which might be useful in this case?
>

I don't. Perhaps somebody else does.

Re: Drill development

Posted by Ashwin Aravind <as...@gmail.com>.

Hi All-
I had a query with respect to query execution plan of Drill.
Since Drill will not be using map reduce and the query execution flow will
be that of normal databases (except that schema is  nested and storage is
column oriented). How different will be query plan to that of relational
DBs?

Also is there some reference which anyone could share - to convert Query to
Query plan which might be useful in this case?

Regards
Ashwin

Re: Drill development

Posted by Philip Haynes <ph...@virtualnation.com.au>.

Hey Christopher,

Not sure what floats your boat and the 100% relevance to the Drill project
(as this is for others to decide), but this is what I am doing. If you
could help out or at least keep me honest (my baby twins do distract) I
would appreciate it.

I am thinking that creating a LLVM interpreter that the Drill parser could
plug into could be quite a straight forward task (using Clang to help
figuring out the assembly). I find new Google supersonic is very
interesting and am keen to test out its performance and if it proves
adequate, kicking queries off via interpretive requests as above.

Now the Tables in supersonic seem to have Protocol Buffer (PB) like
structures. Having worked with PB a bit, it seems that one could inspect a
PB dataset to automagically populate supersonic tables that could then be
queried. Now I need a relevant data set to test this out.

Google's BigQuery team gave a presentation here last week. They used a
data set from Wikipedia
(https://developers.google.com/bigquery/docs/dataset-wikipedia). I thought
this might be a good test case for Drill and comparative performance data
would be useful no matter which way the project technically evolves.

So to execute the above, my steps were:
A) create a PB structure and then injector which reads the Wikipedia data
sets and converts it into PB format, and reports other ingestion
information to aid subsequent reads.
B) Write a C++ program that loads the above data set into supersonic and
tests out various supersonic queries  and their performance.
C) Do above with a java infrastructure to enable comparative performance.
D) Write a LLVM interpreter with interpretive script fragments to execute
the above.

If this of interest, then please let me know.

Cheers
Philip

On 18/10/12 9:33 AM, "Christopher Bartos" <ba...@gmail.com> wrote:

>Hey,
>
>I've been interested in Drill for awhile. I was looking at Big Data / Big
>Query for some time
>for my job and that's when I stumbled upon Drill. Now that some
>development is underway
>I would love to contribute.
>
>I work as a programmer and web developer with a CS degree and in dire
>need of a side project
>to work on that interests me.
>
>How does one get started? Since, development seems to have just begun,
>there is probably
>not a whole lot I'd be able to do. But, maybe there is.
>
>Please let me know!
>
>Thanks,
>
>--
>Christopher Bartos
>Columbus, OH
>(330) 324-0018
>

Drill development

Posted by Christopher Bartos <ba...@gmail.com>.

Hey,

I've been interested in Drill for awhile. I was looking at Big Data / Big Query for some time 
for my job and that's when I stumbled upon Drill. Now that some development is underway
I would love to contribute.

I work as a programmer and web developer with a CS degree and in dire need of a side project 
to work on that interests me.

How does one get started? Since, development seems to have just begun, there is probably
not a whole lot I'd be able to do. But, maybe there is.

Please let me know!

Thanks,

--
Christopher Bartos
Columbus, OH
(330) 324-0018

Re: Drill query parser and Hive

Posted by Ted Dunning <te...@gmail.com>.

I don't think you are off-base at all.

We (the current contributors) are targeting Dremel as a first language
syntax because it has a number of convenient features for the analysis of
nested data.  But having Dremel syntax be a first target doesn't mean that
other targets are out of scope at all.

Building a parser that understands a subset of what Hive can do would be
worthy goal as well.  It just isn't what the current contributors are
working on.  That doesn't mean that a Hive compatible parser is bad, just
that it isn't what those folks are doing.  If you would like to do it, it
would be a great way to start contributing.  And if you start generating
compatible intermediate language before the Dremel parser does, so be it.

The general tenor of the Drill project is to enable alternatives, not
fixate on single implementations.  That is the motive for having a textual
intermediate language and that is the goal for separating parsing,
optimizing and executing steps so strenuously.

On Wed, Oct 17, 2012 at 3:26 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Hello,
>
> For past few days I have been reading about drill and it surely seems like
> a very interesting project. However, as far as I understood, the query
> syntax seems to be very similar to what Apache Hive uses. I was wondering
> why we couldn't really take the hive query analyzer as a starting point and
> try to refactor it according to our needs? Hive also uses the ANTLR based
> parser to parse the queries and generate a Abstract Syntax Tree for the
> queries. I understand that the part where the AST gets converted to M/R
> tasks would be different, but seems like the first part might be reusable.
>
> Surely seems like I am missing something here.
>
> Thanks,
> --
> Swarnim
>