You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Ed Kohlwey <ek...@gmail.com> on 2012/11/30 18:01:22 UTC

Drill Query Abstraction

Hi,
I was talking to Keys at his presentation to the DC HUG last night and was
excited to hear there is so much work going on in Drill to build good
abstraction mechanisms in.

I had a thought which Keys suggested I share on the mailing list, even
though I'm not likely to have the time to implement it in the near future I
think others might be interested.

There is a proliferation of query planners, job coordinators, execution
engines, metadata discovery, and query optimizers in the Hadoop ecosystem
which I believe to be harmful to Hadoop as a whole.

There are projects such as Cascading and Oozie, as well as query languages
like Hive and Pig, and now Drill and Impala. Each has its own set of
services that do these tasks and each of them does them only moderately
well.

It would be nice to see a project that provides abstraction mechanism,
perhaps an intermediate query "bytecode" language that can be futher
compiled to the appropriate job type based on how data is represented and
what additional frameworks are available to process data. I think Drill is
already embarking on some elements of this and it is something others might
be interested in.

Re: Drill Query Abstraction

Posted by Jacques Nadeau <ja...@gmail.com>.
I think this is exactly the hope.  To me, it is a classic case of giving
domain specific researchers reasonable implementation mechanisms.  If
everything is wrapped together and there are no clear API surfaces only the
original implementor can generate the components.

We have a short write up of our current thinking around our logical plan
syntax which serves the first level of this purpose [1].  The goal being
that we can utilize this vocabulary to allow multiple query language
implementations.  We're also looking at having another clear interface on
the back side at the physical plan level.  Our hope being that these will
allow effective reimplementations and experiments with the planner and/or
execution engines.

[1]
https://docs.google.com/a/maprtech.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit


On Fri, Nov 30, 2012 at 9:01 AM, Ed Kohlwey <ek...@gmail.com> wrote:

> Hi,
> I was talking to Keys at his presentation to the DC HUG last night and was
> excited to hear there is so much work going on in Drill to build good
> abstraction mechanisms in.
>
> I had a thought which Keys suggested I share on the mailing list, even
> though I'm not likely to have the time to implement it in the near future I
> think others might be interested.
>
> There is a proliferation of query planners, job coordinators, execution
> engines, metadata discovery, and query optimizers in the Hadoop ecosystem
> which I believe to be harmful to Hadoop as a whole.
>
> There are projects such as Cascading and Oozie, as well as query languages
> like Hive and Pig, and now Drill and Impala. Each has its own set of
> services that do these tasks and each of them does them only moderately
> well.
>
> It would be nice to see a project that provides abstraction mechanism,
> perhaps an intermediate query "bytecode" language that can be futher
> compiled to the appropriate job type based on how data is represented and
> what additional frameworks are available to process data. I think Drill is
> already embarking on some elements of this and it is something others might
> be interested in.
>

Re: Drill Query Abstraction

Posted by Ted Dunning <te...@gmail.com>.
Ed,

Good to hear from you.

On Fri, Nov 30, 2012 at 9:01 AM, Ed Kohlwey <ek...@gmail.com> wrote:

> ...
> There is a proliferation of query planners, job coordinators, execution
> engines, metadata discovery, and query optimizers in the Hadoop ecosystem
> which I believe to be harmful to Hadoop as a whole.
>

Even worse, there is a proliferation of not very good query planners and
such.


> There are projects such as Cascading and Oozie, as well as query languages
> like Hive and Pig, and now Drill and Impala. Each has its own set of
> services that do these tasks and each of them does them only moderately
> well.
>

Yes.


> It would be nice to see a project that provides abstraction mechanism,
> perhaps an intermediate query "bytecode" language that can be futher
> compiled to the appropriate job type based on how data is represented and
> what additional frameworks are available to process data.


Exactly.


> I think Drill is
> already embarking on some elements of this and it is something others might
> be interested in.
>

Please spread the word.  Drill is very much interested in building solid
planning and execution components and supporting the integration of other
components.  The broad interest in this is exactly why there are quite a
few companies and developers interested.  You mentioned Cascading ... note
that Chris W is a founding committer.

We aren't so much going towards a byte-code per se, but I take your meaning
as more general.  The Drill plan syntax taht Jacques is refining will fill
exactly that niche.