You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Jacques Nadeau <ja...@gmail.com> on 2012/12/06 19:36:05 UTC

What do you want out of Apache Drill?

Hello all,

It would be extremely helpful if we built a community list of target user
flows for Apache Drill.  Having a clear set of targets like this will make
it substantially easier to validate our deign decisions and implementations
along the way.   Think of this as an equivalent activity to defining the
classic Word Count (Hadoop, Pig), Northwind query example (MSSQL), hello
world (...), etc.  To that end, please spend some time sharing your use
cases on this thread.  As part of this, please let us know whether these
are hypothetical or real.

Once created, getting community comments across each user flow will help to
define what features are most useful to which groups of users.   However,
if people are critical of early uses cases that are shared, others will be
less likely to contribute.  As such, I'd prefer that this sharing exercise
be more about expanding the possibility space rather than prioritizing it.

While comments like "Faster Hive" or "more compliant SQL" are useful, it
would be more helpful if you spent more time describing the particular data
flows that you run today (including data sources), what your pain points
are and things that you can't do today but would like to do.

I'd also love to have someone raise their hand to collate this data and
continue to drive collection and later efforts around this.

So, What do you want Apache Drill to help you with?

Thanks,
Jacques

RE: What do you want out of Apache Drill?

Posted by Andrew Psaltis <An...@Webtrends.com>.

Yes, in the spirit as Camuel presented in his Apache Drill presentation -- executable script

-----Original Message-----
From: Timothy Chen [mailto:tnachen@gmail.com] 
Sent: Thursday, December 06, 2012 5:13 PM
To: drill-dev@incubator.apache.org
Cc: drill-dev@incubator.apache.org; drill-user@incubator.apache.org
Subject: Re: What do you want out of Apache Drill?

So to be specific you mean the ability to translate drql into a spark execution plan?

Tim

Sent from my iPad

On Dec 6, 2012, at 1:33 PM, Andrew Psaltis <An...@Webtrends.com> wrote:

>> So, What do you want Apache Drill to help you with?
> 
> I also want what Julian wants in respect to #1 (A SQL interface in addition to DrQL interface), but Santa I also would like the following:
> 
> 1.  An ability to generate a plan for a Map Reduce framework, more specifically to generate a plan that can be executed by Spark.  That plan can start with the stack at a low level that then we could walk and do the "correct" thing in Spark.
> 2. A follow on to #2 would be to generate the Java or Scala code that could be used to drive then generation of a Spark RDD
> 
> I realize that this is perhaps outside of the scope of Drill, but I would be willing and ready to work on making this gift a become reality.
> 
> Thanks,
> Andrew
> 
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 
> -----Original Message-----
> From: Julian Hyde [mailto:julianhyde@gmail.com] 
> Sent: Thursday, December 06, 2012 12:44 PM
> To: drill-dev@incubator.apache.org
> Cc: drill-user@incubator.apache.org
> Subject: Re: What do you want out of Apache Drill?
> 
> On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:
> 
>> So, What do you want Apache Drill to help you with?
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 1 A SQL interface (in addition to DrQL interface)
> 2 JDBC driver
> 3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
> 4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)
> 
> 1+2 so that I can run Mondrian on Drill.
> 3 so that I can use Optiq to combine Drill data with data from other systems.
> 4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan
> 
> I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)
> 
> Julian

Re: What do you want out of Apache Drill?

Posted by Timothy Chen <tn...@gmail.com>.

So to be specific you mean the ability to translate drql into a spark execution plan?

Tim

Sent from my iPad

On Dec 6, 2012, at 1:33 PM, Andrew Psaltis <An...@Webtrends.com> wrote:

>> So, What do you want Apache Drill to help you with?
> 
> I also want what Julian wants in respect to #1 (A SQL interface in addition to DrQL interface), but Santa I also would like the following:
> 
> 1.  An ability to generate a plan for a Map Reduce framework, more specifically to generate a plan that can be executed by Spark.  That plan can start with the stack at a low level that then we could walk and do the "correct" thing in Spark.
> 2. A follow on to #2 would be to generate the Java or Scala code that could be used to drive then generation of a Spark RDD
> 
> I realize that this is perhaps outside of the scope of Drill, but I would be willing and ready to work on making this gift a become reality.
> 
> Thanks,
> Andrew
> 
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 
> -----Original Message-----
> From: Julian Hyde [mailto:julianhyde@gmail.com] 
> Sent: Thursday, December 06, 2012 12:44 PM
> To: drill-dev@incubator.apache.org
> Cc: drill-user@incubator.apache.org
> Subject: Re: What do you want out of Apache Drill?
> 
> On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:
> 
>> So, What do you want Apache Drill to help you with?
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 1 A SQL interface (in addition to DrQL interface)
> 2 JDBC driver
> 3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
> 4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)
> 
> 1+2 so that I can run Mondrian on Drill.
> 3 so that I can use Optiq to combine Drill data with data from other systems.
> 4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan
> 
> I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)
> 
> Julian

Re: What do you want out of Apache Drill?

Posted by Timothy Chen <tn...@gmail.com>.

So to be specific you mean the ability to translate drql into a spark execution plan?

Tim

Sent from my iPad

On Dec 6, 2012, at 1:33 PM, Andrew Psaltis <An...@Webtrends.com> wrote:

>> So, What do you want Apache Drill to help you with?
> 
> I also want what Julian wants in respect to #1 (A SQL interface in addition to DrQL interface), but Santa I also would like the following:
> 
> 1.  An ability to generate a plan for a Map Reduce framework, more specifically to generate a plan that can be executed by Spark.  That plan can start with the stack at a low level that then we could walk and do the "correct" thing in Spark.
> 2. A follow on to #2 would be to generate the Java or Scala code that could be used to drive then generation of a Spark RDD
> 
> I realize that this is perhaps outside of the scope of Drill, but I would be willing and ready to work on making this gift a become reality.
> 
> Thanks,
> Andrew
> 
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 
> -----Original Message-----
> From: Julian Hyde [mailto:julianhyde@gmail.com] 
> Sent: Thursday, December 06, 2012 12:44 PM
> To: drill-dev@incubator.apache.org
> Cc: drill-user@incubator.apache.org
> Subject: Re: What do you want out of Apache Drill?
> 
> On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:
> 
>> So, What do you want Apache Drill to help you with?
> 
> Dear Santa,
> 
> Here's what I'd like:
> 
> 1 A SQL interface (in addition to DrQL interface)
> 2 JDBC driver
> 3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
> 4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)
> 
> 1+2 so that I can run Mondrian on Drill.
> 3 so that I can use Optiq to combine Drill data with data from other systems.
> 4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan
> 
> I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)
> 
> Julian

RE: What do you want out of Apache Drill?

Posted by Andrew Psaltis <An...@Webtrends.com>.

> So, What do you want Apache Drill to help you with?

I also want what Julian wants in respect to #1 (A SQL interface in addition to DrQL interface), but Santa I also would like the following:

1.  An ability to generate a plan for a Map Reduce framework, more specifically to generate a plan that can be executed by Spark.  That plan can start with the stack at a low level that then we could walk and do the "correct" thing in Spark.
2. A follow on to #2 would be to generate the Java or Scala code that could be used to drive then generation of a Spark RDD

I realize that this is perhaps outside of the scope of Drill, but I would be willing and ready to work on making this gift a become reality.

Thanks,
Andrew

Dear Santa,

Here's what I'd like:

-----Original Message-----
From: Julian Hyde [mailto:julianhyde@gmail.com] 
Sent: Thursday, December 06, 2012 12:44 PM
To: drill-dev@incubator.apache.org
Cc: drill-user@incubator.apache.org
Subject: Re: What do you want out of Apache Drill?

On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:

> So, What do you want Apache Drill to help you with?

Dear Santa,

Here's what I'd like:

1 A SQL interface (in addition to DrQL interface)
2 JDBC driver
3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)

1+2 so that I can run Mondrian on Drill.
3 so that I can use Optiq to combine Drill data with data from other systems.
4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan

I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)

Julian

Re: What do you want out of Apache Drill?

Posted by David Alves <da...@gmail.com>.

@Jacques: +1 on pretty much all you said. I, personally, will be focusing on those as soon as I'm able to get something running.
@Ted: good to know there is no major sentiment against large joins, the required infrastructure for performant large joins should also allow for performant cogroups

-david

On Mar 13, 2013, at 11:42 AM, Jacques Nadeau <ja...@apache.org> wrote:

> I have a feeling that large joins will be dealt with sooner rather than
> later (especially with interest and work from people like you).  If you
> look at large queries, things are dominated by large sorts, large joins and
> large group-by aggregations.  We need to make sure those are performant in
> large clusters before we focus on the prettier things.  Hopefully we can
> leverage Google Compute Engine to ensure this.
> 
> 
> 
> On Wed, Mar 13, 2013 at 7:07 AM, David Alves <da...@gmail.com> wrote:
> 
>> Hi All
>> 
>>        Sorry to revive an old thread…
>>        I was going through the list looking for the current stance on
>> joins and I found Ted's answer.
>>        What is the main point behind not doing large joins on Drill?
>>        Is it just simplicity (as in optimizer, etc.) or is there
>> something else?
>>        I mention this because I'm particularly interested in large self
>> joins (I'd can volunteer to work on them myself, of course).
>>        I'm not against leaving them out of any optimizer goals, if one
>> can explicitly select an identity optimizer that will just follow the
>> logical plan, but they are big requirement for me.
>>        Thoughts?
>> 
>> Best
>> David
>> 
>> On Dec 6, 2012, at 7:33 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> Drill is explicitly designed (at this time) with the option of not doing
>>> large joins.  Triple stores pretty much  assume lots of large joins.
>>> 
>>> That said, if you could write some suggested typical queries, it would
>> help
>>> the discussion along.  If you could go so far as to translate to a
>> logical
>>> plan, that would be even cooler.
>>> 
>>> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <mk...@gmail.com> wrote:
>>> 
>>>> I would very much be interested in having a SPARQL interface, though I
>> am
>>>> not sure how well Drill will handle many joins.
>>>> 
>>>> 
>>>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>>> 
>>>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> ...
>>>>>> 1 A SQL interface (in addition to DrQL interface)
>>>>>> 
>>>>> 
>>>>> With your help, this may arrive before DrQL is integrated.
>>>>> 
>>>>> 
>>>>>> 2 JDBC driver
>>>>>> 
>>>>> 
>>>>> Should be pretty straightforward.  Not on anybody's task list just
>> yet, I
>>>>> don't think.
>>>>> 
>>>>> 
>>>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>>>> high-performance scan operators without writing a query)
>>>>>> 
>>>>> 
>>>>> Definitely going to happen.
>>>>> 
>>>>> 
>>>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>>>> of
>>>>>> primitives or nio buffers)
>>>>>> 
>>>>> 
>>>>> I wonder if this is just a matter of writing a special scanner or a
>>>> special
>>>>> flavor of join at the execution point.  The scanner for the case where
>>>> the
>>>>> in-memory compact form is only readable in sequential form. The
>>>>> join-operator if the memory can be accessed at random.
>>>>> 
>>>>> ...
>>>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>>>> disregard. But if you don't ask, you don't get. :)
>>>>>> 
>>>>> 
>>>>> They all look pretty reasonable to me.
>>>>> 
>>>> 
>> 
>>

Re: What do you want out of Apache Drill?

Posted by Jacques Nadeau <ja...@apache.org>.

I have a feeling that large joins will be dealt with sooner rather than
later (especially with interest and work from people like you).  If you
look at large queries, things are dominated by large sorts, large joins and
large group-by aggregations.  We need to make sure those are performant in
large clusters before we focus on the prettier things.  Hopefully we can
leverage Google Compute Engine to ensure this.



On Wed, Mar 13, 2013 at 7:07 AM, David Alves <da...@gmail.com> wrote:

> Hi All
>
>         Sorry to revive an old thread…
>         I was going through the list looking for the current stance on
> joins and I found Ted's answer.
>         What is the main point behind not doing large joins on Drill?
>         Is it just simplicity (as in optimizer, etc.) or is there
> something else?
>         I mention this because I'm particularly interested in large self
> joins (I'd can volunteer to work on them myself, of course).
>         I'm not against leaving them out of any optimizer goals, if one
> can explicitly select an identity optimizer that will just follow the
> logical plan, but they are big requirement for me.
>         Thoughts?
>
> Best
> David
>
> On Dec 6, 2012, at 7:33 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > Drill is explicitly designed (at this time) with the option of not doing
> > large joins.  Triple stores pretty much  assume lots of large joins.
> >
> > That said, if you could write some suggested typical queries, it would
> help
> > the discussion along.  If you could go so far as to translate to a
> logical
> > plan, that would be even cooler.
> >
> > On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <mk...@gmail.com> wrote:
> >
> >> I would very much be interested in having a SPARQL interface, though I
> am
> >> not sure how well Drill will handle many joins.
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>
> >>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com>
> >> wrote:
> >>>
> >>>> ...
> >>>> 1 A SQL interface (in addition to DrQL interface)
> >>>>
> >>>
> >>> With your help, this may arrive before DrQL is integrated.
> >>>
> >>>
> >>>> 2 JDBC driver
> >>>>
> >>>
> >>> Should be pretty straightforward.  Not on anybody's task list just
> yet, I
> >>> don't think.
> >>>
> >>>
> >>>> 3 Access to the stack at a lower level (i.e. a way to use the
> >>>> high-performance scan operators without writing a query)
> >>>>
> >>>
> >>> Definitely going to happen.
> >>>
> >>>
> >>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
> >> of
> >>>> primitives or nio buffers)
> >>>>
> >>>
> >>> I wonder if this is just a matter of writing a special scanner or a
> >> special
> >>> flavor of join at the execution point.  The scanner for the case where
> >> the
> >>> in-memory compact form is only readable in sequential form. The
> >>> join-operator if the memory can be accessed at random.
> >>>
> >>> ...
> >>>> I know some of these are outside of Drill's scope. If so, feel free to
> >>>> disregard. But if you don't ask, you don't get. :)
> >>>>
> >>>
> >>> They all look pretty reasonable to me.
> >>>
> >>
>
>

Re: What do you want out of Apache Drill?

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Mar 13, 2013 at 7:07 AM, David Alves <da...@gmail.com> wrote:

>         I was going through the list looking for the current stance on
> joins and I found Ted's answer.
>         What is the main point behind not doing large joins on Drill?
>

Not doing large joins *yet*.

>          Is it just simplicity (as in optimizer, etc.) or is there
> something else?
>

Simplicity in early implementation.

>          I mention this because I'm particularly interested in large self
> joins (I'd can volunteer to work on them myself, of course).
>

I would love to see large self joins.  In pig notation, I would be
interested in co-group of multiple fields on a single key field followed by
counting of all pairs in each of the groups.  Counting the cross-group
pairs is also interesting.  Saying this concisely in SQL is hard for me,
especially since I would like to down-sample each of the groups.  I can say
it with many queries and multiple temp tables, but I expect that this would
be difficult for the optimizer to understand.  I can also say it concisely
in Drill's intermediate language.

Re: What do you want out of Apache Drill?

Posted by David Alves <da...@gmail.com>.

Hi All
	
	Sorry to revive an old thread…
	I was going through the list looking for the current stance on joins and I found Ted's answer.
	What is the main point behind not doing large joins on Drill? 
	Is it just simplicity (as in optimizer, etc.) or is there something else?
	I mention this because I'm particularly interested in large self joins (I'd can volunteer to work on them myself, of course).
	I'm not against leaving them out of any optimizer goals, if one can explicitly select an identity optimizer that will just follow the logical plan, but they are big requirement for me.
	Thoughts?

Best
David	

On Dec 6, 2012, at 7:33 PM, Ted Dunning <te...@gmail.com> wrote:

> Drill is explicitly designed (at this time) with the option of not doing
> large joins.  Triple stores pretty much  assume lots of large joins.
> 
> That said, if you could write some suggested typical queries, it would help
> the discussion along.  If you could go so far as to translate to a logical
> plan, that would be even cooler.
> 
> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <mk...@gmail.com> wrote:
> 
>> I would very much be interested in having a SPARQL interface, though I am
>> not sure how well Drill will handle many joins.
>> 
>> 
>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com>
>> wrote:
>>> 
>>>> ...
>>>> 1 A SQL interface (in addition to DrQL interface)
>>>> 
>>> 
>>> With your help, this may arrive before DrQL is integrated.
>>> 
>>> 
>>>> 2 JDBC driver
>>>> 
>>> 
>>> Should be pretty straightforward.  Not on anybody's task list just yet, I
>>> don't think.
>>> 
>>> 
>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>> high-performance scan operators without writing a query)
>>>> 
>>> 
>>> Definitely going to happen.
>>> 
>>> 
>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>> of
>>>> primitives or nio buffers)
>>>> 
>>> 
>>> I wonder if this is just a matter of writing a special scanner or a
>> special
>>> flavor of join at the execution point.  The scanner for the case where
>> the
>>> in-memory compact form is only readable in sequential form. The
>>> join-operator if the memory can be accessed at random.
>>> 
>>> ...
>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>> disregard. But if you don't ask, you don't get. :)
>>>> 
>>> 
>>> They all look pretty reasonable to me.
>>> 
>>

Re: What do you want out of Apache Drill?

Posted by Ted Dunning <te...@gmail.com>.

Drill is explicitly designed (at this time) with the option of not doing
large joins.  Triple stores pretty much  assume lots of large joins.

That said, if you could write some suggested typical queries, it would help
the discussion along.  If you could go so far as to translate to a logical
plan, that would be even cooler.

On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <mk...@gmail.com> wrote:

> I would very much be interested in having a SPARQL interface, though I am
> not sure how well Drill will handle many joins.
>
>
> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com>
> wrote:
> >
> > > ...
> > > 1 A SQL interface (in addition to DrQL interface)
> > >
> >
> > With your help, this may arrive before DrQL is integrated.
> >
> >
> > > 2 JDBC driver
> > >
> >
> > Should be pretty straightforward.  Not on anybody's task list just yet, I
> > don't think.
> >
> >
> > > 3 Access to the stack at a lower level (i.e. a way to use the
> > > high-performance scan operators without writing a query)
> > >
> >
> > Definitely going to happen.
> >
> >
> > > 4 Ability to query in-memory Java data in a compact form (e.g. arrays
> of
> > > primitives or nio buffers)
> > >
> >
> > I wonder if this is just a matter of writing a special scanner or a
> special
> > flavor of join at the execution point.  The scanner for the case where
> the
> > in-memory compact form is only readable in sequential form. The
> > join-operator if the memory can be accessed at random.
> >
> > ...
> > > I know some of these are outside of Drill's scope. If so, feel free to
> > > disregard. But if you don't ask, you don't get. :)
> > >
> >
> > They all look pretty reasonable to me.
> >
>

Re: What do you want out of Apache Drill?

Posted by Mike Kogan <mk...@gmail.com>.

I would very much be interested in having a SPARQL interface, though I am
not sure how well Drill will handle many joins.


On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com> wrote:
>
> > ...
> > 1 A SQL interface (in addition to DrQL interface)
> >
>
> With your help, this may arrive before DrQL is integrated.
>
>
> > 2 JDBC driver
> >
>
> Should be pretty straightforward.  Not on anybody's task list just yet, I
> don't think.
>
>
> > 3 Access to the stack at a lower level (i.e. a way to use the
> > high-performance scan operators without writing a query)
> >
>
> Definitely going to happen.
>
>
> > 4 Ability to query in-memory Java data in a compact form (e.g. arrays of
> > primitives or nio buffers)
> >
>
> I wonder if this is just a matter of writing a special scanner or a special
> flavor of join at the execution point.  The scanner for the case where the
> in-memory compact form is only readable in sequential form. The
> join-operator if the memory can be accessed at random.
>
> ...
> > I know some of these are outside of Drill's scope. If so, feel free to
> > disregard. But if you don't ask, you don't get. :)
> >
>
> They all look pretty reasonable to me.
>

Re: What do you want out of Apache Drill?

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com> wrote:

> ...
> 1 A SQL interface (in addition to DrQL interface)
>

With your help, this may arrive before DrQL is integrated.

> 2 JDBC driver
>

Should be pretty straightforward.  Not on anybody's task list just yet, I
don't think.

> 3 Access to the stack at a lower level (i.e. a way to use the
> high-performance scan operators without writing a query)
>

Definitely going to happen.

> 4 Ability to query in-memory Java data in a compact form (e.g. arrays of
> primitives or nio buffers)
>

I wonder if this is just a matter of writing a special scanner or a special
flavor of join at the execution point.  The scanner for the case where the
in-memory compact form is only readable in sequential form. The
join-operator if the memory can be accessed at random.

...
> I know some of these are outside of Drill's scope. If so, feel free to
> disregard. But if you don't ask, you don't get. :)
>

They all look pretty reasonable to me.

Re: What do you want out of Apache Drill?

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <ju...@gmail.com> wrote:

> ...
> 1 A SQL interface (in addition to DrQL interface)
>

With your help, this may arrive before DrQL is integrated.

> 2 JDBC driver
>

Should be pretty straightforward.  Not on anybody's task list just yet, I
don't think.

> 3 Access to the stack at a lower level (i.e. a way to use the
> high-performance scan operators without writing a query)
>

Definitely going to happen.

> 4 Ability to query in-memory Java data in a compact form (e.g. arrays of
> primitives or nio buffers)
>

I wonder if this is just a matter of writing a special scanner or a special
flavor of join at the execution point.  The scanner for the case where the
in-memory compact form is only readable in sequential form. The
join-operator if the memory can be accessed at random.

...
> I know some of these are outside of Drill's scope. If so, feel free to
> disregard. But if you don't ask, you don't get. :)
>

They all look pretty reasonable to me.

RE: What do you want out of Apache Drill?

Posted by Andrew Psaltis <An...@Webtrends.com>.

> So, What do you want Apache Drill to help you with?

I also want what Julian wants in respect to #1 (A SQL interface in addition to DrQL interface), but Santa I also would like the following:

1.  An ability to generate a plan for a Map Reduce framework, more specifically to generate a plan that can be executed by Spark.  That plan can start with the stack at a low level that then we could walk and do the "correct" thing in Spark.
2. A follow on to #2 would be to generate the Java or Scala code that could be used to drive then generation of a Spark RDD

I realize that this is perhaps outside of the scope of Drill, but I would be willing and ready to work on making this gift a become reality.

Thanks,
Andrew

Dear Santa,

Here's what I'd like:

-----Original Message-----
From: Julian Hyde [mailto:julianhyde@gmail.com] 
Sent: Thursday, December 06, 2012 12:44 PM
To: drill-dev@incubator.apache.org
Cc: drill-user@incubator.apache.org
Subject: Re: What do you want out of Apache Drill?

On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:

> So, What do you want Apache Drill to help you with?

Dear Santa,

Here's what I'd like:

1 A SQL interface (in addition to DrQL interface)
2 JDBC driver
3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)

1+2 so that I can run Mondrian on Drill.
3 so that I can use Optiq to combine Drill data with data from other systems.
4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan

I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)

Julian

Re: What do you want out of Apache Drill?

Posted by Julian Hyde <ju...@gmail.com>.

On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:

> So, What do you want Apache Drill to help you with?

Dear Santa,

Here's what I'd like:

1 A SQL interface (in addition to DrQL interface)
2 JDBC driver
3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)

1+2 so that I can run Mondrian on Drill.
3 so that I can use Optiq to combine Drill data with data from other systems.
4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan

I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)

Julian

Re: What do you want out of Apache Drill?

Posted by Julian Hyde <ju...@gmail.com>.

On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com> wrote:

> So, What do you want Apache Drill to help you with?

Dear Santa,

Here's what I'd like:

1 A SQL interface (in addition to DrQL interface)
2 JDBC driver
3 Access to the stack at a lower level (i.e. a way to use the high-performance scan operators without writing a query)
4 Ability to query in-memory Java data in a compact form (e.g. arrays of primitives or nio buffers)

1+2 so that I can run Mondrian on Drill.
3 so that I can use Optiq to combine Drill data with data from other systems.
4 so that I can change Mondrian's cache implementation from "java objects" to "in-memory database", whose blocks are managed by a cache such as jboss infinispan

I know some of these are outside of Drill's scope. If so, feel free to disregard. But if you don't ask, you don't get. :)

Julian

Re: What do you want out of Apache Drill?

Posted by Shawn O'Connor <so...@falconknight.com>.

Sure, here you go:

https://dl.dropbox.com/u/49715152/sample.sql

	-Shawn

On Dec 6, 2012, at 11:17 AM, Tomer Shiran <ts...@maprtech.com> wrote:

> Thanks Shawn! It looks like the attachment is being removed by the Apache
> mailing list. Can you paste the query in the body or put it in a publicly
> accessible place (Dropbox, Google Doc, etc.)?
> 
> Thanks!
> 
> 
> On Thu, Dec 6, 2012 at 10:57 AM, Shawn O'Connor
> <so...@falconknight.com>wrote:
> 
>> Once it is ready, I'd like to replace our traditional SQL servers with
>> Drill. Our use case is realtime statistical analytics (Average, standard
>> deviation, sum, etc.) on user specified groups of data.
>> 
>> Our traditional SQL solution works okay up to about 300k rows but starts
>> to become slow (> 3 seconds) beyond that. We've worked around it by having
>> multiple SQL servers and sending the queries to them asynchronously,
>> collecting the varied responses and then returning the results. We need to
>> be able to handle 12 million rows, still hopefully in a real-time manner.
>> In addition, the drill approach would shift the complexity from the
>> sharding data.
>> 
>> I've attached a sample query as an attachment.
>> 
>>        -Shawn
>> 
>> 
>> 
>> 
>> On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com>
>> wrote:
>>> 
>>> While comments like "Faster Hive" or "more compliant SQL" are useful, it
>>> would be more helpful if you spent more time describing the particular
>> data
>>> flows that you run today (including data sources), what your pain points
>>> are and things that you can't do today but would like to do.
>> 
>> 
>> 
> 
> 
> -- 
> Tomer Shiran
> Director of Product Management | MapR Technologies | 650-804-8657

Re: What do you want out of Apache Drill?

Posted by Tomer Shiran <ts...@maprtech.com>.

Thanks Shawn! It looks like the attachment is being removed by the Apache
mailing list. Can you paste the query in the body or put it in a publicly
accessible place (Dropbox, Google Doc, etc.)?

Thanks!


On Thu, Dec 6, 2012 at 10:57 AM, Shawn O'Connor
<so...@falconknight.com>wrote:

> Once it is ready, I'd like to replace our traditional SQL servers with
> Drill. Our use case is realtime statistical analytics (Average, standard
> deviation, sum, etc.) on user specified groups of data.
>
> Our traditional SQL solution works okay up to about 300k rows but starts
> to become slow (> 3 seconds) beyond that. We've worked around it by having
> multiple SQL servers and sending the queries to them asynchronously,
> collecting the varied responses and then returning the results. We need to
> be able to handle 12 million rows, still hopefully in a real-time manner.
> In addition, the drill approach would shift the complexity from the
> sharding data.
>
> I've attached a sample query as an attachment.
>
>         -Shawn
>
>
>
>
> On Dec 6, 2012, at 10:36 AM, Jacques Nadeau <ja...@gmail.com>
> wrote:
> >
> > While comments like "Faster Hive" or "more compliant SQL" are useful, it
> > would be more helpful if you spent more time describing the particular
> data
> > flows that you run today (including data sources), what your pain points
> > are and things that you can't do today but would like to do.
>
>
>


-- 
Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

Re: What do you want out of Apache Drill?

Posted by Shawn O'Connor <so...@falconknight.com>.

Once it is ready, I'd like to replace our traditional SQL servers with Drill. Our use case is realtime statistical analytics (Average, standard deviation, sum, etc.) on user specified groups of data.

Our traditional SQL solution works okay up to about 300k rows but starts to become slow (> 3 seconds) beyond that. We've worked around it by having multiple SQL servers and sending the queries to them asynchronously, collecting the varied responses and then returning the results. We need to be able to handle 12 million rows, still hopefully in a real-time manner. In addition, the drill approach would shift the complexity from the sharding data.

I've attached a sample query as an attachment.

	-Shawn