You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by AnilKumar B <ak...@gmail.com> on 2013/06/10 10:35:44 UTC

Question regarding to Drill

Hi,

I went through the Drill documentation and going through the source code, I
have few questions regarding to drill. Can any one help me in understanding
it much better?

1) How the Drill aggregations are real time? Anyway it is going to scan all
the records right? What exactly it optimizes when compare to Map Reduce
based Hive(Considering index feature)?

2) For aggregations, Is in't Cube materialization will be better solution?
 For example like HBase-Lattice kind of solution.

3) What exactly the real use cases for Drill? Whenever we say interactive,
mostly they include aggregations, and when we say aggregations definitely
they cannot be real time, when we scan whole raw data.

Thanks,
B Anil Kumar.

Re: Question regarding to Drill

Posted by AnilKumar B <ak...@gmail.com>.

Thanks Ted.

What exactly, I thought is pre-computing the aggregations like cubes might
be better. But as you mentioned, that might be true, If I know ahead of
time.


On Mon, Jun 10, 2013 at 2:20 PM, Ted Dunning <te...@gmail.com> wrote:

> On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <ak...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I went through the Drill documentation and going through the source
> code, I
> > have few questions regarding to drill. Can any one help me in
> understanding
> > it much better?
> >
> > 1) How the Drill aggregations are real time? Anyway it is going to scan
> all
> > the records right? What exactly it optimizes when compare to Map Reduce
> > based Hive(Considering index feature)?
> >
>
> Real-time is often used in a bit of a sloppy fashion.  The meaning with
> respect to Drill is "ad hoc, interactive queries".
>
>
> > 2) For aggregations, Is in't Cube materialization will be better
> solution?
> >  For example like HBase-Lattice kind of solution.
> >
>
> Cubes are fine if you know what you are doing ahead of time.  They still
> require a pass over the data.  Nothing prevents Drill from creating and/or
> cubes.
>
> 3) What exactly the real use cases for Drill? Whenever we say interactive,
> > mostly they include aggregations, and when we say aggregations definitely
> > they cannot be real time, when we scan whole raw data.
> >
>
> Aggregation is a fine use case.  There are many others as well.  For
> instance, incremental cooccurrence counting.  Or, with special UDF's, the
> inner loop of many machine learning applications.
>
> Drill has an especially flexible scanner API which will allow cross data
> source scanning.
>
> Not sure what you are getting at, though, so I may have mis interpreted
> something you said.
>

Re: Question regarding to Drill

Posted by AnilKumar B <ak...@gmail.com>.

Thanks Ted.

What exactly, I thought is pre-computing the aggregations like cubes might
be better. But as you mentioned, that might be true, If I know ahead of
time.


On Mon, Jun 10, 2013 at 2:20 PM, Ted Dunning <te...@gmail.com> wrote:

> On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <ak...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I went through the Drill documentation and going through the source
> code, I
> > have few questions regarding to drill. Can any one help me in
> understanding
> > it much better?
> >
> > 1) How the Drill aggregations are real time? Anyway it is going to scan
> all
> > the records right? What exactly it optimizes when compare to Map Reduce
> > based Hive(Considering index feature)?
> >
>
> Real-time is often used in a bit of a sloppy fashion.  The meaning with
> respect to Drill is "ad hoc, interactive queries".
>
>
> > 2) For aggregations, Is in't Cube materialization will be better
> solution?
> >  For example like HBase-Lattice kind of solution.
> >
>
> Cubes are fine if you know what you are doing ahead of time.  They still
> require a pass over the data.  Nothing prevents Drill from creating and/or
> cubes.
>
> 3) What exactly the real use cases for Drill? Whenever we say interactive,
> > mostly they include aggregations, and when we say aggregations definitely
> > they cannot be real time, when we scan whole raw data.
> >
>
> Aggregation is a fine use case.  There are many others as well.  For
> instance, incremental cooccurrence counting.  Or, with special UDF's, the
> inner loop of many machine learning applications.
>
> Drill has an especially flexible scanner API which will allow cross data
> source scanning.
>
> Not sure what you are getting at, though, so I may have mis interpreted
> something you said.
>

Re: Question regarding to Drill

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <ak...@gmail.com> wrote:

> Hi,
>
> I went through the Drill documentation and going through the source code, I
> have few questions regarding to drill. Can any one help me in understanding
> it much better?
>
> 1) How the Drill aggregations are real time? Anyway it is going to scan all
> the records right? What exactly it optimizes when compare to Map Reduce
> based Hive(Considering index feature)?
>

Real-time is often used in a bit of a sloppy fashion.  The meaning with
respect to Drill is "ad hoc, interactive queries".

> 2) For aggregations, Is in't Cube materialization will be better solution?
>  For example like HBase-Lattice kind of solution.
>

Cubes are fine if you know what you are doing ahead of time.  They still
require a pass over the data.  Nothing prevents Drill from creating and/or
cubes.

3) What exactly the real use cases for Drill? Whenever we say interactive,
> mostly they include aggregations, and when we say aggregations definitely
> they cannot be real time, when we scan whole raw data.
>

Aggregation is a fine use case.  There are many others as well.  For
instance, incremental cooccurrence counting.  Or, with special UDF's, the
inner loop of many machine learning applications.

Drill has an especially flexible scanner API which will allow cross data
source scanning.

Not sure what you are getting at, though, so I may have mis interpreted
something you said.

Re: Question regarding to Drill

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <ak...@gmail.com> wrote:

> Hi,
>
> I went through the Drill documentation and going through the source code, I
> have few questions regarding to drill. Can any one help me in understanding
> it much better?
>
> 1) How the Drill aggregations are real time? Anyway it is going to scan all
> the records right? What exactly it optimizes when compare to Map Reduce
> based Hive(Considering index feature)?
>

Real-time is often used in a bit of a sloppy fashion.  The meaning with
respect to Drill is "ad hoc, interactive queries".

> 2) For aggregations, Is in't Cube materialization will be better solution?
>  For example like HBase-Lattice kind of solution.
>

Cubes are fine if you know what you are doing ahead of time.  They still
require a pass over the data.  Nothing prevents Drill from creating and/or
cubes.

3) What exactly the real use cases for Drill? Whenever we say interactive,
> mostly they include aggregations, and when we say aggregations definitely
> they cannot be real time, when we scan whole raw data.
>

Aggregation is a fine use case.  There are many others as well.  For
instance, incremental cooccurrence counting.  Or, with special UDF's, the
inner loop of many machine learning applications.

Drill has an especially flexible scanner API which will allow cross data
source scanning.

Not sure what you are getting at, though, so I may have mis interpreted
something you said.