You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Uli Bethke <ul...@sonra.io> on 2015/04/23 13:42:46 UTC

Drill & approximate query

Is approximate query on the roadmap/radar for Apache Drill (similar to 
what we have in BlinkDB)?

I see the following benefit of this feature:
When performing data discovery, the analyst can often trade off raw 
speed against accuracy.
Data discovery tools such as Datameer etc. work on a statistically 
significant sample of data. Approximate query could potentially put 
Drill at a par with these.

I would also be interested to find out what other people's point of view 
is on this.
Cheers
Uli

Re: Drill & approximate query

Posted by Jacques Nadeau <ja...@apache.org>.

Uli,

To add to what Ted said, we'll be adding some items in this vein in the
medium term.  That being said, we would be more than happy to accept
contributions of ideas, design or code if you were so inclined on any of
these things.  Sometimes, the quickest way to get these things is to
contribute them :)

On Thu, Apr 23, 2015 at 11:17 AM, Ted Dunning <te...@gmail.com> wrote:

> Uli,
>
> I think that the current plans include approximate operators for some
> aggregations, but not anything on the level, say, BlinkDB.
>
> That said, Drill's optimizer could easily have rules that allow you to
> explicitly down-sample data to different degrees and then have queries
> choose between versions very facilely.  This is somewhat analogous to how
> Apache Kylin uses the same optimizer to query OLAP cubed versions of
> tables.
>
> That isn't on the roadmap.
>
>
>
> On Thu, Apr 23, 2015 at 7:42 AM, Uli Bethke <ul...@sonra.io> wrote:
>
> > Is approximate query on the roadmap/radar for Apache Drill (similar to
> > what we have in BlinkDB)?
> >
> > I see the following benefit of this feature:
> > When performing data discovery, the analyst can often trade off raw speed
> > against accuracy.
> > Data discovery tools such as Datameer etc. work on a statistically
> > significant sample of data. Approximate query could potentially put Drill
> > at a par with these.
> >
> > I would also be interested to find out what other people's point of view
> > is on this.
> > Cheers
> > Uli
> >
> >
>

Re: Drill & approximate query

Posted by Ted Dunning <te...@gmail.com>.

Uli,

I think that the current plans include approximate operators for some
aggregations, but not anything on the level, say, BlinkDB.

That said, Drill's optimizer could easily have rules that allow you to
explicitly down-sample data to different degrees and then have queries
choose between versions very facilely.  This is somewhat analogous to how
Apache Kylin uses the same optimizer to query OLAP cubed versions of tables.

That isn't on the roadmap.

On Thu, Apr 23, 2015 at 7:42 AM, Uli Bethke <ul...@sonra.io> wrote:

> Is approximate query on the roadmap/radar for Apache Drill (similar to
> what we have in BlinkDB)?
>
> I see the following benefit of this feature:
> When performing data discovery, the analyst can often trade off raw speed
> against accuracy.
> Data discovery tools such as Datameer etc. work on a statistically
> significant sample of data. Approximate query could potentially put Drill
> at a par with these.
>
> I would also be interested to find out what other people's point of view
> is on this.
> Cheers
> Uli
>
>