You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by 吴朱华 <ik...@gmail.com> on 2017/06/20 11:10:02 UTC

Is there a plan to implement histogram support for computing column stats?

Hi guys:

Is there a plan to implement the histogram support for computing column
stats? Base on my assumption, if  the histogram support implements, it will
easily and more accurate to  predict the join involved row numbers, and
which will make a better decision for choosing the shuffle or the broadcast.
Above all is all my amateur thoughts, I love to hear your feedbacks^_^

Re: Is there a plan to implement histogram support for computing column stats?

Posted by Alexander Behm <al...@cloudera.com>.
Histograms can definitely be useful in getting more accurate cardinality
estimates to improve plan choices.

However, adding support for histograms has several challenges which is why
we have no concrete plans on supporting them yet:
- computing stats is already a huge pain for most users due to its cost;
adding histograms might make this problem worse
- perhaps we should support stats on a subset of columns to make it
cheaper; but most users would have difficulties deciding the subset of
columns to pick, so we'd need to provide an automated solution for
suggesting the subset
- there is no support in the Hive Metastore so we'd need to store them
inside the generic TBLPROPERTIES map or similar

Just trying to explain that the user experience needs to be considered when
adding such a new feature, and that dealing with all caveats could be a
substantial amount of design and implementation work.


On Tue, Jun 20, 2017 at 8:53 PM, 吴朱华 <ik...@gmail.com> wrote:

> let me check it out^_^
>
> 2017-06-21 0:25 GMT+08:00 Jim Apple <jb...@cloudera.com>:
>
> > THis looks like the closest ticket to the question:
> >
> > https://issues.apache.org/jira/browse/IMPALA-2416
> >
> > Feel free to file another, more ambitious, ticket if you'd like.
> >
> > On Tue, Jun 20, 2017 at 4:10 AM, 吴朱华 <ik...@gmail.com> wrote:
> > > Hi guys:
> > >
> > > Is there a plan to implement the histogram support for computing column
> > > stats? Base on my assumption, if  the histogram support implements, it
> > will
> > > easily and more accurate to  predict the join involved row numbers, and
> > > which will make a better decision for choosing the shuffle or the
> > broadcast.
> > > Above all is all my amateur thoughts, I love to hear your feedbacks^_^
> >
>

Re: Is there a plan to implement histogram support for computing column stats?

Posted by 吴朱华 <ik...@gmail.com>.
let me check it out^_^

2017-06-21 0:25 GMT+08:00 Jim Apple <jb...@cloudera.com>:

> THis looks like the closest ticket to the question:
>
> https://issues.apache.org/jira/browse/IMPALA-2416
>
> Feel free to file another, more ambitious, ticket if you'd like.
>
> On Tue, Jun 20, 2017 at 4:10 AM, 吴朱华 <ik...@gmail.com> wrote:
> > Hi guys:
> >
> > Is there a plan to implement the histogram support for computing column
> > stats? Base on my assumption, if  the histogram support implements, it
> will
> > easily and more accurate to  predict the join involved row numbers, and
> > which will make a better decision for choosing the shuffle or the
> broadcast.
> > Above all is all my amateur thoughts, I love to hear your feedbacks^_^
>

Re: Is there a plan to implement histogram support for computing column stats?

Posted by Jim Apple <jb...@cloudera.com>.
THis looks like the closest ticket to the question:

https://issues.apache.org/jira/browse/IMPALA-2416

Feel free to file another, more ambitious, ticket if you'd like.

On Tue, Jun 20, 2017 at 4:10 AM, 吴朱华 <ik...@gmail.com> wrote:
> Hi guys:
>
> Is there a plan to implement the histogram support for computing column
> stats? Base on my assumption, if  the histogram support implements, it will
> easily and more accurate to  predict the join involved row numbers, and
> which will make a better decision for choosing the shuffle or the broadcast.
> Above all is all my amateur thoughts, I love to hear your feedbacks^_^