You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ruoxi Sun <za...@gmail.com> on 2015/05/08 20:57:06 UTC

Question about bushy join in hive CBO

Hi all,

I'm studying CBO code in hive. I have a question about bushy join
optimization.

Bushy join did get introduced in hive via HIVE-7577
<https://issues.apache.org/jira/browse/HIVE-7577>, and played an important
role in optimizing several queries in TPCDS benchmark. Somehow I saw the
bushy join rule was removed in HIVE-7687
<https://issues.apache.org/jira/browse/HIVE-7687>, and didn't find much
comment about the removal.

I wonder if the bushy join is totally gone from hive trunk? And if so, why
is that? Or did I miss anything?

Thanks in advance.

*Rossi*

Re: Question about bushy join in hive CBO

Posted by Ruoxi Sun <za...@gmail.com>.
Thank you, Ashutosh. That's very informative.

I appreciate that!


*Rossi*

2015-05-12 9:08 GMT+08:00 Ashutosh Chauhan <ha...@apache.org>:

> Hi Rossi,
>
> Historically, we used LoptOptimizeJoinRule of Calcite to do join
> reordering. This does a greedy search on join order search space to find a
> join order which is atleast as good as original join order of query.
> Goodness being in term of estimated cost and not globally optimal because
> of greedy nature of algorithm.
> Our initial experimentation showed this rule was generating only left
> leaning join tree, not considering bushy join orders, which made a huge
> difference in query runtime especially because for star schema setups which
> is common in analytical workloads, bushy joins usually are way better than
> left (or right) leaning trees. At this point we added OptimizeBushyJoinRule.
>
>
> However, a bit more experimentation and debugging informed us that LoptOptimizeJoinRule
> can actually generate bushy join trees. Problem was we had bugs in our
> statistics/cost model which we were feeding to the rule. Once that was
> established we switched back to LoptOptimizeJoinRule.
>
> So, in nut shell, hive CBO can and does generate bushy joins. If you have
> test case where we are not generating bushy join, where we can, please post
> back. Will be happy to take a look.
>
> Thanks,
> Ashutosh
>
>
> On Fri, May 8, 2015 at 11:57 AM, Ruoxi Sun <za...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm studying CBO code in hive. I have a question about bushy join
>> optimization.
>>
>> Bushy join did get introduced in hive via HIVE-7577
>> <https://issues.apache.org/jira/browse/HIVE-7577>, and played an
>> important role in optimizing several queries in TPCDS benchmark. Somehow I
>> saw the bushy join rule was removed in HIVE-7687
>> <https://issues.apache.org/jira/browse/HIVE-7687>, and didn't find much
>> comment about the removal.
>>
>> I wonder if the bushy join is totally gone from hive trunk? And if so,
>> why is that? Or did I miss anything?
>>
>> Thanks in advance.
>>
>> *Rossi*
>>
>
>

Re: Question about bushy join in hive CBO

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hi Rossi,

Historically, we used LoptOptimizeJoinRule of Calcite to do join
reordering. This does a greedy search on join order search space to find a
join order which is atleast as good as original join order of query.
Goodness being in term of estimated cost and not globally optimal because
of greedy nature of algorithm.
Our initial experimentation showed this rule was generating only left
leaning join tree, not considering bushy join orders, which made a huge
difference in query runtime especially because for star schema setups which
is common in analytical workloads, bushy joins usually are way better than
left (or right) leaning trees. At this point we added OptimizeBushyJoinRule.


However, a bit more experimentation and debugging informed us that
LoptOptimizeJoinRule
can actually generate bushy join trees. Problem was we had bugs in our
statistics/cost model which we were feeding to the rule. Once that was
established we switched back to LoptOptimizeJoinRule.

So, in nut shell, hive CBO can and does generate bushy joins. If you have
test case where we are not generating bushy join, where we can, please post
back. Will be happy to take a look.

Thanks,
Ashutosh


On Fri, May 8, 2015 at 11:57 AM, Ruoxi Sun <za...@gmail.com> wrote:

> Hi all,
>
> I'm studying CBO code in hive. I have a question about bushy join
> optimization.
>
> Bushy join did get introduced in hive via HIVE-7577
> <https://issues.apache.org/jira/browse/HIVE-7577>, and played an
> important role in optimizing several queries in TPCDS benchmark. Somehow I
> saw the bushy join rule was removed in HIVE-7687
> <https://issues.apache.org/jira/browse/HIVE-7687>, and didn't find much
> comment about the removal.
>
> I wonder if the bushy join is totally gone from hive trunk? And if so, why
> is that? Or did I miss anything?
>
> Thanks in advance.
>
> *Rossi*
>