You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jeff Zhang <zj...@gmail.com> on 2015/08/31 09:56:06 UTC

Is YSmart integrated into Hive on tez ?

The reason why I ask this question is that when I execute the following
sql, it will generated a query plan with 4 vertices. But as my
understanding if YSmart is integrated into hive, it should only take 3
vertices since the join key and group by key are the same. Anybody know
this ? Thanks


>> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
orderkey,count(1)  from lineitem l >> join orders o on
l.l_orderkey=o.o_orderkey group by o.o_orderkey;

*YSmart Hive Jira*

https://issues.apache.org/jira/browse/HIVE-2206




-- 
Best Regards

Jeff Zhang

Re: Is YSmart integrated into Hive on tez ?

Posted by Jeff Zhang <zj...@gmail.com>.
+ dev mail list

The original correlation optimization might be designed for mr engine. But
similar optimization could be applied for tez too.  Is there any existing
jira to track that ?



On Tue, Sep 1, 2015 at 1:58 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi Pengcheng,
>
> Is there reason why the correlation optimization disabled in tez ?
>
> And even when I change the code to enable the correlation optimization in
> tez. I still get the same query plan.
>
> >>> Vertex dependency in root stage
> >>> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> >>> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>
> On Tue, Sep 1, 2015 at 1:14 AM, Pengcheng Xiong <px...@apache.org> wrote:
>
>> Hi Jeff,
>>
>>      From code base point of view,  YSmart is integrated into Hive on Tez
>> because it is one of the optimization of the current Hive. However, from
>> the execution point of view, it is now disabled when Hive is running on
>> Tez. You may take look at the source code of Hive
>>
>> Optimizer.java, L175-180:
>> {code}
>>
>> if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW)
>> &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
>> HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&
>>
>>         !isTezExecEngine) {
>>
>>       transformations.add(new CorrelationOptimizer());
>>
>>     }
>> {code}
>>
>> Hope it helps.
>>
>> Best
>> Pengcheng Xiong
>>
>>
>> On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> The reason why I ask this question is that when I execute the following
>>> sql, it will generated a query plan with 4 vertices. But as my
>>> understanding if YSmart is integrated into hive, it should only take 3
>>> vertices since the join key and group by key are the same. Anybody know
>>> this ? Thanks
>>>
>>>
>>> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
>>> orderkey,count(1)  from lineitem l >> join orders o on
>>> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>>>
>>> *YSmart Hive Jira*
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2206
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Is YSmart integrated into Hive on tez ?

Posted by Jeff Zhang <zj...@gmail.com>.
+ dev mail list

The original correlation optimization might be designed for mr engine. But
similar optimization could be applied for tez too.  Is there any existing
jira to track that ?



On Tue, Sep 1, 2015 at 1:58 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi Pengcheng,
>
> Is there reason why the correlation optimization disabled in tez ?
>
> And even when I change the code to enable the correlation optimization in
> tez. I still get the same query plan.
>
> >>> Vertex dependency in root stage
> >>> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> >>> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>
> On Tue, Sep 1, 2015 at 1:14 AM, Pengcheng Xiong <px...@apache.org> wrote:
>
>> Hi Jeff,
>>
>>      From code base point of view,  YSmart is integrated into Hive on Tez
>> because it is one of the optimization of the current Hive. However, from
>> the execution point of view, it is now disabled when Hive is running on
>> Tez. You may take look at the source code of Hive
>>
>> Optimizer.java, L175-180:
>> {code}
>>
>> if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW)
>> &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
>> HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&
>>
>>         !isTezExecEngine) {
>>
>>       transformations.add(new CorrelationOptimizer());
>>
>>     }
>> {code}
>>
>> Hope it helps.
>>
>> Best
>> Pengcheng Xiong
>>
>>
>> On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> The reason why I ask this question is that when I execute the following
>>> sql, it will generated a query plan with 4 vertices. But as my
>>> understanding if YSmart is integrated into hive, it should only take 3
>>> vertices since the join key and group by key are the same. Anybody know
>>> this ? Thanks
>>>
>>>
>>> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
>>> orderkey,count(1)  from lineitem l >> join orders o on
>>> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>>>
>>> *YSmart Hive Jira*
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2206
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Is YSmart integrated into Hive on tez ?

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Pengcheng,

Is there reason why the correlation optimization disabled in tez ?

And even when I change the code to enable the correlation optimization in
tez. I still get the same query plan.

>>> Vertex dependency in root stage
>>> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>>> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)

On Tue, Sep 1, 2015 at 1:14 AM, Pengcheng Xiong <px...@apache.org> wrote:

> Hi Jeff,
>
>      From code base point of view,  YSmart is integrated into Hive on Tez
> because it is one of the optimization of the current Hive. However, from
> the execution point of view, it is now disabled when Hive is running on
> Tez. You may take look at the source code of Hive
>
> Optimizer.java, L175-180:
> {code}
>
> if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&
>
>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW)
> &&
>
>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
> HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&
>
>         !isTezExecEngine) {
>
>       transformations.add(new CorrelationOptimizer());
>
>     }
> {code}
>
> Hope it helps.
>
> Best
> Pengcheng Xiong
>
>
> On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
>> The reason why I ask this question is that when I execute the following
>> sql, it will generated a query plan with 4 vertices. But as my
>> understanding if YSmart is integrated into hive, it should only take 3
>> vertices since the join key and group by key are the same. Anybody know
>> this ? Thanks
>>
>>
>> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
>> orderkey,count(1)  from lineitem l >> join orders o on
>> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>>
>> *YSmart Hive Jira*
>>
>> https://issues.apache.org/jira/browse/HIVE-2206
>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


-- 
Best Regards

Jeff Zhang

Re: Is YSmart integrated into Hive on tez ?

Posted by Pengcheng Xiong <px...@apache.org>.
Hi Jeff,

     From code base point of view,  YSmart is integrated into Hive on Tez
because it is one of the optimization of the current Hive. However, from
the execution point of view, it is now disabled when Hive is running on
Tez. You may take look at the source code of Hive

Optimizer.java, L175-180:
{code}

if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&

        !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW) &&

        !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&

        !isTezExecEngine) {

      transformations.add(new CorrelationOptimizer());

    }
{code}

Hope it helps.

Best
Pengcheng Xiong


On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang <zj...@gmail.com> wrote:

> The reason why I ask this question is that when I execute the following
> sql, it will generated a query plan with 4 vertices. But as my
> understanding if YSmart is integrated into hive, it should only take 3
> vertices since the join key and group by key are the same. Anybody know
> this ? Thanks
>
>
> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
> orderkey,count(1)  from lineitem l >> join orders o on
> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>
> *YSmart Hive Jira*
>
> https://issues.apache.org/jira/browse/HIVE-2206
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>