You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang (JIRA)" <ji...@apache.org> on 2017/12/11 06:42:00 UTC

[jira] [Comment Edited] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

    [ https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285558#comment-16285558 ] 

liyunzhang edited comment on HIVE-17486 at 12/11/17 6:41 AM:
-------------------------------------------------------------

[~xuefuz]
{quote}  It seems that we can do so whenever an TS is connected to multiple RSs. The split point should happen at the fork. {quote}  not very understand about this. Currently the split is on the TS
for example
{code}
TS[0]-FIL[52]-SEL[2]-GBY[3]-RS[4]-GBY[5]-RS[42]-JOIN[48]-SEL[49]-LIM[50]-FS[51]
        -FIL[53]-SEL[9]-GBY[10]-RS[11]-GBY[12]-RS[43]-JOIN[48]
{code}

->
{code}		
Map1: TS[0]
Map2:FIL[52]-SEL[2]-GBY[3]-RS[4]
Map3:FIL[53]-SEL[9]-GBY[10]-RS[11]
Reducer1:GBY[5]-RS[42]-JOIN[48]-SEL[49]-LIM[50]-FS[51]
Reducer2:GBY[12]-RS[43]
{code}


was (Author: kellyzly):
[~xuefuz]
{quote}  It seems that we can do so whenever an TS is connected to multiple RSs. The split point should happen at the fork. {quote}  not very understand about this. Please explain more, thanks!

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>         Attachments: HIVE-17486.1.patch, explain.28.share.false, explain.28.share.true, scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level.  In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)