You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang (JIRA)" <ji...@apache.org> on 2017/10/27 09:12:00 UTC
[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

    [ https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221989#comment-16221989 ] 

liyunzhang commented on HIVE-17486:
-----------------------------------

mapjoin.q
{code}
set hive.mapred.mode=nonstrict;
set hive.explain.user=false;
set hive.auto.convert.join=true;
explain
select src1.key, src1.cnt1, src2.cnt1 from
(
  select key, count(*) as cnt1 from 
  (
    select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 b on a.key = b.key
  ) subq1 group by key
) src1
join
(
  select key, count(*) as cnt1 from 
  (
    select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 b on a.key = b.key
  ) subq2 group by key
) src2
on src1.key = src2.key
{code}

before shared work optimization, the physical plan in Tez
{code}
TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
TS[14]-FIL[43]-SEL[16]-MAPJOIN[46]-GBY[24]-RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[17]-FIL[44]-SEL[19]-RS[21]-MAPJOIN[46]
{code}

after the optimization
{code}

TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
                                        -RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
{code}

so the tez explain 
before
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: root_20171027044107_4977290a-6856-41c5-b0e3-24b4ec32c59e:1
      Edges:
        Map 1 <- Map 3 (BROADCAST_EDGE)
        Map 4 <- Map 6 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 5 (BROADCAST_EDGE)
        Reducer 5 <- Map 4 (SIMPLE_EDGE)
      DagName: root_20171027044107_4977290a-6856-41c5-b0e3-24b4ec32c59e:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 3
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        HybridGraceHashJoin: true
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 6
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        HybridGraceHashJoin: true
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
        Map 6 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)
                  outputColumnNames: _col0, _col1, _col3
                  input vertices:
                    1 Reducer 5
                  Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                  HybridGraceHashJoin: true
                  Select Operator
                    expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 5 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: int)
                  Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

After
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: root_20171027043345_8893c266-bde2-4f23-9914-0dfccc8be5ea:1
      Edges:
        Map 1 <- Map 4 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE)
        Reducer 3 <- Map 1 (SIMPLE_EDGE)
      DagName: root_20171027043345_8893c266-bde2-4f23-9914-0dfccc8be5ea:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 4
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        HybridGraceHashJoin: true
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)
                  outputColumnNames: _col0, _col1, _col3
                  input vertices:
                    1 Reducer 3
                  Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                  HybridGraceHashJoin: true
                  Select Operator
                    expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 3 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: int)
                  Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

We can see that there are only 2 Maps(Map1 and Map4) in the after explain (in before explain, there are 4 Maps).
When i tried to change the physical plan for HOS like what HOT does. I found change of the optimization  
before
{code}
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-2 depends on stages: Stage-3
  Stage-4 depends on stages: Stage-2
  Stage-1 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-3
    Spark
      DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:4
      Vertices:
        Map 6 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Spark HashTable Sink Operator
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-2
    Spark
      Edges:
        Reducer 5 <- Map 4 (GROUP, 1)
      DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:2
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 6
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
            Local Work:
              Map Reduce Local Work
        Reducer 5 
            Local Work:
              Map Reduce Local Work
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Spark HashTable Sink Operator
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)

  Stage: Stage-4
    Spark
      DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:3
      Vertices:
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Spark HashTable Sink Operator
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (GROUP, 1)
      DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 3
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
            Local Work:
              Map Reduce Local Work
        Reducer 2 
            Local Work:
              Map Reduce Local Work
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)
                  outputColumnNames: _col0, _col1, _col3
                  input vertices:
                    1 Reducer 5
                  Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


{code}

After
{code}
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-2 depends on stages: Stage-3
  Stage-4 depends on stages: Stage-2
  Stage-1 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-3
    Spark
      DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:3
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Spark HashTable Sink Operator
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-2
    Spark
      Edges:
        Reducer 3 <- Map 6 (GROUP, 1)
      DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:2
      Vertices:
        Map 6 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 4
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
            Local Work:
              Map Reduce Local Work
        Reducer 3 
            Local Work:
              Map Reduce Local Work
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Spark HashTable Sink Operator
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)

  Stage: Stage-4
    Spark
      DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:4
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: b
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Spark HashTable Sink Operator
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 5 (GROUP, 1)
      DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:1
      Vertices:
        Map 5 
            Map Operator Tree:
                TableScan
                  alias: a
                  Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 4
                        Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
            Local Work:
              Map Reduce Local Work
        Reducer 2 
            Local Work:
              Map Reduce Local Work
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int)
                    1 _col0 (type: int)
                  outputColumnNames: _col0, _col1, _col3
                  input vertices:
                    1 Reducer 3
                  Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}
only the Map about table {{b}} are merged to 1 Map, there are still two Maps about table {{a}}(Map1,Map4).

The reason causes this is because [GenSparkWork|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java#L421] will split following physical operator tree once encounting RS
{code}
TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
                                        -RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
{code}  to 
{code}
Map1:TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]
Reduce2:GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
Map3:TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[25]
Reduce4:GBY[26]-RS[29]
Map5:TS[3]-FIL[42]-SEL[5]-RS[7]
{code}



> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level.  In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)