You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang (JIRA)" <ji...@apache.org> on 2017/12/04 09:41:00 UTC
[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

    [ https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276526#comment-16276526 ] 

liyunzhang commented on HIVE-17486:
-----------------------------------

explain.28.scan.share.true

we can see that there is only operator (TS) in Map1, and the child of TS to the RS are belongs to another Map(Map12,Map15,Map18,Map2,Map6,Map9). So  change current {{M-R}} in 1 SparkTask to {{M-M-R}}
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Map 12 <- Map 1 (NONE, 1000)
        Map 15 <- Map 1 (NONE, 1000)
        Map 18 <- Map 1 (NONE, 1000)
        Map 2 <- Map 1 (NONE, 1000)
        Map 6 <- Map 1 (NONE, 1000)
        Map 9 <- Map 1 (NONE, 1000)
        Reducer 10 <- Map 9 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 13 <- Map 12 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 16 <- Map 15 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 19 <- Map 18 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 3 <- Map 2 (GROUP PARTITION-LEVEL SORT, 1)
        Reducer 4 <- Reducer 10 (PARTITION-LEVEL SORT, 1), Reducer 13 (PARTITION-LEVEL SORT, 1), Reducer 16 (PARTITION-LEVEL SORT, 1), Reducer 19 (PARTITION-LEVEL SORT, 1), Reducer 3 (PARTITION-LEVEL SORT, 1), Reducer 7 (PARTITION-LEVEL SORT, 1)
        Reducer 7 <- Map 6 (GROUP PARTITION-LEVEL SORT, 1)
      DagName: root_20171204042631_0435ff7e-3f10-4c84-a5fc-dc5b607497ba:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: ((ss_quantity BETWEEN 0 AND 5 and (ss_list_price BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost BETWEEN 14 AND 34)) or (ss_quantity BETWEEN 6 AND 10 and (ss_list_price BETWEEN 91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost BETWEEN 32 AND 52)) or (ss_quantity BETWEEN 11 AND 15 and (ss_list_price BETWEEN 66 AND 76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost BETWEEN 4 AND 24)) or (ss_quantity BETWEEN 16 AND 20 and (ss_list_price BETWEEN 142 AND 152 or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost BETWEEN 80 AND 100)) or (ss_quantity BETWEEN 21 AND 25 and (ss_list_price BETWEEN 135 AND 145 or ss_coupon_amt BETWEEN 14180 AND 15180 or ss_wholesale_cost BETWEEN 38 AND 58)) or (ss_quantity BETWEEN 26 AND 30 and (ss_list_price BETWEEN 28 AND 38 or ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost BETWEEN 42 AND 62))) (type: boolean)
                  Statistics: Num rows: 28800991 Data size: 4751513940 Basic stats: COMPLETE Column stats: NONE
            Execution mode: vectorized
        Map 12 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 16 AND 20 and (ss_list_price BETWEEN 142 AND 152 or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost BETWEEN 80 AND 100)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 15 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 21 AND 25 and (ss_list_price BETWEEN 135 AND 145 or ss_coupon_amt BETWEEN 14180 AND 15180 or ss_wholesale_cost BETWEEN 38 AND 58)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 18 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 26 AND 30 and (ss_list_price BETWEEN 28 AND 38 or ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost BETWEEN 42 AND 62)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 2 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 0 AND 5 and (ss_list_price BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost BETWEEN 14 AND 34)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 6 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 6 AND 10 and (ss_list_price BETWEEN 91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost BETWEEN 32 AND 52)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Map 9 
            Map Operator Tree:
                Filter Operator
                  predicate: (ss_quantity BETWEEN 11 AND 15 and (ss_list_price BETWEEN 66 AND 76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost BETWEEN 4 AND 24)) (type: boolean)
                  Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ss_list_price (type: double)
                    outputColumnNames: ss_list_price
                    Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: avg(ss_list_price), count(ss_list_price), count(DISTINCT ss_list_price)
                      keys: ss_list_price (type: double)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: double)
                        sort order: +
                        Statistics: Num rows: 1066701 Data size: 175981606 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
        Reducer 10 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)
        Reducer 13 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)
        Reducer 16 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)
        Reducer 19 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)
        Reducer 3 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)
        Reducer 4 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                     Inner Join 0 to 2
                     Inner Join 0 to 3
                     Inner Join 0 to 4
                     Inner Join 0 to 5
                keys:
                  0 
                  1 
                  2 
                  3 
                  4 
                  5 
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17
                Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE Column stats: NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 7 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), count(VALUE._col1), count(DISTINCT KEY._col0:0._col0)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: double), _col1 (type: bigint), _col2 (type: bigint)

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink


{code}

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>         Attachments: scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level.  In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)