You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/12/10 23:36:12 UTC

[jira] [Created] (HIVE-9069) Simplify filter predicates for CBO

Mostafa Mokhtar created HIVE-9069:
-------------------------------------

             Summary: Simplify filter predicates for CBO
                 Key: HIVE-9069
                 URL: https://issues.apache.org/jira/browse/HIVE-9069
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 0.14.0
            Reporter: Mostafa Mokhtar
            Assignee: Laljo John Pullokkaran
             Fix For: 0.14.1


Simplify predicates for disjunctive predicates so that can get pushed down to the scan.

Looks like this is still an issue, some of the filters can be pushed down to the scan.
{code}
set hive.cbo.enable=true
set hive.stats.fetch.column.stats=true
set hive.exec.dynamic.partition.mode=nonstrict
set hive.tez.auto.reducer.parallelism=true
set hive.auto.convert.join.noconditionaltask.size=320000000
set hive.exec.reducers.bytes.per.reducer=100000000
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
set hive.support.concurrency=false
set hive.tez.exec.print.summary=true
explain  

select  substr(r_reason_desc,1,20) as r
       ,avg(ws_quantity) wq
       ,avg(wr_refunded_cash) ref
       ,avg(wr_fee) fee
 from web_sales, web_returns, web_page, customer_demographics cd1,
      customer_demographics cd2, customer_address, date_dim, reason 
 where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
   and web_sales.ws_item_sk = web_returns.wr_item_sk
   and web_sales.ws_order_number = web_returns.wr_order_number
   and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
   and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
   and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
   and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
   and reason.r_reason_sk = web_returns.wr_reason_sk
   and
   (
    (
     cd1.cd_marital_status = 'M'
     and
     cd1.cd_marital_status = cd2.cd_marital_status
     and
     cd1.cd_education_status = '4 yr Degree'
     and 
     cd1.cd_education_status = cd2.cd_education_status
     and
     ws_sales_price between 100.00 and 150.00
    )
   or
    (
     cd1.cd_marital_status = 'D'
     and
     cd1.cd_marital_status = cd2.cd_marital_status
     and
     cd1.cd_education_status = 'Primary' 
     and
     cd1.cd_education_status = cd2.cd_education_status
     and
     ws_sales_price between 50.00 and 100.00
    )
   or
    (
     cd1.cd_marital_status = 'U'
     and
     cd1.cd_marital_status = cd2.cd_marital_status
     and
     cd1.cd_education_status = 'Advanced Degree'
     and
     cd1.cd_education_status = cd2.cd_education_status
     and
     ws_sales_price between 150.00 and 200.00
    )
   )
   and
   (
    (
     ca_country = 'United States'
     and
     ca_state in ('KY', 'GA', 'NM')
     and ws_net_profit between 100 and 200  
    )
    or
    (
     ca_country = 'United States'
     and
     ca_state in ('MT', 'OR', 'IN')
     and ws_net_profit between 150 and 300  
    )
    or
    (
     ca_country = 'United States'
     and
     ca_state in ('WI', 'MO', 'WV')
     and ws_net_profit between 50 and 250  
    )
   )
group by r_reason_desc
order by r, wq, ref, fee
limit 100
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 9 <- Map 1 (BROADCAST_EDGE)
        Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
        Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
        Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
        Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
        Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
        Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
      DagName: mmokhtar_20141111161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: web_page
                  filterExpr: wp_web_page_sk is not null (type: boolean)
                  Statistics: Num rows: 4602 Data size: 2696178 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: wp_web_page_sk is not null (type: boolean)
                    Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: wp_web_page_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 10 
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: ((ca_country = 'United States') and ca_address_sk is not null) (type: boolean)
                  Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((ca_country = 'United States') and ca_address_sk is not null) (type: boolean)
                    Statistics: Num rows: 20000000 Data size: 3740000000 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ca_address_sk (type: int), ca_state (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 20000000 Data size: 1800000000 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 20000000 Data size: 1800000000 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: string)
            Execution mode: vectorized
        Map 11 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: ((d_year = 1998) and d_date_sk is not null) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((d_year = 1998) and d_date_sk is not null) (type: boolean)
                    Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 326 Data size: 1304 Basic stats: COMPLETE Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: web_sales
                            Partition key expr: ws_sold_date_sk
                            Statistics: Num rows: 326 Data size: 1304 Basic stats: COMPLETE Column stats: COMPLETE
                            Target column: ws_sold_date_sk
                            Target Vertex: Map 9
            Execution mode: vectorized
        Map 12 
            Map Operator Tree:
                TableScan
                  alias: reason
                  filterExpr: r_reason_sk is not null (type: boolean)
                  Statistics: Num rows: 72 Data size: 14400 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: r_reason_sk is not null (type: boolean)
                    Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: r_reason_sk (type: int), r_reason_desc (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 72 Data size: 7272 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: string)
            Execution mode: vectorized
        Map 13 
            Map Operator Tree:
                TableScan
                  alias: web_returns
                  filterExpr: (((((wr_refunded_cdemo_sk is not null and wr_item_sk is not null) and wr_order_number is not null) and wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and wr_reason_sk is not null) (type: boolean)
                  Statistics: Num rows: 2062802370 Data size: 185695406284 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((((wr_refunded_cdemo_sk is not null and wr_item_sk is not null) and wr_order_number is not null) and wr_returning_cdemo_sk is not null) and wr_refunded_addr_sk is not null) and wr_reason_sk is not null) (type: boolean)
                    Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: wr_item_sk (type: int), wr_refunded_cdemo_sk (type: int), wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int), wr_reason_sk (type: int), wr_order_number (type: int), wr_fee (type: float), wr_refunded_cash (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                      Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col1 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col1 (type: int)
                        Statistics: Num rows: 1875154722 Data size: 58944640412 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col0 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: float), _col7 (type: float)
            Execution mode: vectorized
        Map 14 
            Map Operator Tree:
                TableScan
                  alias: cd1
                  filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
                  Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
                    Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string)
                        sort order: +++
                        Map-reduce partition columns: _col0 (type: int), _col1 (type: string), _col2 (type: string)
                        Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: cd1
                  filterExpr: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
                  Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((cd_demo_sk is not null and cd_marital_status is not null) and cd_education_status is not null) (type: boolean)
                    Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cd_demo_sk (type: int), cd_marital_status (type: string), cd_education_status (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 1920800 Data size: 351506400 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: string), _col2 (type: string)
            Execution mode: vectorized
        Map 9 
            Map Operator Tree:
                TableScan
                  alias: web_sales
                  filterExpr: ((ws_web_page_sk is not null and ws_item_sk is not null) and ws_order_number is not null) (type: boolean)
                  Statistics: Num rows: 21594638446 Data size: 2850189889652 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((ws_web_page_sk is not null and ws_item_sk is not null) and ws_order_number is not null) (type: boolean)
                    Statistics: Num rows: 21591939929 Data size: 604541956128 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ws_item_sk (type: int), ws_web_page_sk (type: int), ws_order_number (type: int), ws_quantity (type: int), ws_sales_price (type: float), ws_net_profit (type: float), ws_sold_date_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
                      Statistics: Num rows: 21591939929 Data size: 604541956128 Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col0} {_col2} {_col3} {_col4} {_col5} {_col6}
                          1 
                        keys:
                          0 _col1 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6
                        input vertices:
                          1 Map 1
                        Statistics: Num rows: 21591939072 Data size: 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: int), _col2 (type: int)
                          sort order: ++
                          Map-reduce partition columns: _col0 (type: int), _col2 (type: int)
                          Statistics: Num rows: 21591939072 Data size: 518206537728 Basic stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col3 (type: int), _col4 (type: float), _col5 (type: float), _col6 (type: int)
            Execution mode: vectorized
        Reducer 3 
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6}
                  1 {VALUE._col0} {VALUE._col1}
                outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col9, _col10
                Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: int), _col10 (type: string), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: string)
                  outputColumnNames: _col0, _col10, _col2, _col3, _col4, _col5, _col6, _col7, _col9
                  Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: int), _col5 (type: int)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: int), _col5 (type: int)
                    Statistics: Num rows: 1875154688 Data size: 373155782912 Basic stats: COMPLETE Column stats: COMPLETE
                    value expressions: _col2 (type: int), _col3 (type: int), _col4 (type: int), _col6 (type: float), _col7 (type: float), _col9 (type: string), _col10 (type: string)
        Reducer 4 
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4}
                  1 {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col7} {VALUE._col8}
                outputColumnNames: _col3, _col4, _col5, _col6, _col10, _col11, _col12, _col14, _col15, _col17, _col18
                Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: (((_col17 = 'M') and ((_col18 = '4 yr Degree') and _col4 BETWEEN 100.0 AND 150.0)) or (((_col17 = 'D') and ((_col18 = 'Primary') and _col4 BETWEEN 50.0 AND 100.0)) or ((_col17 = 'U') and ((_col18 = 'Advanced Degree') and _col4 BETWEEN 150.0 AND 200.0)))) (type: boolean)
                  Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: _col11 (type: int), _col12 (type: int), _col14 (type: float), _col15 (type: float), _col17 (type: string), _col18 (type: string), _col3 (type: int), _col5 (type: float), _col6 (type: int), _col10 (type: int)
                    outputColumnNames: _col10, _col11, _col13, _col14, _col17, _col18, _col3, _col5, _col6, _col9
                    Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col10 (type: int), _col17 (type: string), _col18 (type: string)
                      sort order: +++
                      Map-reduce partition columns: _col10 (type: int), _col17 (type: string), _col18 (type: string)
                      Statistics: Num rows: 57653145 Data size: 11472975855 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col3 (type: int), _col5 (type: float), _col6 (type: int), _col9 (type: int), _col11 (type: int), _col13 (type: float), _col14 (type: float)
        Reducer 5 
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 
                  1 {VALUE._col3} {VALUE._col5} {VALUE._col6} {VALUE._col9} {VALUE._col10} {VALUE._col12} {VALUE._col13}
                outputColumnNames: _col6, _col8, _col9, _col12, _col14, _col16, _col17
                Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col12 (type: int), _col14 (type: int), _col16 (type: float), _col17 (type: float), _col6 (type: int), _col8 (type: float), _col9 (type: int)
                  outputColumnNames: _col12, _col14, _col16, _col17, _col6, _col8, _col9
                  Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col12 (type: int)
                    sort order: +
                    Map-reduce partition columns: _col12 (type: int)
                    Statistics: Num rows: 3187317548 Data size: 50997080768 Basic stats: COMPLETE Column stats: COMPLETE
                    value expressions: _col6 (type: int), _col8 (type: float), _col9 (type: int), _col14 (type: int), _col16 (type: float), _col17 (type: float)
        Reducer 6 
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {VALUE._col0}
                  1 {VALUE._col6} {VALUE._col8} {VALUE._col9} {VALUE._col13} {VALUE._col15} {VALUE._col16}
                outputColumnNames: _col1, _col9, _col11, _col12, _col17, _col19, _col20
                Statistics: Num rows: 1593658752 Data size: 156178557696 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: (((_col1) IN ('KY', 'GA', 'NM') and _col11 BETWEEN 100 AND 200) or (((_col1) IN ('MT', 'OR', 'IN') and _col11 BETWEEN 150 AND 300) or ((_col1) IN ('WI', 'MO', 'WV') and _col11 BETWEEN 50 AND 250))) (type: boolean)
                  Statistics: Num rows: 1195244064 Data size: 117133918272 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: _col17 (type: int), _col19 (type: float), _col20 (type: float), _col9 (type: int), _col12 (type: int)
                    outputColumnNames: _col11, _col13, _col14, _col3, _col6
                    Statistics: Num rows: 1195244064 Data size: 14342928768 Basic stats: COMPLETE Column stats: COMPLETE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 
                        1 {_col3} {_col11} {_col13} {_col14}
                      keys:
                        0 _col0 (type: int)
                        1 _col6 (type: int)
                      outputColumnNames: _col5, _col13, _col15, _col16
                      input vertices:
                        0 Map 11
                      Statistics: Num rows: 1334416318 Data size: 16012995816 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col13 (type: int), _col15 (type: float), _col16 (type: float), _col5 (type: int)
                        outputColumnNames: _col13, _col15, _col16, _col5
                        Statistics: Num rows: 1334416318 Data size: 16012995816 Basic stats: COMPLETE Column stats: COMPLETE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          condition expressions:
                            0 {_col1}
                            1 {_col5} {_col15} {_col16}
                          keys:
                            0 _col0 (type: int)
                            1 _col13 (type: int)
                          outputColumnNames: _col1, _col7, _col17, _col18
                          input vertices:
                            0 Map 12
                          Statistics: Num rows: 1334416256 Data size: 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: _col1 (type: string), _col7 (type: int), _col18 (type: float), _col17 (type: float)
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Statistics: Num rows: 1334416256 Data size: 140113706880 Basic stats: COMPLETE Column stats: COMPLETE
                            Group By Operator
                              aggregations: avg(_col1), avg(_col2), avg(_col3)
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0, _col1, _col2, _col3
                              Statistics: Num rows: 157024 Data size: 15231328 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col0 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: string)
                                Statistics: Num rows: 157024 Data size: 15231328 Basic stats: COMPLETE Column stats: COMPLETE
                                value expressions: _col1 (type: struct<count:bigint,sum:double,input:int>), _col2 (type: struct<count:bigint,sum:double,input:float>), _col3 (type: struct<count:bigint,sum:double,input:float>)
        Reducer 7 
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), avg(VALUE._col1), avg(VALUE._col2)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 112 Data size: 13552 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: substr(_col0, 1, 20) (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double)
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: string), _col1 (type: double), _col2 (type: double), _col3 (type: double)
                    sort order: ++++
                    Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE
                    TopN Hash Memory Usage: 0.04
        Reducer 8 
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), KEY.reducesinkkey3 (type: double)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 112 Data size: 23296 Basic stats: COMPLETE Column stats: COMPLETE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 100 Data size: 20800 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 100 Data size: 20800 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Execution mode: vectorized

  Stage: Stage-0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)