You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/12/27 10:18:58 UTC
[jira] [Created] (HIVE-15516) Unable to vectorize select statement having case-when with GenericUDFOPGreaterThan expr

Rajesh Balamohan created HIVE-15516:
---------------------------------------

             Summary: Unable to vectorize select statement having case-when with GenericUDFOPGreaterThan expr
                 Key: HIVE-15516
                 URL: https://issues.apache.org/jira/browse/HIVE-15516
             Project: Hive
          Issue Type: Bug
            Reporter: Rajesh Balamohan


First query listed below does not get vectorized; Without "case-when" statement it gets vectorized.

{noformat}
hive> explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales;
explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName:
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: CASE WHEN ((ss_quantity > 1)) THEN ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double)
                    outputColumnNames: _col0
                    Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                    Group By Operator
                      aggregations: sum(_col0)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        sort order:
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col0 (type: double)
            Execution mode: llap
            LLAP IO: all inputs
        Reducer 2
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


....
....
2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1)
2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Cannot vectorize select expression: GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1), GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]), Column[ss_wholesale_cost]), Const int 0)
2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
....
....


hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales;
explain select sum(ss_quantity * ss_wholesale_cost) from store_sales
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName:
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: (UDFToDouble(ss_quantity) * ss_wholesale_cost) (type: double)
                    outputColumnNames: _col0
                    Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                    Group By Operator
                      aggregations: sum(_col0)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        sort order:
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col0 (type: double)
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Reducer 2
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)