You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/12/27 10:18:58 UTC
[jira] [Created] (HIVE-15516) Unable to vectorize select statement
having case-when with GenericUDFOPGreaterThan expr
Rajesh Balamohan created HIVE-15516:
---------------------------------------
Summary: Unable to vectorize select statement having case-when with GenericUDFOPGreaterThan expr
Key: HIVE-15516
URL: https://issues.apache.org/jira/browse/HIVE-15516
Project: Hive
Issue Type: Bug
Reporter: Rajesh Balamohan
First query listed below does not get vectorized; Without "case-when" statement it gets vectorized.
{noformat}
hive> explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales;
explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: CASE WHEN ((ss_quantity > 1)) THEN ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double)
outputColumnNames: _col0
Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: sum(_col0)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: double)
Execution mode: llap
LLAP IO: all inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
....
....
2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1)
2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Cannot vectorize select expression: GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1), GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]), Column[ss_wholesale_cost]), Const int 0)
2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
....
....
hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales;
explain select sum(ss_quantity * ss_wholesale_cost) from store_sales
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: (UDFToDouble(ss_quantity) * ss_wholesale_cost) (type: double)
outputColumnNames: _col0
Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: sum(_col0)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: double)
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)