You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/11/29 03:10:00 UTC
[jira] [Created] (HIVE-18174) Vectorization: De-dup Group-by key
expressions (identical keys are irrelevant)
Gopal V created HIVE-18174:
------------------------------
Summary: Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)
Key: HIVE-18174
URL: https://issues.apache.org/jira/browse/HIVE-18174
Project: Hive
Issue Type: Bug
Reporter: Gopal V
{code}
hive.vectorized.execution.reduce.enabled=true;
hive.vectorized.execution.reduce.groupby.enabled=true;
create temporary table foo (x int) stored as orc;
insert into foo values(1),(2),(3);
insert into foo values(1),(2),(3);
set hive.cbo.enable=false;
select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;
{code}
{code}
Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)
{code}
The key has duplicate references - {{keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)}}
{code}
STAGE PLANS:
Stage: Stage-1
Tez
DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: foo
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: x (type: int)
outputColumnNames: x
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: concat('x', x) (type: string), concat('x', x) (type: string), 'Foo' (type: string), 'Foo' (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string), 'Foo' (type: string)
sort order: ++
Map-reduce partition columns: _col1 (type: string), 'Foo' (type: string)
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: string), _col1 (type: string), 'Foo' (type: string), 'Foo' (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)