You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/11/29 03:10:00 UTC
[jira] [Updated] (HIVE-18174) Vectorization: De-dup Group-by key
expressions (identical keys are irrelevant)
[ https://issues.apache.org/jira/browse/HIVE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-18174:
---------------------------
Affects Version/s: 3.0.0
> Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)
> ------------------------------------------------------------------------------
>
> Key: HIVE-18174
> URL: https://issues.apache.org/jira/browse/HIVE-18174
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 3.0.0
> Reporter: Gopal V
>
> {code}
> hive.vectorized.execution.reduce.enabled=true;
> hive.vectorized.execution.reduce.groupby.enabled=true;
> create temporary table foo (x int) stored as orc;
> insert into foo values(1),(2),(3);
> insert into foo values(1),(2),(3);
> set hive.cbo.enable=false;
> select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;
> {code}
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
> at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
> at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)
> {code}
> The key has duplicate references - {{keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)}}
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: foo
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: x (type: int)
> outputColumnNames: x
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> keys: concat('x', x) (type: string), concat('x', x) (type: string), 'Foo' (type: string), 'Foo' (type: string)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: string), 'Foo' (type: string)
> sort order: ++
> Map-reduce partition columns: _col1 (type: string), 'Foo' (type: string)
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col1 (type: string), _col1 (type: string), 'Foo' (type: string), 'Foo' (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)