You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/11/29 03:10:00 UTC

[jira] [Updated] (HIVE-18174) Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)

     [ https://issues.apache.org/jira/browse/HIVE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HIVE-18174:
---------------------------
    Affects Version/s: 3.0.0

> Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-18174
>                 URL: https://issues.apache.org/jira/browse/HIVE-18174
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>
> {code}
> hive.vectorized.execution.reduce.enabled=true;
> hive.vectorized.execution.reduce.groupby.enabled=true;
> create temporary table foo (x int) stored as orc;
> insert into foo values(1),(2),(3);
> insert into foo values(1),(2),(3);
> set hive.cbo.enable=false;
> select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;
> {code}
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
>         at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)
> {code}
> The key has duplicate references - {{keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)}}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName: 
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: foo
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: x (type: int)
>                     outputColumnNames: x
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                     Group By Operator
>                       keys: concat('x', x) (type: string), concat('x', x) (type: string), 'Foo' (type: string), 'Foo' (type: string)
>                       mode: hash
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col1 (type: string), 'Foo' (type: string)
>                         sort order: ++
>                         Map-reduce partition columns: _col1 (type: string), 'Foo' (type: string)
>                         Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col1 (type: string), _col1 (type: string), 'Foo' (type: string), 'Foo' (type: string)
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)