You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2016/07/25 22:12:20 UTC
[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product
reduce sink serialization
[ https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-13872:
------------------------------------
Fix Version/s: 2.2.0
> Vectorization: Fix cross-product reduce sink serialization
> ----------------------------------------------------------
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.0
> Reporter: Gopal V
> Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 projection column num 1
> at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)
> from store_sales
> ,customer_demographics
> where (
> (
> customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
> and customer_demographics.cd_marital_status = 'M'
> )or
> (
> customer_demographics.cd_demo_sk = ss_cdemo_sk
> and customer_demographics.cd_marital_status = 'U'
> ))
> ;
> {code}
> {code}
> Map 3
> Map Operator Tree:
> TableScan
> alias: customer_demographics
> Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)