You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2017/10/23 00:52:00 UTC
[jira] [Commented] (HIVE-17876) row.serde.deserialize broken for
non-vectorized file inputformats
[ https://issues.apache.org/jira/browse/HIVE-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214500#comment-16214500 ]
Vihang Karajgaonkar commented on HIVE-17876:
--------------------------------------------
CC: [~mmccline]
> row.serde.deserialize broken for non-vectorized file inputformats
> -----------------------------------------------------------------
>
> Key: HIVE-17876
> URL: https://issues.apache.org/jira/browse/HIVE-17876
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.4.0
> Reporter: Vihang Karajgaonkar
>
> Vectorization using {{hive.vectorized.use.row.serde.deserialize}} errors out for both Orc and Parquet input format.
> Steps to reproduce:
> {noformat}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.use.row.serde.deserialize=true;
> set hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
> set hive.vectorized.execution.enabled=true;
> explain vectorization select * from alltypesorc where cint = 528534767 limit 10;
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | PLAN VECTORIZATION: |
> | enabled: true |
> | enabledConditionsMet: [hive.vectorized.execution.enabled IS true] |
> | |
> | STAGE DEPENDENCIES: |
> | Stage-1 is a root stage |
> | Stage-0 depends on stages: Stage-1 |
> | |
> | STAGE PLANS: |
> | Stage: Stage-1 |
> | Map Reduce |
> | Map Operator Tree: |
> | TableScan |
> | alias: alltypesorc |
> | Statistics: Num rows: 12288 Data size: 2641964 Basic stats: COMPLETE Column stats: NONE |
> | Filter Operator |
> | predicate: (cint = 528534767) (type: boolean) |
> | Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE |
> | Select Operator |
> | expressions: ctinyint (type: tinyint), csmallint (type: smallint), 528534767 (type: int), cbigint (type: bigint), cfloat (type: float), cdouble (type: double), cstring1 (type: string), cstring2 (type: string), ctimestamp1 (type: timestamp), ctimestamp2 (type: timestamp), cboolean1 (type: boolean), cboolean2 (type: boolean) |
> | outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11 |
> | Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE |
> | Limit |
> | Number of rows: 10 |
> | Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE |
> | File Output Operator |
> | compressed: false |
> | Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE |
> | table: |
> | input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
> | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
> | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
> | Execution mode: vectorized |
> | Map Vectorization: |
> | enabled: true |
> | enabledConditionsMet: hive.vectorized.use.row.serde.deserialize IS true |
> | groupByVectorOutput: true |
> | inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
> | allNative: false |
> | usesVectorUDFAdaptor: false |
> | vectorized: true |
> | |
> | Stage: Stage-0 |
> | Fetch Operator |
> | limit: 10 |
> | Processor Tree: |
> | ListSink |
> | |
> +----------------------------------------------------+
> 48 rows selected (0.742 seconds)
> 0: jdbc:hive2://localhost:10000/default>
> 0: jdbc:hive2://localhost:10000/default> select * from alltypesorc where cint = 528534767 limit 10;
> Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)