You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2017/10/23 00:51:00 UTC
[jira] [Created] (HIVE-17876) row.serde.deserialize broken for
non-vectorized file inputformats
Vihang Karajgaonkar created HIVE-17876:
------------------------------------------
Summary: row.serde.deserialize broken for non-vectorized file inputformats
Key: HIVE-17876
URL: https://issues.apache.org/jira/browse/HIVE-17876
Project: Hive
Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Vihang Karajgaonkar
Vectorization using {{hive.vectorized.use.row.serde.deserialize}} errors out for both Orc and Parquet input format.
Steps to reproduce:
{noformat}
set hive.fetch.task.conversion=none;
set hive.vectorized.use.row.serde.deserialize=true;
set hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
set hive.vectorized.execution.enabled=true;
explain vectorization select * from alltypesorc where cint = 528534767 limit 10;
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| PLAN VECTORIZATION: |
| enabled: true |
| enabledConditionsMet: [hive.vectorized.execution.enabled IS true] |
| |
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: alltypesorc |
| Statistics: Num rows: 12288 Data size: 2641964 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: (cint = 528534767) (type: boolean) |
| Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: ctinyint (type: tinyint), csmallint (type: smallint), 528534767 (type: int), cbigint (type: bigint), cfloat (type: float), cdouble (type: double), cstring1 (type: string), cstring2 (type: string), ctimestamp1 (type: timestamp), ctimestamp2 (type: timestamp), cboolean1 (type: boolean), cboolean2 (type: boolean) |
| outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11 |
| Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE Column stats: NONE |
| Limit |
| Number of rows: 10 |
| Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| Execution mode: vectorized |
| Map Vectorization: |
| enabled: true |
| enabledConditionsMet: hive.vectorized.use.row.serde.deserialize IS true |
| groupByVectorOutput: true |
| inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
| allNative: false |
| usesVectorUDFAdaptor: false |
| vectorized: true |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: 10 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
48 rows selected (0.742 seconds)
0: jdbc:hive2://localhost:10000/default>
0: jdbc:hive2://localhost:10000/default> select * from alltypesorc where cint = 528534767 limit 10;
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)