You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2015/06/04 01:53:39 UTC
[jira] [Resolved] (HIVE-10917) ORC fails to read table with a 38Gb
ORC file
[ https://issues.apache.org/jira/browse/HIVE-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V resolved HIVE-10917.
----------------------------
Resolution: Duplicate
> ORC fails to read table with a 38Gb ORC file
> --------------------------------------------
>
> Key: HIVE-10917
> URL: https://issues.apache.org/jira/browse/HIVE-10917
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 1.3.0
> Reporter: Gopal V
>
> {code}
> hive> set mapreduce.input.fileinputformat.split.maxsize=1000000000000;
> hive> set mapreduce.input.fileinputformat.split.maxsize=1000000000000;
> hive> alter table lineitem concatenate;
> ..
> hive> dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
> Found 12 items
> -rwxr-xr-x 3 gopal supergroup 41368976599 2015-06-03 15:49 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000000_0
> -rwxr-xr-x 3 gopal supergroup 36226719673 2015-06-03 15:48 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000001_0
> -rwxr-xr-x 3 gopal supergroup 27544042018 2015-06-03 15:50 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000002_0
> -rwxr-xr-x 3 gopal supergroup 23147063608 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000003_0
> -rwxr-xr-x 3 gopal supergroup 21079035936 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000004_0
> -rwxr-xr-x 3 gopal supergroup 13813961419 2015-06-03 15:43 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000005_0
> -rwxr-xr-x 3 gopal supergroup 8155299977 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000006_0
> -rwxr-xr-x 3 gopal supergroup 6264478613 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000007_0
> -rwxr-xr-x 3 gopal supergroup 4653393054 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000008_0
> -rwxr-xr-x 3 gopal supergroup 3621672928 2015-06-03 15:39 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000009_0
> -rwxr-xr-x 3 gopal supergroup 1460919310 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000010_0
> -rwxr-xr-x 3 gopal supergroup 485129789 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000011_0
> {code}
> Errors without PPD
> Suspicions about ORC stripe padding and stream offsets in the stream information, when concatenating.
> {code}
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 36845
> at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
> at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
> at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
> at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
> at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
> at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
> ... 25 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)