You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2016/07/05 05:20:10 UTC
[jira] [Comment Edited] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

    [ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362017#comment-15362017 ] 

Matt McCline edited comment on HIVE-14004 at 7/5/16 5:19 AM:
-------------------------------------------------------------

The patch includes a bunch of changes originally written for HIVE-13974 that rework the handling of schemas and move to using TypeDescription class instead of arrays/lists of the OrcProto.Type class.  It solves the bug in this JIRA.

Code in OrcRawRecordMerger that manipulated the include and column name arrays has been moved to Schema Evolution in an attempt to centralize that logic and it also contributed to the solution, too.

Now, HIVE-13974 will be based on this code + more logic that needs to be added to the ORC split generation logic for handling logical/reader/file schema for inner STRUCT types and Schema Evolution.


was (Author: mmccline):
The patch includes a bunch of changes originally written for HIVE-13974 that rework the handling of schemas and move to using TypeDescription class instead of arrays/lists of the OrcProto.Type class.  It solves the bug in this JIRA.

Code in OrcRawRecordMerger that manipulated the include and column name arrays has been moved to Schema Evolution in an attempt to centralize that logic and it also contributed to the solution, too.

Now, HIVE-13974 will be based on this code + more logic that needs to be added to the ORC generation logic for handling logical/reader/file schema for inner STRUCT types and Schema Evolution.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14004
>                 URL: https://issues.apache.org/jira/browse/HIVE-14004
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Matt McCline
>         Attachments: HIVE-14004.01.patch, HIVE-14004.01.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
>     int[][] tableData = {{1,2},{3,4}};
>     runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + makeValuesClause(tableData));
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
>     Worker t = new Worker();
>     t.setThreadId((int) t.getId());
>     t.setHiveConf(hiveConf);
>     AtomicBoolean stop = new AtomicBoolean();
>     AtomicBoolean looped = new AtomicBoolean();
>     stop.set(true);
>     t.init(stop, looped);
>     t.run();
>     runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
>     runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 2");
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
>     t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) ~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) ~[classes/:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_71]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
> {noformat}
> I observed the same on a real cluster.
> Based on my observations, running Major compaction instead of minor, works fine.
> Replacing the DELETE operation with update, makes both Major/Minor run fine.
> The issue itself should be addressed by HIVE-13974 but need to make sure to add the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)