You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aviral Agarwal <av...@gmail.com> on 2018/02/15 10:08:41 UTC
ORC ACID table returning Array Index Out of Bounds
Hi guys,
I am running into the following error when querying a ACID table :
Caused by: java.lang.RuntimeException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
... 25 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
... 26 more
Any help would be appreciated.
Regards,
Aviral Agarwal
Re: ORC ACID table returning Array Index Out of Bounds
Posted by Jason Dere <jd...@hortonworks.com>.
I've opened HIVE-18817<https://issues.apache.org/jira/browse/HIVE-18817> for this.
________________________________
From: Aviral Agarwal <av...@gmail.com>
Sent: Thursday, February 15, 2018 6:11 PM
To: user@hive.apache.org
Subject: Re: ORC ACID table returning Array Index Out of Bounds
Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)
For now I have mitigated the problem by recreating the table. So, I don't have the relevant ORC files right now.
Also, I am curious, how would "hive.acid.key.index" help in debugging this problem ?
I was going through the source code and it seems the following line is the problem:
/**
* Find the key range for bucket files.
* @param reader the reader
* @param options the options for reading with
* @throws IOException
*/
private void discoverKeyBounds(Reader reader,
Reader.Options options) throws IOException {
RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
long offset = options.getOffset();
long maxOffset = options.getMaxOffset();
int firstStripe = 0;
int stripeCount = 0;
boolean isTail = true;
List<StripeInformation> stripes = reader.getStripes();
for(StripeInformation stripe: stripes) {
if (offset > stripe.getOffset()) {
firstStripe += 1;
} else if (maxOffset > stripe.getOffset()) {
stripeCount += 1;
} else {
isTail = false;
break;
}
}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
}
If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.
Thanks,
Aviral Agarwal
On Feb 15, 2018 23:10, "Eugene Koifman" <ek...@hortonworks.com>> wrote:
What version of Hive is this?
Can you isolate this to a specific partition?
The table/partition you are reading should have a directory called base_x/ with several bucket_0000N files. (if you see more than 1 base_x, take one with highest x)
Each bucket_0000N should have a “hive.acid.key.index” property in user metadata section of ORC footer.
Could you share the value of this property?
You can use orcfiledump (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility) for this but it requires https://issues.apache.org/jira/browse/ORC-223.
Thanks,
Eugene
From: Aviral Agarwal <av...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, February 15, 2018 at 2:08 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: ORC ACID table returning Array Index Out of Bounds
Hi guys,
I am running into the following error when querying a ACID table :
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
... 25 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
... 26 more
Any help would be appreciated.
Regards,
Aviral Agarwal
Re: ORC ACID table returning Array Index Out of Bounds
Posted by Aviral Agarwal <av...@gmail.com>.
Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)
For now I have mitigated the problem by recreating the table. So, I don't
have the relevant ORC files right now.
Also, I am curious, how would "*hive.acid.key.index*" help in debugging
this problem ?
I was going through the source code and it seems the following line is the
problem:
/**
* Find the key range for bucket files.
* @param reader the reader
* @param options the options for reading with
* @throws IOException
*/
private void discoverKeyBounds(Reader reader,
Reader.Options options) throws IOException {
RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
long offset = options.getOffset();
long maxOffset = options.getMaxOffset();
int firstStripe = 0;
int stripeCount = 0;
boolean isTail = true;
List<StripeInformation> stripes = reader.getStripes();
for(StripeInformation stripe: stripes) {
if (offset > stripe.getOffset()) {
firstStripe += 1;
} else if (maxOffset > stripe.getOffset()) {
stripeCount += 1;
} else {
isTail = false;
break;
}
}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
}
If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.
Thanks,
Aviral Agarwal
On Feb 15, 2018 23:10, "Eugene Koifman" <ek...@hortonworks.com> wrote:
> What version of Hive is this?
>
>
>
> Can you isolate this to a specific partition?
>
>
>
> The table/partition you are reading should have a directory called base_x/
> with several bucket_0000N files. (if you see more than 1 base_x, take one
> with highest x)
>
>
>
> Each bucket_0000N should have a “*hive.acid.key.index*” property in user
> metadata section of ORC footer.
>
> Could you share the value of this property?
>
>
>
> You can use orcfiledump (https://cwiki.apache.org/conf
> luence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORC
> FileDumpUtility) for this but it requires https://issues.apache.org/jira
> /browse/ORC-223.
>
>
>
> Thanks,
>
> Eugene
>
>
>
>
>
> *From: *Aviral Agarwal <av...@gmail.com>
> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Date: *Thursday, February 15, 2018 at 2:08 AM
> *To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Subject: *ORC ACID table returning Array Index Out of Bounds
>
>
>
> Hi guys,
>
>
>
> I am running into the following error when querying a ACID table :
>
>
>
> Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
>
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>
> at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>
> at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>
> at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
>
> at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
>
> at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>
> at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
>
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
>
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
>
> ... 14 more
>
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
>
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
>
> ... 25 more
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
>
> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
>
> at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
>
> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
>
> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
>
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
>
> ... 26 more
>
>
>
>
> Any help would be appreciated.
>
>
> Regards,
>
> Aviral Agarwal
>
Re: ORC ACID table returning Array Index Out of Bounds
Posted by Eugene Koifman <ek...@hortonworks.com>.
What version of Hive is this?
Can you isolate this to a specific partition?
The table/partition you are reading should have a directory called base_x/ with several bucket_0000N files. (if you see more than 1 base_x, take one with highest x)
Each bucket_0000N should have a “hive.acid.key.index” property in user metadata section of ORC footer.
Could you share the value of this property?
You can use orcfiledump (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility) for this but it requires https://issues.apache.org/jira/browse/ORC-223.
Thanks,
Eugene
From: Aviral Agarwal <av...@gmail.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Thursday, February 15, 2018 at 2:08 AM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: ORC ACID table returning Array Index Out of Bounds
Hi guys,
I am running into the following error when querying a ACID table :
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
... 25 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
... 26 more
Any help would be appreciated.
Regards,
Aviral Agarwal