You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aviral Agarwal <av...@gmail.com> on 2018/02/15 10:08:41 UTC

ORC ACID table returning Array Index Out of Bounds

Hi guys,

I am running into the following error when querying a ACID table :

Caused by: java.lang.RuntimeException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 8
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
	at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
	at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
	at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
	at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
	... 14 more
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
	... 25 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
	at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
	at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
	... 26 more



Any help would be appreciated.


Regards,

Aviral Agarwal

Re: ORC ACID table returning Array Index Out of Bounds

Posted by Jason Dere <jd...@hortonworks.com>.
I've opened HIVE-18817<https://issues.apache.org/jira/browse/HIVE-18817> for this.​


________________________________
From: Aviral Agarwal <av...@gmail.com>
Sent: Thursday, February 15, 2018 6:11 PM
To: user@hive.apache.org
Subject: Re: ORC ACID table returning Array Index Out of Bounds

Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)

For now I have mitigated the problem by recreating the table. So, I don't have the relevant ORC files right now.

Also, I am curious, how would "hive.acid.key.index" help in debugging this problem ?

I was going through the source code and it seems the following line is the problem:


/**
 * Find the key range for bucket files.
 * @param reader the reader
 * @param options the options for reading with
 * @throws IOException
 */
private void discoverKeyBounds(Reader reader,
                               Reader.Options options) throws IOException {
  RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
  long offset = options.getOffset();
  long maxOffset = options.getMaxOffset();
  int firstStripe = 0;
  int stripeCount = 0;
  boolean isTail = true;
  List<StripeInformation> stripes = reader.getStripes();
  for(StripeInformation stripe: stripes) {
    if (offset > stripe.getOffset()) {
      firstStripe += 1;
    } else if (maxOffset > stripe.getOffset()) {
      stripeCount += 1;
    } else {
      isTail = false;
      break;
    }
  }
  if (firstStripe != 0) {
    minKey = keyIndex[firstStripe - 1];
  }
  if (!isTail) {
    maxKey = keyIndex[firstStripe + stripeCount - 1];
  }
}

If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.

Thanks,
Aviral Agarwal

On Feb 15, 2018 23:10, "Eugene Koifman" <ek...@hortonworks.com>> wrote:
What version of Hive is this?

Can you isolate this to a specific partition?

The table/partition you are reading should have a directory called base_x/ with several bucket_0000N files.  (if you see more than 1 base_x, take one with highest x)

Each bucket_0000N should have a “hive.acid.key.index” property in user metadata section of ORC footer.
Could you share the value of this property?

You can use orcfiledump (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility) for this but it requires https://issues.apache.org/jira/browse/ORC-223.

Thanks,
Eugene


From: Aviral Agarwal <av...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, February 15, 2018 at 2:08 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: ORC ACID table returning Array Index Out of Bounds

Hi guys,

I am running into the following error when querying a ACID table :


Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)

        at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)

        at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)

        at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)

        at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)

        at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)

        at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)

        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)

        ... 14 more

Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)

        at org.apache.hadoop.hive.io<http://org.apache.hadoop.hive.io>.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)

        ... 25 more

Caused by: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)

        at org.apache.hadoop.hive.ql.io<http://org.apache.hadoop.hive.ql.io>.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)

        ... 26 more



Any help would be appreciated.


Regards,

Aviral Agarwal

Re: ORC ACID table returning Array Index Out of Bounds

Posted by Aviral Agarwal <av...@gmail.com>.
Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)

For now I have mitigated the problem by recreating the table. So, I don't
have the relevant ORC files right now.

Also, I am curious, how would "*hive.acid.key.index*" help in debugging
this problem ?

I was going through the source code and it seems the following line is the
problem:

/**
 * Find the key range for bucket files.
 * @param reader the reader
 * @param options the options for reading with
 * @throws IOException
 */
private void discoverKeyBounds(Reader reader,
                               Reader.Options options) throws IOException {
  RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
  long offset = options.getOffset();
  long maxOffset = options.getMaxOffset();
  int firstStripe = 0;
  int stripeCount = 0;
  boolean isTail = true;
  List<StripeInformation> stripes = reader.getStripes();
  for(StripeInformation stripe: stripes) {
    if (offset > stripe.getOffset()) {
      firstStripe += 1;
    } else if (maxOffset > stripe.getOffset()) {
      stripeCount += 1;
    } else {
      isTail = false;
      break;
    }
  }
  if (firstStripe != 0) {
    minKey = keyIndex[firstStripe - 1];
  }
  if (!isTail) {
    maxKey = keyIndex[firstStripe + stripeCount - 1];
  }
}

If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.

Thanks,
Aviral Agarwal

On Feb 15, 2018 23:10, "Eugene Koifman" <ek...@hortonworks.com> wrote:

> What version of Hive is this?
>
>
>
> Can you isolate this to a specific partition?
>
>
>
> The table/partition you are reading should have a directory called base_x/
> with several bucket_0000N files.  (if you see more than 1 base_x, take one
> with highest x)
>
>
>
> Each bucket_0000N should have a “*hive.acid.key.index*” property in user
> metadata section of ORC footer.
>
> Could you share the value of this property?
>
>
>
> You can use orcfiledump (https://cwiki.apache.org/conf
> luence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORC
> FileDumpUtility) for this but it requires https://issues.apache.org/jira
> /browse/ORC-223.
>
>
>
> Thanks,
>
> Eugene
>
>
>
>
>
> *From: *Aviral Agarwal <av...@gmail.com>
> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Date: *Thursday, February 15, 2018 at 2:08 AM
> *To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Subject: *ORC ACID table returning Array Index Out of Bounds
>
>
>
> Hi guys,
>
>
>
> I am running into the following error when querying a ACID table :
>
>
>
> Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>
>         at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
>
>         at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>
>         at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>
>         at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>
>         at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
>
>         at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
>
>         at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>
>         at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>
>         at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
>
>         at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
>
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
>
>         ... 14 more
>
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>
>         at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
>
>         at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
>
>         ... 25 more
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
>
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
>
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
>
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
>
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
>
>         ... 26 more
>
>
>
>
> Any help would be appreciated.
>
>
> Regards,
>
> Aviral Agarwal
>

Re: ORC ACID table returning Array Index Out of Bounds

Posted by Eugene Koifman <ek...@hortonworks.com>.
What version of Hive is this?

Can you isolate this to a specific partition?

The table/partition you are reading should have a directory called base_x/ with several bucket_0000N files.  (if you see more than 1 base_x, take one with highest x)

Each bucket_0000N should have a “hive.acid.key.index” property in user metadata section of ORC footer.
Could you share the value of this property?

You can use orcfiledump (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility) for this but it requires https://issues.apache.org/jira/browse/ORC-223.

Thanks,
Eugene


From: Aviral Agarwal <av...@gmail.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Thursday, February 15, 2018 at 2:08 AM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: ORC ACID table returning Array Index Out of Bounds

Hi guys,

I am running into the following error when querying a ACID table :


Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)

        at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)

        at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)

        at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)

        at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)

        at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)

        at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)

        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)

        ... 14 more

Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)

        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)

        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)

        ... 25 more

Caused by: java.lang.ArrayIndexOutOfBoundsException: 8

        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)

        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)

        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)

        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)

        ... 26 more



Any help would be appreciated.


Regards,

Aviral Agarwal