You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Juan Martin Pampliega <jp...@gmail.com> on 2013/11/27 14:02:27 UTC

Error inserting data to ORC table

Hi,

I am using Hive 0.12 with Hadoop 2.2 and trying to insert data in a new ORC
table with an INSERT SELECT statement from a TEXT file based table and I am
running into the following error (I have trimmed some of the data showed in
the error):

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{"id":"1932685422","ad_id":"7325801318", .... , "account_id":"6875965212"}
        at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
        at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
        ... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 26
        at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
        at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
        at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:797)
        at
org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
...

That error is produced when the ad_id column in the destination table has
the type BIGINT. When I change the column type to STRING the insert works
fine.

>From what I see that value is no nearly big enough to cause any overflow
issues in a BIGINT.

Is this a known bug or do I have to do anything in particular for this to
work?

Thanks,
Juan.

Re: Error inserting data to ORC table

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.

Hi Juan

I was able to reproduce this issue with a different dataset. I posted a patch for this bug here https://issues.apache.org/jira/browse/HIVE-5991. Can you use this patch and see if it resolves the issue?

Thanks
Prasanth Jayachandran

On Nov 27, 2013, at 11:01 AM, Prasanth Jayachandran <pj...@hortonworks.com> wrote:

> Hi Juan
> 
> This seems like a bug in version 2 of RLE (Run Length Encoding) that was implemented in Hive 0.12. New version of RLE can be disabled by setting hive.exec.orc.write.format=“0.11”. This will fallback to old version RLE. 
> 
> The reason why changing it to string type works is that, string columns uses adaptive dictionary encoding to encode the column whereas integers uses run length encoding.
> 
> Can you file a bug for this with possible steps for reproducing this issue? Also what dataset are you using? Will it be possible to post the segment of the dataset that causes this failure along with the bug?
> 
> Thanks
> Prasanth Jayachandran
> 
> On Nov 27, 2013, at 5:02 AM, Juan Martin Pampliega <jp...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I am using Hive 0.12 with Hadoop 2.2 and trying to insert data in a new ORC table with an INSERT SELECT statement from a TEXT file based table and I am running into the following error (I have trimmed some of the data showed in the error):
>> 
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
>> {"id":"1932685422","ad_id":"7325801318", .... , "account_id":"6875965212"}
>>        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
>>        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
>>        ... 8 more
>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 26
>>        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
>>        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
>>        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:797)
>>        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
>> ...
>> 
>> That error is produced when the ad_id column in the destination table has the type BIGINT. When I change the column type to STRING the insert works fine. 
>> 
>> From what I see that value is no nearly big enough to cause any overflow issues in a BIGINT. 
>> 
>> Is this a known bug or do I have to do anything in particular for this to work?
>> 
>> Thanks,
>> Juan.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Error inserting data to ORC table

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.

Hi Juan

This seems like a bug in version 2 of RLE (Run Length Encoding) that was implemented in Hive 0.12. New version of RLE can be disabled by setting hive.exec.orc.write.format=“0.11”. This will fallback to old version RLE. 

The reason why changing it to string type works is that, string columns uses adaptive dictionary encoding to encode the column whereas integers uses run length encoding.

Can you file a bug for this with possible steps for reproducing this issue? Also what dataset are you using? Will it be possible to post the segment of the dataset that causes this failure along with the bug?

Thanks
Prasanth Jayachandran

On Nov 27, 2013, at 5:02 AM, Juan Martin Pampliega <jp...@gmail.com> wrote:

> Hi,
> 
> I am using Hive 0.12 with Hadoop 2.2 and trying to insert data in a new ORC table with an INSERT SELECT statement from a TEXT file based table and I am running into the following error (I have trimmed some of the data showed in the error):
> 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
> {"id":"1932685422","ad_id":"7325801318", .... , "account_id":"6875965212"}
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
>         ... 8 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 26
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:797)
>         at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:744)
> ...
> 
> That error is produced when the ad_id column in the destination table has the type BIGINT. When I change the column type to STRING the insert works fine. 
> 
> From what I see that value is no nearly big enough to cause any overflow issues in a BIGINT. 
> 
> Is this a known bug or do I have to do anything in particular for this to work?
> 
> Thanks,
> Juan.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.