You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Robin Verlangen <ro...@us2.nl> on 2015/06/13 14:42:43 UTC

Hive double issues while moving around RC files between clusters

Hi there,

I was copying around RC files from an CDH hadoop 2.0 cluster to a new HDP
hadoop 2.6 cluster.

After creating a new table with the storage options RC file and LOCATION
pointing to the right direction I can query all columns, except for the
ones that are double.

I tried querying with Hive (via tez and MR), beeline, presto. None of these
work.

The error from hive is:

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 20221

    at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)

    at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)

    at
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111)

    at
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)

    at
org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)

    at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:140)

    at
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)

    at
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)

    at
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)

The error from presto is less verbose, but also indicates a lead:

Query 20150613_114049_00297_468ni failed: Double should be 8 bytes

Both point at something around the doubles which seems to be causing issues.

Around table settings, both serdes are
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' which should be
correct.
The hive versions used in the old (0.8) and new (0.14) vary quite a bit,
but it is still a valid RC file (checksums match), but only the doubles are
"stuck".

I tried
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-DecimalTypeIncompatibilitiesbetweenHive0.12.0and0.13.0
but that doesn't seem to help as well.

Any idea on how I can resolve this?

Thanks in advance!

Best regards,

Robin Verlangen
*Chief Data Architect*

W http://www.robinverlangen.nl
E robin@us2.nl

<http://goo.gl/Lt7BC>
*What is CloudPelican? <http://goo.gl/HkB3D>*

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

Re: Hive double issues while moving around RC files between clusters

Posted by Robin Verlangen <ro...@us2.nl>.
One thing I found in the change logs was this
https://issues.apache.org/jira/browse/HIVE-7041 which sounds like it might
have something to do with this. I don't use any byte datatypes in the
structure, so it would be hard to verify those.

Best regards,

Robin Verlangen
*Chief Data Architect*

W http://www.robinverlangen.nl
E robin@us2.nl

<http://goo.gl/Lt7BC>
*What is CloudPelican? <http://goo.gl/HkB3D>*

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Sat, Jun 13, 2015 at 2:42 PM, Robin Verlangen <ro...@us2.nl> wrote:

> Hi there,
>
> I was copying around RC files from an CDH hadoop 2.0 cluster to a new HDP
> hadoop 2.6 cluster.
>
> After creating a new table with the storage options RC file and LOCATION
> pointing to the right direction I can query all columns, except for the
> ones that are double.
>
> I tried querying with Hive (via tez and MR), beeline, presto. None of
> these work.
>
> The error from hive is:
>
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row [Error getting row data with exception
> java.lang.ArrayIndexOutOfBoundsException: 20221
>
>     at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>
>     at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>
>     at
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111)
>
>     at
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
>
>     at
> org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
>
>     at
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:140)
>
>     at
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
>
>     at
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)
>
>     at
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)
>
> The error from presto is less verbose, but also indicates a lead:
>
> Query 20150613_114049_00297_468ni failed: Double should be 8 bytes
>
> Both point at something around the doubles which seems to be causing
> issues.
>
> Around table settings, both serdes are
> 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' which should be
> correct.
> The hive versions used in the old (0.8) and new (0.14) vary quite a bit,
> but it is still a valid RC file (checksums match), but only the doubles are
> "stuck".
>
> I tried
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-DecimalTypeIncompatibilitiesbetweenHive0.12.0and0.13.0
> but that doesn't seem to help as well.
>
> Any idea on how I can resolve this?
>
> Thanks in advance!
>
> Best regards,
>
> Robin Verlangen
> *Chief Data Architect*
>
> W http://www.robinverlangen.nl
> E robin@us2.nl
>
> <http://goo.gl/Lt7BC>
> *What is CloudPelican? <http://goo.gl/HkB3D>*
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>