You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jose Rozanec <jo...@mercadolibre.com> on 2016/04/01 21:09:21 UTC

Hive parquet on EMR

Hello,

We have a Hive (v 1.0.0) cluster at EMR and data stored in parquet files.
When querying data, it fails to return result, causing a NPE. We think the
error may be related with Hive deserialization, since can query data
without problems when using other technologies (ex.: Presto).

Here we provide the stacktrace of the exception we get
<http://pastebin.com/6SfNCrzj>. We checked Hive Parquet support
<https://cwiki.apache.org/confluence/display/Hive/Parquet> and release notes
<https://github.com/apache/hive/blob/master/RELEASE_NOTES.txt>, and seems
map and struct types are supported for this version (1.0.0) Our Parquet
schema involves the following types: string, boolean, map, struct.

Did anyone had a similar experience? Is there a workaround for this?

Thank you in advance,

Re: Hive parquet on EMR

Posted by Jose Rozanec <jo...@mercadolibre.com>.
Hello Nicholas,

Thank you for the quick response. All our fields are defined in lowercase,
and queried as such: at most, they are separated by underscore. We tried,
ex.:
- query all fields of the table (*), and results in error
- query a single field whose type is string, and still fails with same
error.

If this is a Hive Parquet support issue, could be solved implementing a
custom SerDe that references a newer version of Parquet? Perhaps did
someone work in this direction or could provide advice on this?

Thanks,



2016-04-01 16:34 GMT-03:00 Nicholas Hakobian <
nicholas.hakobian@rallyhealth.com>:

> Make sure your column names in the struct exactly matches the case in the
> table create statement. We just decided to make everything lowercase, but
> occasionally someone forgets and makes one of the characters upper case and
> Hive fails.
>
> There was a fix for this in Hive, but it only fixed querying with mixed
> case in top level column names, not columns nested in structs.
>
> Hope this helps,
>
> Nick
>
> Nicholas Szandor Hakobian
> Data Scientist
> Rally Health
> nicholas.hakobian@rallyhealth.com
>
> On Fri, Apr 1, 2016 at 12:09 PM, Jose Rozanec <
> jose.rozanec@mercadolibre.com> wrote:
>
>> Hello,
>>
>> We have a Hive (v 1.0.0) cluster at EMR and data stored in parquet files.
>> When querying data, it fails to return result, causing a NPE. We think the
>> error may be related with Hive deserialization, since can query data
>> without problems when using other technologies (ex.: Presto).
>>
>> Here we provide the stacktrace of the exception we get
>> <http://pastebin.com/6SfNCrzj>. We checked Hive Parquet support
>> <https://cwiki.apache.org/confluence/display/Hive/Parquet> and release
>> notes <https://github.com/apache/hive/blob/master/RELEASE_NOTES.txt>,
>> and seems map and struct types are supported for this version (1.0.0) Our
>> Parquet schema involves the following types: string, boolean, map, struct.
>>
>> Did anyone had a similar experience? Is there a workaround for this?
>>
>> Thank you in advance,
>>
>>
>>
>

Re: Hive parquet on EMR

Posted by Nicholas Hakobian <ni...@rallyhealth.com>.
Make sure your column names in the struct exactly matches the case in the
table create statement. We just decided to make everything lowercase, but
occasionally someone forgets and makes one of the characters upper case and
Hive fails.

There was a fix for this in Hive, but it only fixed querying with mixed
case in top level column names, not columns nested in structs.

Hope this helps,

Nick

Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com

On Fri, Apr 1, 2016 at 12:09 PM, Jose Rozanec <jose.rozanec@mercadolibre.com
> wrote:

> Hello,
>
> We have a Hive (v 1.0.0) cluster at EMR and data stored in parquet files.
> When querying data, it fails to return result, causing a NPE. We think the
> error may be related with Hive deserialization, since can query data
> without problems when using other technologies (ex.: Presto).
>
> Here we provide the stacktrace of the exception we get
> <http://pastebin.com/6SfNCrzj>. We checked Hive Parquet support
> <https://cwiki.apache.org/confluence/display/Hive/Parquet> and release
> notes <https://github.com/apache/hive/blob/master/RELEASE_NOTES.txt>, and
> seems map and struct types are supported for this version (1.0.0) Our
> Parquet schema involves the following types: string, boolean, map, struct.
>
> Did anyone had a similar experience? Is there a workaround for this?
>
> Thank you in advance,
>
>
>