You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Tor Ivry <to...@gmail.com> on 2014/08/17 14:07:11 UTC

Hive queries returning all NULL values.

Hi



I have a hive (0.11) table with the following create syntax:



CREATE EXTERNAL TABLE events(

…

)

PARTITIONED BY(dt string)

  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'

  STORED AS

    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"

    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"

    LOCATION '/data-events/success’;



Query runs fine.


I add hdfs partitions (containing snappy.parquet files).



When I run

hive

> select count(*) from events where dt=“20140815”

I get the correct result



*Problem:*

When I run

hive

> select * from events where dt=“20140815” limit 1;

I get

OK

NULL NULL NULL NULL NULL NULL NULL 20140815



*The same query in Impala returns the correct values.*



Any idea what could be the issue?



Thanks

Tor

Re: Hive queries returning all NULL values.

Posted by Tor Ivry <to...@gmail.com>.

Raymond - you were the closest.
Parquet field names contained '::' ex. bag1::user_name

Hope it will help anyone in the future

Thanks for all your help

Tor



On Sun, Aug 17, 2014 at 7:50 PM, Raymond Lau <ra...@gmail.com>
wrote:

> Do your field names in your parquet files contain upper case letters by
> any chance ex. userName?  Hive will not read the data of external tables if
> they are not completely lower case field names, it doesn't convert them
> properly in the case of external tables.
> On Aug 17, 2014 8:00 AM, "hadoop hive" <ha...@gmail.com> wrote:
>
>> Take a small set of data like 2-5 line and insert it...
>>
>> After that you can try insert first 10 column and then next 10 till you
>> fund your problematic column
>> On Aug 17, 2014 8:37 PM, "Tor Ivry" <to...@gmail.com> wrote:
>>
>>> Is there any way to debug this?
>>>
>>> We are talking about many fields here.
>>> How can I see which field has the mismatch?
>>>
>>>
>>>
>>> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive <ha...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> You check the data type you have provided while creating external
>>>> table, it should match with data in files.
>>>>
>>>> Thanks
>>>> Vikas Srivastava
>>>> On Aug 17, 2014 7:07 PM, "Tor Ivry" <to...@gmail.com> wrote:
>>>>
>>>>>  Hi
>>>>>
>>>>>
>>>>>
>>>>> I have a hive (0.11) table with the following create syntax:
>>>>>
>>>>>
>>>>>
>>>>> CREATE EXTERNAL TABLE events(
>>>>>
>>>>> …
>>>>>
>>>>> )
>>>>>
>>>>> PARTITIONED BY(dt string)
>>>>>
>>>>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>>>>
>>>>>   STORED AS
>>>>>
>>>>>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>>>>
>>>>>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>>>>
>>>>>     LOCATION '/data-events/success’;
>>>>>
>>>>>
>>>>>
>>>>> Query runs fine.
>>>>>
>>>>>
>>>>> I add hdfs partitions (containing snappy.parquet files).
>>>>>
>>>>>
>>>>>
>>>>> When I run
>>>>>
>>>>> hive
>>>>>
>>>>> > select count(*) from events where dt=“20140815”
>>>>>
>>>>> I get the correct result
>>>>>
>>>>>
>>>>>
>>>>> *Problem:*
>>>>>
>>>>> When I run
>>>>>
>>>>> hive
>>>>>
>>>>> > select * from events where dt=“20140815” limit 1;
>>>>>
>>>>> I get
>>>>>
>>>>> OK
>>>>>
>>>>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>>>>
>>>>>
>>>>>
>>>>> *The same query in Impala returns the correct values.*
>>>>>
>>>>>
>>>>>
>>>>> Any idea what could be the issue?
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Tor
>>>>>
>>>>
>>>

Re: Hive queries returning all NULL values.

Posted by Raymond Lau <ra...@gmail.com>.

Do your field names in your parquet files contain upper case letters by any
chance ex. userName?  Hive will not read the data of external tables if
they are not completely lower case field names, it doesn't convert them
properly in the case of external tables.
On Aug 17, 2014 8:00 AM, "hadoop hive" <ha...@gmail.com> wrote:

> Take a small set of data like 2-5 line and insert it...
>
> After that you can try insert first 10 column and then next 10 till you
> fund your problematic column
> On Aug 17, 2014 8:37 PM, "Tor Ivry" <to...@gmail.com> wrote:
>
>> Is there any way to debug this?
>>
>> We are talking about many fields here.
>> How can I see which field has the mismatch?
>>
>>
>>
>> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive <ha...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> You check the data type you have provided while creating external table,
>>> it should match with data in files.
>>>
>>> Thanks
>>> Vikas Srivastava
>>> On Aug 17, 2014 7:07 PM, "Tor Ivry" <to...@gmail.com> wrote:
>>>
>>>>  Hi
>>>>
>>>>
>>>>
>>>> I have a hive (0.11) table with the following create syntax:
>>>>
>>>>
>>>>
>>>> CREATE EXTERNAL TABLE events(
>>>>
>>>> …
>>>>
>>>> )
>>>>
>>>> PARTITIONED BY(dt string)
>>>>
>>>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>>>
>>>>   STORED AS
>>>>
>>>>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>>>
>>>>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>>>
>>>>     LOCATION '/data-events/success’;
>>>>
>>>>
>>>>
>>>> Query runs fine.
>>>>
>>>>
>>>> I add hdfs partitions (containing snappy.parquet files).
>>>>
>>>>
>>>>
>>>> When I run
>>>>
>>>> hive
>>>>
>>>> > select count(*) from events where dt=“20140815”
>>>>
>>>> I get the correct result
>>>>
>>>>
>>>>
>>>> *Problem:*
>>>>
>>>> When I run
>>>>
>>>> hive
>>>>
>>>> > select * from events where dt=“20140815” limit 1;
>>>>
>>>> I get
>>>>
>>>> OK
>>>>
>>>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>>>
>>>>
>>>>
>>>> *The same query in Impala returns the correct values.*
>>>>
>>>>
>>>>
>>>> Any idea what could be the issue?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Tor
>>>>
>>>
>>

Re: Hive queries returning all NULL values.

Posted by hadoop hive <ha...@gmail.com>.

Take a small set of data like 2-5 line and insert it...

After that you can try insert first 10 column and then next 10 till you
fund your problematic column
On Aug 17, 2014 8:37 PM, "Tor Ivry" <to...@gmail.com> wrote:

> Is there any way to debug this?
>
> We are talking about many fields here.
> How can I see which field has the mismatch?
>
>
>
> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive <ha...@gmail.com> wrote:
>
>> Hi,
>>
>> You check the data type you have provided while creating external table,
>> it should match with data in files.
>>
>> Thanks
>> Vikas Srivastava
>> On Aug 17, 2014 7:07 PM, "Tor Ivry" <to...@gmail.com> wrote:
>>
>>>  Hi
>>>
>>>
>>>
>>> I have a hive (0.11) table with the following create syntax:
>>>
>>>
>>>
>>> CREATE EXTERNAL TABLE events(
>>>
>>> …
>>>
>>> )
>>>
>>> PARTITIONED BY(dt string)
>>>
>>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>>
>>>   STORED AS
>>>
>>>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>>
>>>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>>
>>>     LOCATION '/data-events/success’;
>>>
>>>
>>>
>>> Query runs fine.
>>>
>>>
>>> I add hdfs partitions (containing snappy.parquet files).
>>>
>>>
>>>
>>> When I run
>>>
>>> hive
>>>
>>> > select count(*) from events where dt=“20140815”
>>>
>>> I get the correct result
>>>
>>>
>>>
>>> *Problem:*
>>>
>>> When I run
>>>
>>> hive
>>>
>>> > select * from events where dt=“20140815” limit 1;
>>>
>>> I get
>>>
>>> OK
>>>
>>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>>
>>>
>>>
>>> *The same query in Impala returns the correct values.*
>>>
>>>
>>>
>>> Any idea what could be the issue?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Tor
>>>
>>
>

Re: Hive queries returning all NULL values.

Posted by Tor Ivry <to...@gmail.com>.

Is there any way to debug this?

We are talking about many fields here.
How can I see which field has the mismatch?



On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive <ha...@gmail.com> wrote:

> Hi,
>
> You check the data type you have provided while creating external table,
> it should match with data in files.
>
> Thanks
> Vikas Srivastava
> On Aug 17, 2014 7:07 PM, "Tor Ivry" <to...@gmail.com> wrote:
>
>>  Hi
>>
>>
>>
>> I have a hive (0.11) table with the following create syntax:
>>
>>
>>
>> CREATE EXTERNAL TABLE events(
>>
>> …
>>
>> )
>>
>> PARTITIONED BY(dt string)
>>
>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>
>>   STORED AS
>>
>>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>
>>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>
>>     LOCATION '/data-events/success’;
>>
>>
>>
>> Query runs fine.
>>
>>
>> I add hdfs partitions (containing snappy.parquet files).
>>
>>
>>
>> When I run
>>
>> hive
>>
>> > select count(*) from events where dt=“20140815”
>>
>> I get the correct result
>>
>>
>>
>> *Problem:*
>>
>> When I run
>>
>> hive
>>
>> > select * from events where dt=“20140815” limit 1;
>>
>> I get
>>
>> OK
>>
>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>
>>
>>
>> *The same query in Impala returns the correct values.*
>>
>>
>>
>> Any idea what could be the issue?
>>
>>
>>
>> Thanks
>>
>> Tor
>>
>

Re: Hive queries returning all NULL values.

Posted by hadoop hive <ha...@gmail.com>.

Hi,

You check the data type you have provided while creating external table, it
should match with data in files.

Thanks
Vikas Srivastava
On Aug 17, 2014 7:07 PM, "Tor Ivry" <to...@gmail.com> wrote:

> Hi
>
>
>
> I have a hive (0.11) table with the following create syntax:
>
>
>
> CREATE EXTERNAL TABLE events(
>
> …
>
> )
>
> PARTITIONED BY(dt string)
>
>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>
>   STORED AS
>
>     INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>
>     OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>
>     LOCATION '/data-events/success’;
>
>
>
> Query runs fine.
>
>
> I add hdfs partitions (containing snappy.parquet files).
>
>
>
> When I run
>
> hive
>
> > select count(*) from events where dt=“20140815”
>
> I get the correct result
>
>
>
> *Problem:*
>
> When I run
>
> hive
>
> > select * from events where dt=“20140815” limit 1;
>
> I get
>
> OK
>
> NULL NULL NULL NULL NULL NULL NULL 20140815
>
>
>
> *The same query in Impala returns the correct values.*
>
>
>
> Any idea what could be the issue?
>
>
>
> Thanks
>
> Tor
>