You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Pradeep Kamath <pr...@yahoo-inc.com> on 2010/08/09 18:42:26 UTC

How are nulls represented in data?

Hi,
   What value does hive expect in the data for a column to be treated as null? I tried some permutations on a text data based table but couldn't figure out what the correct representation was. I tried empty string, the string NULL and the string null for a string column and in all three cases the "is null" operator returned false.

A couple of related questions:
 - Does the representation of null depend on the type of the column - is it different for string Vs non-string columns?
 - Is the representation of null different for different storage formats - text Vs RCFile Vs SequenceFile - I am particularly interested in text and RCFile.

Thanks in advance,

Pradeep

Re: How are nulls represented in data?

Posted by yongqiang he <he...@gmail.com>.
Yes. In LazySimpleSerde/SequenceFile/TextFile, "\N" is used as NULL.
(It is a table property: serialization.null.format)

In ColumnSerDe/RCFile, there is no NULL stored. (zero byte, column
byte length is zero).
But RCFile/ColumnarSerde also use this property when do serializing to
determine if a column is a null or not. ( This is unavoidable because
client can only pass a string to serde and let serde serialize it.
need some special charater to represent NULL).

On Mon, Aug 9, 2010 at 11:46 AM, Ning Zhang <nz...@facebook.com> wrote:
> How it is serialized/deserialized is determined by specific serde. NULL is
> serialized as \N by SimpleLazySerDe (default serde for text). RCFile
> (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe.
> Unless I missed something, NULL serialization/deserialization is type
> independent (at least in LazySimpleSerDe).
> On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote:
>
> Hi,
>    What value does hive expect in the data for a column to be treated as
> null? I tried some permutations on a text data based table but couldn’t
> figure out what the correct representation was. I tried empty string, the
> string NULL and the string null for a string column and in all three cases
> the “is null” operator returned false.
>
> A couple of related questions:
>  - Does the representation of null depend on the type of the column – is it
> different for string Vs non-string columns?
>  - Is the representation of null different for different storage formats –
> text Vs RCFile Vs SequenceFile – I am particularly interested in text and
> RCFile.
>
> Thanks in advance,
>
> Pradeep
>

Re: How are nulls represented in data?

Posted by Ning Zhang <nz...@facebook.com>.
How it is serialized/deserialized is determined by specific serde. NULL is serialized as \N by SimpleLazySerDe (default serde for text). RCFile (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe.

Unless I missed something, NULL serialization/deserialization is type independent (at least in LazySimpleSerDe).

On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote:

Hi,
   What value does hive expect in the data for a column to be treated as null? I tried some permutations on a text data based table but couldn’t figure out what the correct representation was. I tried empty string, the string NULL and the string null for a string column and in all three cases the “is null” operator returned false.

A couple of related questions:
 - Does the representation of null depend on the type of the column – is it different for string Vs non-string columns?
 - Is the representation of null different for different storage formats – text Vs RCFile Vs SequenceFile – I am particularly interested in text and RCFile.

Thanks in advance,

Pradeep