You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Arun Chandy Thomas <ar...@apple.com> on 2011/05/26 00:22:39 UTC

Null values while loading

Hi ,

I am trying to use pig to aggregate data from an applications log lines.

Most of the data in the input file have the following format:
	A	B	C	D	E	F

I am aggregating the data as follows:

A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
D = group A by (A, B,C,D,E,F);
E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
STORE E INTO '$in_dir._1' using PigStorage('\t');

In some cases i see the input lines are only : A	B	C	D  (E,F columns are missing)
Would the pig script ignore such lines.

Thanks & Regards,
Arun

Re: Null values while loading

Posted by Sven Krasser <kr...@gmail.com>.
Are the tabs for these columns still there? In that case, there should
be an empty string in there. Something like this should work then:

Y = foreach X generate
    (A == '' ? null : A),
    (B == '' ? null : B),
...

Otherwise, you could load the full line using TextLoader and then use
STRSPLIT on it to extract your columns. That allows you to check if E
and F are present.

Best,
-Sven

On Wed, May 25, 2011 at 3:43 PM, Arun Chandy Thomas
<ar...@apple.com> wrote:
> Thanks for the quick reply, but my question is a little different.
> I am sorry if i am not clear in my initial post.
>
> I want the Pig script to consider E and F as null if the values are not present in the input line.
>
> So basically all the lines should be loaded while firing :
>>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>
> irrespective of whether any of the fields are null or not.
>
> How can we achieve this?
>
> Thanks & Regards,
> Arun
> On May 25, 2011, at 3:35 PM, Alan Gates wrote:
>
>> No, but you can make it by adding:
>>
>> B = filter A by E is not null;
>>
>> Alan.
>>
>> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
>>
>>> Hi ,
>>>
>>> I am trying to use pig to aggregate data from an applications log lines.
>>>
>>> Most of the data in the input file have the following format:
>>>      A       B       C       D       E       F
>>>
>>> I am aggregating the data as follows:
>>>
>>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>>> D = group A by (A, B,C,D,E,F);
>>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>>>
>>> In some cases i see the input lines are only : A     B       C       D  (E,F columns are missing)
>>> Would the pig script ignore such lines.
>>>
>>> Thanks & Regards,
>>> Arun
>>
>
>



-- 
http://sites.google.com/site/krasser/

RE: Null values while loading

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
This will happen with Pig 0.9. You can make it happen with Pig 0.8 if you provide type information in the schema of the load statement.

Olga

-----Original Message-----
From: Arun Chandy Thomas [mailto:arunc_thomas@apple.com] 
Sent: Wednesday, May 25, 2011 3:43 PM
To: user@pig.apache.org
Subject: Re: Null values while loading

Thanks for the quick reply, but my question is a little different.
I am sorry if i am not clear in my initial post.

I want the Pig script to consider E and F as null if the values are not present in the input line.

So basically all the lines should be loaded while firing :
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);

irrespective of whether any of the fields are null or not.

How can we achieve this?

Thanks & Regards,
Arun
On May 25, 2011, at 3:35 PM, Alan Gates wrote:

> No, but you can make it by adding:
> 
> B = filter A by E is not null;
> 
> Alan.
> 
> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
> 
>> Hi ,
>> 
>> I am trying to use pig to aggregate data from an applications log lines.
>> 
>> Most of the data in the input file have the following format:
>> 	A	B	C	D	E	F
>> 
>> I am aggregating the data as follows:
>> 
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>> D = group A by (A, B,C,D,E,F);
>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>> 
>> In some cases i see the input lines are only : A	B	C	D  (E,F columns are missing)
>> Would the pig script ignore such lines.
>> 
>> Thanks & Regards,
>> Arun
> 


Re: Null values while loading

Posted by Arun Chandy Thomas <ar...@apple.com>.
Thanks for the quick reply, but my question is a little different.
I am sorry if i am not clear in my initial post.

I want the Pig script to consider E and F as null if the values are not present in the input line.

So basically all the lines should be loaded while firing :
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);

irrespective of whether any of the fields are null or not.

How can we achieve this?

Thanks & Regards,
Arun
On May 25, 2011, at 3:35 PM, Alan Gates wrote:

> No, but you can make it by adding:
> 
> B = filter A by E is not null;
> 
> Alan.
> 
> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
> 
>> Hi ,
>> 
>> I am trying to use pig to aggregate data from an applications log lines.
>> 
>> Most of the data in the input file have the following format:
>> 	A	B	C	D	E	F
>> 
>> I am aggregating the data as follows:
>> 
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>> D = group A by (A, B,C,D,E,F);
>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>> 
>> In some cases i see the input lines are only : A	B	C	D  (E,F columns are missing)
>> Would the pig script ignore such lines.
>> 
>> Thanks & Regards,
>> Arun
> 


Re: Null values while loading

Posted by Alan Gates <ga...@yahoo-inc.com>.
No, but you can make it by adding:

B = filter A by E is not null;

Alan.

On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:

> Hi ,
>
> I am trying to use pig to aggregate data from an applications log  
> lines.
>
> Most of the data in the input file have the following format:
> 	A	B	C	D	E	F
>
> I am aggregating the data as follows:
>
> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
> D = group A by (A, B,C,D,E,F);
> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as  
> hit
> STORE E INTO '$in_dir._1' using PigStorage('\t');
>
> In some cases i see the input lines are only : A	B	C	D  (E,F columns  
> are missing)
> Would the pig script ignore such lines.
>
> Thanks & Regards,
> Arun


Re: Null values while loading

Posted by Jonathan Coveney <jc...@gmail.com>.
I believe it should null them out.

2011/5/25 Arun Chandy Thomas <ar...@apple.com>

> Hi ,
>
> I am trying to use pig to aggregate data from an applications log lines.
>
> Most of the data in the input file have the following format:
>        A       B       C       D       E       F
>
> I am aggregating the data as follows:
>
> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
> D = group A by (A, B,C,D,E,F);
> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
> STORE E INTO '$in_dir._1' using PigStorage('\t');
>
> In some cases i see the input lines are only : A        B       C       D
>  (E,F columns are missing)
> Would the pig script ignore such lines.
>
> Thanks & Regards,
> Arun
>