You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alexander Schätzle <al...@yahoo.com> on 2010/06/07 09:27:08 UTC

Unable to store alias

Hi all,

my script looks like this:

A = LOAD 'left_rel.txt' AS (var1, var2);
B = LOAD 'right_rel.txt' AS (var1, var3);
C = JOIN A BY var1 LEFT OUTER, B BY var1;
D = FILTER C BY $2 is null;
DUMP D;

But when I dump D I get the error "Unable to store alias D".
I suppose there is something going wrong with the Filter vor null-values (is not null also doesn't work).
What I want to do is to filter for the tuples in A which do not find a Join partner in B
Input files are attached.

Does anybody know what's going on and how to fix this?
By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig 0.5.0.

Thx in advance,
Alex


Re: Unable to store alias

Posted by Sandip Bhattacharya <sa...@foss-community.com>.
I am using Pig 0.7 w/ stock Apache Hadoop 0.20.2. Works on both local
and mapreduce mode.

$ pig -d WARN test.pig
...
(c,x,,)

$ cat left_rel.txt
a	x
a	y
b	x
b	y
c	x

$ cat right_rel.txt
a	5
a	10
b	5
b	10

$ cat test.pig
A = LOAD 'left_rel.txt' AS (var1, var2);
B = LOAD 'right_rel.txt' AS (var1, var3);
C = JOIN A BY var1 LEFT OUTER, B BY var1;
D = FILTER C BY $2 is null;
DUMP D;

- Sandip


On Mon, Jun 7, 2010 at 11:18 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> I can reproduce this in 0.6, and it appears to have nothing to do with your
> data or with the DUMP operator -- a simple "explain" on D causes the same
> problem. Looks like there is something wrong with how the query plan gets
> compiled:
>
> Caused by: java.lang.NullPointerException
>        at org.apache.pig.impl.plan.OperatorPlan.add(OperatorPlan.java:152)
>        at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.generateStorePlan(QueryParser.java:128)
>        at org.apache.pig.PigServer.store(PigServer.java:552)
>        ... 7 more
>
>
> Haven't tried on 0.7
>
> -D
>
>
> On Mon, Jun 7, 2010 at 5:10 AM, Alexander Schätzle <
> alexander.schaetzle@yahoo.com> wrote:
>
>> I exchanged the FILTER statement by a SPLIT:
>>
>> SPLIT C into D if var3 is null, E if var3 is not null;
>>
>> Now, this works!
>> Obviously there is a problem with null-values in the FILTER statement.
>> Does anybody know what's the problem?
>>
>> Cheers,
>> Alex
>>
>>
>>
>> ________________________________
>> Von: Rekha Joshi <re...@yahoo-inc.com>
>> An: "pig-user@hadoop.apache.org" <pi...@hadoop.apache.org>
>> Gesendet: Montag, den 7. Juni 2010, 10:22:19 Uhr
>> Betreff: Re: Unable to store alias
>>
>> Offhand I think its dump faulty behavior after join combined with datatype
>> misinterpretation, you can use store and that might work. However I would
>> try using a foreach generate stmt after C and then filter..
>>
>> D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3;
>> E = filter D by fvar3 is null;
>> Dump E; //verify result at null
>> E = filter D by fvar3 is not null;
>> Dump E; //Verify results for not null
>>
>> Cheers,
>> /R
>>
>> On 6/7/10 12:57 PM, "Alexander SchÀtzle" <al...@yahoo.com>
>> wrote:
>>
>> Hi all,
>>
>> my script looks like this:
>>
>> A = LOAD 'left_rel.txt' AS (var1, var2);
>> B = LOAD 'right_rel.txt' AS (var1, var3);
>> C = JOIN A BY var1 LEFT OUTER, B BY var1;
>> D = FILTER C BY $2 is null;
>> DUMP D;
>>
>> But when I dump D I get the error "Unable to store alias D".
>> I suppose there is something going wrong with the Filter vor null-values
>> (is not null also doesn't work).
>> What I want to do is to filter for the tuples in A which do not find a Join
>> partner in B
>> Input files are attached.
>>
>> Does anybody know what's going on and how to fix this?
>> By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig
>> 0.5.0.
>>
>> Thx in advance,
>> Alex
>>
>>
>



-- 
http://www.pedalogue.com

Re: Unable to store alias

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I can reproduce this in 0.6, and it appears to have nothing to do with your
data or with the DUMP operator -- a simple "explain" on D causes the same
problem. Looks like there is something wrong with how the query plan gets
compiled:

Caused by: java.lang.NullPointerException
        at org.apache.pig.impl.plan.OperatorPlan.add(OperatorPlan.java:152)
        at
org.apache.pig.impl.logicalLayer.parser.QueryParser.generateStorePlan(QueryParser.java:128)
        at org.apache.pig.PigServer.store(PigServer.java:552)
        ... 7 more


Haven't tried on 0.7

-D


On Mon, Jun 7, 2010 at 5:10 AM, Alexander Schätzle <
alexander.schaetzle@yahoo.com> wrote:

> I exchanged the FILTER statement by a SPLIT:
>
> SPLIT C into D if var3 is null, E if var3 is not null;
>
> Now, this works!
> Obviously there is a problem with null-values in the FILTER statement.
> Does anybody know what's the problem?
>
> Cheers,
> Alex
>
>
>
> ________________________________
> Von: Rekha Joshi <re...@yahoo-inc.com>
> An: "pig-user@hadoop.apache.org" <pi...@hadoop.apache.org>
> Gesendet: Montag, den 7. Juni 2010, 10:22:19 Uhr
> Betreff: Re: Unable to store alias
>
> Offhand I think its dump faulty behavior after join combined with datatype
> misinterpretation, you can use store and that might work. However I would
> try using a foreach generate stmt after C and then filter..
>
> D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3;
> E = filter D by fvar3 is null;
> Dump E; //verify result at null
> E = filter D by fvar3 is not null;
> Dump E; //Verify results for not null
>
> Cheers,
> /R
>
> On 6/7/10 12:57 PM, "Alexander SchÀtzle" <al...@yahoo.com>
> wrote:
>
> Hi all,
>
> my script looks like this:
>
> A = LOAD 'left_rel.txt' AS (var1, var2);
> B = LOAD 'right_rel.txt' AS (var1, var3);
> C = JOIN A BY var1 LEFT OUTER, B BY var1;
> D = FILTER C BY $2 is null;
> DUMP D;
>
> But when I dump D I get the error "Unable to store alias D".
> I suppose there is something going wrong with the Filter vor null-values
> (is not null also doesn't work).
> What I want to do is to filter for the tuples in A which do not find a Join
> partner in B
> Input files are attached.
>
> Does anybody know what's going on and how to fix this?
> By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig
> 0.5.0.
>
> Thx in advance,
> Alex
>
>

AW: Unable to store alias

Posted by Alexander Schätzle <al...@yahoo.com>.
I exchanged the FILTER statement by a SPLIT:

SPLIT C into D if var3 is null, E if var3 is not null;

Now, this works!
Obviously there is a problem with null-values in the FILTER statement.
Does anybody know what's the problem?

Cheers,
Alex



________________________________
Von: Rekha Joshi <re...@yahoo-inc.com>
An: "pig-user@hadoop.apache.org" <pi...@hadoop.apache.org>
Gesendet: Montag, den 7. Juni 2010, 10:22:19 Uhr
Betreff: Re: Unable to store alias

Offhand I think its dump faulty behavior after join combined with datatype misinterpretation, you can use store and that might work. However I would try using a foreach generate stmt after C and then filter..

D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3;
E = filter D by fvar3 is null;
Dump E; //verify result at null
E = filter D by fvar3 is not null;
Dump E; //Verify results for not null

Cheers,
/R

On 6/7/10 12:57 PM, "Alexander SchÀtzle" <al...@yahoo.com> wrote:

Hi all,

my script looks like this:

A = LOAD 'left_rel.txt' AS (var1, var2);
B = LOAD 'right_rel.txt' AS (var1, var3);
C = JOIN A BY var1 LEFT OUTER, B BY var1;
D = FILTER C BY $2 is null;
DUMP D;

But when I dump D I get the error "Unable to store alias D".
I suppose there is something going wrong with the Filter vor null-values (is not null also doesn't work).
What I want to do is to filter for the tuples in A which do not find a Join partner in B
Input files are attached.

Does anybody know what's going on and how to fix this?
By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig 0.5.0.

Thx in advance,
Alex


Re: Unable to store alias

Posted by Rekha Joshi <re...@yahoo-inc.com>.
Offhand I think its dump faulty behavior after join combined with datatype misinterpretation, you can use store and that might work. However I would try using a foreach generate stmt after C and then filter..

D = foreach C generate $0 as fvar1, $1 as fvar2, (chararray)$2 as fvar3;
E = filter D by fvar3 is null;
Dump E; //verify result at null
E = filter D by fvar3 is not null;
Dump E; //Verify results for not null

Cheers,
/R

On 6/7/10 12:57 PM, "Alexander SchÀtzle" <al...@yahoo.com> wrote:

Hi all,

my script looks like this:

A = LOAD 'left_rel.txt' AS (var1, var2);
B = LOAD 'right_rel.txt' AS (var1, var3);
C = JOIN A BY var1 LEFT OUTER, B BY var1;
D = FILTER C BY $2 is null;
DUMP D;

But when I dump D I get the error "Unable to store alias D".
I suppose there is something going wrong with the Filter vor null-values (is not null also doesn't work).
What I want to do is to filter for the tuples in A which do not find a Join partner in B
Input files are attached.

Does anybody know what's going on and how to fix this?
By the way, I'm using Cloudera Distribution for Hadoop 3 Beta with pig 0.5.0.

Thx in advance,
Alex