You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Daniel Eklund <do...@gmail.com> on 2011/05/20 21:22:19 UTC

can I not project into the group tuple from FILTER?

If I can access the implicit 'group' column from within FOREACH like this:

GROUPED = GROUP InputRelVar by (firstDim,secondDim);
B = FOREACH GROUPED  GENERATE   group.firstDim;

... then should I not be able to do something like this?

B1 = FILTER GROUPED by group.firstDim == 'something';

I get messages like this:
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.Tuple
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)

Interestingly I can use the 'group' alias overall like
B2 = FILTER GROUPED by group is not null;


Any explanations of what I am doing incorrect here?

thanks,
daniel

Re: can I not project into the group tuple from FILTER?

Posted by Daniel Eklund <do...@gmail.com>.
So when I do something like this:
---------
my_data = LOAD 'test.txt' using PigStorage(',')
      as (name:chararray, age:int, eye_color:chararray, height:int);

one = foreach my_data generate TOTUPLE(name,age) as groupz,
TOTUPLE(eye_color, height) as second;

two = filter one by groupz.age is  null;
--- two = filter one by groupz.age > 33;  -- this works also
dump two;

---------------

then I CAN project into a tuple.  I would consider this a bug then.  Even if
'group' is arrived at in a different way then 'groupz' (i.e. via the group
operator rather than an explicit tuple creation), for the purposes of the
FILTER operator, they both should be considered the same. I will make this a
JIRA ticket.



here is a more basic script that reproduces what I am talking about... you
> will see that dumping OUT works fine, but dumping OUT2 gives me a
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> org.apache.pig.data.Tuple
>
>     at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>     at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>     at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>     at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>
> -----------
>
> my_data = LOAD 'test.txt' using PigStorage(',')
>       as (name:chararray, age:int, eye_color:chararray, height:int);
>
> by_age_and_color = GROUP my_data BY (age, eye_color);
> -- dump by_age_and_color;
> OUT = FOREACH by_age_and_color generate group.age;
> dump OUT
>
> OUT2 = FILTER by_age_and_color by group.age is not null;
> dump OUT2
> -----------
>
> I get a similar problem even if I do something like:
>
> OUT2 = FILTER by_age_and_color by group.age > 9;
> dump OUT2
>
> ---------  sample test.txt ---------
> ravi,33,blue,43
> brendan,33,green,53
> ravichandra,15,blue,43
> leonor,15,brown,46
> caeser,18,blue,23
> JCVD,,blue,23
> anthony,33,blue,46
> xavier,23,blue,13
> patrick,18,blue,33
> sang,33,brown,44
>
>
>
>
>
> On Fri, May 20, 2011 at 3:28 PM, Daniel Dai <ji...@yahoo-inc.com>wrote:
>
>> It seems the stack does not match your statement. Do have another filter
>> which use "not" and "is null" in your script?
>>
>> Daniel
>>
>>
>> On 05/20/2011 12:22 PM, Daniel Eklund wrote:
>>
>>> If I can access the implicit 'group' column from within FOREACH like
>>> this:
>>>
>>> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
>>> B = FOREACH GROUPED  GENERATE   group.firstDim;
>>>
>>> ... then should I not be able to do something like this?
>>>
>>> B1 = FILTER GROUPED by group.firstDim == 'something';
>>>
>>> I get messages like this:
>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>> org.apache.pig.data.Tuple
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>>>     at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>>
>>> Interestingly I can use the 'group' alias overall like
>>> B2 = FILTER GROUPED by group is not null;
>>>
>>>
>>> Any explanations of what I am doing incorrect here?
>>>
>>> thanks,
>>> daniel
>>>
>>
>>
>

Re: can I not project into the group tuple from FILTER?

Posted by Daniel Eklund <do...@gmail.com>.
here is a more basic script that reproduces what I am talking about... you
will see that dumping OUT works fine, but dumping OUT2 gives me a

java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.Tuple
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)

-----------

my_data = LOAD 'test.txt' using PigStorage(',')
      as (name:chararray, age:int, eye_color:chararray, height:int);

by_age_and_color = GROUP my_data BY (age, eye_color);
-- dump by_age_and_color;
OUT = FOREACH by_age_and_color generate group.age;
dump OUT

OUT2 = FILTER by_age_and_color by group.age is not null;
dump OUT2
-----------

I get a similar problem even if I do something like:

OUT2 = FILTER by_age_and_color by group.age > 9;
dump OUT2

---------  sample test.txt ---------
ravi,33,blue,43
brendan,33,green,53
ravichandra,15,blue,43
leonor,15,brown,46
caeser,18,blue,23
JCVD,,blue,23
anthony,33,blue,46
xavier,23,blue,13
patrick,18,blue,33
sang,33,brown,44




On Fri, May 20, 2011 at 3:28 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:

> It seems the stack does not match your statement. Do have another filter
> which use "not" and "is null" in your script?
>
> Daniel
>
>
> On 05/20/2011 12:22 PM, Daniel Eklund wrote:
>
>> If I can access the implicit 'group' column from within FOREACH like this:
>>
>> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
>> B = FOREACH GROUPED  GENERATE   group.firstDim;
>>
>> ... then should I not be able to do something like this?
>>
>> B1 = FILTER GROUPED by group.firstDim == 'something';
>>
>> I get messages like this:
>> java.lang.ClassCastException: java.lang.String cannot be cast to
>> org.apache.pig.data.Tuple
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>
>> Interestingly I can use the 'group' alias overall like
>> B2 = FILTER GROUPED by group is not null;
>>
>>
>> Any explanations of what I am doing incorrect here?
>>
>> thanks,
>> daniel
>>
>
>

Re: can I not project into the group tuple from FILTER?

Posted by Daniel Dai <ji...@yahoo-inc.com>.
It seems the stack does not match your statement. Do have another filter 
which use "not" and "is null" in your script?

Daniel

On 05/20/2011 12:22 PM, Daniel Eklund wrote:
> If I can access the implicit 'group' column from within FOREACH like this:
>
> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
> B = FOREACH GROUPED  GENERATE   group.firstDim;
>
> ... then should I not be able to do something like this?
>
> B1 = FILTER GROUPED by group.firstDim == 'something';
>
> I get messages like this:
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.pig.data.Tuple
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>      at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>
> Interestingly I can use the 'group' alias overall like
> B2 = FILTER GROUPED by group is not null;
>
>
> Any explanations of what I am doing incorrect here?
>
> thanks,
> daniel