You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Daniel Eklund <do...@gmail.com> on 2011/05/20 21:22:19 UTC
can I not project into the group tuple from FILTER?
If I can access the implicit 'group' column from within FOREACH like this:
GROUPED = GROUP InputRelVar by (firstDim,secondDim);
B = FOREACH GROUPED GENERATE group.firstDim;
... then should I not be able to do something like this?
B1 = FILTER GROUPED by group.firstDim == 'something';
I get messages like this:
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
Interestingly I can use the 'group' alias overall like
B2 = FILTER GROUPED by group is not null;
Any explanations of what I am doing incorrect here?
thanks,
daniel
Re: can I not project into the group tuple from FILTER?
Posted by Daniel Eklund <do...@gmail.com>.
So when I do something like this:
---------
my_data = LOAD 'test.txt' using PigStorage(',')
as (name:chararray, age:int, eye_color:chararray, height:int);
one = foreach my_data generate TOTUPLE(name,age) as groupz,
TOTUPLE(eye_color, height) as second;
two = filter one by groupz.age is null;
--- two = filter one by groupz.age > 33; -- this works also
dump two;
---------------
then I CAN project into a tuple. I would consider this a bug then. Even if
'group' is arrived at in a different way then 'groupz' (i.e. via the group
operator rather than an explicit tuple creation), for the purposes of the
FILTER operator, they both should be considered the same. I will make this a
JIRA ticket.
here is a more basic script that reproduces what I am talking about... you
> will see that dumping OUT works fine, but dumping OUT2 gives me a
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> org.apache.pig.data.Tuple
>
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>
> -----------
>
> my_data = LOAD 'test.txt' using PigStorage(',')
> as (name:chararray, age:int, eye_color:chararray, height:int);
>
> by_age_and_color = GROUP my_data BY (age, eye_color);
> -- dump by_age_and_color;
> OUT = FOREACH by_age_and_color generate group.age;
> dump OUT
>
> OUT2 = FILTER by_age_and_color by group.age is not null;
> dump OUT2
> -----------
>
> I get a similar problem even if I do something like:
>
> OUT2 = FILTER by_age_and_color by group.age > 9;
> dump OUT2
>
> --------- sample test.txt ---------
> ravi,33,blue,43
> brendan,33,green,53
> ravichandra,15,blue,43
> leonor,15,brown,46
> caeser,18,blue,23
> JCVD,,blue,23
> anthony,33,blue,46
> xavier,23,blue,13
> patrick,18,blue,33
> sang,33,brown,44
>
>
>
>
>
> On Fri, May 20, 2011 at 3:28 PM, Daniel Dai <ji...@yahoo-inc.com>wrote:
>
>> It seems the stack does not match your statement. Do have another filter
>> which use "not" and "is null" in your script?
>>
>> Daniel
>>
>>
>> On 05/20/2011 12:22 PM, Daniel Eklund wrote:
>>
>>> If I can access the implicit 'group' column from within FOREACH like
>>> this:
>>>
>>> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
>>> B = FOREACH GROUPED GENERATE group.firstDim;
>>>
>>> ... then should I not be able to do something like this?
>>>
>>> B1 = FILTER GROUPED by group.firstDim == 'something';
>>>
>>> I get messages like this:
>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>> org.apache.pig.data.Tuple
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>>
>>> Interestingly I can use the 'group' alias overall like
>>> B2 = FILTER GROUPED by group is not null;
>>>
>>>
>>> Any explanations of what I am doing incorrect here?
>>>
>>> thanks,
>>> daniel
>>>
>>
>>
>
Re: can I not project into the group tuple from FILTER?
Posted by Daniel Eklund <do...@gmail.com>.
here is a more basic script that reproduces what I am talking about... you
will see that dumping OUT works fine, but dumping OUT2 gives me a
java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
-----------
my_data = LOAD 'test.txt' using PigStorage(',')
as (name:chararray, age:int, eye_color:chararray, height:int);
by_age_and_color = GROUP my_data BY (age, eye_color);
-- dump by_age_and_color;
OUT = FOREACH by_age_and_color generate group.age;
dump OUT
OUT2 = FILTER by_age_and_color by group.age is not null;
dump OUT2
-----------
I get a similar problem even if I do something like:
OUT2 = FILTER by_age_and_color by group.age > 9;
dump OUT2
--------- sample test.txt ---------
ravi,33,blue,43
brendan,33,green,53
ravichandra,15,blue,43
leonor,15,brown,46
caeser,18,blue,23
JCVD,,blue,23
anthony,33,blue,46
xavier,23,blue,13
patrick,18,blue,33
sang,33,brown,44
On Fri, May 20, 2011 at 3:28 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:
> It seems the stack does not match your statement. Do have another filter
> which use "not" and "is null" in your script?
>
> Daniel
>
>
> On 05/20/2011 12:22 PM, Daniel Eklund wrote:
>
>> If I can access the implicit 'group' column from within FOREACH like this:
>>
>> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
>> B = FOREACH GROUPED GENERATE group.firstDim;
>>
>> ... then should I not be able to do something like this?
>>
>> B1 = FILTER GROUPED by group.firstDim == 'something';
>>
>> I get messages like this:
>> java.lang.ClassCastException: java.lang.String cannot be cast to
>> org.apache.pig.data.Tuple
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>
>> Interestingly I can use the 'group' alias overall like
>> B2 = FILTER GROUPED by group is not null;
>>
>>
>> Any explanations of what I am doing incorrect here?
>>
>> thanks,
>> daniel
>>
>
>
Re: can I not project into the group tuple from FILTER?
Posted by Daniel Dai <ji...@yahoo-inc.com>.
It seems the stack does not match your statement. Do have another filter
which use "not" and "is null" in your script?
Daniel
On 05/20/2011 12:22 PM, Daniel Eklund wrote:
> If I can access the implicit 'group' column from within FOREACH like this:
>
> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
> B = FOREACH GROUPED GENERATE group.firstDim;
>
> ... then should I not be able to do something like this?
>
> B1 = FILTER GROUPED by group.firstDim == 'something';
>
> I get messages like this:
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.pig.data.Tuple
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>
> Interestingly I can use the 'group' alias overall like
> B2 = FILTER GROUPED by group is not null;
>
>
> Any explanations of what I am doing incorrect here?
>
> thanks,
> daniel