You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Seth Ladd <se...@gmail.com> on 2009/04/13 05:35:36 UTC

Subtracting Two Numbers (via FLATTEN) in FOREACH

Hello,

I'm trying something that I would imagine would work, but instead I
get an error.  Is this a bug or simply my misunderstanding?

I'm starting with this:

((A,2009-01-01),{},{(A,3L)})
((A,2009-02-01),{},{(A,2L)})
((B,2009-01-01),{(B,1L)},{(B,3L)})
((B,2009-02-01),{(B,1L)},{(B,2L)})
((C,2009-01-01),{(C,2L)},{(C,2L)})
((C,2009-02-01),{(C,1L)},{(C,1L)})

and then via this:
projected = FOREACH joined GENERATE $0, FLATTEN($1.$1), FLATTEN($2.$1);
DESCRIBE projected
projected: {group: (uic: chararray,date: chararray),long,long}

As you can see, the last two elements in the tuple are longs.
HOWEVER, if I try the following:

projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);

(notice the attempt to subtract one long from another)

I get this error:

2009-04-12 17:14:36,394 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1000: Error during parsing. Encountered " "-" "- "" at line
16, column 88.
Was expecting one of:
    "as" ...
    "parallel" ...
    ";" ...
    "," ...

So, why is it that the schema says longs but an attempt to subtract
the two fails with the error above?

Thanks very much for your help,
Seth

ps I solved the problem by performing another FOREACH GENERATE right
after the one mentioned here.  But that seems to add more work.

Re: Subtracting Two Numbers (via FLATTEN) in FOREACH

Posted by Alan Gates <ga...@yahoo-inc.com>.
I'm not sure I understand what you're trying to do, so I don't know if  
I can suggest anything helpful or not.  Is your data such that you can  
guarantee that there is only one record per key from each input?  If  
so, then flattening it and doing the add should be fine.  If you can't  
guarantee this, then I don't see how the operation has any meaning.   
Which tuple out of the bag would you want?  If you want to subtract  
them all, you could do something like:

C = cogroup A, B;
D = foreach C {
            E = A.$1 - B.$1;
	   generate E;
}

This would then give you a bag of all the subtracted results.  But as  
you have no ordering guarantees on A or B the contents of this bag  
will be non-deterministic.

If you really want to always take the first tuple, and that's good  
enough, you could write a UDF that took just the first tuple out of  
the bag.

Alan.

On Apr 23, 2009, at 10:23 AM, Seth Ladd wrote:

> Thanks Alan, that helps a lot.
>
> Can you lend a suggestion that would help accomplish my goal without
> the two stage flatten and then add ?
>
> Appreciate the explanation,
> Seth
>
> On Thu, Apr 23, 2009 at 7:11 AM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>> When you apply the '.' dereference operator to a bag, you don't get  
>> the
>> first tuple, but a bag with tuples  that have just that field.   
>> This was
>> chosen for a couple of reasons.  One, it gives Pig a way to apply  
>> operators
>> to individual fields in a bag.  Two, there is no notion of ordering  
>> in a
>> bag, so trying to reference individual tuples in a bag is non- 
>> deterministic.
>>
>> Alan.
>>
>> X = FOREACH A GENERATE FLATTEN($
>> On Apr 13, 2009, at 1:57 PM, Seth Ladd wrote:
>>
>>> Thanks Alan.  That would explain it.
>>>
>>> Which leads me to my next question.
>>>
>>> My relation looks like this (which comes from a COGROUP):
>>>
>>> 1, { (A,1L) }, { (A,2L) }
>>>
>>> and what I'd really like to know is: How can I subtract the 2L  
>>> from the 1L
>>> ?
>>>
>>> I've tried this:
>>>
>>> X = FOREACH A GENERATE $2.$0.$1 - $1.$0.$1
>>>
>>> but I get the error:
>>>
>>> 2009-04-13 10:54:02,425 [main] ERROR  
>>> org.apache.pig.tools.grunt.Grunt
>>> - ERROR 1039: Incompatible types in Subtract Operator left hand
>>> side:bag right hand side:bag
>>>
>>> How best to "escape" out of the bag to perform a subtraction between
>>> two values in tuples inside bags?
>>>
>>> Thanks very much for your help,
>>> Seth
>>>
>>> On Mon, Apr 13, 2009 at 10:34 AM, Alan Gates <ga...@yahoo-inc.com>  
>>> wrote:
>>>>
>>>> In Pig Latin's grammar, FLATTEN is not an expression.  So  
>>>> flatten(x) -
>>>> flatten(y) isn't legal.  Adding a second foreach (which you are  
>>>> correct
>>>> adds
>>>> a little more work), is the way to go.
>>>>
>>>> Alan.
>>>>
>>>> On Apr 12, 2009, at 8:35 PM, Seth Ladd wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm trying something that I would imagine would work, but  
>>>>> instead I
>>>>> get an error.  Is this a bug or simply my misunderstanding?
>>>>>
>>>>> I'm starting with this:
>>>>>
>>>>> ((A,2009-01-01),{},{(A,3L)})
>>>>> ((A,2009-02-01),{},{(A,2L)})
>>>>> ((B,2009-01-01),{(B,1L)},{(B,3L)})
>>>>> ((B,2009-02-01),{(B,1L)},{(B,2L)})
>>>>> ((C,2009-01-01),{(C,2L)},{(C,2L)})
>>>>> ((C,2009-02-01),{(C,1L)},{(C,1L)})
>>>>>
>>>>> and then via this:
>>>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),  
>>>>> FLATTEN($2.$1);
>>>>> DESCRIBE projected
>>>>> projected: {group: (uic: chararray,date: chararray),long,long}
>>>>>
>>>>> As you can see, the last two elements in the tuple are longs.
>>>>> HOWEVER, if I try the following:
>>>>>
>>>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
>>>>> FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);
>>>>>
>>>>> (notice the attempt to subtract one long from another)
>>>>>
>>>>> I get this error:
>>>>>
>>>>> 2009-04-12 17:14:36,394 [main] ERROR  
>>>>> org.apache.pig.tools.grunt.Grunt
>>>>> - ERROR 1000: Error during parsing. Encountered " "-" "- "" at  
>>>>> line
>>>>> 16, column 88.
>>>>> Was expecting one of:
>>>>>  "as" ...
>>>>>  "parallel" ...
>>>>>  ";" ...
>>>>>  "," ...
>>>>>
>>>>> So, why is it that the schema says longs but an attempt to  
>>>>> subtract
>>>>> the two fails with the error above?
>>>>>
>>>>> Thanks very much for your help,
>>>>> Seth
>>>>>
>>>>> ps I solved the problem by performing another FOREACH GENERATE  
>>>>> right
>>>>> after the one mentioned here.  But that seems to add more work.
>>>>
>>>>
>>
>>


Re: Subtracting Two Numbers (via FLATTEN) in FOREACH

Posted by Seth Ladd <se...@gmail.com>.
Thanks Alan, that helps a lot.

Can you lend a suggestion that would help accomplish my goal without
the two stage flatten and then add ?

Appreciate the explanation,
Seth

On Thu, Apr 23, 2009 at 7:11 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> When you apply the '.' dereference operator to a bag, you don't get the
> first tuple, but a bag with tuples  that have just that field.  This was
> chosen for a couple of reasons.  One, it gives Pig a way to apply operators
> to individual fields in a bag.  Two, there is no notion of ordering in a
> bag, so trying to reference individual tuples in a bag is non-deterministic.
>
> Alan.
>
> X = FOREACH A GENERATE FLATTEN($
> On Apr 13, 2009, at 1:57 PM, Seth Ladd wrote:
>
>> Thanks Alan.  That would explain it.
>>
>> Which leads me to my next question.
>>
>> My relation looks like this (which comes from a COGROUP):
>>
>> 1, { (A,1L) }, { (A,2L) }
>>
>> and what I'd really like to know is: How can I subtract the 2L from the 1L
>> ?
>>
>> I've tried this:
>>
>> X = FOREACH A GENERATE $2.$0.$1 - $1.$0.$1
>>
>> but I get the error:
>>
>> 2009-04-13 10:54:02,425 [main] ERROR org.apache.pig.tools.grunt.Grunt
>> - ERROR 1039: Incompatible types in Subtract Operator left hand
>> side:bag right hand side:bag
>>
>> How best to "escape" out of the bag to perform a subtraction between
>> two values in tuples inside bags?
>>
>> Thanks very much for your help,
>> Seth
>>
>> On Mon, Apr 13, 2009 at 10:34 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
>>>
>>> In Pig Latin's grammar, FLATTEN is not an expression.  So flatten(x) -
>>> flatten(y) isn't legal.  Adding a second foreach (which you are correct
>>> adds
>>> a little more work), is the way to go.
>>>
>>> Alan.
>>>
>>> On Apr 12, 2009, at 8:35 PM, Seth Ladd wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying something that I would imagine would work, but instead I
>>>> get an error.  Is this a bug or simply my misunderstanding?
>>>>
>>>> I'm starting with this:
>>>>
>>>> ((A,2009-01-01),{},{(A,3L)})
>>>> ((A,2009-02-01),{},{(A,2L)})
>>>> ((B,2009-01-01),{(B,1L)},{(B,3L)})
>>>> ((B,2009-02-01),{(B,1L)},{(B,2L)})
>>>> ((C,2009-01-01),{(C,2L)},{(C,2L)})
>>>> ((C,2009-02-01),{(C,1L)},{(C,1L)})
>>>>
>>>> and then via this:
>>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1), FLATTEN($2.$1);
>>>> DESCRIBE projected
>>>> projected: {group: (uic: chararray,date: chararray),long,long}
>>>>
>>>> As you can see, the last two elements in the tuple are longs.
>>>> HOWEVER, if I try the following:
>>>>
>>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
>>>> FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);
>>>>
>>>> (notice the attempt to subtract one long from another)
>>>>
>>>> I get this error:
>>>>
>>>> 2009-04-12 17:14:36,394 [main] ERROR org.apache.pig.tools.grunt.Grunt
>>>> - ERROR 1000: Error during parsing. Encountered " "-" "- "" at line
>>>> 16, column 88.
>>>> Was expecting one of:
>>>>  "as" ...
>>>>  "parallel" ...
>>>>  ";" ...
>>>>  "," ...
>>>>
>>>> So, why is it that the schema says longs but an attempt to subtract
>>>> the two fails with the error above?
>>>>
>>>> Thanks very much for your help,
>>>> Seth
>>>>
>>>> ps I solved the problem by performing another FOREACH GENERATE right
>>>> after the one mentioned here.  But that seems to add more work.
>>>
>>>
>
>

Re: Subtracting Two Numbers (via FLATTEN) in FOREACH

Posted by Alan Gates <ga...@yahoo-inc.com>.
When you apply the '.' dereference operator to a bag, you don't get  
the first tuple, but a bag with tuples  that have just that field.   
This was chosen for a couple of reasons.  One, it gives Pig a way to  
apply operators to individual fields in a bag.  Two, there is no  
notion of ordering in a bag, so trying to reference individual tuples  
in a bag is non-deterministic.

Alan.

X = FOREACH A GENERATE FLATTEN($
On Apr 13, 2009, at 1:57 PM, Seth Ladd wrote:

> Thanks Alan.  That would explain it.
>
> Which leads me to my next question.
>
> My relation looks like this (which comes from a COGROUP):
>
> 1, { (A,1L) }, { (A,2L) }
>
> and what I'd really like to know is: How can I subtract the 2L from  
> the 1L ?
>
> I've tried this:
>
> X = FOREACH A GENERATE $2.$0.$1 - $1.$0.$1
>
> but I get the error:
>
> 2009-04-13 10:54:02,425 [main] ERROR org.apache.pig.tools.grunt.Grunt
> - ERROR 1039: Incompatible types in Subtract Operator left hand
> side:bag right hand side:bag
>
> How best to "escape" out of the bag to perform a subtraction between
> two values in tuples inside bags?
>
> Thanks very much for your help,
> Seth
>
> On Mon, Apr 13, 2009 at 10:34 AM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>> In Pig Latin's grammar, FLATTEN is not an expression.  So  
>> flatten(x) -
>> flatten(y) isn't legal.  Adding a second foreach (which you are  
>> correct adds
>> a little more work), is the way to go.
>>
>> Alan.
>>
>> On Apr 12, 2009, at 8:35 PM, Seth Ladd wrote:
>>
>>> Hello,
>>>
>>> I'm trying something that I would imagine would work, but instead I
>>> get an error.  Is this a bug or simply my misunderstanding?
>>>
>>> I'm starting with this:
>>>
>>> ((A,2009-01-01),{},{(A,3L)})
>>> ((A,2009-02-01),{},{(A,2L)})
>>> ((B,2009-01-01),{(B,1L)},{(B,3L)})
>>> ((B,2009-02-01),{(B,1L)},{(B,2L)})
>>> ((C,2009-01-01),{(C,2L)},{(C,2L)})
>>> ((C,2009-02-01),{(C,1L)},{(C,1L)})
>>>
>>> and then via this:
>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),  
>>> FLATTEN($2.$1);
>>> DESCRIBE projected
>>> projected: {group: (uic: chararray,date: chararray),long,long}
>>>
>>> As you can see, the last two elements in the tuple are longs.
>>> HOWEVER, if I try the following:
>>>
>>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
>>> FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);
>>>
>>> (notice the attempt to subtract one long from another)
>>>
>>> I get this error:
>>>
>>> 2009-04-12 17:14:36,394 [main] ERROR  
>>> org.apache.pig.tools.grunt.Grunt
>>> - ERROR 1000: Error during parsing. Encountered " "-" "- "" at line
>>> 16, column 88.
>>> Was expecting one of:
>>>   "as" ...
>>>   "parallel" ...
>>>   ";" ...
>>>   "," ...
>>>
>>> So, why is it that the schema says longs but an attempt to subtract
>>> the two fails with the error above?
>>>
>>> Thanks very much for your help,
>>> Seth
>>>
>>> ps I solved the problem by performing another FOREACH GENERATE right
>>> after the one mentioned here.  But that seems to add more work.
>>
>>


Re: Subtracting Two Numbers (via FLATTEN) in FOREACH

Posted by Seth Ladd <se...@gmail.com>.
Thanks Alan.  That would explain it.

Which leads me to my next question.

My relation looks like this (which comes from a COGROUP):

1, { (A,1L) }, { (A,2L) }

and what I'd really like to know is: How can I subtract the 2L from the 1L ?

I've tried this:

X = FOREACH A GENERATE $2.$0.$1 - $1.$0.$1

but I get the error:

2009-04-13 10:54:02,425 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1039: Incompatible types in Subtract Operator left hand
side:bag right hand side:bag

How best to "escape" out of the bag to perform a subtraction between
two values in tuples inside bags?

Thanks very much for your help,
Seth

On Mon, Apr 13, 2009 at 10:34 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> In Pig Latin's grammar, FLATTEN is not an expression.  So flatten(x) -
> flatten(y) isn't legal.  Adding a second foreach (which you are correct adds
> a little more work), is the way to go.
>
> Alan.
>
> On Apr 12, 2009, at 8:35 PM, Seth Ladd wrote:
>
>> Hello,
>>
>> I'm trying something that I would imagine would work, but instead I
>> get an error.  Is this a bug or simply my misunderstanding?
>>
>> I'm starting with this:
>>
>> ((A,2009-01-01),{},{(A,3L)})
>> ((A,2009-02-01),{},{(A,2L)})
>> ((B,2009-01-01),{(B,1L)},{(B,3L)})
>> ((B,2009-02-01),{(B,1L)},{(B,2L)})
>> ((C,2009-01-01),{(C,2L)},{(C,2L)})
>> ((C,2009-02-01),{(C,1L)},{(C,1L)})
>>
>> and then via this:
>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1), FLATTEN($2.$1);
>> DESCRIBE projected
>> projected: {group: (uic: chararray,date: chararray),long,long}
>>
>> As you can see, the last two elements in the tuple are longs.
>> HOWEVER, if I try the following:
>>
>> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
>> FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);
>>
>> (notice the attempt to subtract one long from another)
>>
>> I get this error:
>>
>> 2009-04-12 17:14:36,394 [main] ERROR org.apache.pig.tools.grunt.Grunt
>> - ERROR 1000: Error during parsing. Encountered " "-" "- "" at line
>> 16, column 88.
>> Was expecting one of:
>>   "as" ...
>>   "parallel" ...
>>   ";" ...
>>   "," ...
>>
>> So, why is it that the schema says longs but an attempt to subtract
>> the two fails with the error above?
>>
>> Thanks very much for your help,
>> Seth
>>
>> ps I solved the problem by performing another FOREACH GENERATE right
>> after the one mentioned here.  But that seems to add more work.
>
>

Re: Subtracting Two Numbers (via FLATTEN) in FOREACH

Posted by Alan Gates <ga...@yahoo-inc.com>.
In Pig Latin's grammar, FLATTEN is not an expression.  So flatten(x) -  
flatten(y) isn't legal.  Adding a second foreach (which you are  
correct adds a little more work), is the way to go.

Alan.

On Apr 12, 2009, at 8:35 PM, Seth Ladd wrote:

> Hello,
>
> I'm trying something that I would imagine would work, but instead I
> get an error.  Is this a bug or simply my misunderstanding?
>
> I'm starting with this:
>
> ((A,2009-01-01),{},{(A,3L)})
> ((A,2009-02-01),{},{(A,2L)})
> ((B,2009-01-01),{(B,1L)},{(B,3L)})
> ((B,2009-02-01),{(B,1L)},{(B,2L)})
> ((C,2009-01-01),{(C,2L)},{(C,2L)})
> ((C,2009-02-01),{(C,1L)},{(C,1L)})
>
> and then via this:
> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),  
> FLATTEN($2.$1);
> DESCRIBE projected
> projected: {group: (uic: chararray,date: chararray),long,long}
>
> As you can see, the last two elements in the tuple are longs.
> HOWEVER, if I try the following:
>
> projected = FOREACH joined GENERATE $0, FLATTEN($1.$1),
> FLATTEN($2.$1), FLATTEN($2.$1) - FLATTEN($1.$1);
>
> (notice the attempt to subtract one long from another)
>
> I get this error:
>
> 2009-04-12 17:14:36,394 [main] ERROR org.apache.pig.tools.grunt.Grunt
> - ERROR 1000: Error during parsing. Encountered " "-" "- "" at line
> 16, column 88.
> Was expecting one of:
>    "as" ...
>    "parallel" ...
>    ";" ...
>    "," ...
>
> So, why is it that the schema says longs but an attempt to subtract
> the two fails with the error above?
>
> Thanks very much for your help,
> Seth
>
> ps I solved the problem by performing another FOREACH GENERATE right
> after the one mentioned here.  But that seems to add more work.