You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eli Finkelshteyn <ie...@gmail.com> on 2012/04/04 20:18:58 UTC

Cross Product of Two Tuples?

Hi Folks,
I'm currently trying to do something I figured would be trivial, but 
actually wound up being a bit of work for me, so I'm wondering if I'm 
missing something. All I want to do is get a cross product of two 
tuples. So for example, given an input of:

('hello', 'howdy', 'hi'), ('hola', 'bonjour')

I'd get:

('hello', 'hola')
('hello', 'bonjour')
('howdy', 'hola')
('howdy', 'bonjour')
('hi', 'hola')
('hi', 'bonjour')

At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's 
no good cause the tuples are first themselves put into new tuples. So, 
what I'm left with no is writing a dirty and slow python udf for this. 
Is there really no better way to do this? I'd think it would be a pretty 
standard task.

Eli

Re: Cross Product of Two Tuples?

Posted by Herbert Mühlburger <he...@gmail.com>.
Hi Eli,

Am 04.04.12 20:18, schrieb Eli Finkelshteyn:
> I'm currently trying to do something I figured would be trivial, but
> actually wound up being a bit of work for me, so I'm wondering if I'm
> missing something. All I want to do is get a cross product of two
> tuples. So for example, given an input of:
>
> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>
> I'd get:
>
> ('hello', 'hola')
> ('hello', 'bonjour')
> ('howdy', 'hola')
> ('howdy', 'bonjour')
> ('hi', 'hola')
> ('hi', 'bonjour')
>
> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's
> no good cause the tuples are first themselves put into new tuples. So,
> what I'm left with no is writing a dirty and slow python udf for this.
> Is there really no better way to do this? I'd think it would be a pretty
> standard task.

Have you tried CROSS [1] to compute the cross product?

[1] https://pig.apache.org/docs/r0.9.2/basic.html#cross

Regards,
Herbert

-- 
=================================================================
Herbert Muehlburger  Software Development and Business Management
                                     Graz University of Technology
www.muehlburger.at                   www.twitter.com/hmuehlburger
=================================================================

Re: Cross Product of Two Tuples?

Posted by Jonathan Coveney <jc...@gmail.com>.
A totally valid point. You swayed me :)

2012/4/5 Scott Carey <sc...@richrelevance.com>

> The documentation is extremely clear:
>
> /**
>  * This class takes a list of items and puts them into a bag
>  * T = foreach U generate TOBAG($0, $1, $2);
>  * It's like saying this:
>  * T = foreach U generate {($0), ($1), ($2)}
>  */
>
>
> Adding conditionals to that seems complicating the issue and would
> introduce bugs.
>
> What happens with TOBAG(tuple1, tuple2)?
> What happens when TOBAG($0) changes type?  What if its type is different
> across rows?
>
> Each operator should do one simple operation consistently, and not depend
> on the type passed in.
> Its frustrating enough that FLATTEN does two things.  IMO there should be
> one operator that explodes bags, and one that unpacks tuples, not one
> conflated operator that does both -- I have had to debug several issues as
> a result of this or a misunderstanding from new pig users. Making TOBAG do
> one thing for one type of data and something else for others does not make
> pig scripts maintainable or intuitive to follow IMO.
>
> On 4/5/12 4:41 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> >Well, perhaps bug is a heavy handed word. A poor user experience might be
> >better. I would posit that TOBAG(tuple) 9 times out of ten means "make
> >each
> >column a row" instead of "give me a bag with a tuple of a tuple." But I'd
> >love opinions on the matter.
> >
> >2012/4/5 Scott Carey <sc...@richrelevance.com>
> >
> >> On 4/5/12 11:25 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>
> >> >Yup, you guys are right...it's alittle annoying, but flatten first,
> >>then
> >> >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving
> >>you
> >> >a bag is a bug, but this should work in the meanitme.
> >>
> >> I can't see how that could be a bug.  What if you want to create a bag
> >> with one tuple in it?
> >>
> >>
> >> >
> >> >2012/4/5 Scott Carey <sc...@richrelevance.com>
> >> >
> >> >> Isn't it
> >> >>
> >> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
> >> >> or
> >> >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0,
> >>t2::$1))
> >> >> ?
> >> >>
> >> >> The inner tuple needs to be unpacked into a list of fields.  TOBAG
> >> >>simply
> >> >> puts each element passed in into a bag, and if you pass t1 in there,
> >>it
> >> >> will be a bag with only one item.
> >> >>
> >> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >> >>
> >> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> >> >> >
> >> >> >2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
> >> >> >
> >> >> >> That's for a relation only. Unless I'm missing something, it does
> >>not
> >> >> >>work
> >> >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking.
> >> >> >>
> >> >> >> Eli
> >> >> >>
> >> >> >>
> >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> >> >> >>
> >> >> >>>
> >> >> >>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
> >> >> http://pig.apache.o
> >> >> >>>rg/docs/r0.9.1/basic.html#cross>
> >> >> >>>
> >> >> >>> -Prashant
> >> >> >>>
> >> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
> >> >> >>>Finkelshteyn<ie...@gmail.com>
> >> >> >>> >wrote:
> >> >> >>>
> >> >> >>>  Hi Folks,
> >> >> >>>> I'm currently trying to do something I figured would be trivial,
> >> >>but
> >> >> >>>> actually wound up being a bit of work for me, so I'm wondering
> >>if
> >> >>I'm
> >> >> >>>> missing something. All I want to do is get a cross product of
> >>two
> >> >> >>>>tuples.
> >> >> >>>> So for example, given an input of:
> >> >> >>>>
> >> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >> >> >>>>
> >> >> >>>> I'd get:
> >> >> >>>>
> >> >> >>>> ('hello', 'hola')
> >> >> >>>> ('hello', 'bonjour')
> >> >> >>>> ('howdy', 'hola')
> >> >> >>>> ('howdy', 'bonjour')
> >> >> >>>> ('hi', 'hola')
> >> >> >>>> ('hi', 'bonjour')
> >> >> >>>>
> >> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
> >> >> >>>>that's no
> >> >> >>>> good cause the tuples are first themselves put into new tuples.
> >>So,
> >> >> >>>>what
> >> >> >>>> I'm left with no is writing a dirty and slow python udf for
> >>this.
> >> >>Is
> >> >> >>>> there
> >> >> >>>> really no better way to do this? I'd think it would be a pretty
> >> >> >>>>standard
> >> >> >>>> task.
> >> >> >>>>
> >> >> >>>> Eli
> >> >> >>>>
> >> >> >>>>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
>
>

Re: Cross Product of Two Tuples?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Very much agree. Had that been the case, This would have been a far less 
confusing exercise. At least I feel like I have a better grasp on when 
Flatten does what now, anyway.

On 4/5/12 8:23 PM, Scott Carey wrote:
> The documentation is extremely clear:
>
> /**
>   * This class takes a list of items and puts them into a bag
>   * T = foreach U generate TOBAG($0, $1, $2);
>   * It's like saying this:
>   * T = foreach U generate {($0), ($1), ($2)}
>   */
>
>
> Adding conditionals to that seems complicating the issue and would
> introduce bugs.
>
> What happens with TOBAG(tuple1, tuple2)?
> What happens when TOBAG($0) changes type?  What if its type is different
> across rows?
>
> Each operator should do one simple operation consistently, and not depend
> on the type passed in.
> Its frustrating enough that FLATTEN does two things.  IMO there should be
> one operator that explodes bags, and one that unpacks tuples, not one
> conflated operator that does both -- I have had to debug several issues as
> a result of this or a misunderstanding from new pig users. Making TOBAG do
> one thing for one type of data and something else for others does not make
> pig scripts maintainable or intuitive to follow IMO.
>
> On 4/5/12 4:41 PM, "Jonathan Coveney"<jc...@gmail.com>  wrote:
>
>> Well, perhaps bug is a heavy handed word. A poor user experience might be
>> better. I would posit that TOBAG(tuple) 9 times out of ten means "make
>> each
>> column a row" instead of "give me a bag with a tuple of a tuple." But I'd
>> love opinions on the matter.
>>
>> 2012/4/5 Scott Carey<sc...@richrelevance.com>
>>
>>> On 4/5/12 11:25 AM, "Jonathan Coveney"<jc...@gmail.com>  wrote:
>>>
>>>> Yup, you guys are right...it's alittle annoying, but flatten first,
>>> then
>>>> the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving
>>> you
>>>> a bag is a bug, but this should work in the meanitme.
>>> I can't see how that could be a bug.  What if you want to create a bag
>>> with one tuple in it?
>>>
>>>
>>>> 2012/4/5 Scott Carey<sc...@richrelevance.com>
>>>>
>>>>> Isn't it
>>>>>
>>>>> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
>>>>> or
>>>>> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0,
>>> t2::$1))
>>>>> ?
>>>>>
>>>>> The inner tuple needs to be unpacked into a list of fields.  TOBAG
>>>>> simply
>>>>> puts each element passed in into a bag, and if you pass t1 in there,
>>> it
>>>>> will be a bag with only one item.
>>>>>
>>>>> On 4/4/12 11:43 AM, "Jonathan Coveney"<jc...@gmail.com>  wrote:
>>>>>
>>>>>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>>>>>>
>>>>>> 2012/4/4 Eli Finkelshteyn<ie...@gmail.com>
>>>>>>
>>>>>>> That's for a relation only. Unless I'm missing something, it does
>>> not
>>>>>>> work
>>>>>>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>>>>>>
>>>>>>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
>>>>> http://pig.apache.o
>>>>>>>> rg/docs/r0.9.1/basic.html#cross>
>>>>>>>>
>>>>>>>> -Prashant
>>>>>>>>
>>>>>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>>>>>>>> Finkelshteyn<ie...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>   Hi Folks,
>>>>>>>>> I'm currently trying to do something I figured would be trivial,
>>>>> but
>>>>>>>>> actually wound up being a bit of work for me, so I'm wondering
>>> if
>>>>> I'm
>>>>>>>>> missing something. All I want to do is get a cross product of
>>> two
>>>>>>>>> tuples.
>>>>>>>>> So for example, given an input of:
>>>>>>>>>
>>>>>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>>>>>>
>>>>>>>>> I'd get:
>>>>>>>>>
>>>>>>>>> ('hello', 'hola')
>>>>>>>>> ('hello', 'bonjour')
>>>>>>>>> ('howdy', 'hola')
>>>>>>>>> ('howdy', 'bonjour')
>>>>>>>>> ('hi', 'hola')
>>>>>>>>> ('hi', 'bonjour')
>>>>>>>>>
>>>>>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>>>>>>>>> that's no
>>>>>>>>> good cause the tuples are first themselves put into new tuples.
>>> So,
>>>>>>>>> what
>>>>>>>>> I'm left with no is writing a dirty and slow python udf for
>>> this.
>>>>> Is
>>>>>>>>> there
>>>>>>>>> really no better way to do this? I'd think it would be a pretty
>>>>>>>>> standard
>>>>>>>>> task.
>>>>>>>>>
>>>>>>>>> Eli
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>


Re: Cross Product of Two Tuples?

Posted by Scott Carey <sc...@richrelevance.com>.
The documentation is extremely clear:

/**
 * This class takes a list of items and puts them into a bag
 * T = foreach U generate TOBAG($0, $1, $2);
 * It's like saying this:
 * T = foreach U generate {($0), ($1), ($2)}
 */


Adding conditionals to that seems complicating the issue and would
introduce bugs.

What happens with TOBAG(tuple1, tuple2)?
What happens when TOBAG($0) changes type?  What if its type is different
across rows?

Each operator should do one simple operation consistently, and not depend
on the type passed in.
Its frustrating enough that FLATTEN does two things.  IMO there should be
one operator that explodes bags, and one that unpacks tuples, not one
conflated operator that does both -- I have had to debug several issues as
a result of this or a misunderstanding from new pig users. Making TOBAG do
one thing for one type of data and something else for others does not make
pig scripts maintainable or intuitive to follow IMO.

On 4/5/12 4:41 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:

>Well, perhaps bug is a heavy handed word. A poor user experience might be
>better. I would posit that TOBAG(tuple) 9 times out of ten means "make
>each
>column a row" instead of "give me a bag with a tuple of a tuple." But I'd
>love opinions on the matter.
>
>2012/4/5 Scott Carey <sc...@richrelevance.com>
>
>> On 4/5/12 11:25 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>>
>> >Yup, you guys are right...it's alittle annoying, but flatten first,
>>then
>> >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving
>>you
>> >a bag is a bug, but this should work in the meanitme.
>>
>> I can't see how that could be a bug.  What if you want to create a bag
>> with one tuple in it?
>>
>>
>> >
>> >2012/4/5 Scott Carey <sc...@richrelevance.com>
>> >
>> >> Isn't it
>> >>
>> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
>> >> or
>> >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0,
>>t2::$1))
>> >> ?
>> >>
>> >> The inner tuple needs to be unpacked into a list of fields.  TOBAG
>> >>simply
>> >> puts each element passed in into a bag, and if you pass t1 in there,
>>it
>> >> will be a bag with only one item.
>> >>
>> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> >>
>> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>> >> >
>> >> >2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
>> >> >
>> >> >> That's for a relation only. Unless I'm missing something, it does
>>not
>> >> >>work
>> >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>> >> >>
>> >> >> Eli
>> >> >>
>> >> >>
>> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>> >> >>
>> >> >>>
>> >> >>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
>> >> http://pig.apache.o
>> >> >>>rg/docs/r0.9.1/basic.html#cross>
>> >> >>>
>> >> >>> -Prashant
>> >> >>>
>> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>> >> >>>Finkelshteyn<ie...@gmail.com>
>> >> >>> >wrote:
>> >> >>>
>> >> >>>  Hi Folks,
>> >> >>>> I'm currently trying to do something I figured would be trivial,
>> >>but
>> >> >>>> actually wound up being a bit of work for me, so I'm wondering
>>if
>> >>I'm
>> >> >>>> missing something. All I want to do is get a cross product of
>>two
>> >> >>>>tuples.
>> >> >>>> So for example, given an input of:
>> >> >>>>
>> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>> >> >>>>
>> >> >>>> I'd get:
>> >> >>>>
>> >> >>>> ('hello', 'hola')
>> >> >>>> ('hello', 'bonjour')
>> >> >>>> ('howdy', 'hola')
>> >> >>>> ('howdy', 'bonjour')
>> >> >>>> ('hi', 'hola')
>> >> >>>> ('hi', 'bonjour')
>> >> >>>>
>> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>> >> >>>>that's no
>> >> >>>> good cause the tuples are first themselves put into new tuples.
>>So,
>> >> >>>>what
>> >> >>>> I'm left with no is writing a dirty and slow python udf for
>>this.
>> >>Is
>> >> >>>> there
>> >> >>>> really no better way to do this? I'd think it would be a pretty
>> >> >>>>standard
>> >> >>>> task.
>> >> >>>>
>> >> >>>> Eli
>> >> >>>>
>> >> >>>>
>> >> >>
>> >>
>> >>
>>
>>


Re: Cross Product of Two Tuples?

Posted by Jonathan Coveney <jc...@gmail.com>.
Well, perhaps bug is a heavy handed word. A poor user experience might be
better. I would posit that TOBAG(tuple) 9 times out of ten means "make each
column a row" instead of "give me a bag with a tuple of a tuple." But I'd
love opinions on the matter.

2012/4/5 Scott Carey <sc...@richrelevance.com>

> On 4/5/12 11:25 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> >Yup, you guys are right...it's alittle annoying, but flatten first, then
> >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you
> >a bag is a bug, but this should work in the meanitme.
>
> I can't see how that could be a bug.  What if you want to create a bag
> with one tuple in it?
>
>
> >
> >2012/4/5 Scott Carey <sc...@richrelevance.com>
> >
> >> Isn't it
> >>
> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
> >> or
> >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1))
> >> ?
> >>
> >> The inner tuple needs to be unpacked into a list of fields.  TOBAG
> >>simply
> >> puts each element passed in into a bag, and if you pass t1 in there, it
> >> will be a bag with only one item.
> >>
> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>
> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> >> >
> >> >2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
> >> >
> >> >> That's for a relation only. Unless I'm missing something, it does not
> >> >>work
> >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking.
> >> >>
> >> >> Eli
> >> >>
> >> >>
> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> >> >>
> >> >>>
> >> >>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
> >> http://pig.apache.o
> >> >>>rg/docs/r0.9.1/basic.html#cross>
> >> >>>
> >> >>> -Prashant
> >> >>>
> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
> >> >>>Finkelshteyn<ie...@gmail.com>
> >> >>> >wrote:
> >> >>>
> >> >>>  Hi Folks,
> >> >>>> I'm currently trying to do something I figured would be trivial,
> >>but
> >> >>>> actually wound up being a bit of work for me, so I'm wondering if
> >>I'm
> >> >>>> missing something. All I want to do is get a cross product of two
> >> >>>>tuples.
> >> >>>> So for example, given an input of:
> >> >>>>
> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >> >>>>
> >> >>>> I'd get:
> >> >>>>
> >> >>>> ('hello', 'hola')
> >> >>>> ('hello', 'bonjour')
> >> >>>> ('howdy', 'hola')
> >> >>>> ('howdy', 'bonjour')
> >> >>>> ('hi', 'hola')
> >> >>>> ('hi', 'bonjour')
> >> >>>>
> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
> >> >>>>that's no
> >> >>>> good cause the tuples are first themselves put into new tuples. So,
> >> >>>>what
> >> >>>> I'm left with no is writing a dirty and slow python udf for this.
> >>Is
> >> >>>> there
> >> >>>> really no better way to do this? I'd think it would be a pretty
> >> >>>>standard
> >> >>>> task.
> >> >>>>
> >> >>>> Eli
> >> >>>>
> >> >>>>
> >> >>
> >>
> >>
>
>

Re: Cross Product of Two Tuples?

Posted by Scott Carey <sc...@richrelevance.com>.
On 4/5/12 11:25 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:

>Yup, you guys are right...it's alittle annoying, but flatten first, then
>the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you
>a bag is a bug, but this should work in the meanitme.

I can't see how that could be a bug.  What if you want to create a bag
with one tuple in it?


>
>2012/4/5 Scott Carey <sc...@richrelevance.com>
>
>> Isn't it
>>
>> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
>> or
>> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1))
>> ?
>>
>> The inner tuple needs to be unpacked into a list of fields.  TOBAG
>>simply
>> puts each element passed in into a bag, and if you pass t1 in there, it
>> will be a bag with only one item.
>>
>> On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>>
>> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>> >
>> >2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
>> >
>> >> That's for a relation only. Unless I'm missing something, it does not
>> >>work
>> >> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>> >>
>> >> Eli
>> >>
>> >>
>> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>> >>
>> >>>
>> >>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
>> http://pig.apache.o
>> >>>rg/docs/r0.9.1/basic.html#cross>
>> >>>
>> >>> -Prashant
>> >>>
>> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>> >>>Finkelshteyn<ie...@gmail.com>
>> >>> >wrote:
>> >>>
>> >>>  Hi Folks,
>> >>>> I'm currently trying to do something I figured would be trivial,
>>but
>> >>>> actually wound up being a bit of work for me, so I'm wondering if
>>I'm
>> >>>> missing something. All I want to do is get a cross product of two
>> >>>>tuples.
>> >>>> So for example, given an input of:
>> >>>>
>> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>> >>>>
>> >>>> I'd get:
>> >>>>
>> >>>> ('hello', 'hola')
>> >>>> ('hello', 'bonjour')
>> >>>> ('howdy', 'hola')
>> >>>> ('howdy', 'bonjour')
>> >>>> ('hi', 'hola')
>> >>>> ('hi', 'bonjour')
>> >>>>
>> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>> >>>>that's no
>> >>>> good cause the tuples are first themselves put into new tuples. So,
>> >>>>what
>> >>>> I'm left with no is writing a dirty and slow python udf for this.
>>Is
>> >>>> there
>> >>>> really no better way to do this? I'd think it would be a pretty
>> >>>>standard
>> >>>> task.
>> >>>>
>> >>>> Eli
>> >>>>
>> >>>>
>> >>
>>
>>


Re: Cross Product of Two Tuples?

Posted by Jonathan Coveney <jc...@gmail.com>.
Yup, you guys are right...it's alittle annoying, but flatten first, then
the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you
a bag is a bug, but this should work in the meanitme.

2012/4/5 Scott Carey <sc...@richrelevance.com>

> Isn't it
>
> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
> or
> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1))
> ?
>
> The inner tuple needs to be unpacked into a list of fields.  TOBAG simply
> puts each element passed in into a bag, and if you pass t1 in there, it
> will be a bag with only one item.
>
> On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> >
> >2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
> >
> >> That's for a relation only. Unless I'm missing something, it does not
> >>work
> >> for tuples. What I'm doing what require a FOREACH, I'm thinking.
> >>
> >> Eli
> >>
> >>
> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> >>
> >>>
> >>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
> http://pig.apache.o
> >>>rg/docs/r0.9.1/basic.html#cross>
> >>>
> >>> -Prashant
> >>>
> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
> >>>Finkelshteyn<ie...@gmail.com>
> >>> >wrote:
> >>>
> >>>  Hi Folks,
> >>>> I'm currently trying to do something I figured would be trivial, but
> >>>> actually wound up being a bit of work for me, so I'm wondering if I'm
> >>>> missing something. All I want to do is get a cross product of two
> >>>>tuples.
> >>>> So for example, given an input of:
> >>>>
> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >>>>
> >>>> I'd get:
> >>>>
> >>>> ('hello', 'hola')
> >>>> ('hello', 'bonjour')
> >>>> ('howdy', 'hola')
> >>>> ('howdy', 'bonjour')
> >>>> ('hi', 'hola')
> >>>> ('hi', 'bonjour')
> >>>>
> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
> >>>>that's no
> >>>> good cause the tuples are first themselves put into new tuples. So,
> >>>>what
> >>>> I'm left with no is writing a dirty and slow python udf for this. Is
> >>>> there
> >>>> really no better way to do this? I'd think it would be a pretty
> >>>>standard
> >>>> task.
> >>>>
> >>>> Eli
> >>>>
> >>>>
> >>
>
>

Re: Cross Product of Two Tuples?

Posted by Scott Carey <sc...@richrelevance.com>.
"FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross"


Why?  TOBAG(t1) should give you a bag with one tuple in it.  FLATTEN on
that gives you one tuple.
If you want a bag with each tuple field, FLATTEN it first

TOBAG(FLATTEN(t1))
or reference the fields:
TOBAG(t1::$0, t1::$1, t1::$2)

I have not tested the above, but that is logically what you want and the
same thing if t1 has 3 fields.  You may need an extra line of FOREACH ...
GENERATE to do this.

FLATTEN on a tuple 'unnests' it.  FLATTEN on a bag 'explodes' it.

The first example is
('a', 'b', 'c'), ('x', 'y')
to
('a', 'x')
('a', 'y')
('b', 'x')
('b', 'y')
('c', 'x')
('c', 'y')



Which seems sane, but is not in the general case.  Consider:
('a', 3L, 2.0d), (0, {('x')})
to:
('a', 0)
('a', {('x')})
(3L, 0)
(3L, {('x')})
(2.0d, 0)
(2.0d, {('x')})


The schema on output is undefined and nearly unusable.




On 4/5/12 2:27 AM, "Gianmarco De Francisci Morales" <gd...@apache.org>
wrote:

>I would say the additional nesting level is a bug.
>But we should check if we break stuff with this change.
>
>Cheers,
>--
>Gianmarco
>
>
>
>On Thu, Apr 5, 2012 at 01:36, Jonathan Coveney <jc...@gmail.com> wrote:
>
>> Pig folks: it seems like it defies the expectation if TOBAG is run on a
>> single TUPLE and you don't get a bag. I can patch it, but seem like a
>>fair
>> change?
>>
>> 2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
>>
>> > Nah, doesn't work because it doubles up the tuple, so that:
>> >
>> > TOBAG(('hello', 'howdy', 'hi'))
>> > returns
>> > {(('hello', 'howdy', 'hi'))}
>> >
>> > And so,
>> >
>> > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
>> > gets me
>> >
>> > ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>> >
>> > which is just what I started with.
>> >
>> > Anyway, to solve this problem, what I did was make a quick python udf
>>to
>> > make a bag from a tuple without doubling up the tuple, and then ran
>> FLATTEN
>> > on that, which looks like:
>> >
>> > bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**),
>> > FLATTEN(py_udfs.tupleToBag(t2)**);
>> >
>> > Where the Python udf I'm using is:
>> >
>> > @outputSchema("b:bag{}")
>> > def tupleToBag(tup):
>> >    b = [tupify(i) for i in tupify(tup)]
>> >    return b
>> >
>> > def tupify(tup):
>> >    if isinstance(tup, tuple):
>> >        return tup
>> >    return (tup,)
>> >
>> > I'll add that into Python PiggyBank as soon as I get a chance to
>>finish
>> > that stuff up.
>> >
>> > Eli
>> >
>> >
>> >
>> > On 4/4/12 2:43 PM, Jonathan Coveney wrote:
>> >
>> >> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>> >>
>> >> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <ie...@gmail.com>>
>> >>
>> >>  That's for a relation only. Unless I'm missing something, it does
>>not
>> >>> work
>> >>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>> >>>
>> >>> Eli
>> >>>
>> >>>
>> >>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>> >>>
>> >>>  http://pig.apache.org/docs/r0.****9.1/basic.html#cross<
>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross>
>> >>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<
>> http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>> >>>> >
>> >>>>
>> >>>> -Prashant
>> >>>>
>> >>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>>Finkelshteyn<iefinkel@gmail.****
>> >>>> com<ie...@gmail.com>
>> >>>>
>> >>>>  wrote:
>> >>>>>
>> >>>>  Hi Folks,
>> >>>>
>> >>>>> I'm currently trying to do something I figured would be trivial,
>>but
>> >>>>> actually wound up being a bit of work for me, so I'm wondering if
>>I'm
>> >>>>> missing something. All I want to do is get a cross product of two
>> >>>>> tuples.
>> >>>>> So for example, given an input of:
>> >>>>>
>> >>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>> >>>>>
>> >>>>> I'd get:
>> >>>>>
>> >>>>> ('hello', 'hola')
>> >>>>> ('hello', 'bonjour')
>> >>>>> ('howdy', 'hola')
>> >>>>> ('howdy', 'bonjour')
>> >>>>> ('hi', 'hola')
>> >>>>> ('hi', 'bonjour')
>> >>>>>
>> >>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>> that's
>> >>>>> no
>> >>>>> good cause the tuples are first themselves put into new tuples.
>>So,
>> >>>>> what
>> >>>>> I'm left with no is writing a dirty and slow python udf for this.
>>Is
>> >>>>> there
>> >>>>> really no better way to do this? I'd think it would be a pretty
>> >>>>> standard
>> >>>>> task.
>> >>>>>
>> >>>>> Eli
>> >>>>>
>> >>>>>
>> >>>>>
>> >
>>


Re: Cross Product of Two Tuples?

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
I would say the additional nesting level is a bug.
But we should check if we break stuff with this change.

Cheers,
--
Gianmarco



On Thu, Apr 5, 2012 at 01:36, Jonathan Coveney <jc...@gmail.com> wrote:

> Pig folks: it seems like it defies the expectation if TOBAG is run on a
> single TUPLE and you don't get a bag. I can patch it, but seem like a fair
> change?
>
> 2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
>
> > Nah, doesn't work because it doubles up the tuple, so that:
> >
> > TOBAG(('hello', 'howdy', 'hi'))
> > returns
> > {(('hello', 'howdy', 'hi'))}
> >
> > And so,
> >
> > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
> > gets me
> >
> > ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >
> > which is just what I started with.
> >
> > Anyway, to solve this problem, what I did was make a quick python udf to
> > make a bag from a tuple without doubling up the tuple, and then ran
> FLATTEN
> > on that, which looks like:
> >
> > bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**),
> > FLATTEN(py_udfs.tupleToBag(t2)**);
> >
> > Where the Python udf I'm using is:
> >
> > @outputSchema("b:bag{}")
> > def tupleToBag(tup):
> >    b = [tupify(i) for i in tupify(tup)]
> >    return b
> >
> > def tupify(tup):
> >    if isinstance(tup, tuple):
> >        return tup
> >    return (tup,)
> >
> > I'll add that into Python PiggyBank as soon as I get a chance to finish
> > that stuff up.
> >
> > Eli
> >
> >
> >
> > On 4/4/12 2:43 PM, Jonathan Coveney wrote:
> >
> >> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> >>
> >> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <ie...@gmail.com>>
> >>
> >>  That's for a relation only. Unless I'm missing something, it does not
> >>> work
> >>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
> >>>
> >>> Eli
> >>>
> >>>
> >>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> >>>
> >>>  http://pig.apache.org/docs/r0.****9.1/basic.html#cross<
> http://pig.apache.org/docs/r0.**9.1/basic.html#cross>
> >>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<
> http://pig.apache.org/docs/r0.9.1/basic.html#cross>
> >>>> >
> >>>>
> >>>> -Prashant
> >>>>
> >>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.****
> >>>> com<ie...@gmail.com>
> >>>>
> >>>>  wrote:
> >>>>>
> >>>>  Hi Folks,
> >>>>
> >>>>> I'm currently trying to do something I figured would be trivial, but
> >>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
> >>>>> missing something. All I want to do is get a cross product of two
> >>>>> tuples.
> >>>>> So for example, given an input of:
> >>>>>
> >>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >>>>>
> >>>>> I'd get:
> >>>>>
> >>>>> ('hello', 'hola')
> >>>>> ('hello', 'bonjour')
> >>>>> ('howdy', 'hola')
> >>>>> ('howdy', 'bonjour')
> >>>>> ('hi', 'hola')
> >>>>> ('hi', 'bonjour')
> >>>>>
> >>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
> that's
> >>>>> no
> >>>>> good cause the tuples are first themselves put into new tuples. So,
> >>>>> what
> >>>>> I'm left with no is writing a dirty and slow python udf for this. Is
> >>>>> there
> >>>>> really no better way to do this? I'd think it would be a pretty
> >>>>> standard
> >>>>> task.
> >>>>>
> >>>>> Eli
> >>>>>
> >>>>>
> >>>>>
> >
>

Re: Cross Product of Two Tuples?

Posted by Jonathan Coveney <jc...@gmail.com>.
Pig folks: it seems like it defies the expectation if TOBAG is run on a
single TUPLE and you don't get a bag. I can patch it, but seem like a fair
change?

2012/4/4 Eli Finkelshteyn <ie...@gmail.com>

> Nah, doesn't work because it doubles up the tuple, so that:
>
> TOBAG(('hello', 'howdy', 'hi'))
> returns
> {(('hello', 'howdy', 'hi'))}
>
> And so,
>
> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
> gets me
>
> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>
> which is just what I started with.
>
> Anyway, to solve this problem, what I did was make a quick python udf to
> make a bag from a tuple without doubling up the tuple, and then ran FLATTEN
> on that, which looks like:
>
> bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**),
> FLATTEN(py_udfs.tupleToBag(t2)**);
>
> Where the Python udf I'm using is:
>
> @outputSchema("b:bag{}")
> def tupleToBag(tup):
>    b = [tupify(i) for i in tupify(tup)]
>    return b
>
> def tupify(tup):
>    if isinstance(tup, tuple):
>        return tup
>    return (tup,)
>
> I'll add that into Python PiggyBank as soon as I get a chance to finish
> that stuff up.
>
> Eli
>
>
>
> On 4/4/12 2:43 PM, Jonathan Coveney wrote:
>
>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>>
>> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <ie...@gmail.com>>
>>
>>  That's for a relation only. Unless I'm missing something, it does not
>>> work
>>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>>
>>> Eli
>>>
>>>
>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>>
>>>  http://pig.apache.org/docs/r0.****9.1/basic.html#cross<http://pig.apache.org/docs/r0.**9.1/basic.html#cross>
>>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>>>> >
>>>>
>>>> -Prashant
>>>>
>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>> com<ie...@gmail.com>
>>>>
>>>>  wrote:
>>>>>
>>>>  Hi Folks,
>>>>
>>>>> I'm currently trying to do something I figured would be trivial, but
>>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>>>> missing something. All I want to do is get a cross product of two
>>>>> tuples.
>>>>> So for example, given an input of:
>>>>>
>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>>
>>>>> I'd get:
>>>>>
>>>>> ('hello', 'hola')
>>>>> ('hello', 'bonjour')
>>>>> ('howdy', 'hola')
>>>>> ('howdy', 'bonjour')
>>>>> ('hi', 'hola')
>>>>> ('hi', 'bonjour')
>>>>>
>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's
>>>>> no
>>>>> good cause the tuples are first themselves put into new tuples. So,
>>>>> what
>>>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>>>> there
>>>>> really no better way to do this? I'd think it would be a pretty
>>>>> standard
>>>>> task.
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>>
>

Re: Cross Product of Two Tuples?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Nah, doesn't work because it doubles up the tuple, so that:

TOBAG(('hello', 'howdy', 'hi'))
returns
{(('hello', 'howdy', 'hi'))}

And so,

FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
gets me
('hello', 'howdy', 'hi'), ('hola', 'bonjour')

which is just what I started with.

Anyway, to solve this problem, what I did was make a quick python udf to 
make a bag from a tuple without doubling up the tuple, and then ran 
FLATTEN on that, which looks like:

bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)), 
FLATTEN(py_udfs.tupleToBag(t2));

Where the Python udf I'm using is:

@outputSchema("b:bag{}")
def tupleToBag(tup):
     b = [tupify(i) for i in tupify(tup)]
     return b

def tupify(tup):
     if isinstance(tup, tuple):
         return tup
     return (tup,)

I'll add that into Python PiggyBank as soon as I get a chance to finish 
that stuff up.

Eli


On 4/4/12 2:43 PM, Jonathan Coveney wrote:
> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>
> 2012/4/4 Eli Finkelshteyn<ie...@gmail.com>
>
>> That's for a relation only. Unless I'm missing something, it does not work
>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>
>> Eli
>>
>>
>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>
>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>>>
>>> -Prashant
>>>
>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<ie...@gmail.com>
>>>> wrote:
>>>   Hi Folks,
>>>> I'm currently trying to do something I figured would be trivial, but
>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>>> missing something. All I want to do is get a cross product of two tuples.
>>>> So for example, given an input of:
>>>>
>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>
>>>> I'd get:
>>>>
>>>> ('hello', 'hola')
>>>> ('hello', 'bonjour')
>>>> ('howdy', 'hola')
>>>> ('howdy', 'bonjour')
>>>> ('hi', 'hola')
>>>> ('hi', 'bonjour')
>>>>
>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no
>>>> good cause the tuples are first themselves put into new tuples. So, what
>>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>>> there
>>>> really no better way to do this? I'd think it would be a pretty standard
>>>> task.
>>>>
>>>> Eli
>>>>
>>>>


Re: Cross Product of Two Tuples?

Posted by Scott Carey <sc...@richrelevance.com>.
Isn't it 

FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
or
FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1))
?

The inner tuple needs to be unpacked into a list of fields.  TOBAG simply
puts each element passed in into a bag, and if you pass t1 in there, it
will be a bag with only one item.

On 4/4/12 11:43 AM, "Jonathan Coveney" <jc...@gmail.com> wrote:

>FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>
>2012/4/4 Eli Finkelshteyn <ie...@gmail.com>
>
>> That's for a relation only. Unless I'm missing something, it does not
>>work
>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>
>> Eli
>>
>>
>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>
>>> 
>>>http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.o
>>>rg/docs/r0.9.1/basic.html#cross>
>>>
>>> -Prashant
>>>
>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>>>Finkelshteyn<ie...@gmail.com>
>>> >wrote:
>>>
>>>  Hi Folks,
>>>> I'm currently trying to do something I figured would be trivial, but
>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>>> missing something. All I want to do is get a cross product of two
>>>>tuples.
>>>> So for example, given an input of:
>>>>
>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>
>>>> I'd get:
>>>>
>>>> ('hello', 'hola')
>>>> ('hello', 'bonjour')
>>>> ('howdy', 'hola')
>>>> ('howdy', 'bonjour')
>>>> ('hi', 'hola')
>>>> ('hi', 'bonjour')
>>>>
>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>>>>that's no
>>>> good cause the tuples are first themselves put into new tuples. So,
>>>>what
>>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>>> there
>>>> really no better way to do this? I'd think it would be a pretty
>>>>standard
>>>> task.
>>>>
>>>> Eli
>>>>
>>>>
>>


Re: Cross Product of Two Tuples?

Posted by Jonathan Coveney <jc...@gmail.com>.
FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross

2012/4/4 Eli Finkelshteyn <ie...@gmail.com>

> That's for a relation only. Unless I'm missing something, it does not work
> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>
> Eli
>
>
> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>
>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>>
>> -Prashant
>>
>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<ie...@gmail.com>
>> >wrote:
>>
>>  Hi Folks,
>>> I'm currently trying to do something I figured would be trivial, but
>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>> missing something. All I want to do is get a cross product of two tuples.
>>> So for example, given an input of:
>>>
>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>
>>> I'd get:
>>>
>>> ('hello', 'hola')
>>> ('hello', 'bonjour')
>>> ('howdy', 'hola')
>>> ('howdy', 'bonjour')
>>> ('hi', 'hola')
>>> ('hi', 'bonjour')
>>>
>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no
>>> good cause the tuples are first themselves put into new tuples. So, what
>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>> there
>>> really no better way to do this? I'd think it would be a pretty standard
>>> task.
>>>
>>> Eli
>>>
>>>
>

Re: Cross Product of Two Tuples?

Posted by Eli Finkelshteyn <ie...@gmail.com>.
That's for a relation only. Unless I'm missing something, it does not 
work for tuples. What I'm doing what require a FOREACH, I'm thinking.

Eli

On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> http://pig.apache.org/docs/r0.9.1/basic.html#cross
>
> -Prashant
>
> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<ie...@gmail.com>wrote:
>
>> Hi Folks,
>> I'm currently trying to do something I figured would be trivial, but
>> actually wound up being a bit of work for me, so I'm wondering if I'm
>> missing something. All I want to do is get a cross product of two tuples.
>> So for example, given an input of:
>>
>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>
>> I'd get:
>>
>> ('hello', 'hola')
>> ('hello', 'bonjour')
>> ('howdy', 'hola')
>> ('howdy', 'bonjour')
>> ('hi', 'hola')
>> ('hi', 'bonjour')
>>
>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no
>> good cause the tuples are first themselves put into new tuples. So, what
>> I'm left with no is writing a dirty and slow python udf for this. Is there
>> really no better way to do this? I'd think it would be a pretty standard
>> task.
>>
>> Eli
>>


Re: Cross Product of Two Tuples?

Posted by Prashant Kommireddi <pr...@gmail.com>.
http://pig.apache.org/docs/r0.9.1/basic.html#cross

-Prashant

On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn <ie...@gmail.com>wrote:

> Hi Folks,
> I'm currently trying to do something I figured would be trivial, but
> actually wound up being a bit of work for me, so I'm wondering if I'm
> missing something. All I want to do is get a cross product of two tuples.
> So for example, given an input of:
>
> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>
> I'd get:
>
> ('hello', 'hola')
> ('hello', 'bonjour')
> ('howdy', 'hola')
> ('howdy', 'bonjour')
> ('hi', 'hola')
> ('hi', 'bonjour')
>
> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no
> good cause the tuples are first themselves put into new tuples. So, what
> I'm left with no is writing a dirty and slow python udf for this. Is there
> really no better way to do this? I'd think it would be a pretty standard
> task.
>
> Eli
>