You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Christopher Surage <cs...@gmail.com> on 2014/03/25 20:01:45 UTC

Any way to join two aliases without using CROSS

I am trying to perform the following action, but the only solution I have
been able to come up with is using a CROSS, but I don't want to use that
statement as it is a very expensive process.

(1,2,3,4,5)          (10,11)
(1,2,4,5,7)          (10,11)
(1,5,7,8,9)          (10,11)


I want to make it
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)

any help would be much appreciated,

Chris

Re: Any way to join two aliases without using CROSS

Posted by Andrew Musselman <an...@gmail.com>.
In that situation you could write a script that tacks on the equivalent value that rank does, and stream the ordered relations through it.

I'm assuming you have a sense of order on both these relations.

After that join like you would after rank.

I'm not at a computer so can't type up an example.

> On Mar 25, 2014, at 1:57 PM, Christopher Surage <cs...@gmail.com> wrote:
> 
> I don't think my version of PIG supports the rank function, I keep getting
> Internal Error. I would update it, but I am not in control of the cluster.
> 
> 
> On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> 
>> John's answer about RANK sounds like it should solve your problem
>> 
>>>> On Mar 25, 2014, at 1:13 PM, Christopher Surage <cs...@gmail.com>
>>> wrote:
>>> 
>>> @ pradeep, I know what the cross product will do, but I have many lines
>> in
>>> many files. So the cross will take far too long to complete.
>>> 
>>> 
>>> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeepg26@gmail.com
>>> wrote:
>>> 
>>>> I don't understand what you're trying to do from your example.
>>>> 
>>>> If you perform a cross on the data you have, the output will be the
>>>> following:
>>>> 
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> 
>>>> On this, you'll have to do a distinct to get what you're looking for.
>>>> 
>>>> Let's change the example a little bit so we get a more clear
>> understanding
>>>> of your problem. What would be the output if your two relations looked
>> as
>>>> follows:
>>>> 
>>>> (1,2,3,4,5)          (10,11)
>>>> (1,2,4,5,7)          (10,12)
>>>> (1,5,7,8,9)          (10,13)
>>>> 
>>>> 
>>>> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
>>>>> wrote:
>>>> 
>>>>> Have you tried iterating over the first relation and in the nested
>>>>> *generate* clause, always appending the second relation? Your top level
>>>>> looping is on first relation but in the nested block you are sort of
>>>>> hardcoding appending of second relation.
>>>>> 
>>>>> I am referring to the examples like in  "Example: Nested Blocks"
>> section
>>>>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
>>>>> 
>>>>> 
>>>>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> I am trying to perform the following action, but the only solution I
>>>> have
>>>>>> been able to come up with is using a CROSS, but I don't want to use
>>>> that
>>>>>> statement as it is a very expensive process.
>>>>>> 
>>>>>> (1,2,3,4,5)          (10,11)
>>>>>> (1,2,4,5,7)          (10,11)
>>>>>> (1,5,7,8,9)          (10,11)
>>>>>> 
>>>>>> 
>>>>>> I want to make it
>>>>>> (1,2,3,4,5,10,11)
>>>>>> (1,2,4,5,7,10,11)
>>>>>> (1,5,7,8,9,10,11)
>>>>>> 
>>>>>> any help would be much appreciated,
>>>>>> 
>>>>>> Chris
>> 

Re: 回复:Re: Any way to join two aliases without using CROSS

Posted by Pradeep Gollakota <pr...@gmail.com>.
Unfortunately, the Enumerate UDF from DataFu would not work in this case.
The UDF works on Bags and in this case, we want to enumerate a relation.
Implementing RANK is a very tricky thing to do correctly. I'm not even sure
if it's doable just by using Pig operators, UDFs or macros. Best option is
probably to request a Pig upgrade.


On Tue, Mar 25, 2014 at 6:21 PM, James <al...@gmail.com> wrote:

> Hello,
>
> There is a similar UDF in DataFu named Enumerate.
>
> http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html
>
> I wish it may help.
>
> James

回复:Re: Any way to join two aliases without using CROSS

Posted by James <al...@gmail.com>.
Hello,

There is a similar UDF in DataFu named Enumerate. 
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html

I wish it may help. 

James

Re: Any way to join two aliases without using CROSS

Posted by Christopher Surage <cs...@gmail.com>.
I don't think my version of PIG supports the rank function, I keep getting
Internal Error. I would update it, but I am not in control of the cluster.


On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> John's answer about RANK sounds like it should solve your problem
>
> > On Mar 25, 2014, at 1:13 PM, Christopher Surage <cs...@gmail.com>
> wrote:
> >
> > @ pradeep, I know what the cross product will do, but I have many lines
> in
> > many files. So the cross will take far too long to complete.
> >
> >
> > On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeepg26@gmail.com
> >wrote:
> >
> >> I don't understand what you're trying to do from your example.
> >>
> >> If you perform a cross on the data you have, the output will be the
> >> following:
> >>
> >> (1,2,3,4,5,10,11)
> >> (1,2,3,4,5,10,11)
> >> (1,2,3,4,5,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,2,4,5,7,10,11)
> >> (1,5,7,8,9,10,11)
> >> (1,5,7,8,9,10,11)
> >> (1,5,7,8,9,10,11)
> >>
> >> On this, you'll have to do a distinct to get what you're looking for.
> >>
> >> Let's change the example a little bit so we get a more clear
> understanding
> >> of your problem. What would be the output if your two relations looked
> as
> >> follows:
> >>
> >> (1,2,3,4,5)          (10,11)
> >> (1,2,4,5,7)          (10,12)
> >> (1,5,7,8,9)          (10,13)
> >>
> >>
> >> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> >>> wrote:
> >>
> >>> Have you tried iterating over the first relation and in the nested
> >>> *generate* clause, always appending the second relation? Your top level
> >>> looping is on first relation but in the nested block you are sort of
> >>> hardcoding appending of second relation.
> >>>
> >>> I am referring to the examples like in  "Example: Nested Blocks"
> section
> >>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >>>
> >>>
> >>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> >>>> wrote:
> >>>
> >>>> I am trying to perform the following action, but the only solution I
> >> have
> >>>> been able to come up with is using a CROSS, but I don't want to use
> >> that
> >>>> statement as it is a very expensive process.
> >>>>
> >>>> (1,2,3,4,5)          (10,11)
> >>>> (1,2,4,5,7)          (10,11)
> >>>> (1,5,7,8,9)          (10,11)
> >>>>
> >>>>
> >>>> I want to make it
> >>>> (1,2,3,4,5,10,11)
> >>>> (1,2,4,5,7,10,11)
> >>>> (1,5,7,8,9,10,11)
> >>>>
> >>>> any help would be much appreciated,
> >>>>
> >>>> Chris
> >>
>

Re: Any way to join two aliases without using CROSS

Posted by Andrew Musselman <an...@gmail.com>.
John's answer about RANK sounds like it should solve your problem

> On Mar 25, 2014, at 1:13 PM, Christopher Surage <cs...@gmail.com> wrote:
> 
> @ pradeep, I know what the cross product will do, but I have many lines in
> many files. So the cross will take far too long to complete.
> 
> 
> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pr...@gmail.com>wrote:
> 
>> I don't understand what you're trying to do from your example.
>> 
>> If you perform a cross on the data you have, the output will be the
>> following:
>> 
>> (1,2,3,4,5,10,11)
>> (1,2,3,4,5,10,11)
>> (1,2,3,4,5,10,11)
>> (1,2,4,5,7,10,11)
>> (1,2,4,5,7,10,11)
>> (1,2,4,5,7,10,11)
>> (1,5,7,8,9,10,11)
>> (1,5,7,8,9,10,11)
>> (1,5,7,8,9,10,11)
>> 
>> On this, you'll have to do a distinct to get what you're looking for.
>> 
>> Let's change the example a little bit so we get a more clear understanding
>> of your problem. What would be the output if your two relations looked as
>> follows:
>> 
>> (1,2,3,4,5)          (10,11)
>> (1,2,4,5,7)          (10,12)
>> (1,5,7,8,9)          (10,13)
>> 
>> 
>> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
>>> wrote:
>> 
>>> Have you tried iterating over the first relation and in the nested
>>> *generate* clause, always appending the second relation? Your top level
>>> looping is on first relation but in the nested block you are sort of
>>> hardcoding appending of second relation.
>>> 
>>> I am referring to the examples like in  "Example: Nested Blocks" section
>>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
>>> 
>>> 
>>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
>>>> wrote:
>>> 
>>>> I am trying to perform the following action, but the only solution I
>> have
>>>> been able to come up with is using a CROSS, but I don't want to use
>> that
>>>> statement as it is a very expensive process.
>>>> 
>>>> (1,2,3,4,5)          (10,11)
>>>> (1,2,4,5,7)          (10,11)
>>>> (1,5,7,8,9)          (10,11)
>>>> 
>>>> 
>>>> I want to make it
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> 
>>>> any help would be much appreciated,
>>>> 
>>>> Chris
>> 

Re: Any way to join two aliases without using CROSS

Posted by Christopher Surage <cs...@gmail.com>.
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.


On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pr...@gmail.com>wrote:

> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the data you have, the output will be the
> following:
>
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
>
> On this, you'll have to do a distinct to get what you're looking for.
>
> Let's change the example a little bit so we get a more clear understanding
> of your problem. What would be the output if your two relations looked as
> follows:
>
> (1,2,3,4,5)          (10,11)
> (1,2,4,5,7)          (10,12)
> (1,5,7,8,9)          (10,13)
>
>
> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> >wrote:
>
> > Have you tried iterating over the first relation and in the nested
> > *generate* clause, always appending the second relation? Your top level
> > looping is on first relation but in the nested block you are sort of
> > hardcoding appending of second relation.
> >
> > I am referring to the examples like in  "Example: Nested Blocks" section
> > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >
> >
> > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> > >wrote:
> >
> > > I am trying to perform the following action, but the only solution I
> have
> > > been able to come up with is using a CROSS, but I don't want to use
> that
> > > statement as it is a very expensive process.
> > >
> > > (1,2,3,4,5)          (10,11)
> > > (1,2,4,5,7)          (10,11)
> > > (1,5,7,8,9)          (10,11)
> > >
> > >
> > > I want to make it
> > > (1,2,3,4,5,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,5,7,8,9,10,11)
> > >
> > > any help would be much appreciated,
> > >
> > > Chris
> > >
> >
>

Re: Any way to join two aliases without using CROSS

Posted by Christopher Surage <cs...@gmail.com>.
yes


On Tue, Mar 25, 2014 at 4:07 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Oh, sorry. This new example is something different from what I understood
> before. I thought you were only trying to append one relation (with one
> tuple) to another (which has more than one tuple).
>
> So essentially you want to loop over 2 collection and combine their tuples.
> Are they always going to be same size (number of tuples)?
>
>
> On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage <csurage@gmail.com
> >wrote:
>
> > The output I would like to see is
> >
> > (1,2,3,4,5,10,11)
> > (1,2,4,5,7,10,12)
> > (1,5,7,8,9,10,13)
> >
> >
> > On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeepg26@gmail.com
> > >wrote:
> >
> > > I don't understand what you're trying to do from your example.
> > >
> > > If you perform a cross on the data you have, the output will be the
> > > following:
> > >
> > > (1,2,3,4,5,10,11)
> > > (1,2,3,4,5,10,11)
> > > (1,2,3,4,5,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,5,7,8,9,10,11)
> > > (1,5,7,8,9,10,11)
> > > (1,5,7,8,9,10,11)
> > >
> > > On this, you'll have to do a distinct to get what you're looking for.
> > >
> > > Let's change the example a little bit so we get a more clear
> > understanding
> > > of your problem. What would be the output if your two relations looked
> as
> > > follows:
> > >
> > > (1,2,3,4,5)          (10,11)
> > > (1,2,4,5,7)          (10,12)
> > > (1,5,7,8,9)          (10,13)
> > >
> > >
> > > On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> > > >wrote:
> > >
> > > > Have you tried iterating over the first relation and in the nested
> > > > *generate* clause, always appending the second relation? Your top
> level
> > > > looping is on first relation but in the nested block you are sort of
> > > > hardcoding appending of second relation.
> > > >
> > > > I am referring to the examples like in  "Example: Nested Blocks"
> > section
> > > > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> > > >
> > > >
> > > > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <
> csurage@gmail.com
> > > > >wrote:
> > > >
> > > > > I am trying to perform the following action, but the only solution
> I
> > > have
> > > > > been able to come up with is using a CROSS, but I don't want to use
> > > that
> > > > > statement as it is a very expensive process.
> > > > >
> > > > > (1,2,3,4,5)          (10,11)
> > > > > (1,2,4,5,7)          (10,11)
> > > > > (1,5,7,8,9)          (10,11)
> > > > >
> > > > >
> > > > > I want to make it
> > > > > (1,2,3,4,5,10,11)
> > > > > (1,2,4,5,7,10,11)
> > > > > (1,5,7,8,9,10,11)
> > > > >
> > > > > any help would be much appreciated,
> > > > >
> > > > > Chris
> > > > >
> > > >
> > >
> >
>

Re: Any way to join two aliases without using CROSS

Posted by Shahab Yunus <sh...@gmail.com>.
Oh, sorry. This new example is something different from what I understood
before. I thought you were only trying to append one relation (with one
tuple) to another (which has more than one tuple).

So essentially you want to loop over 2 collection and combine their tuples.
Are they always going to be same size (number of tuples)?


On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage <cs...@gmail.com>wrote:

> The output I would like to see is
>
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeepg26@gmail.com
> >wrote:
>
> > I don't understand what you're trying to do from your example.
> >
> > If you perform a cross on the data you have, the output will be the
> > following:
> >
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> >
> > On this, you'll have to do a distinct to get what you're looking for.
> >
> > Let's change the example a little bit so we get a more clear
> understanding
> > of your problem. What would be the output if your two relations looked as
> > follows:
> >
> > (1,2,3,4,5)          (10,11)
> > (1,2,4,5,7)          (10,12)
> > (1,5,7,8,9)          (10,13)
> >
> >
> > On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> > >wrote:
> >
> > > Have you tried iterating over the first relation and in the nested
> > > *generate* clause, always appending the second relation? Your top level
> > > looping is on first relation but in the nested block you are sort of
> > > hardcoding appending of second relation.
> > >
> > > I am referring to the examples like in  "Example: Nested Blocks"
> section
> > > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> > >
> > >
> > > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> > > >wrote:
> > >
> > > > I am trying to perform the following action, but the only solution I
> > have
> > > > been able to come up with is using a CROSS, but I don't want to use
> > that
> > > > statement as it is a very expensive process.
> > > >
> > > > (1,2,3,4,5)          (10,11)
> > > > (1,2,4,5,7)          (10,11)
> > > > (1,5,7,8,9)          (10,11)
> > > >
> > > >
> > > > I want to make it
> > > > (1,2,3,4,5,10,11)
> > > > (1,2,4,5,7,10,11)
> > > > (1,5,7,8,9,10,11)
> > > >
> > > > any help would be much appreciated,
> > > >
> > > > Chris
> > > >
> > >
> >
>

Re: Any way to join two aliases without using CROSS

Posted by John Meagher <jo...@gmail.com>.
Try this:  http://pig.apache.org/docs/r0.11.0/basic.html#rank
Rank each data set then join on the rank.

On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage <cs...@gmail.com> wrote:
> The output I would like to see is
>
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pr...@gmail.com>wrote:
>
>> I don't understand what you're trying to do from your example.
>>
>> If you perform a cross on the data you have, the output will be the
>> following:
>>
>> (1,2,3,4,5,10,11)
>> (1,2,3,4,5,10,11)
>> (1,2,3,4,5,10,11)
>> (1,2,4,5,7,10,11)
>> (1,2,4,5,7,10,11)
>> (1,2,4,5,7,10,11)
>> (1,5,7,8,9,10,11)
>> (1,5,7,8,9,10,11)
>> (1,5,7,8,9,10,11)
>>
>> On this, you'll have to do a distinct to get what you're looking for.
>>
>> Let's change the example a little bit so we get a more clear understanding
>> of your problem. What would be the output if your two relations looked as
>> follows:
>>
>> (1,2,3,4,5)          (10,11)
>> (1,2,4,5,7)          (10,12)
>> (1,5,7,8,9)          (10,13)
>>
>>
>> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
>> >wrote:
>>
>> > Have you tried iterating over the first relation and in the nested
>> > *generate* clause, always appending the second relation? Your top level
>> > looping is on first relation but in the nested block you are sort of
>> > hardcoding appending of second relation.
>> >
>> > I am referring to the examples like in  "Example: Nested Blocks" section
>> > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
>> >
>> >
>> > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
>> > >wrote:
>> >
>> > > I am trying to perform the following action, but the only solution I
>> have
>> > > been able to come up with is using a CROSS, but I don't want to use
>> that
>> > > statement as it is a very expensive process.
>> > >
>> > > (1,2,3,4,5)          (10,11)
>> > > (1,2,4,5,7)          (10,11)
>> > > (1,5,7,8,9)          (10,11)
>> > >
>> > >
>> > > I want to make it
>> > > (1,2,3,4,5,10,11)
>> > > (1,2,4,5,7,10,11)
>> > > (1,5,7,8,9,10,11)
>> > >
>> > > any help would be much appreciated,
>> > >
>> > > Chris
>> > >
>> >
>>

Re: Any way to join two aliases without using CROSS

Posted by Pradeep Gollakota <pr...@gmail.com>.
CROSS is by definition a very very expensive operation. Regardless, CROSS
is the wrong operator for what you're trying to do.

As was suggested by others, you want to RANK the relations then do a JOIN
by the rank.


On Tue, Mar 25, 2014 at 1:27 PM, <wi...@thomsonreuters.com> wrote:

> Here is how to use rank and join for this problem:
>
> sh cat xxx
> 1,2,3,4,5
> 1,2,4,5,7
> 1,5,7,8,9
>
> sh cat yyy
> 10,11
> 10,12
> 10,13
>
>
> a= load 'xxx' using PigStorage(',');
> b= load 'yyy' using PigStorage(',');
>
> a2 = rank a;
> b2 = rank b;
>
> c = join a1 by $0, b2 by $0;
> c2 = order c by $6;
> c3 = foreach c2 generate $1 .. $5, $7 ..;
>
> dump c3
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: Christopher Surage [mailto:csurage@gmail.com]
> Sent: Tuesday, March 25, 2014 4:03 PM
> To: user@pig.apache.org
> Subject: Re: Any way to join two aliases without using CROSS
>
> The output I would like to see is
>
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeepg26@gmail.com
> >wrote:
>
> > I don't understand what you're trying to do from your example.
> >
> > If you perform a cross on the data you have, the output will be the
> > following:
> >
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,3,4,5,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,2,4,5,7,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> > (1,5,7,8,9,10,11)
> >
> > On this, you'll have to do a distinct to get what you're looking for.
> >
> > Let's change the example a little bit so we get a more clear
> understanding
> > of your problem. What would be the output if your two relations looked as
> > follows:
> >
> > (1,2,3,4,5)          (10,11)
> > (1,2,4,5,7)          (10,12)
> > (1,5,7,8,9)          (10,13)
> >
> >
> > On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> > >wrote:
> >
> > > Have you tried iterating over the first relation and in the nested
> > > *generate* clause, always appending the second relation? Your top level
> > > looping is on first relation but in the nested block you are sort of
> > > hardcoding appending of second relation.
> > >
> > > I am referring to the examples like in  "Example: Nested Blocks"
> section
> > > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> > >
> > >
> > > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> > > >wrote:
> > >
> > > > I am trying to perform the following action, but the only solution I
> > have
> > > > been able to come up with is using a CROSS, but I don't want to use
> > that
> > > > statement as it is a very expensive process.
> > > >
> > > > (1,2,3,4,5)          (10,11)
> > > > (1,2,4,5,7)          (10,11)
> > > > (1,5,7,8,9)          (10,11)
> > > >
> > > >
> > > > I want to make it
> > > > (1,2,3,4,5,10,11)
> > > > (1,2,4,5,7,10,11)
> > > > (1,5,7,8,9,10,11)
> > > >
> > > > any help would be much appreciated,
> > > >
> > > > Chris
> > > >
> > >
> >
>

RE: Any way to join two aliases without using CROSS

Posted by wi...@thomsonreuters.com.
Here is how to use rank and join for this problem:

sh cat xxx
1,2,3,4,5
1,2,4,5,7
1,5,7,8,9

sh cat yyy
10,11
10,12
10,13


a= load 'xxx' using PigStorage(',');
b= load 'yyy' using PigStorage(',');

a2 = rank a;
b2 = rank b;

c = join a1 by $0, b2 by $0;
c2 = order c by $6;
c3 = foreach c2 generate $1 .. $5, $7 ..;

dump c3
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)


William F Dowling
Senior Technologist
Thomson Reuters


-----Original Message-----
From: Christopher Surage [mailto:csurage@gmail.com] 
Sent: Tuesday, March 25, 2014 4:03 PM
To: user@pig.apache.org
Subject: Re: Any way to join two aliases without using CROSS

The output I would like to see is

(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)


On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pr...@gmail.com>wrote:

> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the data you have, the output will be the
> following:
>
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
>
> On this, you'll have to do a distinct to get what you're looking for.
>
> Let's change the example a little bit so we get a more clear understanding
> of your problem. What would be the output if your two relations looked as
> follows:
>
> (1,2,3,4,5)          (10,11)
> (1,2,4,5,7)          (10,12)
> (1,5,7,8,9)          (10,13)
>
>
> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> >wrote:
>
> > Have you tried iterating over the first relation and in the nested
> > *generate* clause, always appending the second relation? Your top level
> > looping is on first relation but in the nested block you are sort of
> > hardcoding appending of second relation.
> >
> > I am referring to the examples like in  "Example: Nested Blocks" section
> > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >
> >
> > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> > >wrote:
> >
> > > I am trying to perform the following action, but the only solution I
> have
> > > been able to come up with is using a CROSS, but I don't want to use
> that
> > > statement as it is a very expensive process.
> > >
> > > (1,2,3,4,5)          (10,11)
> > > (1,2,4,5,7)          (10,11)
> > > (1,5,7,8,9)          (10,11)
> > >
> > >
> > > I want to make it
> > > (1,2,3,4,5,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,5,7,8,9,10,11)
> > >
> > > any help would be much appreciated,
> > >
> > > Chris
> > >
> >
>

Re: Any way to join two aliases without using CROSS

Posted by Christopher Surage <cs...@gmail.com>.
The output I would like to see is

(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)


On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pr...@gmail.com>wrote:

> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the data you have, the output will be the
> following:
>
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
>
> On this, you'll have to do a distinct to get what you're looking for.
>
> Let's change the example a little bit so we get a more clear understanding
> of your problem. What would be the output if your two relations looked as
> follows:
>
> (1,2,3,4,5)          (10,11)
> (1,2,4,5,7)          (10,12)
> (1,5,7,8,9)          (10,13)
>
>
> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yunus@gmail.com
> >wrote:
>
> > Have you tried iterating over the first relation and in the nested
> > *generate* clause, always appending the second relation? Your top level
> > looping is on first relation but in the nested block you are sort of
> > hardcoding appending of second relation.
> >
> > I am referring to the examples like in  "Example: Nested Blocks" section
> > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >
> >
> > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> > >wrote:
> >
> > > I am trying to perform the following action, but the only solution I
> have
> > > been able to come up with is using a CROSS, but I don't want to use
> that
> > > statement as it is a very expensive process.
> > >
> > > (1,2,3,4,5)          (10,11)
> > > (1,2,4,5,7)          (10,11)
> > > (1,5,7,8,9)          (10,11)
> > >
> > >
> > > I want to make it
> > > (1,2,3,4,5,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,5,7,8,9,10,11)
> > >
> > > any help would be much appreciated,
> > >
> > > Chris
> > >
> >
>

Re: Any way to join two aliases without using CROSS

Posted by Pradeep Gollakota <pr...@gmail.com>.
I don't understand what you're trying to do from your example.

If you perform a cross on the data you have, the output will be the
following:

(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)

On this, you'll have to do a distinct to get what you're looking for.

Let's change the example a little bit so we get a more clear understanding
of your problem. What would be the output if your two relations looked as
follows:

(1,2,3,4,5)          (10,11)
(1,2,4,5,7)          (10,12)
(1,5,7,8,9)          (10,13)


On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Have you tried iterating over the first relation and in the nested
> *generate* clause, always appending the second relation? Your top level
> looping is on first relation but in the nested block you are sort of
> hardcoding appending of second relation.
>
> I am referring to the examples like in  "Example: Nested Blocks" section
> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
>
>
> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csurage@gmail.com
> >wrote:
>
> > I am trying to perform the following action, but the only solution I have
> > been able to come up with is using a CROSS, but I don't want to use that
> > statement as it is a very expensive process.
> >
> > (1,2,3,4,5)          (10,11)
> > (1,2,4,5,7)          (10,11)
> > (1,5,7,8,9)          (10,11)
> >
> >
> > I want to make it
> > (1,2,3,4,5,10,11)
> > (1,2,4,5,7,10,11)
> > (1,5,7,8,9,10,11)
> >
> > any help would be much appreciated,
> >
> > Chris
> >
>

Re: Any way to join two aliases without using CROSS

Posted by Shahab Yunus <sh...@gmail.com>.
Have you tried iterating over the first relation and in the nested
*generate* clause, always appending the second relation? Your top level
looping is on first relation but in the nested block you are sort of
hardcoding appending of second relation.

I am referring to the examples like in  "Example: Nested Blocks" section
http://pig.apache.org/docs/r0.10.0/basic.html#foreach


On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <cs...@gmail.com>wrote:

> I am trying to perform the following action, but the only solution I have
> been able to come up with is using a CROSS, but I don't want to use that
> statement as it is a very expensive process.
>
> (1,2,3,4,5)          (10,11)
> (1,2,4,5,7)          (10,11)
> (1,5,7,8,9)          (10,11)
>
>
> I want to make it
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,11)
> (1,5,7,8,9,10,11)
>
> any help would be much appreciated,
>
> Chris
>