You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by John <jo...@gmail.com> on 2013/09/13 18:37:38 UTC

Problem while using merge join

Hi,

I try to use a merge join for 2 bags. Here is my pig code:
http://pastebin.com/Y9b2UtNk .

But I got this error:

Caused by:
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
Sort, or Load as its predecessors. Found

I think the reason is that there is no sort function or something like
this. But the bags are definitely sorted. How can I do the merge join?

thanks

Re: Problem while using merge join

Posted by John <jo...@gmail.com>.
As far as I know that is not possible, because I have to extend from the
LoadFunc Class and this class requires this method:

 public abstract Tuple getNext() throws IOException;

Or do you have another idea?

And yes, the columns are sorted. In my modified HbaseStorage Load function
is only one row loaded (In every case). So there are no conflicts with
other rows, because there are now other rows :)

btw. the batch(1) workaround works fine so far. It's not faster, but its
also not slower. So its okay for me. The merge join works now too, I had at
first exactly the same error like it's described here:
https://issues.apache.org/jira/browse/PIG-2495 ... but after adding the
lines from the patch, the merge join worked.

There is one issue left. If I try to join the the joined bag with another
bag I got this exception:

Caused by:
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
Sort, or Load as its predecessors. Found :

Here is the pig programm: http://pastebin.com/BeziRrdD . After the first
merge join the bag is sorted, or am I not right? Or do I have to execute a
sort after this!? Normaly I would join the 3 bags with a multi join, but
merge joins doesn't work with the merge feature.

regards,
john


2013/9/13 Pradeep Gollakota <pr...@gmail.com>

> I think a better option is to completely bypass the HBaseStorage mechanism.
> Since you've already modified it, just put your 2nd UDF in there and have
> it return the data that you need right away.
>
> Another question I have is, are you absolutely positive that your data will
> continue to be sorted if you've projected away the row key? The columns are
> only sorted intra-row.
>
>
> On Fri, Sep 13, 2013 at 12:06 PM, John <jo...@gmail.com> wrote:
>
> > Sure, it is not so fast while loading, but on the other hand I can safe
> the
> > foreach operation after the load function. The best way would be to get
> all
> > Columns and return a bag, but I see there no way because the LoadFunc
> > return a Tuple and no Bag. I will try this way and see how fast it is. If
> > there are other ideas to make that faster I will try it.
> >
> > regards,
> > john
> >
> >
> > 2013/9/13 Shahab Yunus <sh...@gmail.com>
> >
> > > Wouldn't this slow down your data retrieval? Once column in each call
> > > instead of a batch?
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Fri, Sep 13, 2013 at 2:34 PM, John <jo...@gmail.com>
> > wrote:
> > >
> > > > I think I might have found a way to transform it directly into a bag.
> > > > Inside the HBaseStorage() Load Function I have set the HBase scan
> batch
> > > to
> > > > 1, so I got for every scan.next() one column instead of all columns.
> > See
> > > >
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
> > > >
> > > > setBatch(int batch)
> > > > Set the maximum number of values to return for each call to next()
> > > >
> > > > I think this will work. Any idea if this way have disadvantages?
> > > >
> > > > regards
> > > >
> > > >
> > > > 2013/9/13 John <jo...@gmail.com>
> > > >
> > > > > hi,
> > > > >
> > > > > the join key is in the bag, thats the problem. The Load Function
> > > returns
> > > > > only one element 0$ and that is the map. This map is transformed in
> > the
> > > > > next step with the UDF "MapToBagUDF" into a bag. for example the
> load
> > > > > functions returns this ([col1,col2,col3), then this map inside the
> > > tuple
> > > > is
> > > > > transformed to:
> > > > >
> > > > > (col1)
> > > > > (col2)
> > > > > (col3)
> > > > >
> > > > > Maybe there is is way to transform the map directly in the load
> > > function
> > > > > into a bag? The problem I see is that the next() Method in the
> > LoadFunc
> > > > has
> > > > > to be a Tuple and no Bag. :/
> > > > >
> > > > >
> > > > >
> > > > > 2013/9/13 Pradeep Gollakota <pr...@gmail.com>
> > > > >
> > > > >> Since your join key is not in the Bag, can you do your join first
> > and
> > > > then
> > > > >> execute your UDF?
> > > > >>
> > > > >>
> > > > >> On Fri, Sep 13, 2013 at 10:04 AM, John <
> johnnyenglish739@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Okay, I think I have found the problem here:
> > > > >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ...
> > there
> > > is
> > > > >> > wirtten;
> > > > >> >
> > > > >> > There may be filter statements and foreach statements between
> the
> > > > sorted
> > > > >> > data source and the join statement. The foreach statement should
> > > meet
> > > > >> the
> > > > >> > following conditions:
> > > > >> >
> > > > >> >    - There should be no UDFs in the foreach statement.
> > > > >> >    - The foreach statement should not change the position of the
> > > join
> > > > >> keys.
> > > > >> >    - There should be no transformation on the join keys which
> will
> > > > >> change
> > > > >> >    the sort order.
> > > > >> >
> > > > >> >
> > > > >> > I have to use a UDF to transform the Map into a Bag ... any
> > > Workaround
> > > > >> > idea?
> > > > >> >
> > > > >> > thanks
> > > > >> >
> > > > >> >
> > > > >> > 2013/9/13 John <jo...@gmail.com>
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > I try to use a merge join for 2 bags. Here is my pig code:
> > > > >> > > http://pastebin.com/Y9b2UtNk .
> > > > >> > >
> > > > >> > > But I got this error:
> > > > >> > >
> > > > >> > > Caused by:
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > > > >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> > > > >> Ascending
> > > > >> > > Sort, or Load as its predecessors. Found
> > > > >> > >
> > > > >> > > I think the reason is that there is no sort function or
> > something
> > > > like
> > > > >> > > this. But the bags are definitely sorted. How can I do the
> merge
> > > > join?
> > > > >> > >
> > > > >> > > thanks
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Problem while using merge join

Posted by Pradeep Gollakota <pr...@gmail.com>.
I think a better option is to completely bypass the HBaseStorage mechanism.
Since you've already modified it, just put your 2nd UDF in there and have
it return the data that you need right away.

Another question I have is, are you absolutely positive that your data will
continue to be sorted if you've projected away the row key? The columns are
only sorted intra-row.


On Fri, Sep 13, 2013 at 12:06 PM, John <jo...@gmail.com> wrote:

> Sure, it is not so fast while loading, but on the other hand I can safe the
> foreach operation after the load function. The best way would be to get all
> Columns and return a bag, but I see there no way because the LoadFunc
> return a Tuple and no Bag. I will try this way and see how fast it is. If
> there are other ideas to make that faster I will try it.
>
> regards,
> john
>
>
> 2013/9/13 Shahab Yunus <sh...@gmail.com>
>
> > Wouldn't this slow down your data retrieval? Once column in each call
> > instead of a batch?
> >
> > Regards,
> > Shahab
> >
> >
> > On Fri, Sep 13, 2013 at 2:34 PM, John <jo...@gmail.com>
> wrote:
> >
> > > I think I might have found a way to transform it directly into a bag.
> > > Inside the HBaseStorage() Load Function I have set the HBase scan batch
> > to
> > > 1, so I got for every scan.next() one column instead of all columns.
> See
> > >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
> > >
> > > setBatch(int batch)
> > > Set the maximum number of values to return for each call to next()
> > >
> > > I think this will work. Any idea if this way have disadvantages?
> > >
> > > regards
> > >
> > >
> > > 2013/9/13 John <jo...@gmail.com>
> > >
> > > > hi,
> > > >
> > > > the join key is in the bag, thats the problem. The Load Function
> > returns
> > > > only one element 0$ and that is the map. This map is transformed in
> the
> > > > next step with the UDF "MapToBagUDF" into a bag. for example the load
> > > > functions returns this ([col1,col2,col3), then this map inside the
> > tuple
> > > is
> > > > transformed to:
> > > >
> > > > (col1)
> > > > (col2)
> > > > (col3)
> > > >
> > > > Maybe there is is way to transform the map directly in the load
> > function
> > > > into a bag? The problem I see is that the next() Method in the
> LoadFunc
> > > has
> > > > to be a Tuple and no Bag. :/
> > > >
> > > >
> > > >
> > > > 2013/9/13 Pradeep Gollakota <pr...@gmail.com>
> > > >
> > > >> Since your join key is not in the Bag, can you do your join first
> and
> > > then
> > > >> execute your UDF?
> > > >>
> > > >>
> > > >> On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Okay, I think I have found the problem here:
> > > >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ...
> there
> > is
> > > >> > wirtten;
> > > >> >
> > > >> > There may be filter statements and foreach statements between the
> > > sorted
> > > >> > data source and the join statement. The foreach statement should
> > meet
> > > >> the
> > > >> > following conditions:
> > > >> >
> > > >> >    - There should be no UDFs in the foreach statement.
> > > >> >    - The foreach statement should not change the position of the
> > join
> > > >> keys.
> > > >> >    - There should be no transformation on the join keys which will
> > > >> change
> > > >> >    the sort order.
> > > >> >
> > > >> >
> > > >> > I have to use a UDF to transform the Map into a Bag ... any
> > Workaround
> > > >> > idea?
> > > >> >
> > > >> > thanks
> > > >> >
> > > >> >
> > > >> > 2013/9/13 John <jo...@gmail.com>
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > I try to use a merge join for 2 bags. Here is my pig code:
> > > >> > > http://pastebin.com/Y9b2UtNk .
> > > >> > >
> > > >> > > But I got this error:
> > > >> > >
> > > >> > > Caused by:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > > >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> > > >> Ascending
> > > >> > > Sort, or Load as its predecessors. Found
> > > >> > >
> > > >> > > I think the reason is that there is no sort function or
> something
> > > like
> > > >> > > this. But the bags are definitely sorted. How can I do the merge
> > > join?
> > > >> > >
> > > >> > > thanks
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Problem while using merge join

Posted by John <jo...@gmail.com>.
Sure, it is not so fast while loading, but on the other hand I can safe the
foreach operation after the load function. The best way would be to get all
Columns and return a bag, but I see there no way because the LoadFunc
return a Tuple and no Bag. I will try this way and see how fast it is. If
there are other ideas to make that faster I will try it.

regards,
john


2013/9/13 Shahab Yunus <sh...@gmail.com>

> Wouldn't this slow down your data retrieval? Once column in each call
> instead of a batch?
>
> Regards,
> Shahab
>
>
> On Fri, Sep 13, 2013 at 2:34 PM, John <jo...@gmail.com> wrote:
>
> > I think I might have found a way to transform it directly into a bag.
> > Inside the HBaseStorage() Load Function I have set the HBase scan batch
> to
> > 1, so I got for every scan.next() one column instead of all columns. See
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
> >
> > setBatch(int batch)
> > Set the maximum number of values to return for each call to next()
> >
> > I think this will work. Any idea if this way have disadvantages?
> >
> > regards
> >
> >
> > 2013/9/13 John <jo...@gmail.com>
> >
> > > hi,
> > >
> > > the join key is in the bag, thats the problem. The Load Function
> returns
> > > only one element 0$ and that is the map. This map is transformed in the
> > > next step with the UDF "MapToBagUDF" into a bag. for example the load
> > > functions returns this ([col1,col2,col3), then this map inside the
> tuple
> > is
> > > transformed to:
> > >
> > > (col1)
> > > (col2)
> > > (col3)
> > >
> > > Maybe there is is way to transform the map directly in the load
> function
> > > into a bag? The problem I see is that the next() Method in the LoadFunc
> > has
> > > to be a Tuple and no Bag. :/
> > >
> > >
> > >
> > > 2013/9/13 Pradeep Gollakota <pr...@gmail.com>
> > >
> > >> Since your join key is not in the Bag, can you do your join first and
> > then
> > >> execute your UDF?
> > >>
> > >>
> > >> On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com>
> > >> wrote:
> > >>
> > >> > Okay, I think I have found the problem here:
> > >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there
> is
> > >> > wirtten;
> > >> >
> > >> > There may be filter statements and foreach statements between the
> > sorted
> > >> > data source and the join statement. The foreach statement should
> meet
> > >> the
> > >> > following conditions:
> > >> >
> > >> >    - There should be no UDFs in the foreach statement.
> > >> >    - The foreach statement should not change the position of the
> join
> > >> keys.
> > >> >    - There should be no transformation on the join keys which will
> > >> change
> > >> >    the sort order.
> > >> >
> > >> >
> > >> > I have to use a UDF to transform the Map into a Bag ... any
> Workaround
> > >> > idea?
> > >> >
> > >> > thanks
> > >> >
> > >> >
> > >> > 2013/9/13 John <jo...@gmail.com>
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > I try to use a merge join for 2 bags. Here is my pig code:
> > >> > > http://pastebin.com/Y9b2UtNk .
> > >> > >
> > >> > > But I got this error:
> > >> > >
> > >> > > Caused by:
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> > >> Ascending
> > >> > > Sort, or Load as its predecessors. Found
> > >> > >
> > >> > > I think the reason is that there is no sort function or something
> > like
> > >> > > this. But the bags are definitely sorted. How can I do the merge
> > join?
> > >> > >
> > >> > > thanks
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Problem while using merge join

Posted by Shahab Yunus <sh...@gmail.com>.
Wouldn't this slow down your data retrieval? Once column in each call
instead of a batch?

Regards,
Shahab


On Fri, Sep 13, 2013 at 2:34 PM, John <jo...@gmail.com> wrote:

> I think I might have found a way to transform it directly into a bag.
> Inside the HBaseStorage() Load Function I have set the HBase scan batch to
> 1, so I got for every scan.next() one column instead of all columns. See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
>
> setBatch(int batch)
> Set the maximum number of values to return for each call to next()
>
> I think this will work. Any idea if this way have disadvantages?
>
> regards
>
>
> 2013/9/13 John <jo...@gmail.com>
>
> > hi,
> >
> > the join key is in the bag, thats the problem. The Load Function returns
> > only one element 0$ and that is the map. This map is transformed in the
> > next step with the UDF "MapToBagUDF" into a bag. for example the load
> > functions returns this ([col1,col2,col3), then this map inside the tuple
> is
> > transformed to:
> >
> > (col1)
> > (col2)
> > (col3)
> >
> > Maybe there is is way to transform the map directly in the load function
> > into a bag? The problem I see is that the next() Method in the LoadFunc
> has
> > to be a Tuple and no Bag. :/
> >
> >
> >
> > 2013/9/13 Pradeep Gollakota <pr...@gmail.com>
> >
> >> Since your join key is not in the Bag, can you do your join first and
> then
> >> execute your UDF?
> >>
> >>
> >> On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com>
> >> wrote:
> >>
> >> > Okay, I think I have found the problem here:
> >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> >> > wirtten;
> >> >
> >> > There may be filter statements and foreach statements between the
> sorted
> >> > data source and the join statement. The foreach statement should meet
> >> the
> >> > following conditions:
> >> >
> >> >    - There should be no UDFs in the foreach statement.
> >> >    - The foreach statement should not change the position of the join
> >> keys.
> >> >    - There should be no transformation on the join keys which will
> >> change
> >> >    the sort order.
> >> >
> >> >
> >> > I have to use a UDF to transform the Map into a Bag ... any Workaround
> >> > idea?
> >> >
> >> > thanks
> >> >
> >> >
> >> > 2013/9/13 John <jo...@gmail.com>
> >> >
> >> > > Hi,
> >> > >
> >> > > I try to use a merge join for 2 bags. Here is my pig code:
> >> > > http://pastebin.com/Y9b2UtNk .
> >> > >
> >> > > But I got this error:
> >> > >
> >> > > Caused by:
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> >> Ascending
> >> > > Sort, or Load as its predecessors. Found
> >> > >
> >> > > I think the reason is that there is no sort function or something
> like
> >> > > this. But the bags are definitely sorted. How can I do the merge
> join?
> >> > >
> >> > > thanks
> >> > >
> >> >
> >>
> >
> >
>

Re: Problem while using merge join

Posted by John <jo...@gmail.com>.
I think I might have found a way to transform it directly into a bag.
Inside the HBaseStorage() Load Function I have set the HBase scan batch to
1, so I got for every scan.next() one column instead of all columns. See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

setBatch(int batch)
Set the maximum number of values to return for each call to next()

I think this will work. Any idea if this way have disadvantages?

regards


2013/9/13 John <jo...@gmail.com>

> hi,
>
> the join key is in the bag, thats the problem. The Load Function returns
> only one element 0$ and that is the map. This map is transformed in the
> next step with the UDF "MapToBagUDF" into a bag. for example the load
> functions returns this ([col1,col2,col3), then this map inside the tuple is
> transformed to:
>
> (col1)
> (col2)
> (col3)
>
> Maybe there is is way to transform the map directly in the load function
> into a bag? The problem I see is that the next() Method in the LoadFunc has
> to be a Tuple and no Bag. :/
>
>
>
> 2013/9/13 Pradeep Gollakota <pr...@gmail.com>
>
>> Since your join key is not in the Bag, can you do your join first and then
>> execute your UDF?
>>
>>
>> On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com>
>> wrote:
>>
>> > Okay, I think I have found the problem here:
>> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
>> > wirtten;
>> >
>> > There may be filter statements and foreach statements between the sorted
>> > data source and the join statement. The foreach statement should meet
>> the
>> > following conditions:
>> >
>> >    - There should be no UDFs in the foreach statement.
>> >    - The foreach statement should not change the position of the join
>> keys.
>> >    - There should be no transformation on the join keys which will
>> change
>> >    the sort order.
>> >
>> >
>> > I have to use a UDF to transform the Map into a Bag ... any Workaround
>> > idea?
>> >
>> > thanks
>> >
>> >
>> > 2013/9/13 John <jo...@gmail.com>
>> >
>> > > Hi,
>> > >
>> > > I try to use a merge join for 2 bags. Here is my pig code:
>> > > http://pastebin.com/Y9b2UtNk .
>> > >
>> > > But I got this error:
>> > >
>> > > Caused by:
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
>> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
>> Ascending
>> > > Sort, or Load as its predecessors. Found
>> > >
>> > > I think the reason is that there is no sort function or something like
>> > > this. But the bags are definitely sorted. How can I do the merge join?
>> > >
>> > > thanks
>> > >
>> >
>>
>
>

Re: Problem while using merge join

Posted by John <jo...@gmail.com>.
hi,

the join key is in the bag, thats the problem. The Load Function returns
only one element 0$ and that is the map. This map is transformed in the
next step with the UDF "MapToBagUDF" into a bag. for example the load
functions returns this ([col1,col2,col3), then this map inside the tuple is
transformed to:

(col1)
(col2)
(col3)

Maybe there is is way to transform the map directly in the load function
into a bag? The problem I see is that the next() Method in the LoadFunc has
to be a Tuple and no Bag. :/


2013/9/13 Pradeep Gollakota <pr...@gmail.com>

> Since your join key is not in the Bag, can you do your join first and then
> execute your UDF?
>
>
> On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com> wrote:
>
> > Okay, I think I have found the problem here:
> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> > wirtten;
> >
> > There may be filter statements and foreach statements between the sorted
> > data source and the join statement. The foreach statement should meet the
> > following conditions:
> >
> >    - There should be no UDFs in the foreach statement.
> >    - The foreach statement should not change the position of the join
> keys.
> >    - There should be no transformation on the join keys which will change
> >    the sort order.
> >
> >
> > I have to use a UDF to transform the Map into a Bag ... any Workaround
> > idea?
> >
> > thanks
> >
> >
> > 2013/9/13 John <jo...@gmail.com>
> >
> > > Hi,
> > >
> > > I try to use a merge join for 2 bags. Here is my pig code:
> > > http://pastebin.com/Y9b2UtNk .
> > >
> > > But I got this error:
> > >
> > > Caused by:
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
> > > Sort, or Load as its predecessors. Found
> > >
> > > I think the reason is that there is no sort function or something like
> > > this. But the bags are definitely sorted. How can I do the merge join?
> > >
> > > thanks
> > >
> >
>

Re: Problem while using merge join

Posted by Pradeep Gollakota <pr...@gmail.com>.
Since your join key is not in the Bag, can you do your join first and then
execute your UDF?


On Fri, Sep 13, 2013 at 10:04 AM, John <jo...@gmail.com> wrote:

> Okay, I think I have found the problem here:
> http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> wirtten;
>
> There may be filter statements and foreach statements between the sorted
> data source and the join statement. The foreach statement should meet the
> following conditions:
>
>    - There should be no UDFs in the foreach statement.
>    - The foreach statement should not change the position of the join keys.
>    - There should be no transformation on the join keys which will change
>    the sort order.
>
>
> I have to use a UDF to transform the Map into a Bag ... any Workaround
> idea?
>
> thanks
>
>
> 2013/9/13 John <jo...@gmail.com>
>
> > Hi,
> >
> > I try to use a merge join for 2 bags. Here is my pig code:
> > http://pastebin.com/Y9b2UtNk .
> >
> > But I got this error:
> >
> > Caused by:
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
> > Sort, or Load as its predecessors. Found
> >
> > I think the reason is that there is no sort function or something like
> > this. But the bags are definitely sorted. How can I do the merge join?
> >
> > thanks
> >
>

Re: Problem while using merge join

Posted by John <jo...@gmail.com>.
Okay, I think I have found the problem here:
http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
wirtten;

There may be filter statements and foreach statements between the sorted
data source and the join statement. The foreach statement should meet the
following conditions:

   - There should be no UDFs in the foreach statement.
   - The foreach statement should not change the position of the join keys.
   - There should be no transformation on the join keys which will change
   the sort order.


I have to use a UDF to transform the Map into a Bag ... any Workaround idea?

thanks


2013/9/13 John <jo...@gmail.com>

> Hi,
>
> I try to use a merge join for 2 bags. Here is my pig code:
> http://pastebin.com/Y9b2UtNk .
>
> But I got this error:
>
> Caused by:
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
> Sort, or Load as its predecessors. Found
>
> I think the reason is that there is no sort function or something like
> this. But the bags are definitely sorted. How can I do the merge join?
>
> thanks
>