You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Vasco Visser <va...@gmail.com> on 2012/09/07 15:32:09 UTC

POCollectedGroup and LoadFunc indicator interface

Hi,

Hi I am new to the list. I've been working on the Pig code base,
adding my own blocking map side POs (e.g., map side join, map side
grouping) for when assertions can be made with regard to fragmentation
of input relations. Partly inspired by the new block placement policy
possibilities in hadoop-2.

Anyway, my question to the list is the following. Whilst looking at
the code for POCollectedGroup I noticed that this PO expects split
content to be sorted. On the other hand the Collectable loader
interface only seems to indicate that keys are unique across splits.
Why is this discrepancy? Is there a good reason not to have a
indicator interface that captures all input requirements, e.g., smt
like OrderedCollectableLoadFunc.


regards,
Vasco

Re: POCollectedGroup and LoadFunc indicator interface

Posted by Alan Gates <ga...@hortonworks.com>.
You are correct, this would be better named OrderedCollectableLoadFunc.  I suspect the way this happened is that this is usually used on the output of MapReduce jobs.  In that case (at least in MR1) the keys are sorted as well as guaranteed to be in a particular part file.  

Alan.

On Sep 7, 2012, at 6:32 AM, Vasco Visser wrote:

> Hi,
> 
> Hi I am new to the list. I've been working on the Pig code base,
> adding my own blocking map side POs (e.g., map side join, map side
> grouping) for when assertions can be made with regard to fragmentation
> of input relations. Partly inspired by the new block placement policy
> possibilities in hadoop-2.
> 
> Anyway, my question to the list is the following. Whilst looking at
> the code for POCollectedGroup I noticed that this PO expects split
> content to be sorted. On the other hand the Collectable loader
> interface only seems to indicate that keys are unique across splits.
> Why is this discrepancy? Is there a good reason not to have a
> indicator interface that captures all input requirements, e.g., smt
> like OrderedCollectableLoadFunc.
> 
> 
> regards,
> Vasco