You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "William Watson (JIRA)" <ji...@apache.org> on 2015/03/12 19:28:38 UTC

[jira] [Commented] (PIG-4458) Support UDFs in a FOREACH Before a Merge Join

    [ https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359126#comment-14359126 ] 

William Watson commented on PIG-4458:
-------------------------------------

I should explain, a little bit. Right now, you can run a foreach that changes the placement of the join key. This also messes with the results you get, but this validation doesn't check that. 

IMO, we should probably just document that one shouldn't run a UDF on a JOIN key or change the placement of a JOIN key and let UDFs work here by removing the !containsUDFs requirement.

> Support UDFs in a FOREACH Before a Merge Join
> ---------------------------------------------
>
>                 Key: PIG-4458
>                 URL: https://issues.apache.org/jira/browse/PIG-4458
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: William Watson
>
> Right now, the MapSideMergeValidator outright rejects any foreach that has a UDF in it:
> {code}
> private boolean isAcceptableForEachOp(Operator lo) throws LogicalToPhysicalTranslatorException {
>         if (lo instanceof LOForEach) {
>             OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan();
>             validateMapSideMerge(innerPlan.getSinks(), innerPlan);
>             return !containsUDFs((LOForEach) lo);
>         } else {
>             return false;
>         }
>     }
> {code}
> There is a TODO for this later on in that same class (inside containsUDFs):
> {code}
> // TODO (dvryaboy): in the future we could relax this rule by tracing what fields
> // are being passed into the UDF, and only refusing if the UDF is working on the
> // join key. Transforms of other fields should be ok.
> {code}
> We should do the TODO and relax this requirement or just remove it altogether



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)