You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2009/12/02 12:04:20 UTC

[jira] Commented: (PIG-1116) Remove redundant map-reduce job for merge join

    [ https://issues.apache.org/jira/browse/PIG-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784763#action_12784763 ] 

Hadoop QA commented on PIG-1116:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426637/PIG-1116.patch
  against trunk revision 886015.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/74/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/74/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/74/console

This message is automatically generated.

> Remove redundant map-reduce job for merge join
> ----------------------------------------------
>
>                 Key: PIG-1116
>                 URL: https://issues.apache.org/jira/browse/PIG-1116
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Pradeep Kamath
>             Fix For: 0.6.0
>
>         Attachments: PIG-1116.patch
>
>
> In merge join, when we convert right hand side file into a side file, we didn't remove it from the map-reduce plan, we only disconnect it from the plan. When we run the query, the redundant load will load the data but doing nothing. This operation should be removed entirely. 
> Eg: 
> a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, gpa);
> b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, registration, contributions);
> c = join a by name, b by name using "merge";
> explain c;
> {code}
> #--------------------------------------------------
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-21
> Map Plan
> Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/votersortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted')) - 1-13--------
> Global sort: false
> ----------------
> MapReduce node 1-20
> Map Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19
> |
> |---MergeJoin[tuple] - 1-16
>     |
>     |---Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/studentsortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted')) - 1-12--------
> Global sort: false
> ----------------
> {code}
> 1-21 should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.