You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2011/04/08 00:52:05 UTC

[jira] [Commented] (PIG-1848) Confusing statement for Merge Join -> Both Conditions in Pig reference manual1

    [ https://issues.apache.org/jira/browse/PIG-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017193#comment-13017193 ] 

Olga Natkovich commented on PIG-1848:
-------------------------------------

Indeed the section makes very little sense. 

Corinne, lets just completely eliminate the section and instead add a section called Performance Considerations which would contain 2 observations:

(1) If one of the data sets is small enough to fit into memory, Replicated Join is very likely to provide better performance
(2) You will also see better performance if the data in the left table partitioned evenly across part files (no significant skew and each part file contains at least one full block of data.

> Confusing statement for Merge Join -> Both Conditions in Pig reference manual1
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1848
>                 URL: https://issues.apache.org/jira/browse/PIG-1848
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Vivek Padmanabhan
>            Assignee: Olga Natkovich
>             Fix For: 0.9.0
>
>
> In Pig reference manual , http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins,
> for merge join under Both Conditions ,  the example statement is confusing.
> {quote}
> Both Conditions
> For optimal performance, each part file of the left (sorted) input of the join should have a size of at least 1 hdfs block size (for example if the hdfs block size is 128 MB, each part file should be less than 128 MB). 
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira