You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Corinne Chandel (JIRA)" <ji...@apache.org> on 2009/12/04 00:26:21 UTC

[jira] Commented: (PIG-1081) PigCookBook use of PARALLEL keyword

    [ https://issues.apache.org/jira/browse/PIG-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785626#action_12785626 ] 

Corinne Chandel commented on PIG-1081:
--------------------------------------

Discussed with Viraj.

Documentation changes made.

Changes included in pig-6.patch attached to PIG-1084.

https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12440363

> PigCookBook use of PARALLEL keyword
> -----------------------------------
>
>                 Key: PIG-1081
>                 URL: https://issues.apache.org/jira/browse/PIG-1081
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.5.0
>            Reporter: Viraj Bhat
>             Fix For: 0.5.0
>
>
> Hi all,
>  I am looking at some tips for optimizing Pig programs (Pig Cookbook) using the PARALLEL keyword.
> http://hadoop.apache.org/pig/docs/r0.5.0/cookbook.html#Use+PARALLEL+Keyword 
> We know that currently Pig 0.5 uses Hadoop 20 (as its default) which launches 1 reducer for all cases. 
> In this documentation we state that: <num machines> * <num reduce slots per machine> * 0.9, this documentation was valid for HoD (Hadoop on Demand) where you are creating your own Hadoop clusters, but if you are using:
> Either the Capacity Scheduler http://hadoop.apache.org/common/docs/current/capacity_scheduler.html or the Fair Share Scheduler http://hadoop.apache.org/common/docs/current/fair_scheduler.html , these numbers could mean that you are using around 90% of your reducer slots in your machine.
> We should change this to something like: 
> The number of reducers you may need for a particular construct in Pig which forms a Map Reduce boundary depends entirely on your data and the number of intermediate keys you are generating in your mappers. In best cases we have seen that a reducer processing about 500 MB of data behaves efficiently. Additionally it is hard to define the optimum number of reducers, since it completely depends on the paritioner and distribution of map (combiner) output keys.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.