You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2011/01/19 10:00:47 UTC

[jira] Created: (PIG-1810) Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number

Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number
-----------------------------------------------------------------------------------

                 Key: PIG-1810
                 URL: https://issues.apache.org/jira/browse/PIG-1810
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Jeff Zhang
            Assignee: Jeff Zhang
             Fix For: 0.9.0


Anup Point this problem in PIG-1249
{quote}
Anup added a comment - 18/Jan/11 07:46 PM
one thing that we didn't take care is the use of the hadoop parameter "mapred.reduce.tasks".
If I specify the hadoop parameter -Dmapred.reduce.tasks=450 for all the MR jobs , it is overwritten by estimateNumberOfReducers(conf,mro), which in my case is 15.
I am not specifying any default_parallel and PARALLEL statements.
Ideally, the number of reducer should be 450.

I think we should prioritize this parameter above the estimate reducers calculations.
The priority list should be

1. PARALLEL statement
2. default_parallel statement
3. mapred.reduce.task hadoop parameter
4. estimateNumberOfreducers();


{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Updated] (PIG-1810) Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1810:
--------------------------------

    Fix Version/s:     (was: 0.9.0)

Unlinking from the release since there is no activity and it is too late for new functionality to be added to 0.9

> Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1810
>                 URL: https://issues.apache.org/jira/browse/PIG-1810
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> Anup Point this problem in PIG-1249
> {quote}
> Anup added a comment - 18/Jan/11 07:46 PM
> one thing that we didn't take care is the use of the hadoop parameter "mapred.reduce.tasks".
> If I specify the hadoop parameter -Dmapred.reduce.tasks=450 for all the MR jobs , it is overwritten by estimateNumberOfReducers(conf,mro), which in my case is 15.
> I am not specifying any default_parallel and PARALLEL statements.
> Ideally, the number of reducer should be 450.
> I think we should prioritize this parameter above the estimate reducers calculations.
> The priority list should be
> 1. PARALLEL statement
> 2. default_parallel statement
> 3. mapred.reduce.task hadoop parameter
> 4. estimateNumberOfreducers();
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira