You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/12/09 21:08:03 UTC

[jira] Created: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data

speculation should normalize progress rates based on amount of input data
-------------------------------------------------------------------------

                 Key: MAPREDUCE-2216
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Joydeep Sen Sarma


We frequently see skews in data distribution both on the mappers and reducers. The small ones finish quickly and the longer ones immediately get speculated. We should normalize progress rates used by speculation with some metric correlated to the amount of data processed by the task (like bytes read of rows processed). That will prevent these unnecessary speculations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated MAPREDUCE-2216:
-----------------------------------------

    Component/s: jobtracker

hard to believe there's not a jira open for this already - please close/redirect if that is so.

> speculation should normalize progress rates based on amount of input data
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Joydeep Sen Sarma
>
> We frequently see skews in data distribution both on the mappers and reducers. The small ones finish quickly and the longer ones immediately get speculated. We should normalize progress rates used by speculation with some metric correlated to the amount of data processed by the task (like bytes read of rows processed). That will prevent these unnecessary speculations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970055#action_12970055 ] 

Devaraj Das commented on MAPREDUCE-2216:
----------------------------------------

MAPREDUCE-718?

> speculation should normalize progress rates based on amount of input data
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Joydeep Sen Sarma
>
> We frequently see skews in data distribution both on the mappers and reducers. The small ones finish quickly and the longer ones immediately get speculated. We should normalize progress rates used by speculation with some metric correlated to the amount of data processed by the task (like bytes read of rows processed). That will prevent these unnecessary speculations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.