You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2009/11/05 05:02:32 UTC

[jira] Created: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

mapred.reduce.slowstart.completed.maps is too high by default
-------------------------------------------------------------

                 Key: MAPREDUCE-1184
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Allen Wittenauer


By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773969#action_12773969 ] 

Allen Wittenauer commented on MAPREDUCE-1184:
---------------------------------------------

>Why not let it be and change site-specific, job-specific configuration?

In my experience, users don't set this until they've been around the Hadoop block for a while, and even then, this one is easy to miss. 

The other reality is that few users only run "one" job.  It is much more typical to run a series of jobs as part of a work flow.  Doing specific, low-level tuning of every knob for every job is asking too much.  For those users that do want to do that, then they'll eventually hit this and tune appropriately.  But that doesn't mean we shouldn't ship a 'reasonable' default until they get around to setting it themselves.

>I think Allen's point is that the default 5% may be too low from the utilization perspective. 

... and that's exactly my point.  Inexperienced users wonder why all their reduce slots are not being utilized to get the max throughput of the grid.  They have one big job that has all the reduce slots gone, sometimes for hours at a time, when a smaller job has all of its maps finished and just needs a handful of reduces to go.  By setting this to reasonable default, chances are this very common case will disappear out-of-the-box.

While I think it would be great to see this tunable go away, that's not where we are at today.  So let's just set this to something reasonable and then look at the bigger problem at some later date.  There are bigger fish to fry. :)

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773768#action_12773768 ] 

Matei Zaharia commented on MAPREDUCE-1184:
------------------------------------------

This is a good idea. Ideally though, we might actually want slow start to depend on the amount of map output data and the rate at which data can be copied. If you have a job with only a few MB of map output per reducer, setting slow start as high as 95% isn't going to impact your response time too much. On the other hand, if you have a job where the maps "explode" the output and you know that the bulk of your time will be spent in the shuffle phase, you might want to set it lower.

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773792#action_12773792 ] 

Vinod K V commented on MAPREDUCE-1184:
--------------------------------------

This is a per-job configuration. And all the issues quoted above seem to be characteristic of the job in question. As such, no default value will ever cater to all the job characteristics. Why not let it be and change site-specific, job-specific configuration?

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too low by default

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880943#action_12880943 ] 

Allen Wittenauer commented on MAPREDUCE-1184:
---------------------------------------------

For the *edge case* where jobs benefit from having .05%, they can continue to set this.  The *average case*, from my experience, is that this is way way way too low.  To re-iterate: I'd like to change the *default* that Hadoop ships with to something more reasonable for the *average case*.

> mapred.reduce.slowstart.completed.maps is too low by default
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773798#action_12773798 ] 

Hong Tang commented on MAPREDUCE-1184:
--------------------------------------

I think Allen's point is that the default 5% may be too low from the utilization perspective.

My point (which may be shared with Matei) is that this really could be adaptively tuned by the MR framework (thus eliminating the need of a configuration knob).  Finally, back to my comment on turn around time, i think users should specify high level optimization objectives such as whether they care more about response time or throughput, and MR framework should adjust related parameters automatically. Granted, this is probably beyond the scope of this jira.

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773788#action_12773788 ] 

Hong Tang commented on MAPREDUCE-1184:
--------------------------------------

Another case, for tiny jobs that require fast turn around time, it would be better if we set the percentage to be 0.

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773791#action_12773791 ] 

Matei Zaharia commented on MAPREDUCE-1184:
------------------------------------------

Yeah, actually the 5% setting can be a source of latency for small jobs in my experience, because the maps will finish at roughly the same time, and you then need to wait a few seconds for a reducer to start up and to get the map completion events from the JobTracker. For these jobs, it might make sense to look at the rate at which maps are reporting progress and launch the reducers when it looks like the map will finish in the next 5 seconds. There are many other things that could be done to decrease the latency for small jobs however.

> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too low by default

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-1184:
----------------------------------------

    Affects Version/s: 0.20.1
                       0.20.2

> mapred.reduce.slowstart.completed.maps is too low by default
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too high by default

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774263#action_12774263 ] 

Vinod K V commented on MAPREDUCE-1184:
--------------------------------------


As such I an neutral to this change. But as I said before, this is site-specific, job-specific. And this comment I've made only after seeing different opinions on this issue itself.

Allen says in the description "This should be higher, probably around the 50% mark".
Hong says [here|https://issues.apache.org/jira/browse/MAPREDUCE-1184?focusedCommentId=12773788&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12773788] "Another case, for tiny jobs that require fast turn around time, it would be better if we set the percentage to be 0."

Either there is a cross-talk or I am missing something or these are different use-cases already. If we can agree to a default that is agreeable to everyone here, I am fine.


> mapred.reduce.slowstart.completed.maps is too high by default
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1184) mapred.reduce.slowstart.completed.maps is too low by default

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated MAPREDUCE-1184:
----------------------------------------

    Summary: mapred.reduce.slowstart.completed.maps is too low by default  (was: mapred.reduce.slowstart.completed.maps is too high by default)

> mapred.reduce.slowstart.completed.maps is too low by default
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-1184
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1184
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>
> By default, this value is set to 5%.  I believe for most real world situations the code isn't efficient enough to be set this low.  This should be higher, probably around the 50% mark, especially given the predominance of non-FIFO schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.