You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jothi Padmanabhan (JIRA)" <ji...@apache.org> on 2009/09/05 06:42:57 UTC

[jira] Created: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
--------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-956
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Jothi Padmanabhan


For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753647#action_12753647 ] 

Arun C Murthy commented on MAPREDUCE-956:
-----------------------------------------

I can see hte appeal of this, but we should remember that there are applications where merge is a significant part of the reduce runtime e.g. petasort's merge was _huge_.

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752048#action_12752048 ] 

Ravi Gummadi commented on MAPREDUCE-956:
----------------------------------------

We could say the phases as Shuffle phase and Reduce phase. But we need to investigate how we want to update progress in shuffle phase --- because updating progress of shuffle phase just based on 'copy of map outputs' would not be a correct way as there could be some merges that would take some time after all map outputs are copied to this reduce node(even though some merges happen while some map outputs are being copied).

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753614#action_12753614 ] 

Tom White commented on MAPREDUCE-956:
-------------------------------------

It's true that the merge occurs on the map side too. So this change sounds reasonable to me.

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753149#action_12753149 ] 

Jothi Padmanabhan commented on MAPREDUCE-956:
---------------------------------------------

True, we do have a final merge before feeding the reducer. However, assigning 33% of progress for this one final merge does not seem to be correct.  In cases where the number of files at that time is < io.sort.factor, this final merge does not even occur, we start feeding the reducer straight away. Also, since we have merges happening during shuffle phase as well, I was just proposing that we delineate  as
Shuffle (50%)
Final Merge + Reduce (50%)



> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "YangLai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786303#action_12786303 ] 

YangLai commented on MAPREDUCE-956:
-----------------------------------

I have a scenario that the output of shuffle phase is exact what I want, so the sort phase and reduce phase are not necessary to me and cause a lot of overheads. I dont know how get the output of shuffle phase in hadoop 0.19.1 or 0.20.1. Maybe the sort phase should be optional to developers.

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753123#action_12753123 ] 

Tom White commented on MAPREDUCE-956:
-------------------------------------

The sort phase is actually when the map-outputs are being merged prior to being fed to the reducer. Could you give a bit more detail about what has changed - presumably the merging still takes place, so perhaps "sort phase" should just be renamed to "merge phase". 

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated MAPREDUCE-956:
----------------------------------------

          Component/s: task
    Affects Version/s: 0.21.0

> Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its current form,  is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.