You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Binglin Chang (Created) (JIRA)" <ji...@apache.org> on 2011/11/14 11:10:52 UTC

[jira] [Created] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Support no sort dataflow in map output and reduce merge phrase
--------------------------------------------------------------

Key: MAPREDUCE-3397
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: task
Affects Versions: 0.20.205.0
Reporter: Binglin Chang
Assignee: Binglin Chang

In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
** reduce0: 0-100000
** reduce1: 100000-200000
** ...
** reduceM: xxx-max adv-id
Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
By supporting no sort, we can gain a lot of performance improvements:
# Eliminate map side sort & merge.
KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
Just merge spill segments one by one, doesn't need to use heap merge.
# Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.

For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Binglin Chang (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150208#comment-13150208 ] 

Binglin Chang commented on MAPREDUCE-3397:
------------------------------------------

I think why no sort make sense is that, in many cases application has a more efficient way to process data(such as do aggregation on the fly), they don't want the framework to do some sort of heavy weighted data preprocessing, cause they have better prior knowledge/understanding about the data and the goal.

                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Binglin Chang (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150201#comment-13150201 ] 

Binglin Chang commented on MAPREDUCE-3397:
------------------------------------------

No, grouping is not the same as no sort:
# Grouping still needs shuffle phrase barrier;
# In grouping kv pairs of the same key are grouped together, but in no sort kv pairs of the same key may not grouped together, framework only promise they are in the same partition(reduce).


                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185110#comment-13185110 ] 

Robert Joseph Evans commented on MAPREDUCE-3397:
------------------------------------------------

I have a couple of comments about the patch.  NOTE I have just done a quick pass and have not looked in depth at the code.  There are several places that I want to pick apart before giving my approval.

# Combiners are not compatible with mapred.map.output.sort.  Is there a reason why we could not make combiners work with this, so long as they must follow the same assumption that they will not get sorted input?  If the algorithms you are thinking about would never get any benefit from a combiner, could you also add the check in the client.  I would much rather have the client blow up with an error instead of waiting for my map tasks to launch and then blow up 4+ times before I get the error.
# In your test you never validate that the output is what you expected it to be.  That may be hard as it may not be deterministic because there is no sorting, but it would be nice to have something verify that the code did work as expected.  Not just that it did not crash.
# mapred-default.xml Please add mapred.map.output.sort to mapred-default.xml.  Include with it a brief explanation of what it does.
# There is no documentation or examples.  This is a new feature that could be very useful to lots of people, but if they never know it is there it will not be used.  Could you include in your patch updates to the documentation about how to use this, and some useful examples, preferably simple. Perhaps an example computing CTR would be nice.
# Performance.  The entire reason for this change is to improve performance, but I have not seen any numbers showing a performance improvement.  No numbers at all in fact.  It would be great if you could include here some numbers along with the code you used for your benchmark and a description of your setup.  I have spent time on different performance teams, and performance improvement efforts from a huge search engine to an OS on a cell phone and the one thing I have learned is that you have to go off of the numbers because well at least for me my intuition is often wrong and what I thought would make it faster slowed it down instead.
# Trunk.  This patch is specific to 0.20/1.0 line.  Before this can get merged into the 0.20/1.0 lines we really need an equivalent patch for trunk, and possibly 0.21, 0.22, and 0.23.  This is so there are no regressions.  It may be a while off after you get the 1.0 patch cleaned up though. 
                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Binglin Chang (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Binglin Chang updated MAPREDUCE-3397:
-------------------------------------

    Attachment: MAPREDUCE-3397-nosort.v1.patch

A preview patch supporting no sort.
                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Binglin Chang (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184719#comment-13184719 ] 

Binglin Chang commented on MAPREDUCE-3397:
------------------------------------------

This is already fully functional patch. I have done some simple tests both using localRunner and on a real 17 node cluster. To disable sort, set jobconf mapred.map.output.sort=false. Then the reduce phrase will start before all Map Task have done, and reducer can not expect input k/v pairs to be grouped by key & sorted.


                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "anty.rao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184107#comment-13184107 ] 

anty.rao commented on MAPREDUCE-3397:
-------------------------------------

The submitted path do not yet eliminated the shuffle phase barrier?
                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3397) Support no sort dataflow in map output and reduce merge phrase

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150093#comment-13150093 ] 

Aaron T. Myers commented on MAPREDUCE-3397:
-------------------------------------------

Is this not a duplicate of MAPREDUCE-1639?
                
> Support no sort dataflow in map output and reduce merge phrase
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3397
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 0.20.205.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-3397-nosort.v1.patch
>
>
> In our experience, many data aggregation style queries/jobs don't need to sort the intermediate data. In fact reducer side can use hashmap or even array to do application level aggregations. For example, consider computing CTR using display log & click log in sponsored search. Map side just emit (adv_id, clk_cnt, dis_cnt), reduce side aggregate clk_cnt and dis_cnt for every adv_id, cause adv_id is integer, we can partition adv_id by range:
> ** reduce0: 0-100000
> ** reduce1: 100000-200000
> ** ...
> ** reduceM: xxx-max adv-id
> Then the reducer can use an array(for example: int [1000000][2]) to store the aggregated clk_cnt & dis_cnt, and we don't need the framework to sort intermediate data anymore.
> By supporting no sort, we can gain a lot of performance improvements:
> # Eliminate map side sort & merge. 
>   KV paris need to sort by partition first, but this can be done using a liner time counting sort, which is much faster than quick sort.
>   Just merge spill segments one by one, doesn't need to use heap merge.
> # Eliminate shuffle phrase barrier, reducer can start to processing data before all map output data are copied & merged.
> For most cases, memory won't be a problem, cause keys are divided to many partitions, each reducers only process a small subset of the global key set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira