You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Lijie Xu (JIRA)" <ji...@apache.org> on 2013/08/15 17:35:47 UTC
[jira] [Created] (MAPREDUCE-5461) Let users be able to get latest
Key in reduce()
Lijie Xu created MAPREDUCE-5461:
-----------------------------------
Summary: Let users be able to get latest Key in reduce()
Key: MAPREDUCE-5461
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5461
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: task
Affects Versions: 1.2.1
Environment: Any environment
Reporter: Lijie Xu
Reducer generates <K, List(V)> for reduce(). In some cases such as SecondarySort, although current V and next V share the same K, their actual corresponding Ks are different. For example, in SecondarySort, map() outputs
Key Value
<1, 3> 3
<1, 1> 1
<2, 5> 5
<1, 8> 8
After partition by Key.getFirst(), sort and group by Key.getFirst(),
reducer gets:
Key Value
------Group 1------
<1, 1> 1
<1, 3> 3
<1, 8> 8
------Group 2------
<2, 5> 5
reduce() receives:
Key List<Value>
<1, 1> List<1, 3, 8>
<2, 5> List<5>
When invoking V.next(), we can get next V (e.g, 3). But we do not have API to get its corresponding Key (e.g, <1, 3>). We can only get the first Key (e.g., <1,1>).
If we let user be able to get latest key, SecondarySort does not need to emit value in map(). So that the network traffic is better.
Another example is Join. If we can get latest Key, we do need to put table label in both key and value.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira