You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/12/30 12:10:58 UTC
[jira] [Commented] (MAPREDUCE-6827) Failed to traverse Iterable
values the second time in reduce() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787560#comment-15787560 ]
ASF GitHub Bot commented on MAPREDUCE-6827:
-------------------------------------------
GitHub user javeme opened a pull request:
https://github.com/apache/hadoop/pull/177
MAPREDUCE-6827. Failed to traverse Iterable values the second time in…
… reduce() method
The following code is a reduce() method (of WordCount):
public static class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// print some logs
List<String> vals = new LinkedList<>();
for(IntWritable i : values) {
vals.add(i.toString());
}
System.out.println(String.format(">>>> reduce(%s, [%s])",
key, String.join(", ", vals)));
// sum of values
int sum = 0;
for(IntWritable i : values) {
sum += i.get();
}
System.out.println(String.format(">>>> reduced(%s, %s)",
key, sum));
context.write(key, new IntWritable(sum));
}
}
After running it, we got the result that all sums were zero!
After debugging, it was found that the second foreach-loop was not executed, and the root cause was the returned value of Iterable.iterator(), it returned the same instance in the two calls by foreach-loop. In general, Iterable.iterator() should return a new instance in each call, such as ArrayList.iterator(). This patch fixed the bug.
Signed-off-by: Javeme <ja...@gmail.com>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/javeme/hadoop foreach-bug-of-ValueIterable
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hadoop/pull/177.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #177
----
commit 6c323fdc1a0013938d09b09b2e16061910a92c97
Author: Javeme <ja...@gmail.com>
Date: 2016-12-30T11:39:20Z
MAPREDUCE-6827. Failed to traverse Iterable values the second time in reduce() method
The following code is a reduce() method (of WordCount):
public static class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// print some logs
List<String> vals = new LinkedList<>();
for(IntWritable i : values) {
vals.add(i.toString());
}
System.out.println(String.format(">>>> reduce(%s, [%s])",
key, String.join(", ", vals)));
// sum of values
int sum = 0;
for(IntWritable i : values) {
sum += i.get();
}
System.out.println(String.format(">>>> reduced(%s, %s)",
key, sum));
context.write(key, new IntWritable(sum));
}
}
After running it, we got the result that all sums were zero!
After debugging, it was found that the second foreach-loop was not executed, and the root cause was the returned value of Iterable.iterator(), it returned the same instance in the two calls by foreach-loop. In general, Iterable.iterator() should return a new instance in each call, such as ArrayList.iterator(). This patch fixed the bug.
Signed-off-by: Javeme <ja...@gmail.com>
----
> Failed to traverse Iterable values the second time in reduce() method
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-6827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6827
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 3.0.0-alpha1
> Environment: hadoop2.7.3
> Reporter: javaloveme
>
> Failed to traverse Iterable values the second time in reduce() method
> The following code is a reduce() method (of WordCount):
> {code:title=WordCount.java|borderStyle=solid}
> public static class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
> @Override
> protected void reduce(Text key, Iterable<IntWritable> values, Context context)
> throws IOException, InterruptedException {
> // print some logs
> List<String> vals = new LinkedList<>();
> for(IntWritable i : values) {
> vals.add(i.toString());
> }
> System.out.println(String.format(">>>> reduce(%s, [%s])",
> key, String.join(", ", vals)));
> // sum of values
> int sum = 0;
> for(IntWritable i : values) {
> sum += i.get();
> }
> System.out.println(String.format(">>>> reduced(%s, %s)",
> key, sum));
>
> context.write(key, new IntWritable(sum));
> }
> }
> {code}
> After running it, we got the result that all sums were zero!
> After debugging, it was found that the second foreach-loop was not executed, and the root cause was the returned value of Iterable.iterator(), it returned the same instance in the two calls by foreach-loop. In general, Iterable.iterator() should return a new instance in each call, such as ArrayList.iterator().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org