You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shi Jin <ji...@gmail.com> on 2011/11/05 06:32:03 UTC

When can I use null for Reducer value?

Hi there,

I am learning hadoop and looking at the two example Java
codes SecondarySort.java and WordCount.java, using the latest stable
version 0.20.203.0.

One interesting feature I found in the SecondarySort.java code is the use
of the null for the value sent by the reducer.
The code is copied as below:

 public static class Reduce

         extends Reducer<IntPair, IntWritable, Text, IntWritable> {

*    private static final Text SEPARATOR =*

*      new Text("------------------------------------------------");*

    private final Text first = new Text();


    @Override

    public void reduce(IntPair key, Iterable<IntWritable> values,

                       Context context

                       ) throws IOException, InterruptedException {

     * context.write(SEPARATOR, null);*

      first.set(Integer.toString(key.getFirst()));

      for(IntWritable value: values) {

        context.write(first, value);

      }

    }

  }


What I am interested is
> private static final Text SEPARATOR =new
Text("------------------------------------------------");
and
> context.write(SEPARATOR, null);

I think this is a nice way to control the format the output file (like
adding comments, separators etc).

So I added the same code to the WordCounter.java example code (I made a
copy of it and called it WordCounter2).
I have the almost identical code:
*
*

  public static class IntSumReducer

       extends Reducer<Text,IntWritable,Text,IntWritable> {

    *private IntWritable result = new IntWritable();*

*        private static final Text SEPARATOR =*

      new Text("------------------------------------------------");



        @Override

    public void reduce(Text key, Iterable<IntWritable> values,

                       Context context

                       ) throws IOException, InterruptedException {

     * context.write(SEPARATOR, null);*
*...*

*
*
I had no problem building the code but when I ran it, I got the following
error:

ubuntu@ubuntu-gui:~/hadoop$ hadoop jar WordCount2.jar WordCount2 /test.txt
/wc2result7
11/11/05 04:54:09 INFO input.FileInputFormat: Total input paths to process
: 1
11/11/05 04:54:09 INFO mapred.JobClient: Running job: job_201111021955_0052
11/11/05 04:54:10 INFO mapred.JobClient:  map 0% reduce 0%
11/11/05 04:54:24 INFO mapred.JobClient: Task Id :
attempt_201111021955_0052_m_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:166)
at
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1078)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1399)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at WordCount2$IntSumReducer.reduce(WordCount2.java:46)
at WordCount2$IntSumReducer.reduce(WordCount2.java:35)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)



This error message points me right back to context.write(SEPARATOR, null);

So now I am very confused. Why does the same code works for one while not
the other?
Could anyone please help me here?
Thanks.

Shi

*
*

Re: When can I use null for Reducer value?

Posted by Shi Jin <ji...@gmail.com>.
OK, I solved my problem myself.

The difference between the two examples are use the of Combiner. If I
simply disable the use of combiner in the WordCount code, the null works
perfectly fine.

Cheers,

Shi

On Fri, Nov 4, 2011 at 11:32 PM, Shi Jin <ji...@gmail.com> wrote:

> Hi there,
>
> I am learning hadoop and looking at the two example Java
> codes SecondarySort.java and WordCount.java, using the latest stable
> version 0.20.203.0.
>
> One interesting feature I found in the SecondarySort.java code is the use
> of the null for the value sent by the reducer.
> The code is copied as below:
>
>  public static class Reduce
>
>          extends Reducer<IntPair, IntWritable, Text, IntWritable> {
>
> *    private static final Text SEPARATOR =*
>
> *      new Text("------------------------------------------------");*
>
>     private final Text first = new Text();
>
>
>     @Override
>
>     public void reduce(IntPair key, Iterable<IntWritable> values,
>
>                        Context context
>
>                        ) throws IOException, InterruptedException {
>
>      * context.write(SEPARATOR, null);*
>
>       first.set(Integer.toString(key.getFirst()));
>
>       for(IntWritable value: values) {
>
>         context.write(first, value);
>
>       }
>
>     }
>
>   }
>
>
> What I am interested is
> > private static final Text SEPARATOR =new
> Text("------------------------------------------------");
> and
> > context.write(SEPARATOR, null);
>
> I think this is a nice way to control the format the output file (like
> adding comments, separators etc).
>
> So I added the same code to the WordCounter.java example code (I made a
> copy of it and called it WordCounter2).
> I have the almost identical code:
> *
> *
>
>   public static class IntSumReducer
>
>        extends Reducer<Text,IntWritable,Text,IntWritable> {
>
>     *private IntWritable result = new IntWritable();*
>
> *        private static final Text SEPARATOR =*
>
>       new Text("------------------------------------------------");
>
>
>
>         @Override
>
>     public void reduce(Text key, Iterable<IntWritable> values,
>
>                        Context context
>
>                        ) throws IOException, InterruptedException {
>
>      * context.write(SEPARATOR, null);*
> *...*
>
> *
> *
> I had no problem building the code but when I ran it, I got the following
> error:
>
> ubuntu@ubuntu-gui:~/hadoop$ hadoop jar WordCount2.jar WordCount2
> /test.txt /wc2result7
> 11/11/05 04:54:09 INFO input.FileInputFormat: Total input paths to process
> : 1
> 11/11/05 04:54:09 INFO mapred.JobClient: Running job: job_201111021955_0052
> 11/11/05 04:54:10 INFO mapred.JobClient:  map 0% reduce 0%
> 11/11/05 04:54:24 INFO mapred.JobClient: Task Id :
> attempt_201111021955_0052_m_000000_0, Status : FAILED
> java.lang.NullPointerException
>  at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:166)
> at
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1078)
>  at
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1399)
>  at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at WordCount2$IntSumReducer.reduce(WordCount2.java:46)
>  at WordCount2$IntSumReducer.reduce(WordCount2.java:35)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>  at
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
>  at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at org.apache.hadoop.mapred.Child.main(Child.java:253)
>
>
>
> This error message points me right back to context.write(SEPARATOR, null);
>
> So now I am very confused. Why does the same code works for one while not
> the other?
> Could anyone please help me here?
> Thanks.
>
> Shi
>
> *
> *
>
>


-- 
Shi Jin, Ph.D.