You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Coe <da...@chalklabs.net> on 2008/12/16 17:28:31 UTC

Output.collect uses toString for custom key class. Is it possible to change this?

I've defined a custom key class that implements writable.  I've noticed
that for use between the mapper and reducer the write and readFields are
actually used.  However, when I use an identity reducer, toString is
called when I do something like output.collect(myClass, null)

Is there a way to output the write() instead?

Thank you.

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 9:14 AM, David Coe wrote:

> Does the SequenceFileOutputFormat work with NullWritable as the value?

Yes.

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 8:28 AM, David Coe wrote:
>
>> Is there a way to output the write() instead?
>
>
> Use SequenceFileOutputFormat. It writes binary files using the write.
> The reverse is SequenceFileInputFormat, which reads the sequence files
> using readFields.
>
> -- Owen
Does the SequenceFileOutputFormat work with NullWritable as the value?

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by Aaron Kimball <aa...@cloudera.com>.
NullWritable has a get() method that returns the singleton instance of the
NullWritable.
- Aaron

On Tue, Dec 16, 2008 at 9:30 AM, David Coe <da...@chalklabs.net> wrote:

> Owen O'Malley wrote:
> >
> > On Dec 16, 2008, at 9:14 AM, David Coe wrote:
> >
> >> Does the SequenceFileOutputFormat work with NullWritable as the value?
> >
> > Yes.
>
> Owen O'Malley wrote:
> > It means you are trying to write a null value. Your reduce is doing
> > something like:
> >
> > output.collect(key, null);
> >
> > In TextOutputFormat, that is ok and just skips it.
> > SequenceFileOutputFormat doesn't like nulls.
> >
> > -- Owen
> Since the SequenceFileOutputFormat doesn't like nulls, how would I use
> NullWritable?  Obviously output.collect(key, null) isn't working.  If I
> change it to output.collect(key, new IntWritable()) I get the result I
> want (plus an int that I don't), but output.collect(key, new
> NullWritable()) does not work.
>
> Thanks again.
>
> David
>
>

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 9:30 AM, David Coe wrote:

> Since the SequenceFileOutputFormat doesn't like nulls, how would I use
> NullWritable?  Obviously output.collect(key, null) isn't working.   
> If I
> change it to output.collect(key, new IntWritable()) I get the result I
> want (plus an int that I don't), but output.collect(key, new
> NullWritable()) does not work.

Sorry, I answered you literally. You can write a SequenceFile with  
NullWritables as the values, but you really want optional nulls. I'd  
probably define a Wrapper class like GenericWritable. It would look  
something like:

class NullableWriable<T extends Writable> implements Writable {
   private T instance;
   private boolean isNull;
   public void setNull(boolean isNull) {
     this.isNull = isNull;
   }
   public void readFields(DataInput in) throws IOException {
     read isNull;
     if (!isNull) {
        instance.readFields(in);
   }
   public void write(DataOutput out) throws IOException {
     write isNull;
     if (!isNull) {
        instance.write(out);
     }
   }
}

-- Owen

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 9:14 AM, David Coe wrote:
>
>> Does the SequenceFileOutputFormat work with NullWritable as the value?
>
> Yes. 

Owen O'Malley wrote:
> It means you are trying to write a null value. Your reduce is doing
> something like:
>
> output.collect(key, null);
>
> In TextOutputFormat, that is ok and just skips it.
> SequenceFileOutputFormat doesn't like nulls.
>
> -- Owen
Since the SequenceFileOutputFormat doesn't like nulls, how would I use
NullWritable?  Obviously output.collect(key, null) isn't working.  If I
change it to output.collect(key, new IntWritable()) I get the result I
want (plus an int that I don't), but output.collect(key, new
NullWritable()) does not work.

Thanks again.

David


Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 8:58 AM, David Coe wrote:
> Thank you for your swift response.  I am getting this error when I try
> your suggestion:
>
> java.lang.NullPointerException
>    at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)

It means you are trying to write a null value. Your reduce is doing  
something like:

output.collect(key, null);

In TextOutputFormat, that is ok and just skips it.  
SequenceFileOutputFormat doesn't like nulls.

-- Owen

Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 8:28 AM, David Coe wrote:
>
>> Is there a way to output the write() instead?
>
>
> Use SequenceFileOutputFormat. It writes binary files using the write.
> The reverse is SequenceFileInputFormat, which reads the sequence files
> using readFields.
>
> -- Owen
Thank you for your swift response.  I am getting this error when I try
your suggestion:

java.lang.NullPointerException
    at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
    at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
    at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:385)


*My configuration:*
conf.setMapperClass(MyMap.class);
conf.setReducerClass(IdentityReducer.class);
       
conf.setOutputFormat(SequenceFileOutputFormat.class);
conf.setOutputKeyClass(MyClass.class);

*My mapper:*
public static class MyMap extends MapReduceBase implements
            Mapper<LongWritable, Text, MyClass,NullWritable> {

        public void map(LongWritable key, Text value,
                OutputCollector<MyClass,NullWritable> output,
                Reporter reporter) throws IOException {
*
My Class:*
public class MyClass implements Writable,
        Comparable<MyClass> {

Which setting am I missing that results in the null pointer?

Thank  you!!


Re: Output.collect uses toString for custom key class. Is it possible to change this?

Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 8:28 AM, David Coe wrote:

> Is there a way to output the write() instead?


Use SequenceFileOutputFormat. It writes binary files using the write.  
The reverse is SequenceFileInputFormat, which reads the sequence files  
using readFields.

-- Owen