You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Coe <da...@chalklabs.net> on 2008/12/16 17:28:31 UTC
Output.collect uses toString for custom key class. Is it possible
to change this?
I've defined a custom key class that implements writable. I've noticed
that for use between the mapper and reducer the write and readFields are
actually used. However, when I use an identity reducer, toString is
called when I do something like output.collect(myClass, null)
Is there a way to output the write() instead?
Thank you.
Re: Output.collect uses toString for custom key class. Is it possible to change this?
Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 9:14 AM, David Coe wrote:
> Does the SequenceFileOutputFormat work with NullWritable as the value?
Yes.
Re: Output.collect uses toString for custom key class. Is it possible
to change this?
Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 8:28 AM, David Coe wrote:
>
>> Is there a way to output the write() instead?
>
>
> Use SequenceFileOutputFormat. It writes binary files using the write.
> The reverse is SequenceFileInputFormat, which reads the sequence files
> using readFields.
>
> -- Owen
Does the SequenceFileOutputFormat work with NullWritable as the value?
Re: Output.collect uses toString for custom key class. Is it possible to change this?
Posted by Aaron Kimball <aa...@cloudera.com>.
NullWritable has a get() method that returns the singleton instance of the
NullWritable.
- Aaron
On Tue, Dec 16, 2008 at 9:30 AM, David Coe <da...@chalklabs.net> wrote:
> Owen O'Malley wrote:
> >
> > On Dec 16, 2008, at 9:14 AM, David Coe wrote:
> >
> >> Does the SequenceFileOutputFormat work with NullWritable as the value?
> >
> > Yes.
>
> Owen O'Malley wrote:
> > It means you are trying to write a null value. Your reduce is doing
> > something like:
> >
> > output.collect(key, null);
> >
> > In TextOutputFormat, that is ok and just skips it.
> > SequenceFileOutputFormat doesn't like nulls.
> >
> > -- Owen
> Since the SequenceFileOutputFormat doesn't like nulls, how would I use
> NullWritable? Obviously output.collect(key, null) isn't working. If I
> change it to output.collect(key, new IntWritable()) I get the result I
> want (plus an int that I don't), but output.collect(key, new
> NullWritable()) does not work.
>
> Thanks again.
>
> David
>
>
Re: Output.collect uses toString for custom key class. Is it possible to change this?
Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 9:30 AM, David Coe wrote:
> Since the SequenceFileOutputFormat doesn't like nulls, how would I use
> NullWritable? Obviously output.collect(key, null) isn't working.
> If I
> change it to output.collect(key, new IntWritable()) I get the result I
> want (plus an int that I don't), but output.collect(key, new
> NullWritable()) does not work.
Sorry, I answered you literally. You can write a SequenceFile with
NullWritables as the values, but you really want optional nulls. I'd
probably define a Wrapper class like GenericWritable. It would look
something like:
class NullableWriable<T extends Writable> implements Writable {
private T instance;
private boolean isNull;
public void setNull(boolean isNull) {
this.isNull = isNull;
}
public void readFields(DataInput in) throws IOException {
read isNull;
if (!isNull) {
instance.readFields(in);
}
public void write(DataOutput out) throws IOException {
write isNull;
if (!isNull) {
instance.write(out);
}
}
}
-- Owen
Re: Output.collect uses toString for custom key class. Is it possible
to change this?
Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 9:14 AM, David Coe wrote:
>
>> Does the SequenceFileOutputFormat work with NullWritable as the value?
>
> Yes.
Owen O'Malley wrote:
> It means you are trying to write a null value. Your reduce is doing
> something like:
>
> output.collect(key, null);
>
> In TextOutputFormat, that is ok and just skips it.
> SequenceFileOutputFormat doesn't like nulls.
>
> -- Owen
Since the SequenceFileOutputFormat doesn't like nulls, how would I use
NullWritable? Obviously output.collect(key, null) isn't working. If I
change it to output.collect(key, new IntWritable()) I get the result I
want (plus an int that I don't), but output.collect(key, new
NullWritable()) does not work.
Thanks again.
David
Re: Output.collect uses toString for custom key class. Is it possible to change this?
Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 8:58 AM, David Coe wrote:
> Thank you for your swift response. I am getting this error when I try
> your suggestion:
>
> java.lang.NullPointerException
> at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
It means you are trying to write a null value. Your reduce is doing
something like:
output.collect(key, null);
In TextOutputFormat, that is ok and just skips it.
SequenceFileOutputFormat doesn't like nulls.
-- Owen
Re: Output.collect uses toString for custom key class. Is it possible
to change this?
Posted by David Coe <da...@chalklabs.net>.
Owen O'Malley wrote:
>
> On Dec 16, 2008, at 8:28 AM, David Coe wrote:
>
>> Is there a way to output the write() instead?
>
>
> Use SequenceFileOutputFormat. It writes binary files using the write.
> The reverse is SequenceFileInputFormat, which reads the sequence files
> using readFields.
>
> -- Owen
Thank you for your swift response. I am getting this error when I try
your suggestion:
java.lang.NullPointerException
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:385)
*My configuration:*
conf.setMapperClass(MyMap.class);
conf.setReducerClass(IdentityReducer.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
conf.setOutputKeyClass(MyClass.class);
*My mapper:*
public static class MyMap extends MapReduceBase implements
Mapper<LongWritable, Text, MyClass,NullWritable> {
public void map(LongWritable key, Text value,
OutputCollector<MyClass,NullWritable> output,
Reporter reporter) throws IOException {
*
My Class:*
public class MyClass implements Writable,
Comparable<MyClass> {
Which setting am I missing that results in the null pointer?
Thank you!!
Re: Output.collect uses toString for custom key class. Is it possible to change this?
Posted by Owen O'Malley <om...@apache.org>.
On Dec 16, 2008, at 8:28 AM, David Coe wrote:
> Is there a way to output the write() instead?
Use SequenceFileOutputFormat. It writes binary files using the write.
The reverse is SequenceFileInputFormat, which reads the sequence files
using readFields.
-- Owen