You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rakhi Khatwani <rk...@gmail.com> on 2010/01/29 14:04:36 UTC
Using custom Input Format
Hi,
i have been trying to implement custom input and output formats. i
was successful enough in creating an custom output format. but whn i call a
mapreduce method which takes in a file, using custom input format, i get an
exception.
java.lang.NullPointerException
at Beans.Content.write(Content.java:54)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:32)
at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
0 Hello World0 Hello World0 Hello World0
id:: null media:: null url:: null content::null
id:: null media:: null url:: null content::null
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at
CustomInputFormat.SampleCustomInputMapReduce.run(SampleCustomInputMapReduce.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
CustomInputFormat.SampleCustomInputMapReduce.main(SampleCustomInputMapReduce.java:59)
I have attached the following files
Content => Custom object that implements writercomparible
ContentInputFormatV2 => inputformat which implements SequenceFileInputFormat
ContentRecordReader => implementation of RecordReader (Not required though,
it should work w/o it i assume).
SampleCustomInputMap => mapper class
SampleCustomInputReduce => reducer class
SampleCustomInputMapReduce => class which contains the main method and the
job configurations.
data and index => my inputfiles for the main function
For the 1st record it works fine, but for next record and records after
that, i get null. where could i go wrong??
Regards,
Raakhi
Re: Using custom Input Format
Posted by Antonio D'Ettole <co...@gmail.com>.
Rakhi,
I've recently had to implement a custom InputFormat that's pretty basic
(every split is a list of integers basically).
You can check it out here http://github.com/codazzo/MultiRow
One guy also implemented a custom InputFormat and wrote about it on his
blog http://codedemigod.com/blog/?p=120
Hope that helps.
Antonio
On Fri, Jan 29, 2010 at 2:04 PM, Rakhi Khatwani <rk...@gmail.com> wrote:
> Hi,
> i have been trying to implement custom input and output formats. i
> was successful enough in creating an custom output format. but whn i call a
> mapreduce method which takes in a file, using custom input format, i get an
> exception.
> java.lang.NullPointerException
> at Beans.Content.write(Content.java:54)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
> at
> CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:32)
> at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:1)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 0 Hello World0 Hello World0 Hello World0
>
> id:: null media:: null url:: null content::null
>
> id:: null media:: null url:: null content::null
> java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> at
> CustomInputFormat.SampleCustomInputMapReduce.run(SampleCustomInputMapReduce.java:53)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> CustomInputFormat.SampleCustomInputMapReduce.main(SampleCustomInputMapReduce.java:59)
>
>
> I have attached the following files
> Content => Custom object that implements writercomparible
> ContentInputFormatV2 => inputformat which implements
> SequenceFileInputFormat
> ContentRecordReader => implementation of RecordReader (Not required though,
> it should work w/o it i assume).
> SampleCustomInputMap => mapper class
> SampleCustomInputReduce => reducer class
> SampleCustomInputMapReduce => class which contains the main method and the
> job configurations.
> data and index => my inputfiles for the main function
> For the 1st record it works fine, but for next record and records after
> that, i get null. where could i go wrong??
> Regards,
> Raakhi
>