You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rakhi Khatwani <rk...@gmail.com> on 2010/01/29 14:04:36 UTC

Using custom Input Format

Hi,
        i have been trying to implement custom input and output formats. i
was successful enough in creating an custom output format. but whn i call a
mapreduce method which takes in a file, using custom input format, i get an
exception.
java.lang.NullPointerException
 at Beans.Content.write(Content.java:54)
 at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
 at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
 at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
 at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:32)
 at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:1)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
0 Hello World0 Hello World0 Hello World0

id:: null media:: null url:: null content::null

id:: null media:: null url:: null content::null
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
 at
CustomInputFormat.SampleCustomInputMapReduce.run(SampleCustomInputMapReduce.java:53)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at
CustomInputFormat.SampleCustomInputMapReduce.main(SampleCustomInputMapReduce.java:59)


I have attached the following files
Content => Custom object that implements writercomparible
ContentInputFormatV2 => inputformat which implements SequenceFileInputFormat
ContentRecordReader => implementation of RecordReader (Not required though,
it should work w/o it i assume).
SampleCustomInputMap => mapper class
SampleCustomInputReduce => reducer class
SampleCustomInputMapReduce => class which contains the main method and the
job configurations.
data and index => my inputfiles for the main function
For the 1st record it works fine, but for next record and records after
that, i get null. where could i go wrong??
Regards,
Raakhi

Re: Using custom Input Format

Posted by Antonio D'Ettole <co...@gmail.com>.
Rakhi,
I've recently had to implement a custom InputFormat that's pretty basic
(every split is a list of integers basically).
You can check it out here http://github.com/codazzo/MultiRow
One guy also implemented a custom InputFormat and wrote about it on his
blog http://codedemigod.com/blog/?p=120
Hope that helps.
Antonio

On Fri, Jan 29, 2010 at 2:04 PM, Rakhi Khatwani <rk...@gmail.com> wrote:

> Hi,
>         i have been trying to implement custom input and output formats. i
> was successful enough in creating an custom output format. but whn i call a
> mapreduce method which takes in a file, using custom input format, i get an
> exception.
> java.lang.NullPointerException
>  at Beans.Content.write(Content.java:54)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>  at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
>  at
> CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:32)
>  at CustomInputFormat.SampleCustomInputMap.map(SampleCustomInputMap.java:1)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 0 Hello World0 Hello World0 Hello World0
>
> id:: null media:: null url:: null content::null
>
> id:: null media:: null url:: null content::null
> java.io.IOException: Job failed!
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>  at
> CustomInputFormat.SampleCustomInputMapReduce.run(SampleCustomInputMapReduce.java:53)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at
> CustomInputFormat.SampleCustomInputMapReduce.main(SampleCustomInputMapReduce.java:59)
>
>
> I have attached the following files
> Content => Custom object that implements writercomparible
> ContentInputFormatV2 => inputformat which implements
> SequenceFileInputFormat
> ContentRecordReader => implementation of RecordReader (Not required though,
> it should work w/o it i assume).
> SampleCustomInputMap => mapper class
> SampleCustomInputReduce => reducer class
> SampleCustomInputMapReduce => class which contains the main method and the
> job configurations.
> data and index => my inputfiles for the main function
> For the 1st record it works fine, but for next record and records after
> that, i get null. where could i go wrong??
> Regards,
> Raakhi
>