You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/12/01 07:11:20 UTC

[jira] Resolved: (MAPREDUCE-1255) How to write a custom input format and record reader to read multiple lines of text from files

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved MAPREDUCE-1255.
------------------------------------

    Resolution: Invalid

Hi Kunal,

JIRA is meant for issue tracking, not questions. Please email the common-user or mapreduce-user mailing list with your question.

Thanks.

> How to write a custom input format and record reader to read multiple lines of text from files
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1255
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1255
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>    Affects Versions: 0.20.1
>         Environment: Ubuntu, 32 bit system. Apache hadoop 0.20.1
>            Reporter: Kunal Gupta
>            Priority: Minor
>
> Can someone explain how to override the "FileInputFormat" and "RecordReader" in order to be able to read multiple lines of text from input files in a single map task?
> Here the key will be the offset of the first line of text and value will be the N lines of text. 
> I have overridden the class FileInputFormat:
> public class MultiLineFileInputFormat
> 	extends FileInputFormat<LongWritable, Text>{
> ...
> }
> and implemented the abstract method:
> public RecordReader createRecordReader(InputSplit split,
>                 TaskAttemptContext context)
>          throws IOException, InterruptedException {...}
> I have also overridden the recordreader class:
> public class MultiLineFileRecordReader extends RecordReader<LongWritable, Text>
> {...}
> and in the job configuration, specified this new InputFormat class:
> job.setInputFormatClass(MultiLineFileInputFormat.class);
> When I  run this new map/reduce program, i get the following java error:
> Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.<init>()
> 	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
> 	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> 	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
> 	at CustomRecordReader.main(CustomRecordReader.java:257)
> Caused by: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.<init>()
> 	at java.lang.Class.getConstructor0(Class.java:2706)
> 	at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> 	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
> 	... 5 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.