You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/04/04 21:51:32 UTC

[jira] Assigned: (HADOOP-1204) Re-factor InputFormat/RecordReader related classes

     [ https://issues.apache.org/jira/browse/HADOOP-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi reassigned HADOOP-1204:
----------------------------------

    Assignee: Runping Qi

> Re-factor InputFormat/RecordReader related classes
> --------------------------------------------------
>
>                 Key: HADOOP-1204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1204
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> This Jira is the first small step to unify the code related to the inputformat/record readers for streaming 
> with the Hadoop main framework.
> This Jira does a few things to clean up the related parts in the Hadoop main framework.
> 1. Add a constructor 
>        public LineRecordReader(Configuration job, FileSplit split)
> to LineRecordReader. This makes the constructors of both SequenceFileRecordReader and LineRecordReader
> have the same signature. This facilitates to have a factory class to create various record readers when 
> we bring in the class readers classes for hadoop streaming to the main framework.
> 2. Implementded next() method using the following newly added protected method to LineRecordReader class:
>      protected long readLine() throws IOException {
>          return LineRecordReader.readLine(in, buffer);
>      }
>     This allows the user to easily overwrite the readLine logic to use different line breaker (e.g. treat '\r' as part of data, not line breaker).
> 3. Rename class InputFormatBase to FileInputFormat to better reflect the functionality of the class.
> To keep backward compatible, still keep InputFormatBase class, but make it deprecated shallow class simply inheriting FileInputFormat .
> 4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.