You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/01/02 01:25:14 UTC
[jira] [Updated] (HADOOP-9168) The Naming and Inheritance for RecordReader, LineRecordReader, LineReader

     [ https://issues.apache.org/jira/browse/HADOOP-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated HADOOP-9168:
---------------------------------
      Description: 
I feel LineReader is not the correct name, since it reads up to a given delimiter.

How about Text Record Reader ?
Sounds correct but LineReader is not a RecordReader by inheritance,
but by functionality , yes it is the Record reader.

Now if we look at it with a different angle,


In General,
InputFormat would mostly has two responsibilities
1)To Read A split
2)Generate Key & Value pairs based upon the Reading done over Split.

Now in TextInputFormat,
Has a RecordReader, Which is inherited by LineRecordReader, 
which uses another class LineReader.

But We Have
LineReader, which does the reading of the file.
LineRecordReader generates key & Value. 

I would suggest,

RecordReader      to be renamed as     KeyValueGenerator,
LineRecordReader  to be renamed as     TextInputKeyValueGenerator,
LineReader        to be renamed as     delimitedTextReader,

Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to be abstracted to a class called RecordReader,
Since its all specific to reading of the given input.

delimitedTextReader class could extend RecordReader.

Now the names could make better scene. We must also look into computability as well. It might be un fit to deploy unless a new API is introduced.

  was:

I feel LineReader is not the correct name, since it reads up to a given delimiter.

How about Text Record Reader ?
Sounds correct but LineReader is not a RecordReader by inheritance,
but by functionality , yes it is the Record reader.

Now if we look at it with a different angle,


In General,
InputFormat would mostly has two responsibilities
1)To Read A split
2)Generate Key & Value pairs based upon the Reading done over Split.

Now in TextInputFormat,
Has a RecordReader, Which is inherited by LineRecordReader, 
which uses another class LineReader.

But We Have
LineReader, which does the reading of the file.
LineRecordReader generates key & Value. 

I would suggest,

RecordReader      to be renamed as     KeyValueGenerator,
LineRecordReader  to be renamed as     TextInputKeyValueGenerator,
LineReader        to be renamed as     delimitedTextReader,

Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to be abstracted to a class called RecordReader,
Since its all specific to reading of the given input.

delimitedTextReader class could extend RecordReader.

Now the names could make better scene. We must also look into computability as well. It might be un fit to deploy unless a new API is introduced.

    Fix Version/s:     (was: hudson)
                       (was: site)

> The Naming and Inheritance for RecordReader, LineRecordReader, LineReader 
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-9168
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9168
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.21.0, 2.0.2-alpha, 0.23.5
>            Reporter: Gelesh
>            Priority: Minor
>              Labels: Hadoop, InputFormat
>             Fix For: 0.23.2
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I feel LineReader is not the correct name, since it reads up to a given delimiter.
> How about Text Record Reader ?
> Sounds correct but LineReader is not a RecordReader by inheritance,
> but by functionality , yes it is the Record reader.
> Now if we look at it with a different angle,
> In General,
> InputFormat would mostly has two responsibilities
> 1)To Read A split
> 2)Generate Key & Value pairs based upon the Reading done over Split.
> Now in TextInputFormat,
> Has a RecordReader, Which is inherited by LineRecordReader, 
> which uses another class LineReader.
> But We Have
> LineReader, which does the reading of the file.
> LineRecordReader generates key & Value. 
> I would suggest,
> RecordReader      to be renamed as     KeyValueGenerator,
> LineRecordReader  to be renamed as     TextInputKeyValueGenerator,
> LineReader        to be renamed as     delimitedTextReader,
> Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to be abstracted to a class called RecordReader,
> Since its all specific to reading of the given input.
> delimitedTextReader class could extend RecordReader.
> Now the names could make better scene. We must also look into computability as well. It might be un fit to deploy unless a new API is introduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)