You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/11/21 00:27:35 UTC

[jira] [Commented] (MAPREDUCE-5635) FileInputFormat does not specify how the file is split

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828281#comment-13828281 ] 

Jason Lowe commented on MAPREDUCE-5635:
---------------------------------------

FileInputFormat does not require that the file is a plain text file broken into lines with carriage-return or linefeed used as line delimiters.  That's what TextInputFormat is for.

FileInputFormat is an abstract class that makes no assumptions about how the data in the file is formatted.  Concrete implementations that derive from FileInputFormat must implement the getRecordReader method which will dictate how the records are read from the file and therefore what the format must be for that particular input format.

> FileInputFormat does not specify how the file is split
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: Does not matter.
>            Reporter: Pranay Varma
>
> Here is what the TextInputFormat javadoc says:
> [TextInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html]
> An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..
> FileInputFormat should say the same on
> [FileInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html]



--
This message was sent by Atlassian JIRA
(v6.1#6144)