You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sameer Gupta (JIRA)" <ji...@apache.org> on 2015/09/30 16:30:05 UTC
[jira] [Commented] (MAPREDUCE-2254) Allow setting of end-of-record
delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936894#comment-14936894 ]
Sameer Gupta commented on MAPREDUCE-2254:
-----------------------------------------
This is such a critical feature, wonder why this has been stuck for so many years.
> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Ahmed Radwan
> Assignee: Ahmed Radwan
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2245.patch, MAPREDUCE-2254_r2.patch, MAPREDUCE-2254_r3.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)