You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Ahmed Radwan (JIRA)" <ji...@apache.org> on 2011/01/11 02:10:46 UTC

[jira] Created: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Allow setting of end-of-record delimiter for TextInputFormat
------------------------------------------------------------

                 Key: HADOOP-7096
                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Ahmed Radwan
         Attachments: 2.patch

The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981118#action_12981118 ] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12468214/HADOOP-7096.patch
  against trunk revision 1058343.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/179//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/179//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/179//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7096:
--------------------------------

    Attachment: hadoop-7096_r4.patch

Ahmed's patch accidentally reformatted other parts of the file. Here's the same code but formatted according to our guidelines.

I also changed the new methods to private instead of public and modified the javadoc a little bit. Can you check that these changes look good to you, Ahmed?

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991712#comment-12991712 ] 

Todd Lipcon commented on HADOOP-7096:
-------------------------------------

+1 then. I plan on committing this in the next day or two unless there are any objections.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Attachment: HADOOP-7096_r3.patch

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Status: Patch Available  (was: Open)

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992437#comment-12992437 ] 

Hudson commented on HADOOP-7096:
--------------------------------

Integrated in Hadoop-Common-trunk #601 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/601/])
    HADOOP-7096. Allow setting of end-of-record delimiter for TextInputFormat. Contributed by Ahmed Radwan.


> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Attachment: HADOOP-7096.patch

Used --no-prefix when generating the git diffs to solve the apply patch command problem.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981104#action_12981104 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

Updated the patch to remove the extra prefixes (generated by git diff) causing the problem with patch command.

Uploaded the patch to review board for easier review. 
https://reviews.apache.org/r/312/

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-7096:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.0
         Assignee: Ahmed Radwan
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed to trunk, thanks Ahmed!

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989411#comment-12989411 ] 

Todd Lipcon commented on HADOOP-7096:
-------------------------------------

A few style nits:
- after recordDelimiterBytes, need a blank line before the javadoc starts
- the comment on this line:
{code}
    } else { //recordDelimiterBytes != null, use default delimite
{code}
seems like it's backwards - don't you mean "== null" in the else clause? Also typo "delimite"
- Worth considering splitting readLine into two methods, one for the true case, one for the false case
- Can you make recordDelimiterBytes final?
- In the MAX_VALUE case, for custom delimiter, the IOE should say "Too many bytes before delimiter", rather than "before newline"


Although there aren't currently any unit tests to test LineReader in Common, it would be really great if you had time to add a couple. Otherwise we can probably consider this as being tested by the new unit inputformat tests in MAPREDUCE-2254.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981116#action_12981116 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

The review board has a problem with uploading git diff files. Attached an updated file.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Description: 
The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').



  was:
The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').




> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991654#comment-12991654 ] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470509/HADOOP-7096_r3.patch
  against trunk revision 1066284.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Attachment: HADOOP-7096_r2.patch

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992356#comment-12992356 ] 

Hudson commented on HADOOP-7096:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #497 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/497/])
    

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982535#action_12982535 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

The INFRA-3357 reviewboard issue I mentioned above was resolved (the three new git repositories were added). I Uploaded the diff files to the reviewboard for easier review.
https://reviews.apache.org/r/329/
https://reviews.apache.org/r/328/

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991705#comment-12991705 ] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470517/hadoop-7096_r4.patch
  against trunk revision 1066284.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/223//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/223//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/223//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991706#comment-12991706 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

I have revised it, it looks good for me. Thanks Todd.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Attachment:     (was: 2.patch)

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981121#action_12981121 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

The test case is included with the main patch (MAPREDUCE-2254). No test required here, since it is just changing few private fields to protected to allow extension in the main patch.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991644#comment-12991644 ] 

Ahmed Radwan commented on HADOOP-7096:
--------------------------------------

Thanks Todd for your comments. I have addressed them, please review the attached revised patch. For the test case, I previously added a LineRecordReader testcase that covers the cases when using user-specified and default delimiters. 

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Ahmed Radwan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Attachment: 2.patch

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988351#action_12988351 ] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12469722/HADOOP-7096_r2.patch
  against trunk revision 1064919.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/208//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/208//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/208//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979942#action_12979942 ] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12467949/2.patch
  against trunk revision 1057455.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/168//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.