You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Mark Fuhs (JIRA)" <ji...@apache.org> on 2012/11/08 20:56:13 UTC

[jira] [Created] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Mark Fuhs created MAPREDUCE-4782:
------------------------------------

             Summary: NLineInputFormat skips first line of last InputSplit
                 Key: MAPREDUCE-4782
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: client
    Affects Versions: 0.22.0, trunk
         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.setNumLinesPerSplit(job, 100);
NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
            Reporter: Mark Fuhs


NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.

After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.

This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494668#comment-13494668 ] 

Hudson commented on MAPREDUCE-4782:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1222 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1222/])
    MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java

                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Mark Fuhs (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Fuhs updated MAPREDUCE-4782:
---------------------------------

    Attachment: MAPREDUCE-4782.patch

I confess I'm not terribly familiar with git, so this is just a "git diff".
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493571#comment-13493571 ] 

Hadoop QA commented on MAPREDUCE-4782:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12552709/MR-4782.txt
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

                  org.apache.hadoop.mapred.TestClusterMRNotification

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//console

This message is automatically generated.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

     Target Version/s: 0.23.5
    Affects Version/s: 0.23.0
                       1.0.0
                       2.0.0-alpha
               Status: Patch Available  (was: Open)
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.0.0-alpha, 1.0.0, 0.23.0, 0.22.0, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.5
                   2.0.3-alpha
                   3.0.0
                   1.2.0
                   1.1.1
           Status: Resolved  (was: Patch Available)

Thanks Mark,

This is a great catch, I just wish we had found it sooner.  I put this into trunk, branch-2, branch-0.23, branch-1, and branch-1.1.

If I missed any branches that people want it in please let me know and I will see what I can do.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

    Priority: Blocker  (was: Critical)
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Mark Fuhs (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494115#comment-13494115 ] 

Mark Fuhs commented on MAPREDUCE-4782:
--------------------------------------

I'm glad I could contribute!
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494059#comment-13494059 ] 

Jason Lowe commented on MAPREDUCE-4782:
---------------------------------------

+1, thanks Mark and Bobby.  Bobby or Matt, feel free to commit.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494163#comment-13494163 ] 

Hudson commented on MAPREDUCE-4782:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1252 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/])
    MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java

                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493465#comment-13493465 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:
------------------------------------------------

Marked this a critical as data loss is serious.  Mark can you post your patch?
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494080#comment-13494080 ] 

Hudson commented on MAPREDUCE-4782:
-----------------------------------

Integrated in Hadoop-trunk-Commit #2988 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2988/])
    MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java

                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494618#comment-13494618 ] 

Hudson commented on MAPREDUCE-4782:
-----------------------------------

Integrated in Hadoop-Yarn-trunk #32 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/32/])
    MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java

                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

    Priority: Critical  (was: Major)
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

    Attachment: MR-4782.txt

I was able to reproduce the issue, and I have updated the test case to reproduce it as well.  The original test case did not check the last split, I don't know why.  I also found out that this exists in branch-1 as well. 
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494052#comment-13494052 ] 

Hadoop QA commented on MAPREDUCE-4782:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt
  against trunk revision .

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console

This message is automatically generated.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-4782:
----------------------------------

    Fix Version/s:     (was: 1.2.0)
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494650#comment-13494650 ] 

Hudson commented on MAPREDUCE-4782:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #431 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/431/])
    svn merge -c 1407505 FIXES: MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407507)

     Result = UNSTABLE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407507
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java

                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>             Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494050#comment-13494050 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:
------------------------------------------------

Also now that I think about it more this really is a Blocker, not a critical.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Blocker
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Graves reassigned MAPREDUCE-4782:
----------------------------------------

    Assignee: Mark Fuhs
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493669#comment-13493669 ] 

Matt Foley commented on MAPREDUCE-4782:
---------------------------------------

Nasty.  Could you please port to branch-1 and I'll include it in the next release?
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4782:
-------------------------------------------

    Attachment: MR-4782-branch-1.txt

Patch for branch-1.  The patch is identical to the one for trunk except for line numbers and the location of the files.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Assignee: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493584#comment-13493584 ] 

Robert Joseph Evans commented on MAPREDUCE-4782:
------------------------------------------------

The patch looks good to me I am +1 on it, but I added in the test, so if someone else could take a look I would appreciate it.
                
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-4782:
----------------------------------

    Target Version/s: 1.1.1, 0.23.5  (was: 0.23.5)
    
> NLineInputFormat skips first line of last InputSplit
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-4782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
>         Environment: job.setMapperClass(Mapper.class);  // just pass text lines through to output
> job.setInputFormatClass(NLineInputFormat.class);
> NLineInputFormat.setNumLinesPerSplit(job, 100);
> NLineInputFormat.setInputPaths(job, "/path/to/a_file_with_many_lines.txt");
>            Reporter: Mark Fuhs
>            Priority: Critical
>         Attachments: MAPREDUCE-4782.patch, MR-4782.txt
>
>
> NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
> After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
> This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira