You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2011/08/05 03:34:27 UTC

[jira] [Created] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

JobSplitWriter.java can't handle large job.split file
-----------------------------------------------------

                 Key: MAPREDUCE-2779
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: job submission
            Reporter: Ming Ma


We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.

In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.


writeNewSplits
...
        int prevCount = out.size();
...
        int currCount = out.size();

writeOldSplits
...
      long offset = out.size();
...
      int currLen = out.size();


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Konstantin Shvachko (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------

    Attachment: MAPREDUCE-2779-trunk.patch
    
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118371#comment-13118371 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1013 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1013/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118355#comment-13118355 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #32 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/32/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118807#comment-13118807 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #26 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/26/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117844#comment-13117844 ] 

Hadoop QA commented on MAPREDUCE-2779:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12497108/MAPREDUCE-2779-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//console

This message is automatically generated.
                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118338#comment-13118338 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #993 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/993/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118343#comment-13118343 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1071 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1071/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101649#comment-13101649 ] 

Ming Ma commented on MAPREDUCE-2779:
------------------------------------

Arun, the bug is still in the trunk. Thanks.

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117797#comment-13117797 ] 

Hadoop QA commented on MAPREDUCE-2779:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12497098/MAPREDUCE-2779-0.22.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/903//console

This message is automatically generated.
                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Konstantin Shvachko (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.24.0
                   0.23.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

I just committed this to 0.22, 0.23, and trunk.
Thank you Ming.
                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated MAPREDUCE-2779:
-------------------------------

    Attachment: MAPREDUCE-2779-0.22.patch

Here is the patch for 0.22. It passes all unit tests except for known buggy test.

[junit] Test org.apache.hadoop.raid.TestRaidNode FAILED



Note, the previous patch for trunk is no longer applicable to trunk, given there is a major restructuring in trunk since.
                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095521#comment-13095521 ] 

Arun C Murthy commented on MAPREDUCE-2779:
------------------------------------------

Is this tested against 0.20.205 and trunk?

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118361#comment-13118361 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #846 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/846/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------

    Fix Version/s: 0.22.0

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated MAPREDUCE-2779:
-------------------------------

    Affects Version/s: 0.23.0
                       0.22.0

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118786#comment-13118786 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #817 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/817/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated MAPREDUCE-2779:
-------------------------------

    Affects Version/s: 0.20.205.0
               Status: Patch Available  (was: Open)

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118612#comment-13118612 ] 

Hudson commented on MAPREDUCE-2779:
-----------------------------------

Integrated in Hadoop-Mapreduce-22-branch #79 (See [https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/79/])
    MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177787
Files : 
* /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
* /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0, 0.23.0, 0.24.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned MAPREDUCE-2779:
----------------------------------------

    Assignee: Arun C Murthy

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117838#comment-13117838 ] 

Konstantin Shvachko commented on MAPREDUCE-2779:
------------------------------------------------

Adjusted the patch for the new trunk.
                
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Joep Rottinghuis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080105#comment-13080105 ] 

Joep Rottinghuis commented on MAPREDUCE-2779:
---------------------------------------------

Patch looks good.
Affects 0.20-security-* branches as well.

FSDataOutputStream.getPos is not thread safe but then again DataOutPutStream.size does not seem to be thread safe either.
Even through the DataOutPutStream.write method is synchronized, FSDataOutputStrem.write is not synchronized.
This does not seem to be an issue in the current code path because createSplitFiles does not expose out.


> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096548#comment-13096548 ] 

Ming Ma commented on MAPREDUCE-2779:
------------------------------------

It is tested on 0.20-security-* branches. Testing on 0.22 will be conducted later.

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned MAPREDUCE-2779:
----------------------------------------

    Assignee: Ming Ma  (was: Arun C Murthy)

Sorry, hit the wrong button - assigning to Ming.

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101483#comment-13101483 ] 

Arun C Murthy commented on MAPREDUCE-2779:
------------------------------------------

Ming, I can put this into 0.20.205 only after commit to trunk... unless this issue doesn't exist in trunk. Help, pls?

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated MAPREDUCE-2779:
-------------------------------

    Attachment: MAPREDUCE-2779-trunk.patch

> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2779
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>            Reporter: Ming Ma
>         Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
>         int prevCount = out.size();
> ...
>         int currCount = out.size();
> writeOldSplits
> ...
>       long offset = out.size();
> ...
>       int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira