You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2011/08/05 03:34:27 UTC
[jira] [Created] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
JobSplitWriter.java can't handle large job.split file
-----------------------------------------------------
Key: MAPREDUCE-2779
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: job submission
Reporter: Ming Ma
We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
writeNewSplits
...
int prevCount = out.size();
...
int currCount = out.size();
writeOldSplits
...
long offset = out.size();
...
int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Konstantin Shvachko (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------
Attachment: MAPREDUCE-2779-trunk.patch
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118371#comment-13118371 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk-Commit #1013 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1013/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118355#comment-13118355 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Mapreduce-0.23-Build #32 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/32/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118807#comment-13118807 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Hdfs-0.23-Build #26 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/26/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117844#comment-13117844 ]
Hadoop QA commented on MAPREDUCE-2779:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12497108/MAPREDUCE-2779-trunk.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//console
This message is automatically generated.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118338#comment-13118338 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Common-trunk-Commit #993 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/993/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118343#comment-13118343 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk-Commit #1071 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1071/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101649#comment-13101649 ]
Ming Ma commented on MAPREDUCE-2779:
------------------------------------
Arun, the bug is still in the trunk. Thanks.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117797#comment-13117797 ]
Hadoop QA commented on MAPREDUCE-2779:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12497098/MAPREDUCE-2779-0.22.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/903//console
This message is automatically generated.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Konstantin Shvachko (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------
Resolution: Fixed
Fix Version/s: 0.24.0
0.23.0
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
I just committed this to 0.22, 0.23, and trunk.
Thank you Ming.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Ming Ma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated MAPREDUCE-2779:
-------------------------------
Attachment: MAPREDUCE-2779-0.22.patch
Here is the patch for 0.22. It passes all unit tests except for known buggy test.
[junit] Test org.apache.hadoop.raid.TestRaidNode FAILED
Note, the previous patch for trunk is no longer applicable to trunk, given there is a major restructuring in trunk since.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095521#comment-13095521 ]
Arun C Murthy commented on MAPREDUCE-2779:
------------------------------------------
Is this tested against 0.20.205 and trunk?
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118361#comment-13118361 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk #846 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/846/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated MAPREDUCE-2779:
-------------------------------------------
Fix Version/s: 0.22.0
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated MAPREDUCE-2779:
-------------------------------
Affects Version/s: 0.23.0
0.22.0
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.22.0, 0.23.0
> Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118786#comment-13118786 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk #817 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/817/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated MAPREDUCE-2779:
-------------------------------
Affects Version/s: 0.20.205.0
Status: Patch Available (was: Open)
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118612#comment-13118612 ]
Hudson commented on MAPREDUCE-2779:
-----------------------------------
Integrated in Hadoop-Mapreduce-22-branch #79 (See [https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/79/])
MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177787
Files :
* /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
* /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0, 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy reassigned MAPREDUCE-2779:
----------------------------------------
Assignee: Arun C Murthy
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Arun C Murthy
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117838#comment-13117838 ]
Konstantin Shvachko commented on MAPREDUCE-2779:
------------------------------------------------
Adjusted the patch for the new trunk.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-0.22.patch, MAPREDUCE-2779-trunk.patch, MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Joep Rottinghuis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080105#comment-13080105 ]
Joep Rottinghuis commented on MAPREDUCE-2779:
---------------------------------------------
Patch looks good.
Affects 0.20-security-* branches as well.
FSDataOutputStream.getPos is not thread safe but then again DataOutPutStream.size does not seem to be thread safe either.
Even through the DataOutPutStream.write method is synchronized, FSDataOutputStrem.write is not synchronized.
This does not seem to be an issue in the current code path because createSplitFiles does not expose out.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.22.0, 0.23.0
> Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096548#comment-13096548 ]
Ming Ma commented on MAPREDUCE-2779:
------------------------------------
It is tested on 0.20-security-* branches. Testing on 0.22 will be conducted later.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy reassigned MAPREDUCE-2779:
----------------------------------------
Assignee: Ming Ma (was: Arun C Murthy)
Sorry, hit the wrong button - assigning to Ming.
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't
handle large job.split file
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101483#comment-13101483 ]
Arun C Murthy commented on MAPREDUCE-2779:
------------------------------------------
Ming, I can put this into 0.20.205 only after commit to trunk... unless this issue doesn't exist in trunk. Help, pls?
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle
large job.split file
Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated MAPREDUCE-2779:
-------------------------------
Attachment: MAPREDUCE-2779-trunk.patch
> JobSplitWriter.java can't handle large job.split file
> -----------------------------------------------------
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
> long offset = out.size();
> ...
> int currLen = out.size();
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira