You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2009/09/18 00:37:57 UTC
[jira] Commented: (MAPREDUCE-1000) JobHistory.initDone() should
retain the try ... catch in the body
[ https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756826#action_12756826 ]
Hong Tang commented on MAPREDUCE-1000:
--------------------------------------
MAPREDUCE-157 changed JobHistory.initDone() and removed the try...catch clause of the body. The try...catch body is necessary because otherwise, if an IOE is thrown during the execution, JT would be aborted. I observed it when testing MAPREDUCE-728.
Symptom:
{noformat}
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/hadoop-mapred-0.21.0-dev/logs/history/job_200904211745_0010_geek5 at 523264
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:221)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:72)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:45)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:97)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:220)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:143)
at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:55)
at org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1203)
at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveToDoneNow(JobHistory.java:338)
at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveOldFiles(JobHistory.java:372)
at org.apache.hadoop.mapreduce.jobhistory.JobHistory.initDone(JobHistory.java:145)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:3900)
at org.apache.hadoop.mapred.SimulatorJobTracker.<init>(SimulatorJobTracker.java:80)
{noformat}
The previous run of the JT was killed, which leaves the job history file mismatching with CRC checksum.
The selected patch segment that shows the removal of the try...catch clause:
Before MAPREDUCE-157
{noformat}
- static boolean initDone(JobConf conf, FileSystem fs){
- try {
- //if completed job history location is set, use that
- String doneLocation = conf.
- get("mapred.job.tracker.history.completed.location");
- if (doneLocation != null) {
- DONE = fs.makeQualified(new Path(doneLocation));
- DONEDIR_FS = fs;
- } else {
- DONE = new Path(LOG_DIR, "done");
- DONEDIR_FS = LOGDIR_FS;
- }
-
- //If not already present create the done folder with appropriate
- //permission
- if (!DONEDIR_FS.exists(DONE)) {
- LOG.info("Creating DONE folder at "+ DONE);
- if (! DONEDIR_FS.mkdirs(DONE,
- new FsPermission(HISTORY_DIR_PERMISSION))) {
- throw new IOException("Mkdirs failed to create " + DONE.toString());
- }
- }
-
- fileManager.start();
- //move the log files remaining from last run to the DONE folder
- //suffix the file name based on Jobtracker identifier so that history
- //files with same job id don't get over written in case of recovery.
- FileStatus[] files = LOGDIR_FS.listStatus(new Path(LOG_DIR));
- String jtIdentifier = fileManager.jobTracker.getTrackerIdentifier();
- String fileSuffix = "." + jtIdentifier + OLD_SUFFIX;
- for (FileStatus fileStatus : files) {
- Path fromPath = fileStatus.getPath();
- if (fromPath.equals(DONE)) { //DONE can be a subfolder of log dir
- continue;
- }
- LOG.info("Moving log file from last run: " + fromPath);
- Path toPath = new Path(DONE, fromPath.getName() + fileSuffix);
- fileManager.moveToDoneNow(fromPath, toPath);
- }
- } catch(IOException e) {
- LOG.error("Failed to initialize JobHistory log file", e);
- disableHistory = true;
- }
- return !(disableHistory);
- }
{noformat}
After MAPREDUCE-157
{noformat}
+ /** Initialize the done directory and start the history cleaner thread */
+ public void initDone(JobConf conf, FileSystem fs) throws IOException {
+ //if completed job history location is set, use that
+ String doneLocation =
+ conf.get("mapred.job.tracker.history.completed.location");
+ if (doneLocation != null) {
+ done = fs.makeQualified(new Path(doneLocation));
+ doneDirFs = fs;
+ } else {
+ done = logDirFs.makeQualified(new Path(logDir, "done"));
+ doneDirFs = logDirFs;
+ }
+
+ //If not already present create the done folder with appropriate
+ //permission
+ if (!doneDirFs.exists(done)) {
+ LOG.info("Creating DONE folder at "+ done);
+ if (! doneDirFs.mkdirs(done,
+ new FsPermission(HISTORY_DIR_PERMISSION))) {
+ throw new IOException("Mkdirs failed to create " + done.toString());
+ }
+ }
+ LOG.info("Inited the done directory to " + done.toString());
+
+ moveOldFiles();
+ startFileMoverThreads();
+
+ // Start the History Cleaner Thread
+ long maxAgeOfHistoryFiles = conf.getLong(
+ "mapreduce.cluster.jobhistory.maxage", DEFAULT_HISTORY_MAX_AGE);
+ historyCleanerThread = new HistoryCleaner(maxAgeOfHistoryFiles);
+ historyCleanerThread.start();
+ }
{noformat}
> JobHistory.initDone() should retain the try ... catch in the body
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-1000
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Hong Tang
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.