You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2009/09/18 00:37:57 UTC

[jira] Commented: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756826#action_12756826 ] 

Hong Tang commented on MAPREDUCE-1000:
--------------------------------------

MAPREDUCE-157 changed JobHistory.initDone() and removed the try...catch clause of the body. The try...catch body is necessary because otherwise, if an IOE is thrown during the execution, JT would be aborted. I observed it when testing MAPREDUCE-728.

Symptom:
{noformat}
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/Users/htang/Documents/Work/workspace/hadoop-mapreduce/build/hadoop-mapred-0.21.0-dev/logs/history/job_200904211745_0010_geek5 at 523264
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:221)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:72)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:45)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:97)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:220)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:143)
        at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:55)
        at org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1203)
        at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveToDoneNow(JobHistory.java:338)
        at org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveOldFiles(JobHistory.java:372)
        at org.apache.hadoop.mapreduce.jobhistory.JobHistory.initDone(JobHistory.java:145)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:3900)
        at org.apache.hadoop.mapred.SimulatorJobTracker.<init>(SimulatorJobTracker.java:80)
{noformat}

The previous run of the JT was killed, which leaves the job history file mismatching with CRC checksum.

The selected patch segment that shows the removal of the try...catch clause:

Before MAPREDUCE-157
{noformat}
-  static boolean initDone(JobConf conf, FileSystem fs){
-    try {
-      //if completed job history location is set, use that
-      String doneLocation = conf.
-                       get("mapred.job.tracker.history.completed.location");
-      if (doneLocation != null) {
-        DONE = fs.makeQualified(new Path(doneLocation));
-        DONEDIR_FS = fs;
-      } else {
-        DONE = new Path(LOG_DIR, "done");
-        DONEDIR_FS = LOGDIR_FS;
-      }
-
-      //If not already present create the done folder with appropriate 
-      //permission
-      if (!DONEDIR_FS.exists(DONE)) {
-        LOG.info("Creating DONE folder at "+ DONE);
-        if (! DONEDIR_FS.mkdirs(DONE, 
-            new FsPermission(HISTORY_DIR_PERMISSION))) {
-          throw new IOException("Mkdirs failed to create " + DONE.toString());
-        }
-      }
-
-      fileManager.start();
-      //move the log files remaining from last run to the DONE folder
-      //suffix the file name based on Jobtracker identifier so that history
-      //files with same job id don't get over written in case of recovery.
-      FileStatus[] files = LOGDIR_FS.listStatus(new Path(LOG_DIR));
-      String jtIdentifier = fileManager.jobTracker.getTrackerIdentifier();
-      String fileSuffix = "." + jtIdentifier + OLD_SUFFIX;
-      for (FileStatus fileStatus : files) {
-        Path fromPath = fileStatus.getPath();
-        if (fromPath.equals(DONE)) { //DONE can be a subfolder of log dir
-          continue;
-        }
-        LOG.info("Moving log file from last run: " + fromPath);
-        Path toPath = new Path(DONE, fromPath.getName() + fileSuffix);
-        fileManager.moveToDoneNow(fromPath, toPath);
-      }
-    } catch(IOException e) {
-        LOG.error("Failed to initialize JobHistory log file", e); 
-        disableHistory = true;
-    }
-    return !(disableHistory);
-  }
{noformat}

After MAPREDUCE-157
{noformat}
+  /** Initialize the done directory and start the history cleaner thread */
+  public void initDone(JobConf conf, FileSystem fs) throws IOException {
+    //if completed job history location is set, use that
+    String doneLocation =
+      conf.get("mapred.job.tracker.history.completed.location");
+    if (doneLocation != null) {
+      done = fs.makeQualified(new Path(doneLocation));
+      doneDirFs = fs;
+    } else {
+      done = logDirFs.makeQualified(new Path(logDir, "done"));
+      doneDirFs = logDirFs;
+    }
+
+    //If not already present create the done folder with appropriate 
+    //permission
+    if (!doneDirFs.exists(done)) {
+      LOG.info("Creating DONE folder at "+ done);
+      if (! doneDirFs.mkdirs(done, 
+          new FsPermission(HISTORY_DIR_PERMISSION))) {
+        throw new IOException("Mkdirs failed to create " + done.toString());
+      }
+    }
+    LOG.info("Inited the done directory to " + done.toString());
+
+    moveOldFiles();
+    startFileMoverThreads();
+
+    // Start the History Cleaner Thread
+    long maxAgeOfHistoryFiles = conf.getLong(
+        "mapreduce.cluster.jobhistory.maxage", DEFAULT_HISTORY_MAX_AGE);
+    historyCleanerThread = new HistoryCleaner(maxAgeOfHistoryFiles);
+    historyCleanerThread.start();
+  }
{noformat}

> JobHistory.initDone() should retain the try ... catch in the body
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Hong Tang
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.