You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Xi Fang (JIRA)" <ji...@apache.org> on 2013/06/19 01:03:21 UTC

[jira] [Commented] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687346#comment-13687346 ] 

Xi Fang commented on MAPREDUCE-5330:
------------------------------------

If Signal.TERM is sent to a process, then we wait for a delay. But in Windows the signal kind is ignored - we just kill it (look at Shell#getSignalKillProcessGroupCommand())
{code}
  public static String[] getSignalKillProcessGroupCommand(int code,
                                                          String groupId) {
    if (WINDOWS) {
      return new String[] { Shell.WINUTILS, "task", "kill", groupId };
    } else {
      return new String[] { "kill", "-" + code , "-" + groupId };
    }
  }
{code}

Here is a fix. If the OS is Windows and the signal is TERM, then return immediately and let a delayed process killer actually kill this process group. This can give this process group a graceful time to clean up itself.
                
> Killing M/R JVM's leads to metrics not being uploaded
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5330
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>         Environment: Windows
>            Reporter: Xi Fang
>            Assignee: Xi Fang
>         Attachments: MAPREDUCE-5330.patch
>
>
> In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. 
> In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira