You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chelsey Chang (JIRA)" <ji...@apache.org> on 2013/07/19 23:20:48 UTC

[jira] [Updated] (MAPREDUCE-5406) Task Tracker exiting with JVM manager inconsistent state

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chelsey Chang updated MAPREDUCE-5406:
-------------------------------------

    Description: 
Looks like we are reaching JVM manager inconsistent state which cases TT to crash:
{code}
2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_201306080400_104812_m_000001_0 Number of active JVMs:8
  JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104517_m_000001_0
  JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104641_m_000000_0
  JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104494_m_000000_0
  JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104784_m_000000_0
  JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104530_m_000000_0
  JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104589_m_000000_0
  JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104674_m_000000_0
  JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104719_m_000000_0. Aborting. 
2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: 
{code}

Although this causes TT to crash, the frequency of the error is rare and the error itself is recoverable so the priority of the issue is not high.

However, this does look like a bug in the JVM manager state machine. I'm guessing there is some race condition that we're hitting.

(Logs attached)

  was:
Looks like we are reaching JVM manager inconsistent state which cases TT to crash:
{code}
2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_201306080400_104812_m_000001_0 Number of active JVMs:8
  JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104517_m_000001_0
  JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104641_m_000000_0
  JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104494_m_000000_0
  JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104784_m_000000_0
  JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104530_m_000000_0
  JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104589_m_000000_0
  JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104674_m_000000_0
  JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104719_m_000000_0. Aborting. 
2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: 
{code}

Although this causes TT to crash, the frequency of the error is rare and the error itself is recoverable so the priority of the issue is not high.

However, this does look like a bug in the JVM manager state machine. I'm guessing there is some race condition that we're hitting.

    
> Task Tracker exiting with JVM manager inconsistent state
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-5406
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5406
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Chelsey Chang
>            Assignee: Chelsey Chang
>         Attachments: hadoop-tasktracker-RD00155D61582F-short.log
>
>
> Looks like we are reaching JVM manager inconsistent state which cases TT to crash:
> {code}
> 2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_201306080400_104812_m_000001_0 Number of active JVMs:8
>   JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104517_m_000001_0
>   JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104641_m_000000_0
>   JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104494_m_000000_0
>   JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104784_m_000000_0
>   JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104530_m_000000_0
>   JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104589_m_000000_0
>   JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104674_m_000000_0
>   JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104719_m_000000_0. Aborting. 
> 2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: 
> {code}
> Although this causes TT to crash, the frequency of the error is rare and the error itself is recoverable so the priority of the issue is not high.
> However, this does look like a bug in the JVM manager state machine. I'm guessing there is some race condition that we're hitting.
> (Logs attached)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira