You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2008/11/05 20:23:44 UTC

[jira] Created: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

JVM Reuse triggers RuntimeException("Invalid state")
----------------------------------------------------

                 Key: HADOOP-4595
                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0
            Reporter: Aaron Kimball


A Reducer triggers the following exception:

08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
 JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
 JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
   at java.lang.Throwable.<init>(Throwable.java:67)
   at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
   at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
   at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 

Other clues:

In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4595:
--------------------------------

    Fix Version/s: 0.19.0

Marking this for 0.19

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4595:
--------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this.

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4595:
--------------------------------

    Status: Patch Available  (was: Open)

I anyway did run the tests locally. Everything passed (including test-patch).

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4595:
--------------------------------

    Attachment: 4595.patch

This patch fixes a race condition in updating free slot count when the load is high (leading to lost TTs). When a TT reinits, the TaskLauncher object is created again. A task that is currently running might end up incrementing the free slots of the new TaskLauncher object if it takes time to exit. This would lead to the behavior described by Aaron in the bug report. The patch fixes this by moving all code to do with incrementing free slots to one method and is done inline in TaskInProgress.kill

In addition, the patch fixes a race condition to do with starting MapEventsFetcher thread. The thread starts the loop after looking at TaskTracker.running flag. However, when a TT reinits, the running field is set to true only after the thread is spawned. If the thread is immediately scheduled, it will find running false and exit. This would lead to hung reduces.

I also cleaned up some code to do with TIP.cleanup during a task launch.


> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646226#action_12646226 ] 

Sharad Agarwal commented on HADOOP-4595:
----------------------------------------

+1

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reassigned HADOOP-4595:
-----------------------------------

    Assignee: Devaraj Das

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646222#action_12646222 ] 

Devaraj Das commented on HADOOP-4595:
-------------------------------------

The javadoc warning is due to HADOOP-4621

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645299#action_12645299 ] 

Aaron Kimball commented on HADOOP-4595:
---------------------------------------

Note that both existing JVMs have their 'busy' status as true. But numFreeSlots was > 0; the preceeding log entry was:

2008-11-05 08:57:55,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_200811040110_0230_r_000011_1


At the top of TaskTracker.run() , it waits for numFreeSlots to be > 0, then calls startNewTask(tip) which eventually calls TaskRunner.run() -> JvmManager.launchJvm() -> reapJvm(). numFreeSlots seems to be properly synchronized, as does JvmManager.JvmRunner.busy. So I think the error  may be that JvmRunner.busy isn't being set to false (by JvmRunner.taskRan()) before addFreeReduceSlot() is being called.  I can't figure out where these two methods (or other ones that work on the same state) are being called in relation to one another though.


> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645379#action_12645379 ] 

devaraj edited comment on HADOOP-4595 at 11/5/08 5:46 PM:
--------------------------------------------------------------

bq. Note that both existing JVMs have their 'busy' status as true. But numFreeSlots was > 0; 
I assume you mean that the number of reduce slots in the TaskTracker is 2.

bq. the preceeding log entry was: 2008-11-05 08:57:55,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_200811040110_0230_r_000011_1
Did you notice whether this attempt got launched at all? Did it get killed immediately or something?

bq. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.
I didn't understand this part. Could you please explain?

      was (Author: devaraj):
    bq. Note that both existing JVMs have their 'busy' status as true. But numFreeSlots was > 0; 
I assume you mean that the number of slots is 2.

bq. the preceeding log entry was: 2008-11-05 08:57:55,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_200811040110_0230_r_000011_1
Did you notice whether this attempt got launched at all? Did it get killed immediately or something?

bq. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.
I didn't understand this part. Could you please explain?
  
> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646219#action_12646219 ] 

Hadoop QA commented on HADOOP-4595:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393608/4595.patch
  against trunk revision 712615.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3567/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3567/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3567/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3567/console

This message is automatically generated.

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 4595.patch
>
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645379#action_12645379 ] 

Devaraj Das commented on HADOOP-4595:
-------------------------------------

bq. Note that both existing JVMs have their 'busy' status as true. But numFreeSlots was > 0; 
I assume you mean that the number of slots is 2.

bq. the preceeding log entry was: 2008-11-05 08:57:55,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_200811040110_0230_r_000011_1
Did you notice whether this attempt got launched at all? Did it get killed immediately or something?

bq. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.
I didn't understand this part. Could you please explain?

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4595) JVM Reuse triggers RuntimeException("Invalid state")

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645389#action_12645389 ] 

Aaron Kimball commented on HADOOP-4595:
---------------------------------------

There are two reducers allowed per node; maxFreeSlots is 2. But it was in the middle of running tasks. numFreeSlots hit 1, even though two JVMs were already out there and marked as busy.

The attempt did not get launched. The launch process was what triggered the exception. 

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.