You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/09/20 09:01:44 UTC

[jira] Created: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Race condition in JVM reuse when more than one slot becomes free
----------------------------------------------------------------

                 Key: HADOOP-4232
                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0
            Reporter: Devaraj Das
            Assignee: Devaraj Das
            Priority: Blocker
             Fix For: 0.19.0


A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635678#action_12635678 ] 

Arun C Murthy commented on HADOOP-4232:
---------------------------------------

+1 with minor caveats:

1. We could add more information to the RuntimeException in JvmManagerType.reapJvm: no. of current jvms, their jobids etc. This will help in debugging, maybe a comment explaining the need for that exception too might help?
2. TaskTracker.TaskInProgress.launchTask should log information such as the Task's state etc. if we did not launch it (i.e. state != UNASSIGNED) just to aid debugging if need be.


> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4232:
--------------------------------

    Status: Patch Available  (was: Open)

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4232:
--------------------------------

    Status: Open  (was: Patch Available)

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4232:
--------------------------------

    Attachment: 4232-after-review.patch

Attaching patch with logging for two cases..

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4232:
--------------------------------

    Attachment: 4232.patch

Attaching a patch that addresses the issue and cleans up the JVM management a fair bit.
1) The target JVM is reserved in JvmManager.reapJvm and JvmManager.spawnNewJvm. This makes it very predictable (as to which JVM will run which task) 
2) The above makes the TaskTracker.getTask simple. We just need to look up the mapping created earlier as opposed to iterating over the set of INITIALIZED tasks.
3) Due to the above, the task state, INITIALIZED, is now redundant and has been removed.
4) Tweaked the logic for the JVM sleep when there is nothing to execute a bit. Now the JVM goes to a longer sleep (1500 millisec) if it doesn't get a task to run even after 5 consecutive calls to getTask (at 500 millisec interval each). This strategy helps a lot when there are small tasks and the JVM just misses a task because it isn't initialized yet, and goes to a long sleep..


> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4232:
----------------------------------

    Status: Open  (was: Patch Available)

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4232:
----------------------------------

    Status: Patch Available  (was: Open)

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4232:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Devaraj!

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635888#action_12635888 ] 

Hadoop QA commented on HADOOP-4232:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12391205/4232-after-review.patch
  against trunk revision 700548.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3405/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3405/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3405/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3405/console

This message is automatically generated.

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4232:
--------------------------------

    Status: Patch Available  (was: Open)

Passing through hudson

> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4232) Race condition in JVM reuse when more than one slot becomes free

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636030#action_12636030 ] 

Hudson commented on HADOOP-4232:
--------------------------------

Integrated in Hadoop-trunk #620 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/620/])
    . Fix race condition in JVM reuse when multiple slots become free. Contributed by Devaraj Das.


> Race condition in JVM reuse when more than one slot becomes free
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4232-after-review.patch, 4232.patch
>
>
> A race condition exists where there are two or more slots free and there are two or more tasks waiting to run. As an example, consider a case where there are two free slots and there are two tasks waiting to run. JVM_job1 and JVM_job2 are the two idle jvms in memory. A waiting task, task job1_t1, kills the JVM_job2 and spawns a new one, JVM_1_job1. While JVM_1_job1 is initializing (it is marked busy during initialization), JVM_job1 picks this task up and hence this becomes busy as well. Another waiting task, job3_t1 finds both the JVMs busy and doesn't spawn a new JVM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.