You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@harmony.apache.org by "Gregory Shimansky (JIRA)" <ji...@apache.org> on 2007/01/13 23:28:27 UTC

[jira] Created: (HARMONY-3002) [drlvm] Race condition in leads to VM crash on SMP systems on HARMONY-2386

[drlvm] Race condition in leads to VM crash on SMP systems on HARMONY-2386
--------------------------------------------------------------------------

                 Key: HARMONY-3002
                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
             Project: Harmony
          Issue Type: Bug
          Components: DRLVM
         Environment: Linux ia32, windows ia32
            Reporter: Gregory Shimansky


I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HARMONY-3002) [drlvm] Race condition in leads to VM crash on SMP systems on HARMONY-2386

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467007 ] 

Gregory Shimansky commented on HARMONY-3002:
--------------------------------------------

I've reproduced the crash in 2 different places, but the problem in both cases seems to be an absent write barrier somewhere in lock reservation/unreservation code.

> [drlvm] Race condition in leads to VM crash on SMP systems on HARMONY-2386
> --------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Shimansky updated HARMONY-3002:
---------------------------------------

    Summary: [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386  (was: [drlvm] Race condition in leads to VM crash on SMP systems on HARMONY-2386)

> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Shimansky updated HARMONY-3002:
---------------------------------------

    Patch Info: [Patch Available]

Cool! I'll try to commit your patch and see if the problem goes away.

> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>         Assigned To: Gregory Shimansky
>         Attachments: wait_safe_region_event-removed-incorrect-early-return-condition.patch
>
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Shimansky reassigned HARMONY-3002:
------------------------------------------

    Assignee: Gregory Shimansky

> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>         Assigned To: Gregory Shimansky
>         Attachments: wait_safe_region_event-removed-incorrect-early-return-condition.patch
>
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Salikh Zakirov (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470925 ] 

Salikh Zakirov commented on HARMONY-3002:
-----------------------------------------

I've been trying to catch the root cause of these race condition by adding more asserts, and finally got assertion the following state

 Thread D:

	hythread_thin_monitor_exit() { 

		lockword = *lockword_ptr;	// 000D0800 = owned by D, reserved, recursion=1
		...
		assert(*lockword_ptr == lockword); // this was added by me and failed,
									// *lockword_ptr = 000D0400
									// owned by D, unreserved, recursion=1
								
		RECURSION_DEC(lockword_ptr, lockword);

By the time hythread_thin_monitor_exit() wanted to decrease recursion count by rewriting lockword,
the lockword value in memory changed from 000D0800 to 000D0400, thus violating the assumption
that the only thread modifying the lockword is the thread that installed its thread id into lockword.

The change 000D08000 -> 000D0400 corresponds to the unreservation procedure.
(owned by D, reserved, recursion = 1 -> owned by D, unreserved, recursion = 1).

Unreservation procedure is supposed to suspend the owner of the unreserved thread, using safe suspension
model, which must guarantee that no unsafe code is running on the target thread during unreservation.
hythread_thin_monitor_exit() is an example of unsafe code, which must not run during unreservation.

Looking at the unreserve_lock(), it does suspend lock owner thread first:

	169         status=hythread_suspend_other(owner);

And obviously the lock owner thread wasn't really suspended in this case, because it was running hythread_thin_monitor_exit() at the same time.

Looking at hythread_suspend_other(), it can return immediately without waiting for the thread to be really suspended, if the suspension was already requested for that thread:

hythread_suspend_other():
288         send_suspend_request(thread);
289         while(wait_safe_region_event(thread)!=TM_ERROR_NONE) {
...
311     return TM_ERROR_NONE;
312 }

and 

wait_safe_region_event():
217 static IDATA wait_safe_region_event(hythread_t thread) {
...
219     if(thread->suspend_request > 1 || thread == tm_self_tls) {
...
221         return TM_ERROR_NONE;
222     }

The problem looks like incorrect assumption in hythread_suspend_other():
 "if suspend_request requested more than once, we do not need to wait, because thread is already suspended",
because in reality it is not guaranteed, that the thread with suspend->request == 1 has already been suspended.

In this particular test, GC happens fairly often, and probably first suspension request was posted by thread trying to start garbage collection.


> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Salikh Zakirov (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Salikh Zakirov updated HARMONY-3002:
------------------------------------

    Attachment: wait_safe_region_event-removed-incorrect-early-return-condition.patch

wait_safe_region_event-removed-incorrect-early-return-condition.patch removes an incorrect condition for early return
from wait_safe_region_event().

> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>         Attachments: wait_safe_region_event-removed-incorrect-early-return-condition.patch
>
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386

Posted by "Gregory Shimansky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregory Shimansky closed HARMONY-3002.
--------------------------------------

    Resolution: Fixed

Patch applied at 505011.

> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>         Assigned To: Gregory Shimansky
>         Attachments: wait_safe_region_event-removed-incorrect-early-return-condition.patch
>
>
> I am not sure this is a class loader bug, but the crash happens on class loader code in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes after some time. The fat monitor which is used for synchronization in Class::initialize appears to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.