You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Evgueni Brevnov <ev...@gmail.com> on 2006/10/13 06:53:20 UTC

[drlvm][threading] Possible race condition in implementation of conditional variables?

Hi,

I do the following:

hythread_suspend_disable();
<do unsafe actions>
hysem_wait(semaphore);
<do unsafe actions>
hythread_suspend_enable();

By saying hythread_suspend_disable(); I expect the thread can't be
suspended until hythread_suspend_enable() is called. But hysem_wait()
resets disabled mode and enables thread suspension. As a result GC can
happen when it must not. hysem_wait is based on conditional variables
so the same can happen when conditional variables is used.

Do you see the problem here? Or I miss something?

Thanks
Evgueni

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Weldon Washburn <we...@gmail.com>.
Its a design rule kind of thing.  Breaking the design rule leads to
undefined behavior.  My experience is that the kind of problem described
here leads to very hard to diagnose system instability.  If anything, a
"known" bug of general, intermittant instability could be associated with
this issue.  If it turns out to be too hard for Artem to build what I
suggest, then a hard assert(0); should be installed.  This is a 20 minute
hack.  The downside is that it might snag design rule violations
unexpectedly thus causing lots of angst.

On 10/16/06, Rana Dasgupta <rd...@gmail.com> wrote:
>
> Hi,
> Is there any known bug related to this issue?
>
> Rana
>
>
>
> > On 10/15/06, Weldon Washburn <we...@gmail.com> wrote:
> > >
> > > After thinking about it a while, how about the following course of
> > > action:
> > >
> > > 1)
> > > First phase is to modify hysem_wait() and any other hy.... blocking
> > > functions to test if, in fact, the thread is in suspend enabled
> > > mode.  If
> > > the thread is not,  do something like a printf("WARNING: root set
> > > enumeration is unstable, hytsem.cpp line #285\n");  Then do a
> > > non-destructive stack unwind and printf a stack trace
> > >
> > > An even better idea would be to log the printf's out to a file that
> can
> > > later be retrieved.
> > >
> > > 2)
> > > Second phase.  Analyze the code paths that lead to the enable/disable
> > > problems.  Are there fundamental design flaws?  Implementation flaws?
> > >
> > > 3)
> > > Third phase.  Assume the above turns up easy to fix bugs and minor
> > > architectural issues.  And that these issues are settled.  Then commit
> a
> > > mod
> > > to svn that will cause the system to do an assert(0); in debug mode
> and
> > > exit
> > > w/ stack trace in release mode.
> > >
> > > Artem,
> > > Does it make sense for you to create a patch that does the above??
> > >
> > >
> > >
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Rana Dasgupta <rd...@gmail.com>.
Hi,
  Is there any known bug related to this issue?

Rana



> On 10/15/06, Weldon Washburn <we...@gmail.com> wrote:
> >
> > After thinking about it a while, how about the following course of
> > action:
> >
> > 1)
> > First phase is to modify hysem_wait() and any other hy.... blocking
> > functions to test if, in fact, the thread is in suspend enabled
> > mode.  If
> > the thread is not,  do something like a printf("WARNING: root set
> > enumeration is unstable, hytsem.cpp line #285\n");  Then do a
> > non-destructive stack unwind and printf a stack trace
> >
> > An even better idea would be to log the printf's out to a file that can
> > later be retrieved.
> >
> > 2)
> > Second phase.  Analyze the code paths that lead to the enable/disable
> > problems.  Are there fundamental design flaws?  Implementation flaws?
> >
> > 3)
> > Third phase.  Assume the above turns up easy to fix bugs and minor
> > architectural issues.  And that these issues are settled.  Then commit a
> > mod
> > to svn that will cause the system to do an assert(0); in debug mode and
> > exit
> > w/ stack trace in release mode.
> >
> > Artem,
> > Does it make sense for you to create a patch that does the above??
> >
> >
> >

Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Artem Aliev <ar...@gmail.com>.
Weldon,

Ok, I'll try to do something like you describe.
But it is not very easy to do, I think.

Thanks
Artem

On 10/15/06, Weldon Washburn <we...@gmail.com> wrote:
> After thinking about it a while, how about the following course of action:
>
> 1)
> First phase is to modify hysem_wait() and any other hy.... blocking
> functions to test if, in fact, the thread is in suspend enabled mode.  If
> the thread is not,  do something like a printf("WARNING: root set
> enumeration is unstable, hytsem.cpp line #285\n");  Then do a
> non-destructive stack unwind and printf a stack trace
>
> An even better idea would be to log the printf's out to a file that can
> later be retrieved.
>
> 2)
> Second phase.  Analyze the code paths that lead to the enable/disable
> problems.  Are there fundamental design flaws?  Implementation flaws?
>
> 3)
> Third phase.  Assume the above turns up easy to fix bugs and minor
> architectural issues.  And that these issues are settled.  Then commit a mod
> to svn that will cause the system to do an assert(0); in debug mode and exit
> w/ stack trace in release mode.
>
> Artem,
> Does it make sense for you to create a patch that does the above??
>
>
> On 10/13/06, Artem Aliev <ar...@gmail.com> wrote:
> >
> > A classloader and some other components assume that semaphores and
> > latches does not reset suspend disable count and some time ago latches
> > and semaphores work this way.
> > A java monitor ,in reverse, resets the suspend disable status in
> > monitorEnter and monitor_wait command, and DRLVM takes care about it.
> > It looks like someone changed the behavior of semaphores to be
> > compatible with the monitors.
> >
> > I agree with Xiao-Feng: the sleeping thread should be in suspend_enable
> > mode.
> > So I vote for leaving the current threading code as is but checking
> > DRLVM for a code that not ready for the new behavior.
> >
> > For example by putting assert(gc enabled ==  true() and checking failed
> > places.
> >
> > Thanks
> > Artem
> >
> >
> > On 10/13/06, Xiao-Feng Li <xi...@gmail.com> wrote:
> > > GC should be enabled in waiting state. In case that a GC can happen,
> > > the code should do bookkeeping to guarantee the correctness.
> > >
> > > Thanks,
> > > xiaofeng
> > >
> > > On 10/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > I do the following:
> > > >
> > > > hythread_suspend_disable();
> > > > <do unsafe actions>
> > > > hysem_wait(semaphore);
> > > > <do unsafe actions>
> > > > hythread_suspend_enable();
> > > >
> > > > By saying hythread_suspend_disable(); I expect the thread can't be
> > > > suspended until hythread_suspend_enable() is called. But hysem_wait()
> > > > resets disabled mode and enables thread suspension. As a result GC can
> > > > happen when it must not. hysem_wait is based on conditional variables
> > > > so the same can happen when conditional variables is used.
> > > >
> > > > Do you see the problem here? Or I miss something?
> > > >
> > > > Thanks
> > > > Evgueni
> > > >
> > > > ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Weldon Washburn
> Intel Middleware Products Division
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Weldon Washburn <we...@gmail.com>.
After thinking about it a while, how about the following course of action:

1)
First phase is to modify hysem_wait() and any other hy.... blocking
functions to test if, in fact, the thread is in suspend enabled mode.  If
the thread is not,  do something like a printf("WARNING: root set
enumeration is unstable, hytsem.cpp line #285\n");  Then do a
non-destructive stack unwind and printf a stack trace

An even better idea would be to log the printf's out to a file that can
later be retrieved.

2)
Second phase.  Analyze the code paths that lead to the enable/disable
problems.  Are there fundamental design flaws?  Implementation flaws?

3)
Third phase.  Assume the above turns up easy to fix bugs and minor
architectural issues.  And that these issues are settled.  Then commit a mod
to svn that will cause the system to do an assert(0); in debug mode and exit
w/ stack trace in release mode.

Artem,
Does it make sense for you to create a patch that does the above??


On 10/13/06, Artem Aliev <ar...@gmail.com> wrote:
>
> A classloader and some other components assume that semaphores and
> latches does not reset suspend disable count and some time ago latches
> and semaphores work this way.
> A java monitor ,in reverse, resets the suspend disable status in
> monitorEnter and monitor_wait command, and DRLVM takes care about it.
> It looks like someone changed the behavior of semaphores to be
> compatible with the monitors.
>
> I agree with Xiao-Feng: the sleeping thread should be in suspend_enable
> mode.
> So I vote for leaving the current threading code as is but checking
> DRLVM for a code that not ready for the new behavior.
>
> For example by putting assert(gc enabled ==  true() and checking failed
> places.
>
> Thanks
> Artem
>
>
> On 10/13/06, Xiao-Feng Li <xi...@gmail.com> wrote:
> > GC should be enabled in waiting state. In case that a GC can happen,
> > the code should do bookkeeping to guarantee the correctness.
> >
> > Thanks,
> > xiaofeng
> >
> > On 10/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > > Hi,
> > >
> > > I do the following:
> > >
> > > hythread_suspend_disable();
> > > <do unsafe actions>
> > > hysem_wait(semaphore);
> > > <do unsafe actions>
> > > hythread_suspend_enable();
> > >
> > > By saying hythread_suspend_disable(); I expect the thread can't be
> > > suspended until hythread_suspend_enable() is called. But hysem_wait()
> > > resets disabled mode and enables thread suspension. As a result GC can
> > > happen when it must not. hysem_wait is based on conditional variables
> > > so the same can happen when conditional variables is used.
> > >
> > > Do you see the problem here? Or I miss something?
> > >
> > > Thanks
> > > Evgueni
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Artem Aliev <ar...@gmail.com>.
A classloader and some other components assume that semaphores and
latches does not reset suspend disable count and some time ago latches
and semaphores work this way.
A java monitor ,in reverse, resets the suspend disable status in
monitorEnter and monitor_wait command, and DRLVM takes care about it.
It looks like someone changed the behavior of semaphores to be
compatible with the monitors.

I agree with Xiao-Feng: the sleeping thread should be in suspend_enable mode.
So I vote for leaving the current threading code as is but checking
DRLVM for a code that not ready for the new behavior.

For example by putting assert(gc enabled ==  true() and checking failed places.

Thanks
Artem


On 10/13/06, Xiao-Feng Li <xi...@gmail.com> wrote:
> GC should be enabled in waiting state. In case that a GC can happen,
> the code should do bookkeeping to guarantee the correctness.
>
> Thanks,
> xiaofeng
>
> On 10/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > Hi,
> >
> > I do the following:
> >
> > hythread_suspend_disable();
> > <do unsafe actions>
> > hysem_wait(semaphore);
> > <do unsafe actions>
> > hythread_suspend_enable();
> >
> > By saying hythread_suspend_disable(); I expect the thread can't be
> > suspended until hythread_suspend_enable() is called. But hysem_wait()
> > resets disabled mode and enables thread suspension. As a result GC can
> > happen when it must not. hysem_wait is based on conditional variables
> > so the same can happen when conditional variables is used.
> >
> > Do you see the problem here? Or I miss something?
> >
> > Thanks
> > Evgueni
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Xiao-Feng Li <xi...@gmail.com>.
GC should be enabled in waiting state. In case that a GC can happen,
the code should do bookkeeping to guarantee the correctness.

Thanks,
xiaofeng

On 10/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> Hi,
>
> I do the following:
>
> hythread_suspend_disable();
> <do unsafe actions>
> hysem_wait(semaphore);
> <do unsafe actions>
> hythread_suspend_enable();
>
> By saying hythread_suspend_disable(); I expect the thread can't be
> suspended until hythread_suspend_enable() is called. But hysem_wait()
> resets disabled mode and enables thread suspension. As a result GC can
> happen when it must not. hysem_wait is based on conditional variables
> so the same can happen when conditional variables is used.
>
> Do you see the problem here? Or I miss something?
>
> Thanks
> Evgueni
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Weldon Washburn <we...@gmail.com>.
I did some digging.  It looks like what's in today's svn HEAD does the
following:


thread_native_semaphore.c::hysem_wait() calls

 thread_native_semaphore.c::sem_wait_impl() calls

  thread_native_condvar.c::condvar_wait_impl()  which does the following:

  {

      disable_count = reset_suspend_disable();

      apr_thread_cond_wait();

      set_suspend_disable(disable_count);

   }



The potential problem is that
thread_native_suspend.c::reset_suspend_disable() does the following:

 {

     int dis = self->suspend_disable_count;

     self->suspend_disable_count = 0;

}



This needs more investigation.  The big worry at this point is why
reset_suspend_disable() exists in the first place?  There may be a
misunderstanding how GC suspend enable/disable should work.  It also looks
like there is confusion regarding the functionality required for GC suspend
enable/disable and the functionality required for Java Thread
synchronization.   While they have some overlapping similarities, the also
have unique distinct semantics.



Evgueni, Artem, does any of the above make sense?



For completeness, I tracked down the GC suspend enable/disable code.  It is
below:



thread_native_suspend.c:: hythread_is_suspend_enabled(){

        return tm_self_tls->suspend_disable_count == 0;

}



On 10/12/06, Weldon Washburn <we...@gmail.com> wrote:
>
>
>
> On 10/12/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> >
> > Hi,
> >
> > I do the following:
> >
> > hythread_suspend_disable();
> > <do unsafe actions>
> > hysem_wait(semaphore);
> > <do unsafe actions>
> > hythread_suspend_enable();
> >
> > By saying hythread_suspend_disable(); I expect the thread can't be
> > suspended until hythread_suspend_enable() is called.
>
>
> Some observations:
>
> 1)
> The above code sequence is a really good diagnostic test.  It definitely
> breaks the suspend enable/disable model.  It is an illegal code sequence
> that can cause the entire system to hang if there is a GC while stuck in a
> semaphore wait.  If anything, I expect the system to hang, not reset the
> enable/disable state and continue executing.
>
> 2)
> I don't understand why hysem_wait() *enables* the GC.  My only guess is
> that someone hit a problem where the system deadlocked in hysem_wait() and
> hacked in the enable.  Any clues who did this and why they did this??
>
> 3)
> How about putting an assert(gc enabled ==  true) ; in hysem_wait() {...}
> and debugging the cases where gc is not enabled and the next line of code
> executes a wait.
>
>
> But hysem_wait()
> > resets disabled mode and enables thread suspension. As a result GC can
> > happen when it must not. hysem_wait is based on conditional variables
> > so the same can happen when conditional variables is used.
> >
> > Do you see the problem here? Or I miss something?
> >
> > Thanks
> > Evgueni
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Weldon Washburn
> Intel Middleware Products Division




-- 
Weldon Washburn
Intel Enterprise Solutions Software Division

Re: [drlvm][threading] Possible race condition in implementation of conditional variables?

Posted by Weldon Washburn <we...@gmail.com>.
On 10/12/06, Evgueni Brevnov <ev...@gmail.com> wrote:
>
> Hi,
>
> I do the following:
>
> hythread_suspend_disable();
> <do unsafe actions>
> hysem_wait(semaphore);
> <do unsafe actions>
> hythread_suspend_enable();
>
> By saying hythread_suspend_disable(); I expect the thread can't be
> suspended until hythread_suspend_enable() is called.


Some observations:

1)
The above code sequence is a really good diagnostic test.  It definitely
breaks the suspend enable/disable model.  It is an illegal code sequence
that can cause the entire system to hang if there is a GC while stuck in a
semaphore wait.  If anything, I expect the system to hang, not reset the
enable/disable state and continue executing.

2)
I don't understand why hysem_wait() *enables* the GC.  My only guess is that
someone hit a problem where the system deadlocked in hysem_wait() and hacked
in the enable.  Any clues who did this and why they did this??

3)
How about putting an assert(gc enabled ==  true) ; in hysem_wait() {...} and
debugging the cases where gc is not enabled and the next line of code
executes a wait.


But hysem_wait()
> resets disabled mode and enables thread suspension. As a result GC can
> happen when it must not. hysem_wait is based on conditional variables
> so the same can happen when conditional variables is used.
>
> Do you see the problem here? Or I miss something?
>
> Thanks
> Evgueni
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Weldon Washburn
Intel Middleware Products Division