You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Evgueni Brevnov <ev...@gmail.com> on 2006/11/10 15:45:07 UTC

[drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Hi,

While investigating deadlock scenario which is described in
HARMONY-2006 I found out one interesting thing. It turned out that DRL
implementation of hythread_monitor_init /
hythread_monitor_init_with_name initializes and acquires a monitor.
Original spec reads: "Acquire and initialize a new monitor from the
threading library...." AFAIU that doesn't mean to lock the monitor but
get it from the threading library. So the hythread_monitor_init should
not lock the monitor.

Could somebody comment on that?

Thanks
Evgueni

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Artem Aliev <ar...@gmail.com>.
Oops,


You were right. I take a llook into  classlib hythread code.
It looks like I incorrectly understand the documentation.
This is a bug.

Thanks
Artem


On 11/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> Could someone familiar with classlib's implementation comment on that ....?
>
> Thanks in advance.
> Evgueni
>
> On 11/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > Hello Artem,
> >
> > Are you 100% sure? I've looked at the classlib's implementation and
> > can't find where the monitor is acquired. Moreover if you look at the
> > initializeSignalTools() located in
> > modules\portlib\src\main\native\port\linux\hysignal.c you will find
> > that it initializes new monitors with hyhtread_monitor_init_with_name
> > and never frees these monitors. That turned out to be the reason of a
> > deadlock in HARMONY-2006.
> >
> > Thanks
> > Evgueni
> >
> > On 11/13/06, Artem Aliev <ar...@gmail.com> wrote:
> > > > It turned out that DRL
> > > > implementation of hythread_monitor_init /
> > > > hythread_monitor_init_with_name initializes and acquires a monitor.
> > >
> > > Eugeni,
> > >
> > > Both drlvm and classlib hythread work this way.
> > > This original hythread design that for compatibility reason  was
> > > implemented in drlvm.
> > >
> > > Thanks
> > > Artem
> > >
> > >
> > >
> > > On 11/10/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > While investigating deadlock scenario which is described in
> > > > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> > > > implementation of hythread_monitor_init /
> > > > hythread_monitor_init_with_name initializes and acquires a monitor.
> > > > Original spec reads: "Acquire and initialize a new monitor from the
> > > > threading library...." AFAIU that doesn't mean to lock the monitor but
> > > > get it from the threading library. So the hythread_monitor_init should
> > > > not lock the monitor.
> > > >
> > > > Could somebody comment on that?
> > > >
> > > > Thanks
> > > > Evgueni
> > > >
> > >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Could someone familiar with classlib's implementation comment on that ....?

Thanks in advance.
Evgueni

On 11/13/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> Hello Artem,
>
> Are you 100% sure? I've looked at the classlib's implementation and
> can't find where the monitor is acquired. Moreover if you look at the
> initializeSignalTools() located in
> modules\portlib\src\main\native\port\linux\hysignal.c you will find
> that it initializes new monitors with hyhtread_monitor_init_with_name
> and never frees these monitors. That turned out to be the reason of a
> deadlock in HARMONY-2006.
>
> Thanks
> Evgueni
>
> On 11/13/06, Artem Aliev <ar...@gmail.com> wrote:
> > > It turned out that DRL
> > > implementation of hythread_monitor_init /
> > > hythread_monitor_init_with_name initializes and acquires a monitor.
> >
> > Eugeni,
> >
> > Both drlvm and classlib hythread work this way.
> > This original hythread design that for compatibility reason  was
> > implemented in drlvm.
> >
> > Thanks
> > Artem
> >
> >
> >
> > On 11/10/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > > Hi,
> > >
> > > While investigating deadlock scenario which is described in
> > > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> > > implementation of hythread_monitor_init /
> > > hythread_monitor_init_with_name initializes and acquires a monitor.
> > > Original spec reads: "Acquire and initialize a new monitor from the
> > > threading library...." AFAIU that doesn't mean to lock the monitor but
> > > get it from the threading library. So the hythread_monitor_init should
> > > not lock the monitor.
> > >
> > > Could somebody comment on that?
> > >
> > > Thanks
> > > Evgueni
> > >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Hello Artem,

Are you 100% sure? I've looked at the classlib's implementation and
can't find where the monitor is acquired. Moreover if you look at the
initializeSignalTools() located in
modules\portlib\src\main\native\port\linux\hysignal.c you will find
that it initializes new monitors with hyhtread_monitor_init_with_name
and never frees these monitors. That turned out to be the reason of a
deadlock in HARMONY-2006.

Thanks
Evgueni

On 11/13/06, Artem Aliev <ar...@gmail.com> wrote:
> > It turned out that DRL
> > implementation of hythread_monitor_init /
> > hythread_monitor_init_with_name initializes and acquires a monitor.
>
> Eugeni,
>
> Both drlvm and classlib hythread work this way.
> This original hythread design that for compatibility reason  was
> implemented in drlvm.
>
> Thanks
> Artem
>
>
>
> On 11/10/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > Hi,
> >
> > While investigating deadlock scenario which is described in
> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> > implementation of hythread_monitor_init /
> > hythread_monitor_init_with_name initializes and acquires a monitor.
> > Original spec reads: "Acquire and initialize a new monitor from the
> > threading library...." AFAIU that doesn't mean to lock the monitor but
> > get it from the threading library. So the hythread_monitor_init should
> > not lock the monitor.
> >
> > Could somebody comment on that?
> >
> > Thanks
> > Evgueni
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Artem Aliev <ar...@gmail.com>.
> It turned out that DRL
> implementation of hythread_monitor_init /
> hythread_monitor_init_with_name initializes and acquires a monitor.

Eugeni,

Both drlvm and classlib hythread work this way.
This original hythread design that for compatibility reason  was
implemented in drlvm.

Thanks
Artem



On 11/10/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> Hi,
>
> While investigating deadlock scenario which is described in
> HARMONY-2006 I found out one interesting thing. It turned out that DRL
> implementation of hythread_monitor_init /
> hythread_monitor_init_with_name initializes and acquires a monitor.
> Original spec reads: "Acquire and initialize a new monitor from the
> threading library...." AFAIU that doesn't mean to lock the monitor but
> get it from the threading library. So the hythread_monitor_init should
> not lock the monitor.
>
> Could somebody comment on that?
>
> Thanks
> Evgueni
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Gregory,

I can't reproduce the problem described by you on my local Ubuntu
machine. So I can only guess. And my guess is that
mapPortLibSignalToUnix can't find corresponding signal in the map.
That's why you have undefined sig (-1215196204) in jsig_handler. I can
think of two reasons why everything works fine on my machine:
1) Another signal is generated on my build.
2) It is just a matter of luck that eax contains some proper value
upon returning from mapPortLibSignalToUnix.

That's it for now....

Thanks
Evgueni

On 11/14/06, Alexei Fedotov <al...@gmail.com> wrote:
> Evgueni,
> That was great.
>
> Artem,
> It's nice to see you online. Could you please check the last comments
> to http://issues.apache.org/jira/browse/HARMONY-1904 and decide what
> should we do about this issue?
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Alexei Fedotov <al...@gmail.com>.
Evgueni,
That was great.

Artem,
It's nice to see you online. Could you please check the last comments
to http://issues.apache.org/jira/browse/HARMONY-1904 and decide what
should we do about this issue?

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

Evgueni Brevnov wrote:
> Geir,
> 
> Besides rethinking how VM uses SIGUSR2 we need to look closer into how
> VM deals with signal handlers. As Jeff described above this is a very
> delicate place. I'm almost sure DRLVM needs changes in this area.
> Could you suggest the best way to keep this on track? (TODO list,
> JIRA, or just a head)

(I have weird problems with mail - messages seem to come late... hence 
the late response)

A JIRA isn't a bad place to start.  Wanna create it? :)

I think the first thing is getting an understanding of what we already do...

> 
> Thanks
> Evgueni
> 
> On 11/22/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> Yes.  I think I now understand why you don't have the same problems w/
>> J9 that we do w/ DRLVM.  I think :)
>>
>> It think it's worth us revisiting how we're using SIGUSR2 in DRLVM and
>> see if we can refactor along the lines of how you are using them in J9.
>>
>> Thanks so much for the info.  This one has been bugging me for a while.
>>
>> geir
>>
>>
>> Jeff Disher wrote:
>> > No, this is a different mechanism.  Signal handler chaining and 
>> native to
>> > port signal number mappings are both done on the thread which 
>> receives the
>> > signal.  If that thread is the one which was in a select, then it has
>> > already been interrupted to run the master handler.
>> >
>> > My guess as to why you don't see these interruptions in the Harmony 
>> port
>> > calls when running with J9 is that, for the most part, our internal 
>> signals
>> > are synchronous so they are handled on the thread which causes 
>> them.  Even
>> > when we use asynchronous signals, we are usually using pthread_kill 
>> to send
>> > them to specific threads which we know to be blocked in an operation 
>> which
>> > is allowed to be interrupted (FYI:  these are not in port, hence why it
>> > doesn't already have logic for this).
>> >
>> > If you really don't want to be interrupted by an asynchronous signal 
>> during
>> > some operation, the only way may be to change the per-thread signal 
>> mask
>> > for
>> > the duration of the uninterruptable code path.  At least then you would
>> > receive the signal at a deterministic point (the resetting of the 
>> signal
>> > mask) or it would be sent to another thread.
>> >
>> > Even that seems like a heavier operation than just allowing the call 
>> to be
>> > interrupted and deciding if you needed to handle the interruption or 
>> just
>> > resume (complications involving remaining timeout aside - they may 
>> make it
>> > easier to mask).
>> >
>> > Does that answer your question?
>> > Jeff.
>> >
>> > On 11/21/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> >>
>> >> Yes.  Thanks - does this mean that we can intercept and prevent "slow"
>> >> system calls like select() from interrupting?
>> >>
>> >> This is the key problem I'm trying to solve.  I suspect the answer is
>> >> "yes" somehow, since we see no behavioral problems with J9 and the
>> >> various socket calls in the Harmony classlib.
>> >>
>> >> geir
>> >>
>> >> Jeff Disher wrote:
>> >> > On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> >> >>
>> >> >>
>> >> >> Can you illustrate what you are talking about w/ a pointer to
>> >> code?  We
>> >> >> have some very concerning issues re signals, and I never could grok
>> >> how
>> >> >> J9 + classlib didn't have problems....
>> >> >>
>> >> >
>> >> >
>> >> > Without getting stuck in the specific line-by-line details of 
>> this, the
>> >> > idea
>> >> > of chaining signal handlers essentially comes down to using an 
>> internal
>> >> > handler registration interface so that components can register a
>> >> handler
>> >> > with an internal master signal handler and it (ie:  the port
>> >> library, or
>> >> > wherever you want it managed) would make sure that the user 
>> component
>> >> > handler gets called when the signal is triggered.  This means 
>> that the
>> >> > top-level signal handler - which is actually registered with the OS
>> >> - is
>> >> > created lazily within the internal master handler component.  
>> When the
>> >> > signal occurs, the master handler is invoked by the OS and it simply
>> >> calls
>> >> > all the handlers registered with it according to a defined
>> >> ordering.  There
>> >> > is also the issue of whether or not you want the chaining from one
>> >> handler
>> >> > to the next to be implicit or whether the signal handler has to 
>> decide
>> >> > (this
>> >> > may be valuable in that you could allow a signal handler to be
>> >> temporarily
>> >> > installed to wrap a dangerous operation and the handler could opt
>> >> not to
>> >> > chain to the other handlers, thus shielding the rest of the VM 
>> from the
>> >> > dangerous operation without it even needing to be aware of what
>> >> happened).
>> >> >
>> >> > There is also the possibility to apply some symbol over-riding
>> >> tricks so
>> >> > that foreign natives will be forced to use the mechanism in the 
>> VM even
>> >> > though they thought that they were calling the normal system 
>> signal or
>> >> > sigaction (note that things like this are pretty dangerous, 
>> though, and
>> >> > will
>> >> > limit the scope of responsibility which you can push onto the user
>> >> handlers
>> >> > - ie:  chaining requirements or unregistration, etc - so it probably
>> >> > isn't a
>> >> > good idea).
>> >> >
>> >> > I am not sure if the mechanism that we use in the VM was released as
>> >> > part of
>> >> > our Harmony contribution but it does specify a way that multiple
>> >> handlers
>> >> > can co-exist so we don't constantly have handlers from one component
>> >> being
>> >> > over-written by another.  As Angela mentioned, there is also a
>> >> provision
>> >> to
>> >> > ensure that we don't over-write pre-existing external handlers.
>> >> >
>> >> >
>> >> > Is that the kind of information you were looking for?
>> >> > Jeff.
>> >> >
>> >>
>> >
>>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Geir,

Besides rethinking how VM uses SIGUSR2 we need to look closer into how
VM deals with signal handlers. As Jeff described above this is a very
delicate place. I'm almost sure DRLVM needs changes in this area.
Could you suggest the best way to keep this on track? (TODO list,
JIRA, or just a head)

Thanks
Evgueni

On 11/22/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> Yes.  I think I now understand why you don't have the same problems w/
> J9 that we do w/ DRLVM.  I think :)
>
> It think it's worth us revisiting how we're using SIGUSR2 in DRLVM and
> see if we can refactor along the lines of how you are using them in J9.
>
> Thanks so much for the info.  This one has been bugging me for a while.
>
> geir
>
>
> Jeff Disher wrote:
> > No, this is a different mechanism.  Signal handler chaining and native to
> > port signal number mappings are both done on the thread which receives the
> > signal.  If that thread is the one which was in a select, then it has
> > already been interrupted to run the master handler.
> >
> > My guess as to why you don't see these interruptions in the Harmony port
> > calls when running with J9 is that, for the most part, our internal signals
> > are synchronous so they are handled on the thread which causes them.  Even
> > when we use asynchronous signals, we are usually using pthread_kill to send
> > them to specific threads which we know to be blocked in an operation which
> > is allowed to be interrupted (FYI:  these are not in port, hence why it
> > doesn't already have logic for this).
> >
> > If you really don't want to be interrupted by an asynchronous signal during
> > some operation, the only way may be to change the per-thread signal mask
> > for
> > the duration of the uninterruptable code path.  At least then you would
> > receive the signal at a deterministic point (the resetting of the signal
> > mask) or it would be sent to another thread.
> >
> > Even that seems like a heavier operation than just allowing the call to be
> > interrupted and deciding if you needed to handle the interruption or just
> > resume (complications involving remaining timeout aside - they may make it
> > easier to mask).
> >
> > Does that answer your question?
> > Jeff.
> >
> > On 11/21/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >>
> >> Yes.  Thanks - does this mean that we can intercept and prevent "slow"
> >> system calls like select() from interrupting?
> >>
> >> This is the key problem I'm trying to solve.  I suspect the answer is
> >> "yes" somehow, since we see no behavioral problems with J9 and the
> >> various socket calls in the Harmony classlib.
> >>
> >> geir
> >>
> >> Jeff Disher wrote:
> >> > On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >> >>
> >> >>
> >> >> Can you illustrate what you are talking about w/ a pointer to
> >> code?  We
> >> >> have some very concerning issues re signals, and I never could grok
> >> how
> >> >> J9 + classlib didn't have problems....
> >> >>
> >> >
> >> >
> >> > Without getting stuck in the specific line-by-line details of this, the
> >> > idea
> >> > of chaining signal handlers essentially comes down to using an internal
> >> > handler registration interface so that components can register a
> >> handler
> >> > with an internal master signal handler and it (ie:  the port
> >> library, or
> >> > wherever you want it managed) would make sure that the user component
> >> > handler gets called when the signal is triggered.  This means that the
> >> > top-level signal handler - which is actually registered with the OS
> >> - is
> >> > created lazily within the internal master handler component.  When the
> >> > signal occurs, the master handler is invoked by the OS and it simply
> >> calls
> >> > all the handlers registered with it according to a defined
> >> ordering.  There
> >> > is also the issue of whether or not you want the chaining from one
> >> handler
> >> > to the next to be implicit or whether the signal handler has to decide
> >> > (this
> >> > may be valuable in that you could allow a signal handler to be
> >> temporarily
> >> > installed to wrap a dangerous operation and the handler could opt
> >> not to
> >> > chain to the other handlers, thus shielding the rest of the VM from the
> >> > dangerous operation without it even needing to be aware of what
> >> happened).
> >> >
> >> > There is also the possibility to apply some symbol over-riding
> >> tricks so
> >> > that foreign natives will be forced to use the mechanism in the VM even
> >> > though they thought that they were calling the normal system signal or
> >> > sigaction (note that things like this are pretty dangerous, though, and
> >> > will
> >> > limit the scope of responsibility which you can push onto the user
> >> handlers
> >> > - ie:  chaining requirements or unregistration, etc - so it probably
> >> > isn't a
> >> > good idea).
> >> >
> >> > I am not sure if the mechanism that we use in the VM was released as
> >> > part of
> >> > our Harmony contribution but it does specify a way that multiple
> >> handlers
> >> > can co-exist so we don't constantly have handlers from one component
> >> being
> >> > over-written by another.  As Angela mentioned, there is also a
> >> provision
> >> to
> >> > ensure that we don't over-write pre-existing external handlers.
> >> >
> >> >
> >> > Is that the kind of information you were looking for?
> >> > Jeff.
> >> >
> >>
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Angela Lin <al...@gmail.com>.
re: the original hythread_monitor_init() question
When the docs say, "acquire and initialize a monitor," we indeed meant
"acquire" as in, "acquire a monitor from the monitor pool," and not,
"lock the monitor."

On 11/18/06, Angela Lin <al...@gmail.com> wrote:
> Some belated comments:
> 1. I agree that replacing the simple sem_wait() with the check for -1
> and EINTR is a good patch. The original writer of the code was working
> on a version of Linux that claimed sem_wait() would never return an
> error.
>
> 2. I would additionally suggest that it might be a good idea to mask
> all async signals in the asynchSignalReporter thread.
>
> 3. Yes, we had bugs in mapUnix...() and mapPortLib...() functions.
> They had no defined return value for unexpected signals.
>
> 4. To the best of my knowledge, the classlib doesn't intentionally use
> SIGUSR2 for anything. I don't have the code in front of me now, but if
> you grep for where masterASynchSignalHandler() is registered, you
> should see which signals get caught.
>
> 5. If DRLVM uses SIGUSR2 heavily, you might consider extending the
> port library to handle that for you as well. The port library chains
> signal handlers. We needed to do this to defend ourselves against apps
> that would (natively) register their own signal handlers, then create
> the JVM and still expect their signal handlers to get called. I seem
> to remember that this was mentioned in another thread?
>
> Regards,
> Angela
>
> On 11/17/06, Gregory Shimansky <gs...@gmail.com> wrote:
> > Evgueni Brevnov wrote:
> > > In other words we will observe the crash as we do now if sem_wait
> > > completes unsuccessfully for whatever reason...
> >
> > Well it shouldn't return an error except for signal, shouldn't it? Two
> > possible other errors are EINVAL and EDEADLK which should never happen.
> >
> > Maybe we should add an assertion after it that sem_wait was successful
> > to catch this situation quickly, and it will be a good starting point
> > for investigation.
> >
> > > On 11/17/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > >> Gregory,
> > >>
> > >> The code which goes after sem_wait doesn't work properly if sem_wait
> > >> returns with an error code. So we need to either loop until sem_wait
> > >> returns successfully or adjust the code after sem_wait to handle
> > >> irregular cases.
> > >>
> > >> Thanks
> > >> Evgueni
> > >>
> > >> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > >> > Yes - that's why I was poking him to see the patch.  I was going to
> > >> > suggest something very similar.
> > >> >
> > >> > geir
> > >> >
> > >> >
> > >> > Gregory Shimansky wrote:
> > >> > > Evgueni Brevnov wrote:
> > >> > >> You can look at the change here
> > >> > >> http://issues.apache.org/jira/browse/HARMONY-2203
> > >> > >
> > >> > > Could someone who knowns classlib native code internals better
> > >> than me
> > >> > > comment on this JIRA? I've added my comment from the general POV.
> > >> > >
> > >> > > I would change the loop to detect only signal interruption like
> > >> > >
> > >> > > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> > >> > >
> > >> > > Other than that I agree with the patch. I someone does not know,
> > >> every
> > >> > > step in gdb also interrupts sem_wait calls, so such loops are a
> > >> common
> > >> > > practice when using semaphores.
> > >> > >
> > >> > > If someone knows classlib internal logic with this asynchronous
> > >> handlers
> > >> > > stuff please write your opinion.
> > >> > >
> > >> >
> > >>
> > >
> >
> >
> > --
> > Gregory
> >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
Yes.  I think I now understand why you don't have the same problems w/ 
J9 that we do w/ DRLVM.  I think :)

It think it's worth us revisiting how we're using SIGUSR2 in DRLVM and 
see if we can refactor along the lines of how you are using them in J9.

Thanks so much for the info.  This one has been bugging me for a while.

geir


Jeff Disher wrote:
> No, this is a different mechanism.  Signal handler chaining and native to
> port signal number mappings are both done on the thread which receives the
> signal.  If that thread is the one which was in a select, then it has
> already been interrupted to run the master handler.
> 
> My guess as to why you don't see these interruptions in the Harmony port
> calls when running with J9 is that, for the most part, our internal signals
> are synchronous so they are handled on the thread which causes them.  Even
> when we use asynchronous signals, we are usually using pthread_kill to send
> them to specific threads which we know to be blocked in an operation which
> is allowed to be interrupted (FYI:  these are not in port, hence why it
> doesn't already have logic for this).
> 
> If you really don't want to be interrupted by an asynchronous signal during
> some operation, the only way may be to change the per-thread signal mask 
> for
> the duration of the uninterruptable code path.  At least then you would
> receive the signal at a deterministic point (the resetting of the signal
> mask) or it would be sent to another thread.
> 
> Even that seems like a heavier operation than just allowing the call to be
> interrupted and deciding if you needed to handle the interruption or just
> resume (complications involving remaining timeout aside - they may make it
> easier to mask).
> 
> Does that answer your question?
> Jeff.
> 
> On 11/21/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>>
>> Yes.  Thanks - does this mean that we can intercept and prevent "slow"
>> system calls like select() from interrupting?
>>
>> This is the key problem I'm trying to solve.  I suspect the answer is
>> "yes" somehow, since we see no behavioral problems with J9 and the
>> various socket calls in the Harmony classlib.
>>
>> geir
>>
>> Jeff Disher wrote:
>> > On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> >>
>> >>
>> >> Can you illustrate what you are talking about w/ a pointer to 
>> code?  We
>> >> have some very concerning issues re signals, and I never could grok 
>> how
>> >> J9 + classlib didn't have problems....
>> >>
>> >
>> >
>> > Without getting stuck in the specific line-by-line details of this, the
>> > idea
>> > of chaining signal handlers essentially comes down to using an internal
>> > handler registration interface so that components can register a 
>> handler
>> > with an internal master signal handler and it (ie:  the port 
>> library, or
>> > wherever you want it managed) would make sure that the user component
>> > handler gets called when the signal is triggered.  This means that the
>> > top-level signal handler - which is actually registered with the OS 
>> - is
>> > created lazily within the internal master handler component.  When the
>> > signal occurs, the master handler is invoked by the OS and it simply
>> calls
>> > all the handlers registered with it according to a defined
>> ordering.  There
>> > is also the issue of whether or not you want the chaining from one
>> handler
>> > to the next to be implicit or whether the signal handler has to decide
>> > (this
>> > may be valuable in that you could allow a signal handler to be
>> temporarily
>> > installed to wrap a dangerous operation and the handler could opt 
>> not to
>> > chain to the other handlers, thus shielding the rest of the VM from the
>> > dangerous operation without it even needing to be aware of what
>> happened).
>> >
>> > There is also the possibility to apply some symbol over-riding 
>> tricks so
>> > that foreign natives will be forced to use the mechanism in the VM even
>> > though they thought that they were calling the normal system signal or
>> > sigaction (note that things like this are pretty dangerous, though, and
>> > will
>> > limit the scope of responsibility which you can push onto the user
>> handlers
>> > - ie:  chaining requirements or unregistration, etc - so it probably
>> > isn't a
>> > good idea).
>> >
>> > I am not sure if the mechanism that we use in the VM was released as
>> > part of
>> > our Harmony contribution but it does specify a way that multiple
>> handlers
>> > can co-exist so we don't constantly have handlers from one component
>> being
>> > over-written by another.  As Angela mentioned, there is also a 
>> provision
>> to
>> > ensure that we don't over-write pre-existing external handlers.
>> >
>> >
>> > Is that the kind of information you were looking for?
>> > Jeff.
>> >
>>
> 

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Jeff Disher <jm...@gmail.com>.
No, this is a different mechanism.  Signal handler chaining and native to
port signal number mappings are both done on the thread which receives the
signal.  If that thread is the one which was in a select, then it has
already been interrupted to run the master handler.

My guess as to why you don't see these interruptions in the Harmony port
calls when running with J9 is that, for the most part, our internal signals
are synchronous so they are handled on the thread which causes them.  Even
when we use asynchronous signals, we are usually using pthread_kill to send
them to specific threads which we know to be blocked in an operation which
is allowed to be interrupted (FYI:  these are not in port, hence why it
doesn't already have logic for this).

If you really don't want to be interrupted by an asynchronous signal during
some operation, the only way may be to change the per-thread signal mask for
the duration of the uninterruptable code path.  At least then you would
receive the signal at a deterministic point (the resetting of the signal
mask) or it would be sent to another thread.

Even that seems like a heavier operation than just allowing the call to be
interrupted and deciding if you needed to handle the interruption or just
resume (complications involving remaining timeout aside - they may make it
easier to mask).

Does that answer your question?
Jeff.

On 11/21/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>
> Yes.  Thanks - does this mean that we can intercept and prevent "slow"
> system calls like select() from interrupting?
>
> This is the key problem I'm trying to solve.  I suspect the answer is
> "yes" somehow, since we see no behavioral problems with J9 and the
> various socket calls in the Harmony classlib.
>
> geir
>
> Jeff Disher wrote:
> > On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >>
> >>
> >> Can you illustrate what you are talking about w/ a pointer to code?  We
> >> have some very concerning issues re signals, and I never could grok how
> >> J9 + classlib didn't have problems....
> >>
> >
> >
> > Without getting stuck in the specific line-by-line details of this, the
> > idea
> > of chaining signal handlers essentially comes down to using an internal
> > handler registration interface so that components can register a handler
> > with an internal master signal handler and it (ie:  the port library, or
> > wherever you want it managed) would make sure that the user component
> > handler gets called when the signal is triggered.  This means that the
> > top-level signal handler - which is actually registered with the OS - is
> > created lazily within the internal master handler component.  When the
> > signal occurs, the master handler is invoked by the OS and it simply
> calls
> > all the handlers registered with it according to a defined
> ordering.  There
> > is also the issue of whether or not you want the chaining from one
> handler
> > to the next to be implicit or whether the signal handler has to decide
> > (this
> > may be valuable in that you could allow a signal handler to be
> temporarily
> > installed to wrap a dangerous operation and the handler could opt not to
> > chain to the other handlers, thus shielding the rest of the VM from the
> > dangerous operation without it even needing to be aware of what
> happened).
> >
> > There is also the possibility to apply some symbol over-riding tricks so
> > that foreign natives will be forced to use the mechanism in the VM even
> > though they thought that they were calling the normal system signal or
> > sigaction (note that things like this are pretty dangerous, though, and
> > will
> > limit the scope of responsibility which you can push onto the user
> handlers
> > - ie:  chaining requirements or unregistration, etc - so it probably
> > isn't a
> > good idea).
> >
> > I am not sure if the mechanism that we use in the VM was released as
> > part of
> > our Harmony contribution but it does specify a way that multiple
> handlers
> > can co-exist so we don't constantly have handlers from one component
> being
> > over-written by another.  As Angela mentioned, there is also a provision
> to
> > ensure that we don't over-write pre-existing external handlers.
> >
> >
> > Is that the kind of information you were looking for?
> > Jeff.
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
Yes.  Thanks - does this mean that we can intercept and prevent "slow" 
system calls like select() from interrupting?

This is the key problem I'm trying to solve.  I suspect the answer is 
"yes" somehow, since we see no behavioral problems with J9 and the 
various socket calls in the Harmony classlib.

geir

Jeff Disher wrote:
> On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>>
>>
>> Can you illustrate what you are talking about w/ a pointer to code?  We
>> have some very concerning issues re signals, and I never could grok how
>> J9 + classlib didn't have problems....
>>
> 
> 
> Without getting stuck in the specific line-by-line details of this, the 
> idea
> of chaining signal handlers essentially comes down to using an internal
> handler registration interface so that components can register a handler
> with an internal master signal handler and it (ie:  the port library, or
> wherever you want it managed) would make sure that the user component
> handler gets called when the signal is triggered.  This means that the
> top-level signal handler - which is actually registered with the OS - is
> created lazily within the internal master handler component.  When the
> signal occurs, the master handler is invoked by the OS and it simply calls
> all the handlers registered with it according to a defined ordering.  There
> is also the issue of whether or not you want the chaining from one handler
> to the next to be implicit or whether the signal handler has to decide 
> (this
> may be valuable in that you could allow a signal handler to be temporarily
> installed to wrap a dangerous operation and the handler could opt not to
> chain to the other handlers, thus shielding the rest of the VM from the
> dangerous operation without it even needing to be aware of what happened).
> 
> There is also the possibility to apply some symbol over-riding tricks so
> that foreign natives will be forced to use the mechanism in the VM even
> though they thought that they were calling the normal system signal or
> sigaction (note that things like this are pretty dangerous, though, and 
> will
> limit the scope of responsibility which you can push onto the user handlers
> - ie:  chaining requirements or unregistration, etc - so it probably 
> isn't a
> good idea).
> 
> I am not sure if the mechanism that we use in the VM was released as 
> part of
> our Harmony contribution but it does specify a way that multiple handlers
> can co-exist so we don't constantly have handlers from one component being
> over-written by another.  As Angela mentioned, there is also a provision to
> ensure that we don't over-write pre-existing external handlers.
> 
> 
> Is that the kind of information you were looking for?
> Jeff.
> 

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Jeff Disher <jm...@gmail.com>.
On 11/20/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>
>
> Can you illustrate what you are talking about w/ a pointer to code?  We
> have some very concerning issues re signals, and I never could grok how
> J9 + classlib didn't have problems....
>


Without getting stuck in the specific line-by-line details of this, the idea
of chaining signal handlers essentially comes down to using an internal
handler registration interface so that components can register a handler
with an internal master signal handler and it (ie:  the port library, or
wherever you want it managed) would make sure that the user component
handler gets called when the signal is triggered.  This means that the
top-level signal handler - which is actually registered with the OS - is
created lazily within the internal master handler component.  When the
signal occurs, the master handler is invoked by the OS and it simply calls
all the handlers registered with it according to a defined ordering.  There
is also the issue of whether or not you want the chaining from one handler
to the next to be implicit or whether the signal handler has to decide (this
may be valuable in that you could allow a signal handler to be temporarily
installed to wrap a dangerous operation and the handler could opt not to
chain to the other handlers, thus shielding the rest of the VM from the
dangerous operation without it even needing to be aware of what happened).

There is also the possibility to apply some symbol over-riding tricks so
that foreign natives will be forced to use the mechanism in the VM even
though they thought that they were calling the normal system signal or
sigaction (note that things like this are pretty dangerous, though, and will
limit the scope of responsibility which you can push onto the user handlers
- ie:  chaining requirements or unregistration, etc - so it probably isn't a
good idea).

I am not sure if the mechanism that we use in the VM was released as part of
our Harmony contribution but it does specify a way that multiple handlers
can co-exist so we don't constantly have handlers from one component being
over-written by another.  As Angela mentioned, there is also a provision to
ensure that we don't over-write pre-existing external handlers.


Is that the kind of information you were looking for?
Jeff.

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

Angela Lin wrote:

> 5. If DRLVM uses SIGUSR2 heavily, you might consider extending the
> port library to handle that for you as well. The port library chains
> signal handlers. We needed to do this to defend ourselves against apps
> that would (natively) register their own signal handlers, then create
> the JVM and still expect their signal handlers to get called. I seem
> to remember that this was mentioned in another thread?

A ha!

Please forgive a rusty old man for asking stupid questions, but lets 
talk about this.

Can you illustrate what you are talking about w/ a pointer to code?  We 
have some very concerning issues re signals, and I never could grok how 
J9 + classlib didn't have problems....

geir


Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Jeff Disher <jm...@gmail.com>.
On 11/20/06, Evgueni Brevnov <ev...@gmail.com> wrote:
>
> > 3. Yes, we had bugs in mapUnix...() and mapPortLib...() functions.
> > They had no defined return value for unexpected signals.
>
> Will you take care about it?
>


We fixed mapPortLibSignalToUnix in our code but it required changing the
interface to return a signed integer so that we could return an "unmatched"
value which immediately would be obviously recognized as being outside the
range of valid return values.

The key point here is that it was a bug and there was no special logic to
handle any of those other signal numbers outside of this mechanism.  We
hadn't historically used them so there was no mapping defined.

This bug should be fixed in Harmony, as well,
Jeff.

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Hi All,

Angela, I appreciate your comments....thanks

On 11/18/06, Angela Lin <al...@gmail.com> wrote:
> Some belated comments:
> 1. I agree that replacing the simple sem_wait() with the check for -1
> and EINTR is a good patch. The original writer of the code was working
> on a version of Linux that claimed sem_wait() would never return an
> error.

Agree. It also may be a good idea to assert that sem_wait doesn't
return EINVAL and EDEADLK.

>
> 2. I would additionally suggest that it might be a good idea to mask
> all async signals in the asynchSignalReporter thread.
>
> 3. Yes, we had bugs in mapUnix...() and mapPortLib...() functions.
> They had no defined return value for unexpected signals.

Will you take care about it?

>
> 4. To the best of my knowledge, the classlib doesn't intentionally use
> SIGUSR2 for anything. I don't have the code in front of me now, but if
> you grep for where masterASynchSignalHandler() is registered, you
> should see which signals get caught.

Ok, thanks.

>
> 5. If DRLVM uses SIGUSR2 heavily, you might consider extending the
> port library to handle that for you as well. The port library chains
> signal handlers. We needed to do this to defend ourselves against apps
> that would (natively) register their own signal handlers, then create
> the JVM and still expect their signal handlers to get called. I seem
> to remember that this was mentioned in another thread?

That's interesting. It seems DRLVM should be more friendly to user's
handlers :-). Does it make sense to put this task into TODO list... or
somewhere else?

Thanks
Evgueni

>
> Regards,
> Angela
>
> On 11/17/06, Gregory Shimansky <gs...@gmail.com> wrote:
> > Evgueni Brevnov wrote:
> > > In other words we will observe the crash as we do now if sem_wait
> > > completes unsuccessfully for whatever reason...
> >
> > Well it shouldn't return an error except for signal, shouldn't it? Two
> > possible other errors are EINVAL and EDEADLK which should never happen.
> >
> > Maybe we should add an assertion after it that sem_wait was successful
> > to catch this situation quickly, and it will be a good starting point
> > for investigation.
> >
> > > On 11/17/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> > >> Gregory,
> > >>
> > >> The code which goes after sem_wait doesn't work properly if sem_wait
> > >> returns with an error code. So we need to either loop until sem_wait
> > >> returns successfully or adjust the code after sem_wait to handle
> > >> irregular cases.
> > >>
> > >> Thanks
> > >> Evgueni
> > >>
> > >> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > >> > Yes - that's why I was poking him to see the patch.  I was going to
> > >> > suggest something very similar.
> > >> >
> > >> > geir
> > >> >
> > >> >
> > >> > Gregory Shimansky wrote:
> > >> > > Evgueni Brevnov wrote:
> > >> > >> You can look at the change here
> > >> > >> http://issues.apache.org/jira/browse/HARMONY-2203
> > >> > >
> > >> > > Could someone who knowns classlib native code internals better
> > >> than me
> > >> > > comment on this JIRA? I've added my comment from the general POV.
> > >> > >
> > >> > > I would change the loop to detect only signal interruption like
> > >> > >
> > >> > > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> > >> > >
> > >> > > Other than that I agree with the patch. I someone does not know,
> > >> every
> > >> > > step in gdb also interrupts sem_wait calls, so such loops are a
> > >> common
> > >> > > practice when using semaphores.
> > >> > >
> > >> > > If someone knows classlib internal logic with this asynchronous
> > >> handlers
> > >> > > stuff please write your opinion.
> > >> > >
> > >> >
> > >>
> > >
> >
> >
> > --
> > Gregory
> >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Angela Lin <al...@gmail.com>.
Some belated comments:
1. I agree that replacing the simple sem_wait() with the check for -1
and EINTR is a good patch. The original writer of the code was working
on a version of Linux that claimed sem_wait() would never return an
error.

2. I would additionally suggest that it might be a good idea to mask
all async signals in the asynchSignalReporter thread.

3. Yes, we had bugs in mapUnix...() and mapPortLib...() functions.
They had no defined return value for unexpected signals.

4. To the best of my knowledge, the classlib doesn't intentionally use
SIGUSR2 for anything. I don't have the code in front of me now, but if
you grep for where masterASynchSignalHandler() is registered, you
should see which signals get caught.

5. If DRLVM uses SIGUSR2 heavily, you might consider extending the
port library to handle that for you as well. The port library chains
signal handlers. We needed to do this to defend ourselves against apps
that would (natively) register their own signal handlers, then create
the JVM and still expect their signal handlers to get called. I seem
to remember that this was mentioned in another thread?

Regards,
Angela

On 11/17/06, Gregory Shimansky <gs...@gmail.com> wrote:
> Evgueni Brevnov wrote:
> > In other words we will observe the crash as we do now if sem_wait
> > completes unsuccessfully for whatever reason...
>
> Well it shouldn't return an error except for signal, shouldn't it? Two
> possible other errors are EINVAL and EDEADLK which should never happen.
>
> Maybe we should add an assertion after it that sem_wait was successful
> to catch this situation quickly, and it will be a good starting point
> for investigation.
>
> > On 11/17/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> >> Gregory,
> >>
> >> The code which goes after sem_wait doesn't work properly if sem_wait
> >> returns with an error code. So we need to either loop until sem_wait
> >> returns successfully or adjust the code after sem_wait to handle
> >> irregular cases.
> >>
> >> Thanks
> >> Evgueni
> >>
> >> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >> > Yes - that's why I was poking him to see the patch.  I was going to
> >> > suggest something very similar.
> >> >
> >> > geir
> >> >
> >> >
> >> > Gregory Shimansky wrote:
> >> > > Evgueni Brevnov wrote:
> >> > >> You can look at the change here
> >> > >> http://issues.apache.org/jira/browse/HARMONY-2203
> >> > >
> >> > > Could someone who knowns classlib native code internals better
> >> than me
> >> > > comment on this JIRA? I've added my comment from the general POV.
> >> > >
> >> > > I would change the loop to detect only signal interruption like
> >> > >
> >> > > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> >> > >
> >> > > Other than that I agree with the patch. I someone does not know,
> >> every
> >> > > step in gdb also interrupts sem_wait calls, so such loops are a
> >> common
> >> > > practice when using semaphores.
> >> > >
> >> > > If someone knows classlib internal logic with this asynchronous
> >> handlers
> >> > > stuff please write your opinion.
> >> > >
> >> >
> >>
> >
>
>
> --
> Gregory
>
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Gregory Shimansky <gs...@gmail.com>.
Evgueni Brevnov wrote:
> In other words we will observe the crash as we do now if sem_wait
> completes unsuccessfully for whatever reason...

Well it shouldn't return an error except for signal, shouldn't it? Two 
possible other errors are EINVAL and EDEADLK which should never happen.

Maybe we should add an assertion after it that sem_wait was successful 
to catch this situation quickly, and it will be a good starting point 
for investigation.

> On 11/17/06, Evgueni Brevnov <ev...@gmail.com> wrote:
>> Gregory,
>>
>> The code which goes after sem_wait doesn't work properly if sem_wait
>> returns with an error code. So we need to either loop until sem_wait
>> returns successfully or adjust the code after sem_wait to handle
>> irregular cases.
>>
>> Thanks
>> Evgueni
>>
>> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> > Yes - that's why I was poking him to see the patch.  I was going to
>> > suggest something very similar.
>> >
>> > geir
>> >
>> >
>> > Gregory Shimansky wrote:
>> > > Evgueni Brevnov wrote:
>> > >> You can look at the change here
>> > >> http://issues.apache.org/jira/browse/HARMONY-2203
>> > >
>> > > Could someone who knowns classlib native code internals better 
>> than me
>> > > comment on this JIRA? I've added my comment from the general POV.
>> > >
>> > > I would change the loop to detect only signal interruption like
>> > >
>> > > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
>> > >
>> > > Other than that I agree with the patch. I someone does not know, 
>> every
>> > > step in gdb also interrupts sem_wait calls, so such loops are a 
>> common
>> > > practice when using semaphores.
>> > >
>> > > If someone knows classlib internal logic with this asynchronous 
>> handlers
>> > > stuff please write your opinion.
>> > >
>> >
>>
> 


-- 
Gregory


Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
In other words we will observe the crash as we do now if sem_wait
completes unsuccessfully for whatever reason...

On 11/17/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> Gregory,
>
> The code which goes after sem_wait doesn't work properly if sem_wait
> returns with an error code. So we need to either loop until sem_wait
> returns successfully or adjust the code after sem_wait to handle
> irregular cases.
>
> Thanks
> Evgueni
>
> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > Yes - that's why I was poking him to see the patch.  I was going to
> > suggest something very similar.
> >
> > geir
> >
> >
> > Gregory Shimansky wrote:
> > > Evgueni Brevnov wrote:
> > >> You can look at the change here
> > >> http://issues.apache.org/jira/browse/HARMONY-2203
> > >
> > > Could someone who knowns classlib native code internals better than me
> > > comment on this JIRA? I've added my comment from the general POV.
> > >
> > > I would change the loop to detect only signal interruption like
> > >
> > > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> > >
> > > Other than that I agree with the patch. I someone does not know, every
> > > step in gdb also interrupts sem_wait calls, so such loops are a common
> > > practice when using semaphores.
> > >
> > > If someone knows classlib internal logic with this asynchronous handlers
> > > stuff please write your opinion.
> > >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Gregory,

The code which goes after sem_wait doesn't work properly if sem_wait
returns with an error code. So we need to either loop until sem_wait
returns successfully or adjust the code after sem_wait to handle
irregular cases.

Thanks
Evgueni

On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> Yes - that's why I was poking him to see the patch.  I was going to
> suggest something very similar.
>
> geir
>
>
> Gregory Shimansky wrote:
> > Evgueni Brevnov wrote:
> >> You can look at the change here
> >> http://issues.apache.org/jira/browse/HARMONY-2203
> >
> > Could someone who knowns classlib native code internals better than me
> > comment on this JIRA? I've added my comment from the general POV.
> >
> > I would change the loop to detect only signal interruption like
> >
> > while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> >
> > Other than that I agree with the patch. I someone does not know, every
> > step in gdb also interrupts sem_wait calls, so such loops are a common
> > practice when using semaphores.
> >
> > If someone knows classlib internal logic with this asynchronous handlers
> > stuff please write your opinion.
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
Yes - that's why I was poking him to see the patch.  I was going to 
suggest something very similar.

geir


Gregory Shimansky wrote:
> Evgueni Brevnov wrote:
>> You can look at the change here
>> http://issues.apache.org/jira/browse/HARMONY-2203
> 
> Could someone who knowns classlib native code internals better than me 
> comment on this JIRA? I've added my comment from the general POV.
> 
> I would change the loop to detect only signal interruption like
> 
> while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);
> 
> Other than that I agree with the patch. I someone does not know, every 
> step in gdb also interrupts sem_wait calls, so such loops are a common 
> practice when using semaphores.
> 
> If someone knows classlib internal logic with this asynchronous handlers 
> stuff please write your opinion.
> 

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Gregory Shimansky <gs...@gmail.com>.
Evgueni Brevnov wrote:
> You can look at the change here
> http://issues.apache.org/jira/browse/HARMONY-2203

Could someone who knowns classlib native code internals better than me 
comment on this JIRA? I've added my comment from the general POV.

I would change the loop to detect only signal interruption like

while (sem_wait(&wakeUpASynchReporter) == -1 && errno == EINTR);

Other than that I agree with the patch. I someone does not know, every 
step in gdb also interrupts sem_wait calls, so such loops are a common 
practice when using semaphores.

If someone knows classlib internal logic with this asynchronous handlers 
stuff please write your opinion.

-- 
Gregory


Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
You can look at the change here
http://issues.apache.org/jira/browse/HARMONY-2203

On 11/16/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> I haven't published it yet...will file a JIRA soon...
>
> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > ah. whew.
> >
> > can you point me to that change you made?
> >
> > geir
> >
> > Evgueni Brevnov wrote:
> > > I'm not aware if classlib uses SIGUSR2. In this particular case
> > > classlib (to be more precise it is the portlib module) does sem_wait
> > > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> > > with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> > >
> > > Evgueni
> > >
> > > On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > >> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
> > >>
> > >> Evgueni Brevnov wrote:
> > >> > Hey,
> > >> >
> > >> > Seems like the pretty old problem shows itself again. I'm talking
> > >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
> > >> > uses system semaphores for synchronization purposes...and hysem_wait
> > >> > is interrupted by the signal:
> > >> >
> > >> > (gdb) p perror("sym_wait error:")
> > >> > sym_wait error:: Interrupted system call
> > >> >
> > >> > Do we have good (universal) solution for such cases?
> > >> >
> > >> > Thanks
> > >> > Evgueni
> > >> >
> > >> > On 11/15/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> > >> >>
> > >> >>
> > >> >> Gregory Shimansky wrote:
> > >> >> > Evgueni Brevnov wrote:
> > >> >> >> hmmm.... strange. The patch was tested on multi-processor system
> > >> >> >> running SUSE9. I will check if the patch misses something.
> > >> Anyway, we
> > >> >> >> need to wait with the patch submission until we 100% sure how
> > >> >> >> hythread_monitor_init should behave.
> > >> >> >>
> > >> >> >> Thanks
> > >> >> >> Evgueni
> > >> >> >>
> > >> >> >> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
> > >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> > >> >> >>> > Hi,
> > >> >> >>> >
> > >> >> >>> > While investigating deadlock scenario which is described in
> > >> >> >>> > HARMONY-2006 I found out one interesting thing. It turned out
> > >> >> that DRL
> > >> >> >>> > implementation of hythread_monitor_init /
> > >> >> >>> > hythread_monitor_init_with_name initializes and acquires a
> > >> monitor.
> > >> >> >>> > Original spec reads: "Acquire and initialize a new monitor
> > >> from the
> > >> >> >>> > threading library...." AFAIU that doesn't mean to lock the
> > >> >> monitor but
> > >> >> >>> > get it from the threading library. So the hythread_monitor_init
> > >> >> should
> > >> >> >>> > not lock the monitor.
> > >> >> >>> >
> > >> >> >>> > Could somebody comment on that?
> > >> >> >>>
> > >> >> >>> It might be that semantic is different on different platforms
> > >> >> which is
> > >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly
> > >> all of
> > >> >> >>> acceptance tests on Linux while everything on Windows works (ok I
> > >> >> >>> tested on
> > >> >> >>> laptop with 1 processor while Linux was a HT server, sometimes
> > >> it is
> > >> >> >>> important for threading).
> > >> >> >
> > >> >> > I've tried to investigate the problem but didn't find the end of it
> > >> >> yet.
> > >> >> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
> > >> >> > distribution buggy and move on?</joke>).
> > >> >>
> > >> >> There is something odd about it, I'll admit...  Remember the EOMEM
> > >> bugs
> > >> >> I found in forking?
> > >> >>
> > >> >>
> > >> >> I didn't reproduce it on
> > >> >> > gentoo, all tests work just fine.
> > >> >> >
> > >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> > >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
> > >> >> stress.WeakHashMapTest VM
> > >> >> > segfaults. The stack looks like an infinite recursion of 4 stack
> > >> >> frames:
> > >> >> >
> > >> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> > >> >> > info=0xb71a503c, context=0xb71a50bc) at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:443
> > >> >> > #1  <signal handler called>
> > >> >> > #2  0xb6dcc20a in get_stack_addr () at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:293
> > >> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c,
> > >> uc=0xb71a54ec)
> > >> >> >     at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:399
> > >> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> > >> >> > info=0xb71a546c, context=0xb71a54ec) at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > >> >> > re/src/util/linux/signals_ia32.cpp:451
> > >> >> >
> > >> >> > and so on. The stack is very long. When I run VM with
> > >> -Xtrace:signals I
> > >> >> > get a very long log of messages that "NPE or SOE detected at
> > >> ...". The
> > >> >> > first time address always varies, but it appears to be memcpy.
> > >> The next
> > >> >> > addresses are always the same, they point to get_stack_addr
> > >> function.
> > >> >> >
> > >> >> > So I tried to find out why memcpy crashes in the first place. It
> > >> >> appears
> > >> >> > to be a struct copy called from jsig_handler hysig. The stack looks
> > >> >> like
> > >> >> > this (if I can trust gdb on ubuntu):
> > >> >> >
> > >> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> > >> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0,
> > >> uc=0x0)
> > >> >> >  at hysigunix.c:169
> > >> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at
> > >> hysignal.c:971
> > >> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8,
> > >> p_args=0x807a8d8)
> > >> >> >     at
> > >> >> >
> > >> >>
> > >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> > >>
> > >> >>
> > >> >> >
> > >> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
> > >> >> threadproc/unix/thread.c:138
> > >> >> > #5  0xb7b65341 in start_thread () from
> > >> >> lib/tls/i686/cmov/libpthread.so.0
> > >> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> > >> >> >
> > >> >> > In jsig_handler a struct of type sigaction is copied
> > >> >> >
> > >> >> > act = saved_sigaction[sig];
> > >> >> >
> > >> >> > and gcc replaces this statement with a call to memcpy it seems.
> > >> But the
> > >> >> > parameter sig is quite weird if you look at it. It is
> > >> >> sig=-1215196204...
> > >> >> > Now if I could only find where and this sig happened there... I
> > >> cannot
> > >> >> > find it in the depth of classlib native code this late at night.
> > >> >> >
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
I haven't published it yet...will file a JIRA soon...

On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> ah. whew.
>
> can you point me to that change you made?
>
> geir
>
> Evgueni Brevnov wrote:
> > I'm not aware if classlib uses SIGUSR2. In this particular case
> > classlib (to be more precise it is the portlib module) does sem_wait
> > which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> > with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> >
> > Evgueni
> >
> > On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
> >>
> >> Evgueni Brevnov wrote:
> >> > Hey,
> >> >
> >> > Seems like the pretty old problem shows itself again. I'm talking
> >> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
> >> > uses system semaphores for synchronization purposes...and hysem_wait
> >> > is interrupted by the signal:
> >> >
> >> > (gdb) p perror("sym_wait error:")
> >> > sym_wait error:: Interrupted system call
> >> >
> >> > Do we have good (universal) solution for such cases?
> >> >
> >> > Thanks
> >> > Evgueni
> >> >
> >> > On 11/15/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
> >> >>
> >> >>
> >> >> Gregory Shimansky wrote:
> >> >> > Evgueni Brevnov wrote:
> >> >> >> hmmm.... strange. The patch was tested on multi-processor system
> >> >> >> running SUSE9. I will check if the patch misses something.
> >> Anyway, we
> >> >> >> need to wait with the patch submission until we 100% sure how
> >> >> >> hythread_monitor_init should behave.
> >> >> >>
> >> >> >> Thanks
> >> >> >> Evgueni
> >> >> >>
> >> >> >> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
> >> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> >> >> >>> > Hi,
> >> >> >>> >
> >> >> >>> > While investigating deadlock scenario which is described in
> >> >> >>> > HARMONY-2006 I found out one interesting thing. It turned out
> >> >> that DRL
> >> >> >>> > implementation of hythread_monitor_init /
> >> >> >>> > hythread_monitor_init_with_name initializes and acquires a
> >> monitor.
> >> >> >>> > Original spec reads: "Acquire and initialize a new monitor
> >> from the
> >> >> >>> > threading library...." AFAIU that doesn't mean to lock the
> >> >> monitor but
> >> >> >>> > get it from the threading library. So the hythread_monitor_init
> >> >> should
> >> >> >>> > not lock the monitor.
> >> >> >>> >
> >> >> >>> > Could somebody comment on that?
> >> >> >>>
> >> >> >>> It might be that semantic is different on different platforms
> >> >> which is
> >> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly
> >> all of
> >> >> >>> acceptance tests on Linux while everything on Windows works (ok I
> >> >> >>> tested on
> >> >> >>> laptop with 1 processor while Linux was a HT server, sometimes
> >> it is
> >> >> >>> important for threading).
> >> >> >
> >> >> > I've tried to investigate the problem but didn't find the end of it
> >> >> yet.
> >> >> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
> >> >> > distribution buggy and move on?</joke>).
> >> >>
> >> >> There is something odd about it, I'll admit...  Remember the EOMEM
> >> bugs
> >> >> I found in forking?
> >> >>
> >> >>
> >> >> I didn't reproduce it on
> >> >> > gentoo, all tests work just fine.
> >> >> >
> >> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> >> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
> >> >> stress.WeakHashMapTest VM
> >> >> > segfaults. The stack looks like an infinite recursion of 4 stack
> >> >> frames:
> >> >> >
> >> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> >> >> > info=0xb71a503c, context=0xb71a50bc) at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:443
> >> >> > #1  <signal handler called>
> >> >> > #2  0xb6dcc20a in get_stack_addr () at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:293
> >> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c,
> >> uc=0xb71a54ec)
> >> >> >     at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:399
> >> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> >> >> > info=0xb71a546c, context=0xb71a54ec) at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> >> >> > re/src/util/linux/signals_ia32.cpp:451
> >> >> >
> >> >> > and so on. The stack is very long. When I run VM with
> >> -Xtrace:signals I
> >> >> > get a very long log of messages that "NPE or SOE detected at
> >> ...". The
> >> >> > first time address always varies, but it appears to be memcpy.
> >> The next
> >> >> > addresses are always the same, they point to get_stack_addr
> >> function.
> >> >> >
> >> >> > So I tried to find out why memcpy crashes in the first place. It
> >> >> appears
> >> >> > to be a struct copy called from jsig_handler hysig. The stack looks
> >> >> like
> >> >> > this (if I can trust gdb on ubuntu):
> >> >> >
> >> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> >> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0,
> >> uc=0x0)
> >> >> >  at hysigunix.c:169
> >> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at
> >> hysignal.c:971
> >> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8,
> >> p_args=0x807a8d8)
> >> >> >     at
> >> >> >
> >> >>
> >> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> >>
> >> >>
> >> >> >
> >> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
> >> >> threadproc/unix/thread.c:138
> >> >> > #5  0xb7b65341 in start_thread () from
> >> >> lib/tls/i686/cmov/libpthread.so.0
> >> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> >> >> >
> >> >> > In jsig_handler a struct of type sigaction is copied
> >> >> >
> >> >> > act = saved_sigaction[sig];
> >> >> >
> >> >> > and gcc replaces this statement with a call to memcpy it seems.
> >> But the
> >> >> > parameter sig is quite weird if you look at it. It is
> >> >> sig=-1215196204...
> >> >> > Now if I could only find where and this sig happened there... I
> >> cannot
> >> >> > find it in the depth of classlib native code this late at night.
> >> >> >
> >> >>
> >> >>
> >> >
> >>
> >
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
ah. whew.

can you point me to that change you made?

geir

Evgueni Brevnov wrote:
> I'm not aware if classlib uses SIGUSR2. In this particular case
> classlib (to be more precise it is the portlib module) does sem_wait
> which is interrupted by TM's SIGUSR2 signal. I replaced "hysem_wait"
> with "while (hysem_wait() != 0) {}". It helped to pass all tests.
> 
> Evgueni
> 
> On 11/16/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?
>>
>> Evgueni Brevnov wrote:
>> > Hey,
>> >
>> > Seems like the pretty old problem shows itself again. I'm talking
>> > about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
>> > uses system semaphores for synchronization purposes...and hysem_wait
>> > is interrupted by the signal:
>> >
>> > (gdb) p perror("sym_wait error:")
>> > sym_wait error:: Interrupted system call
>> >
>> > Do we have good (universal) solution for such cases?
>> >
>> > Thanks
>> > Evgueni
>> >
>> > On 11/15/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>> >>
>> >>
>> >> Gregory Shimansky wrote:
>> >> > Evgueni Brevnov wrote:
>> >> >> hmmm.... strange. The patch was tested on multi-processor system
>> >> >> running SUSE9. I will check if the patch misses something. 
>> Anyway, we
>> >> >> need to wait with the patch submission until we 100% sure how
>> >> >> hythread_monitor_init should behave.
>> >> >>
>> >> >> Thanks
>> >> >> Evgueni
>> >> >>
>> >> >> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
>> >> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>> >> >>> > Hi,
>> >> >>> >
>> >> >>> > While investigating deadlock scenario which is described in
>> >> >>> > HARMONY-2006 I found out one interesting thing. It turned out
>> >> that DRL
>> >> >>> > implementation of hythread_monitor_init /
>> >> >>> > hythread_monitor_init_with_name initializes and acquires a 
>> monitor.
>> >> >>> > Original spec reads: "Acquire and initialize a new monitor 
>> from the
>> >> >>> > threading library...." AFAIU that doesn't mean to lock the
>> >> monitor but
>> >> >>> > get it from the threading library. So the hythread_monitor_init
>> >> should
>> >> >>> > not lock the monitor.
>> >> >>> >
>> >> >>> > Could somebody comment on that?
>> >> >>>
>> >> >>> It might be that semantic is different on different platforms
>> >> which is
>> >> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly 
>> all of
>> >> >>> acceptance tests on Linux while everything on Windows works (ok I
>> >> >>> tested on
>> >> >>> laptop with 1 processor while Linux was a HT server, sometimes 
>> it is
>> >> >>> important for threading).
>> >> >
>> >> > I've tried to investigate the problem but didn't find the end of it
>> >> yet.
>> >> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
>> >> > distribution buggy and move on?</joke>).
>> >>
>> >> There is something odd about it, I'll admit...  Remember the EOMEM 
>> bugs
>> >> I found in forking?
>> >>
>> >>
>> >> I didn't reproduce it on
>> >> > gentoo, all tests work just fine.
>> >> >
>> >> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
>> >> > gc.PhantomReferenceTest, gc.WeakReferenceTest,
>> >> stress.WeakHashMapTest VM
>> >> > segfaults. The stack looks like an infinite recursion of 4 stack
>> >> frames:
>> >> >
>> >> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
>> >> > info=0xb71a503c, context=0xb71a50bc) at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:443
>> >> > #1  <signal handler called>
>> >> > #2  0xb6dcc20a in get_stack_addr () at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:293
>> >> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, 
>> uc=0xb71a54ec)
>> >> >     at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:399
>> >> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
>> >> > info=0xb71a546c, context=0xb71a54ec) at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> >> > re/src/util/linux/signals_ia32.cpp:451
>> >> >
>> >> > and so on. The stack is very long. When I run VM with 
>> -Xtrace:signals I
>> >> > get a very long log of messages that "NPE or SOE detected at 
>> ...". The
>> >> > first time address always varies, but it appears to be memcpy. 
>> The next
>> >> > addresses are always the same, they point to get_stack_addr 
>> function.
>> >> >
>> >> > So I tried to find out why memcpy crashes in the first place. It
>> >> appears
>> >> > to be a struct copy called from jsig_handler hysig. The stack looks
>> >> like
>> >> > this (if I can trust gdb on ubuntu):
>> >> >
>> >> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
>> >> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, 
>> uc=0x0)
>> >> >  at hysigunix.c:169
>> >> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at 
>> hysignal.c:971
>> >> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, 
>> p_args=0x807a8d8)
>> >> >     at
>> >> >
>> >> 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712 
>>
>> >>
>> >> >
>> >> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at
>> >> threadproc/unix/thread.c:138
>> >> > #5  0xb7b65341 in start_thread () from
>> >> lib/tls/i686/cmov/libpthread.so.0
>> >> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
>> >> >
>> >> > In jsig_handler a struct of type sigaction is copied
>> >> >
>> >> > act = saved_sigaction[sig];
>> >> >
>> >> > and gcc replaces this statement with a call to memcpy it seems. 
>> But the
>> >> > parameter sig is quite weird if you look at it. It is
>> >> sig=-1215196204...
>> >> > Now if I could only find where and this sig happened there... I 
>> cannot
>> >> > find it in the depth of classlib native code this late at night.
>> >> >
>> >>
>> >>
>> >
>>
> 

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
um... classlib uses SIGUSR2 as well?  Doesn't our thread manager use it?

Evgueni Brevnov wrote:
> Hey,
> 
> Seems like the pretty old problem shows itself again. I'm talking
> about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
> uses system semaphores for synchronization purposes...and hysem_wait
> is interrupted by the signal:
> 
> (gdb) p perror("sym_wait error:")
> sym_wait error:: Interrupted system call
> 
> Do we have good (universal) solution for such cases?
> 
> Thanks
> Evgueni
> 
> On 11/15/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>>
>>
>> Gregory Shimansky wrote:
>> > Evgueni Brevnov wrote:
>> >> hmmm.... strange. The patch was tested on multi-processor system
>> >> running SUSE9. I will check if the patch misses something. Anyway, we
>> >> need to wait with the patch submission until we 100% sure how
>> >> hythread_monitor_init should behave.
>> >>
>> >> Thanks
>> >> Evgueni
>> >>
>> >> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
>> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>> >>> > Hi,
>> >>> >
>> >>> > While investigating deadlock scenario which is described in
>> >>> > HARMONY-2006 I found out one interesting thing. It turned out 
>> that DRL
>> >>> > implementation of hythread_monitor_init /
>> >>> > hythread_monitor_init_with_name initializes and acquires a monitor.
>> >>> > Original spec reads: "Acquire and initialize a new monitor from the
>> >>> > threading library...." AFAIU that doesn't mean to lock the 
>> monitor but
>> >>> > get it from the threading library. So the hythread_monitor_init 
>> should
>> >>> > not lock the monitor.
>> >>> >
>> >>> > Could somebody comment on that?
>> >>>
>> >>> It might be that semantic is different on different platforms 
>> which is
>> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
>> >>> acceptance tests on Linux while everything on Windows works (ok I
>> >>> tested on
>> >>> laptop with 1 processor while Linux was a HT server, sometimes it is
>> >>> important for threading).
>> >
>> > I've tried to investigate the problem but didn't find the end of it 
>> yet.
>> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
>> > distribution buggy and move on?</joke>).
>>
>> There is something odd about it, I'll admit...  Remember the EOMEM bugs
>> I found in forking?
>>
>>
>> I didn't reproduce it on
>> > gentoo, all tests work just fine.
>> >
>> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
>> > gc.PhantomReferenceTest, gc.WeakReferenceTest, 
>> stress.WeakHashMapTest VM
>> > segfaults. The stack looks like an infinite recursion of 4 stack 
>> frames:
>> >
>> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
>> > info=0xb71a503c, context=0xb71a50bc) at
>> > 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> > re/src/util/linux/signals_ia32.cpp:443
>> > #1  <signal handler called>
>> > #2  0xb6dcc20a in get_stack_addr () at
>> > 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> > re/src/util/linux/signals_ia32.cpp:293
>> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
>> >     at
>> > 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> > re/src/util/linux/signals_ia32.cpp:399
>> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
>> > info=0xb71a546c, context=0xb71a54ec) at
>> > 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
>> > re/src/util/linux/signals_ia32.cpp:451
>> >
>> > and so on. The stack is very long. When I run VM with -Xtrace:signals I
>> > get a very long log of messages that "NPE or SOE detected at ...". The
>> > first time address always varies, but it appears to be memcpy. The next
>> > addresses are always the same, they point to get_stack_addr function.
>> >
>> > So I tried to find out why memcpy crashes in the first place. It 
>> appears
>> > to be a struct copy called from jsig_handler hysig. The stack looks 
>> like
>> > this (if I can trust gdb on ubuntu):
>> >
>> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
>> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0)
>> >  at hysigunix.c:169
>> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
>> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
>> >     at
>> > 
>> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712 
>>
>> >
>> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at 
>> threadproc/unix/thread.c:138
>> > #5  0xb7b65341 in start_thread () from 
>> lib/tls/i686/cmov/libpthread.so.0
>> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
>> >
>> > In jsig_handler a struct of type sigaction is copied
>> >
>> > act = saved_sigaction[sig];
>> >
>> > and gcc replaces this statement with a call to memcpy it seems. But the
>> > parameter sig is quite weird if you look at it. It is 
>> sig=-1215196204...
>> > Now if I could only find where and this sig happened there... I cannot
>> > find it in the depth of classlib native code this late at night.
>> >
>>
>>
> 

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
Hey,

Seems like the pretty old problem shows itself again. I'm talking
about SIGUSR2 signal :-(...Classlib's asynchronous signal reporter
uses system semaphores for synchronization purposes...and hysem_wait
is interrupted by the signal:

(gdb) p perror("sym_wait error:")
sym_wait error:: Interrupted system call

Do we have good (universal) solution for such cases?

Thanks
Evgueni

On 11/15/06, Geir Magnusson Jr. <ge...@pobox.com> wrote:
>
>
> Gregory Shimansky wrote:
> > Evgueni Brevnov wrote:
> >> hmmm.... strange. The patch was tested on multi-processor system
> >> running SUSE9. I will check if the patch misses something. Anyway, we
> >> need to wait with the patch submission until we 100% sure how
> >> hythread_monitor_init should behave.
> >>
> >> Thanks
> >> Evgueni
> >>
> >> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
> >>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> >>> > Hi,
> >>> >
> >>> > While investigating deadlock scenario which is described in
> >>> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> >>> > implementation of hythread_monitor_init /
> >>> > hythread_monitor_init_with_name initializes and acquires a monitor.
> >>> > Original spec reads: "Acquire and initialize a new monitor from the
> >>> > threading library...." AFAIU that doesn't mean to lock the monitor but
> >>> > get it from the threading library. So the hythread_monitor_init should
> >>> > not lock the monitor.
> >>> >
> >>> > Could somebody comment on that?
> >>>
> >>> It might be that semantic is different on different platforms which is
> >>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
> >>> acceptance tests on Linux while everything on Windows works (ok I
> >>> tested on
> >>> laptop with 1 processor while Linux was a HT server, sometimes it is
> >>> important for threading).
> >
> > I've tried to investigate the problem but didn't find the end of it yet.
> > The bug seems to be ubuntu specific (<joke>shall we maybe call this
> > distribution buggy and move on?</joke>).
>
> There is something odd about it, I'll admit...  Remember the EOMEM bugs
> I found in forking?
>
>
> I didn't reproduce it on
> > gentoo, all tests work just fine.
> >
> > The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE,
> > gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM
> > segfaults. The stack looks like an infinite recursion of 4 stack frames:
> >
> > #0  0xb6dcb814 in null_java_reference_handler (signum=11,
> > info=0xb71a503c, context=0xb71a50bc) at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:443
> > #1  <signal handler called>
> > #2  0xb6dcc20a in get_stack_addr () at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:293
> > #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
> >     at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:399
> > #4  0xb6dcb900 in null_java_reference_handler (signum=11,
> > info=0xb71a546c, context=0xb71a54ec) at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> > re/src/util/linux/signals_ia32.cpp:451
> >
> > and so on. The stack is very long. When I run VM with -Xtrace:signals I
> > get a very long log of messages that "NPE or SOE detected at ...". The
> > first time address always varies, but it appears to be memcpy. The next
> > addresses are always the same, they point to get_stack_addr function.
> >
> > So I tried to find out why memcpy crashes in the first place. It appears
> > to be a struct copy called from jsig_handler hysig. The stack looks like
> > this (if I can trust gdb on ubuntu):
> >
> > #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> > #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0)
> >  at hysigunix.c:169
> > #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
> > #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
> >     at
> > /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
> >
> > #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
> > #5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
> > #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> >
> > In jsig_handler a struct of type sigaction is copied
> >
> > act = saved_sigaction[sig];
> >
> > and gcc replaces this statement with a call to memcpy it seems. But the
> > parameter sig is quite weird if you look at it. It is sig=-1215196204...
> > Now if I could only find where and this sig happened there... I cannot
> > find it in the depth of classlib native code this late at night.
> >
>
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

Gregory Shimansky wrote:
> Evgueni Brevnov wrote:
>> hmmm.... strange. The patch was tested on multi-processor system
>> running SUSE9. I will check if the patch misses something. Anyway, we
>> need to wait with the patch submission until we 100% sure how
>> hythread_monitor_init should behave.
>>
>> Thanks
>> Evgueni
>>
>> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
>>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>>> > Hi,
>>> >
>>> > While investigating deadlock scenario which is described in
>>> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
>>> > implementation of hythread_monitor_init /
>>> > hythread_monitor_init_with_name initializes and acquires a monitor.
>>> > Original spec reads: "Acquire and initialize a new monitor from the
>>> > threading library...." AFAIU that doesn't mean to lock the monitor but
>>> > get it from the threading library. So the hythread_monitor_init should
>>> > not lock the monitor.
>>> >
>>> > Could somebody comment on that?
>>>
>>> It might be that semantic is different on different platforms which is
>>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
>>> acceptance tests on Linux while everything on Windows works (ok I 
>>> tested on
>>> laptop with 1 processor while Linux was a HT server, sometimes it is
>>> important for threading).
> 
> I've tried to investigate the problem but didn't find the end of it yet. 
> The bug seems to be ubuntu specific (<joke>shall we maybe call this 
> distribution buggy and move on?</joke>). 

There is something odd about it, I'll admit...  Remember the EOMEM bugs 
I found in forking?


I didn't reproduce it on
> gentoo, all tests work just fine.
> 
> The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, 
> gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM 
> segfaults. The stack looks like an infinite recursion of 4 stack frames:
> 
> #0  0xb6dcb814 in null_java_reference_handler (signum=11, 
> info=0xb71a503c, context=0xb71a50bc) at 
> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:443
> #1  <signal handler called>
> #2  0xb6dcc20a in get_stack_addr () at 
> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:293
> #3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
>     at 
> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:399
> #4  0xb6dcb900 in null_java_reference_handler (signum=11, 
> info=0xb71a546c, context=0xb71a54ec) at 
> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
> re/src/util/linux/signals_ia32.cpp:451
> 
> and so on. The stack is very long. When I run VM with -Xtrace:signals I 
> get a very long log of messages that "NPE or SOE detected at ...". The 
> first time address always varies, but it appears to be memcpy. The next 
> addresses are always the same, they point to get_stack_addr function.
> 
> So I tried to find out why memcpy crashes in the first place. It appears 
> to be a struct copy called from jsig_handler hysig. The stack looks like 
> this (if I can trust gdb on ubuntu):
> 
> #0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
> #1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0) 
>  at hysigunix.c:169
> #2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
> #3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
>     at 
> /nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712 
> 
> #4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
> #5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
> #6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6
> 
> In jsig_handler a struct of type sigaction is copied
> 
> act = saved_sigaction[sig];
> 
> and gcc replaces this statement with a call to memcpy it seems. But the 
> parameter sig is quite weird if you look at it. It is sig=-1215196204... 
> Now if I could only find where and this sig happened there... I cannot 
> find it in the depth of classlib native code this late at night.
> 


Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Gregory Shimansky <gs...@gmail.com>.
Evgueni Brevnov wrote:
> hmmm.... strange. The patch was tested on multi-processor system
> running SUSE9. I will check if the patch misses something. Anyway, we
> need to wait with the patch submission until we 100% sure how
> hythread_monitor_init should behave.
> 
> Thanks
> Evgueni
> 
> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
>> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
>> > Hi,
>> >
>> > While investigating deadlock scenario which is described in
>> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
>> > implementation of hythread_monitor_init /
>> > hythread_monitor_init_with_name initializes and acquires a monitor.
>> > Original spec reads: "Acquire and initialize a new monitor from the
>> > threading library...." AFAIU that doesn't mean to lock the monitor but
>> > get it from the threading library. So the hythread_monitor_init should
>> > not lock the monitor.
>> >
>> > Could somebody comment on that?
>>
>> It might be that semantic is different on different platforms which is
>> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
>> acceptance tests on Linux while everything on Windows works (ok I 
>> tested on
>> laptop with 1 processor while Linux was a HT server, sometimes it is
>> important for threading).

I've tried to investigate the problem but didn't find the end of it yet. 
The bug seems to be ubuntu specific (<joke>shall we maybe call this 
distribution buggy and move on?</joke>). I didn't reproduce it on 
gentoo, all tests work just fine.

The bug look likes this, on tests gc.Force, gc.LOS, gc.List, gc.NPE, 
gc.PhantomReferenceTest, gc.WeakReferenceTest, stress.WeakHashMapTest VM 
segfaults. The stack looks like an infinite recursion of 4 stack frames:

#0  0xb6dcb814 in null_java_reference_handler (signum=11, 
info=0xb71a503c, context=0xb71a50bc) at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:443
#1  <signal handler called>
#2  0xb6dcc20a in get_stack_addr () at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:293
#3  0xb6dcb6cd in check_stack_overflow (info=0xb71a546c, uc=0xb71a54ec)
     at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:399
#4  0xb6dcb900 in null_java_reference_handler (signum=11, 
info=0xb71a546c, context=0xb71a54ec) at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/vmco
re/src/util/linux/signals_ia32.cpp:451

and so on. The stack is very long. When I run VM with -Xtrace:signals I 
get a very long log of messages that "NPE or SOE detected at ...". The 
first time address always varies, but it appears to be memcpy. The next 
addresses are always the same, they point to get_stack_addr function.

So I tried to find out why memcpy crashes in the first place. It appears 
to be a struct copy called from jsig_handler hysig. The stack looks like 
this (if I can trust gdb on ubuntu):

#0  0xb7a9b9dc in memcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0xb7ba0fa0 in jsig_handler (sig=-1215196204, siginfo=0x0, uc=0x0) 
  at hysigunix.c:169
#2  0xb7f9ec8b in asynchSignalReporter (userData=0x0) at hysignal.c:971
#3  0xb7baa8ef in thread_start_proc (thd=0x807a8e8, p_args=0x807a8d8)
     at 
/nfs/ims/proj/drl/mrt1/users/gregory/Harmony/enhanced/drlvm/trunk/vm/thread/src/thread_native_basic.c:712
#4  0xb7bb0ed4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
#5  0xb7b65341 in start_thread () from lib/tls/i686/cmov/libpthread.so.0
#6  0xb7af94ee in clone () from /lib/tls/i686/cmov/libc.so.6

In jsig_handler a struct of type sigaction is copied

act = saved_sigaction[sig];

and gcc replaces this statement with a call to memcpy it seems. But the 
parameter sig is quite weird if you look at it. It is sig=-1215196204... 
Now if I could only find where and this sig happened there... I cannot 
find it in the depth of classlib native code this late at night.

-- 
Gregory


Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Alexei Fedotov <al...@gmail.com>.
All,

Evgueni's patch is a step in the right direction. Considering
pthread_mutex_init as a conventional example, monitor shouldn't be
locked at _init function. Test errors on Linux can just tell us that
there are more places that rely on the incorrect contract of the
function.

-- Alexei

On 11/11/06, Evgueni Brevnov <ev...@gmail.com> wrote:
> hmmm.... strange. The patch was tested on multi-processor system
> running SUSE9. I will check if the patch misses something. Anyway, we
> need to wait with the patch submission until we 100% sure how
> hythread_monitor_init should behave.
>
> Thanks
> Evgueni
>
> On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
> > On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> > > Hi,
> > >
> > > While investigating deadlock scenario which is described in
> > > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> > > implementation of hythread_monitor_init /
> > > hythread_monitor_init_with_name initializes and acquires a monitor.
> > > Original spec reads: "Acquire and initialize a new monitor from the
> > > threading library...." AFAIU that doesn't mean to lock the monitor but
> > > get it from the threading library. So the hythread_monitor_init should
> > > not lock the monitor.
> > >
> > > Could somebody comment on that?
> >
> > It might be that semantic is different on different platforms which is
> > probably even worse. Your patch in HARMONY-2149 breaks nearly all of
> > acceptance tests on Linux while everything on Windows works (ok I tested on
> > laptop with 1 processor while Linux was a HT server, sometimes it is
> > important for threading).
> >
> > I think we need more investigation on whether or not the monitor has to be
> > locked in init.
> >
> > --
> > Gregory Shimansky, Intel Middleware Products Division
> >
>


-- 
Thank you,
Alexei

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Evgueni Brevnov <ev...@gmail.com>.
hmmm.... strange. The patch was tested on multi-processor system
running SUSE9. I will check if the patch misses something. Anyway, we
need to wait with the patch submission until we 100% sure how
hythread_monitor_init should behave.

Thanks
Evgueni

On 11/11/06, Gregory Shimansky <gs...@gmail.com> wrote:
> On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> > Hi,
> >
> > While investigating deadlock scenario which is described in
> > HARMONY-2006 I found out one interesting thing. It turned out that DRL
> > implementation of hythread_monitor_init /
> > hythread_monitor_init_with_name initializes and acquires a monitor.
> > Original spec reads: "Acquire and initialize a new monitor from the
> > threading library...." AFAIU that doesn't mean to lock the monitor but
> > get it from the threading library. So the hythread_monitor_init should
> > not lock the monitor.
> >
> > Could somebody comment on that?
>
> It might be that semantic is different on different platforms which is
> probably even worse. Your patch in HARMONY-2149 breaks nearly all of
> acceptance tests on Linux while everything on Windows works (ok I tested on
> laptop with 1 processor while Linux was a HT server, sometimes it is
> important for threading).
>
> I think we need more investigation on whether or not the monitor has to be
> locked in init.
>
> --
> Gregory Shimansky, Intel Middleware Products Division
>

Re: [drlvm][threading] Should hythread_monitor_init() aquire the monitor?

Posted by Gregory Shimansky <gs...@gmail.com>.
On Friday 10 November 2006 17:45 Evgueni Brevnov wrote:
> Hi,
>
> While investigating deadlock scenario which is described in
> HARMONY-2006 I found out one interesting thing. It turned out that DRL
> implementation of hythread_monitor_init /
> hythread_monitor_init_with_name initializes and acquires a monitor.
> Original spec reads: "Acquire and initialize a new monitor from the
> threading library...." AFAIU that doesn't mean to lock the monitor but
> get it from the threading library. So the hythread_monitor_init should
> not lock the monitor.
>
> Could somebody comment on that?

It might be that semantic is different on different platforms which is 
probably even worse. Your patch in HARMONY-2149 breaks nearly all of 
acceptance tests on Linux while everything on Windows works (ok I tested on 
laptop with 1 processor while Linux was a HT server, sometimes it is 
important for threading).

I think we need more investigation on whether or not the monitor has to be 
locked in init.

-- 
Gregory Shimansky, Intel Middleware Products Division