You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Vasile GABURICI <ga...@ss.pub.ro> on 2000/05/17 21:02:56 UTC

Two tough questions on JNI

        Dear Jakarta developers,

	To make the AOLServer plug-in work with JNI I need help from people
in three letter companies. That is IBM and/or Sun. If you know somebody
who can answer the following two questions, please ask him/her. It is very
likely that I solve the problem for good if know the answers to these two.

	The two questions refer to the interaction between a C application
using Linux kernel threads (on 2.2.14) and the native threads JDK (1.1,
1.2, it does not matter) via JNI:

        1) What *exactly* is permitted and what is not with respect to
threads and signal masks in an application? If a handler is installed for
a signal like SIGHUP in the main thread, why do threads that block this
signal hang at random points inside the JVM?

        2) How does the -Xrs flag "reduce the use of OS signals"? Can you
give a list of signals that are used by your JVM with and without this
flag?


	TIA,
	Vasile


Re: Alternate solution (was: Two tough questions on JNI)

Posted by Costin Manolache <co...@eng.sun.com>.
Vasile GABURICI wrote:

>         Okay, we have agreed to disagree. Could you make the hack active
> only when it's compiled for the Apache 2.0? I see no reason why other
> modules that use JNI on Linux (like mine) should be blessed with it.

Ok.
( I don't disagree with the solution - I just don't have time to implement it
for apache, and it's not required with the current 2.0.
I have  a bit too much on the todo list - give me some time please... ).

Costin



Re: Alternate solution (was: Two tough questions on JNI)

Posted by Vasile GABURICI <ga...@ss.pub.ro>.
	Okay, we have agreed to disagree. Could you make the hack active
only when it's compiled for the Apache 2.0? I see no reason why other
modules that use JNI on Linux (like mine) should be blessed with it. 

On Thu, 18 May 2000, Costin Manolache wrote:

> Thanks,
> 
> I think I fixed the apache handler, it works fine now. I'll add code
> to make sure  RT 3,4,5, 6,7 are also enabled, but right now I don't think
> we need to call child in a separate thread, at least not with the
> current Apache 2.0.
> 
> 
> >         I don't know Apache 2.0 threading model, but I presume that
> > jk_child_init() is called when a child is forked. I also presume that this
> > child is multithreaded. (Otherwise where are the threads?) Therefore, I
> > guess that the main thread of this process is going to call jk_child_init.
> > If besides initialization this thread is going to deal with signals, you
> > are set for a fall.
> 
> It will deal with signals later, after the init phase.  We start the VM
> during Apache configuration,  and this is done in single-thread mode
> ( because the configuration will determine if we'll have threads and
> what "thread manager " to use  - the MPM ).
> 
> 
> We are lucky :-)
> 
> 
> > advantages:
> >         - you don't mess with signals in jk_jni_worker, which is't suposed
> > to fiddle with signals anyway (it's web-server independent).
> 
> I have to - the apache worker thread will filter the signals we use.
> 
> 
> 
> >         If you don't agree with me, can you give a rationale for your
> > solution? Why do you need to fickle with the real time signals 0, 1 and 2
> > that are actually used by the glibc 2.1 for the linux threads stuff?
> 
> Just paranoia - the signal needed in JDK11 is USR1 and JDK12 needs
> SIGUNUSED.  Just make sure nobody blocks the signals.
> ( all I do is make sure the signals are not filtered out ).
> 
> 
> 
> >         According to Juergen, Blackdown JVM actually uses real time
> > signals 3 and 4 to implement a <quote> a signal based suspend/resume
> > scheme (based on a bug-fixed version of Dave Butenhof's example in his
> > 'Programming with POSIX Threads' book) </quote>. Using strace I have found
> > that it actually uses real time signals 6 and 7, though.
> 
> Ok, it seem I need to  let them pass too.
> I don't think apache will block those, but after all the pain debugging this
> probably it's better to play safe.
> 
> 
> Costin
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 
> 


Re: Alternate solution (was: Two tough questions on JNI)

Posted by Costin Manolache <co...@eng.sun.com>.
Thanks,

I think I fixed the apache handler, it works fine now. I'll add code
to make sure  RT 3,4,5, 6,7 are also enabled, but right now I don't think
we need to call child in a separate thread, at least not with the
current Apache 2.0.


>         I don't know Apache 2.0 threading model, but I presume that
> jk_child_init() is called when a child is forked. I also presume that this
> child is multithreaded. (Otherwise where are the threads?) Therefore, I
> guess that the main thread of this process is going to call jk_child_init.
> If besides initialization this thread is going to deal with signals, you
> are set for a fall.

It will deal with signals later, after the init phase.  We start the VM
during Apache configuration,  and this is done in single-thread mode
( because the configuration will determine if we'll have threads and
what "thread manager " to use  - the MPM ).


We are lucky :-)


> advantages:
>         - you don't mess with signals in jk_jni_worker, which is't suposed
> to fiddle with signals anyway (it's web-server independent).

I have to - the apache worker thread will filter the signals we use.



>         If you don't agree with me, can you give a rationale for your
> solution? Why do you need to fickle with the real time signals 0, 1 and 2
> that are actually used by the glibc 2.1 for the linux threads stuff?

Just paranoia - the signal needed in JDK11 is USR1 and JDK12 needs
SIGUNUSED.  Just make sure nobody blocks the signals.
( all I do is make sure the signals are not filtered out ).



>         According to Juergen, Blackdown JVM actually uses real time
> signals 3 and 4 to implement a <quote> a signal based suspend/resume
> scheme (based on a bug-fixed version of Dave Butenhof's example in his
> 'Programming with POSIX Threads' book) </quote>. Using strace I have found
> that it actually uses real time signals 6 and 7, though.

Ok, it seem I need to  let them pass too.
I don't think apache will block those, but after all the pain debugging this
probably it's better to play safe.


Costin


Re: Alternate solution (was: Two tough questions on JNI)

Posted by Vasile GABURICI <ga...@ss.pub.ro>.
	Costin,

	I would like to suggest that you try to apply the solution that
Juergen has indicated (for nstomcat) to mod_jk as well. I have noted that
in jk_child_init() you call wc_open(). I was making the same mistake.

	I don't know Apache 2.0 threading model, but I presume that
jk_child_init() is called when a child is forked. I also presume that this
child is multithreaded. (Otherwise where are the threads?) Therefore, I
guess that the main thread of this process is going to call jk_child_init.
If besides initialization this thread is going to deal with signals, you
are set for a fall.

	Proposed soulution: launch a thread that calls wc_open, then join
it. If this works for mod_jk as well as it does for nstomcat, you have two
advantages:
	- you don't mess with signals in jk_jni_worker, which is't suposed
to fiddle with signals anyway (it's web-server independent). 
	- the solution is going to work on platforms other than linux that
may experince similar problems in the future. Actually nobody will have
this problem if Apache 2.0 has a threading abstraction layer (like the
AOLServer has).

	Please note that Gal also does this trick in his NSAPI plug-in.

	If you would like to compare the signal handling in Apache 2.0
with the one in AOLServer 3.0, plese find attached the 3kb of code (2
functions) that deal with signals on AOLServer. I have annotated them.

	If you don't agree with me, can you give a rationale for your
solution? Why do you need to fickle with the real time signals 0, 1 and 2
that are actually used by the glibc 2.1 for the linux threads stuff?

On 18 May 2000 costin@locus.apache.org wrote:
>   +
>   +static void linux_signal_hack() {
>   +    sigset_t newM;
>   +    sigset_t old;
>   +    
>   +    sigemptyset(&newM);
>   +    pthread_sigmask( SIG_SETMASK, &newM, &old );
>   +    
>   +    sigdelset(&old, SIGUSR1 );
>   +    sigdelset(&old, SIGUSR2 );
>   +    sigdelset(&old, SIGUNUSED );
>   +    sigdelset(&old, SIGRTMIN );
>   +    sigdelset(&old, SIGRTMIN + 1 );
>   +    sigdelset(&old, SIGRTMIN + 2 );
>   +    pthread_sigmask( SIG_SETMASK, &old, NULL );
>   +}

	According to Juergen, Blackdown JVM actually uses real time
signals 3 and 4 to implement a <quote> a signal based suspend/resume
scheme (based on a bug-fixed version of Dave Butenhof's example in his
'Programming with POSIX Threads' book) </quote>. Using strace I have found
that it actually uses real time signals 6 and 7, though.

	I don't have access to the book Juergen mentions; if anyone knows
what Juergen means, please let me know too. 


	Cheers,
	Vasile


On Thu, 18 May 2000 costin@costin.dnt.ro wrote:~
> 
> Ok, I spent last 3 days with GDB, and I think I have a fix.
> The problem ( GC hanging ) happens in Apache too. 
> 
> First Apache 2.0 worked fine, then I updated the workspace and 
> still worked - but after a certain number of requests it just hanged. 
> Same problem.
> 
> I traced everything to the fact the way linux handles signals. In linux
> you can send a signal to a _thread_, and you can control what signals are
> blocked at thread level. Most programs don't do that, but Apache ( and it
> seems aolserver ) will block all signals ( to prevent the server from
> crashing - they want to restart at least ). 
> 
> I'll commit the change to jni_worker, the trick is to do ( in every worker
> thread - I do it before AttachThread ) :
>     {
> 	sigset_t newM;
> 	sigset_t old;
> 
> 	sigemptyset(&newM); /* I'm not sure it's needed */
> 	pthread_sigmask( SIG_SETMASK, &newM, &old );
> 	sigdelset(&old, SIGUSR1 ); /* jdk11 - not tested */
> 	sigdelset(&old, SIGUNUSED );/*jdk 12, 13 - sun, ibm */
> 	pthread_sigmask( SIG_SETMASK, &old, NULL );
>     }
> I tested with Blackdown/Sun/Inprise JDK122, and with IBM 13 - both 
> work now. 
> 
> I think we are back on track with the JNI connector.
> 
> I also found a very interesting  trick to avoid attaching/detaching - it
> is possible to use the tpool ( the memory pool attached with the thread -
> the grand-parent of req->pool ) and associate the endpoint with a key and
> a cleanup method ( to do detach ) called when the thread ends.
> This is similar with per/thread data, maybe better.
> 
> Costin
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

Re: Two tough questions on JNI

Posted by co...@costin.dnt.ro.
Ok, I spent last 3 days with GDB, and I think I have a fix.
The problem ( GC hanging ) happens in Apache too. 

First Apache 2.0 worked fine, then I updated the workspace and 
still worked - but after a certain number of requests it just hanged. 
Same problem.

I traced everything to the fact the way linux handles signals. In linux
you can send a signal to a _thread_, and you can control what signals are
blocked at thread level. Most programs don't do that, but Apache ( and it
seems aolserver ) will block all signals ( to prevent the server from
crashing - they want to restart at least ). 

I'll commit the change to jni_worker, the trick is to do ( in every worker
thread - I do it before AttachThread ) :
    {
	sigset_t newM;
	sigset_t old;

	sigemptyset(&newM); /* I'm not sure it's needed */
	pthread_sigmask( SIG_SETMASK, &newM, &old );
	sigdelset(&old, SIGUSR1 ); /* jdk11 - not tested */
	sigdelset(&old, SIGUNUSED );/*jdk 12, 13 - sun, ibm */
	pthread_sigmask( SIG_SETMASK, &old, NULL );
    }
I tested with Blackdown/Sun/Inprise JDK122, and with IBM 13 - both 
work now. 

I think we are back on track with the JNI connector.

I also found a very interesting  trick to avoid attaching/detaching - it
is possible to use the tpool ( the memory pool attached with the thread -
the grand-parent of req->pool ) and associate the endpoint with a key and
a cleanup method ( to do detach ) called when the thread ends.
This is similar with per/thread data, maybe better.

Costin



Re: Two tough questions on JNI

Posted by Gal Shachor <sh...@il.ibm.com>.

Vasile GABURICI wrote:
> 
>         Dear Jakarta developers,
> 
>         To make the AOLServer plug-in work with JNI I need help from people
> in three letter companies. That is IBM and/or Sun. If you know somebody
> who can answer the following two questions, please ask him/her. It is very
> likely that I solve the problem for good if know the answers to these two.
> 
>         The two questions refer to the interaction between a C application
> using Linux kernel threads (on 2.2.14) and the native threads JDK (1.1,
> 1.2, it does not matter) via JNI:
> 
>         1) What *exactly* is permitted and what is not with respect to
> threads and signal masks in an application? If a handler is installed for
> a signal like SIGHUP in the main thread, why do threads that block this
> signal hang at random points inside the JVM?
> 
>         2) How does the -Xrs flag "reduce the use of OS signals"? Can you
> give a list of signals that are used by your JVM with and without this
> flag?
> 

Vasile, I can not tell you what is *exactly* permitted since I am not
part of
the JVM team, but I can tell you something about the use of signals (and
why
sigmask tempering will result with the JVM getting stuck).

The old (1.1) AIX JVM as well as probably the Linux JVM uses signals to
coordinate 
the Garbage Collection.  Now, since the GC causes all the threads to
block (unless 
you are using an improved GC algorithm as available for IBM's
MainFrames) playing 
with the signals will result in all the JVM threads getting stuck
forever.

I happened to discover this when my AIX JVM froze on me two or three
years ago 
because I was using SIGUSER1 that was used by the JVM. Now, since the GC
code is
similar on all UNIXs this is probably what happens to you.

	Gal Shachor