You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jean-Francois Nadeau <jf...@irg.ca> on 2002/06/05 22:57:09 UTC

mod_jk 4.03 deadlock

Hi.

I started to load / stress test our web application. It is running under
Apache 1.3.22 and Tomcat 4.03 and the mod_jk binary that came with it.
The OS is Linux 2.4.7, RedHat 7.2 without any updates.

I discovered that httpd processes deadlock after a certain amount of
huge requests.

I decided to investigate the issue by looking at the source code. The
jk_handler function does not terminate. In fact, the call to
end->done($end, l) (just before the jk_close_pool) deadlock (not always
however). That function calls pthread mutex lock/unlock for connection
reuse.

I tried to comment all connection reuse code. (in jk_ajp_done,
jk_ajp_service, jk_ajp_getendpoint). The deadlock is not gone, but it
appears later.

Have you ever encountered this problem before? I'd like to fix it. May
it be a kernel bug, glibc bug? (The problem seems to come from pthread
mutexes...)

Thanks a lot,

jeff


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: mod_jk 4.03 deadlock

Posted by co...@covalent.net.
Thanks for the patch.

However, there are still few big problems and we really need your
help ( even if you solve your problem ). First, I can't reproduce
it - so it's blind debugging. 

I don't think select() is available on all platforms ( for jk2
we could use apr select ), so I doubt we can just check in your 
fix. Second, this adds a certain overhead ( we double the number
of system calls ). 

The real issue is why tomcat doesn't send the data. Could you try 
with tomcat4.1 ( or the new coyote-based ajp connector ) ? Is it 
really a deadlock ( tomcat and mod_jk both waiting for input,
i.e. locked in read ) ? Or it is that tomcat for some reasons
doesn't send the 'END' message ? 


Of course, there is the issue of detecting timeouts - but that's
extremely tricky, as some requests may take a long time to process,
and waiting 3 seconds ( or any other timeout ) is not a good solution. 
It is the java side who should send the END message when the
requests ends.

Can you try more debugging, also on the java side ? Maybe the
etherreal AJP pluging can help :-) 

BTW, even if you solved the deadlock you may run into other problems,
as requests longer than 3 secs will fail.

Costin


On 6 Jun 2002, Jean-Francois Nadeau wrote:

> Hi.
> 
> The lock/unlock fix may help but it doesn't fix the problem. I patched
> my tree with the jk_mt.h modification and I investigated the bug even
> deeper.
> 
> The problem was in jk_connect.c, jk_tcp_socket_recvfull, recv call. It
> seems that Tomcat 4.03 (I didn't try with CVS head version...) sometimes
> doesn't send all the data required. So, mod_jk blocks in recv forever,
> causing a deadlock.
> 
> I patched my tree with the following:
> 
> -- jk_connect.c, jk_tcp_socket_recvfull
> -- after while(rdlen < len) {
> 
> int this_time, select_ret;
> fd_set set;
> struct timeval timeout;
>       
> FD_ZERO(&set);
> FD_SET(sd, &set);
> 
> timeout.tv_sec = 3;
> timeout.tv_usec = 0;
> 
> select_ret = select(sd+1, &set, NULL, NULL, &timeout);
> 
> if (-1 == select_ret) {
> 	return -1;
> }
> 
> if (0 == select_ret) {
> 	return -1;
> }
> 
> -- before this_time = recv(sd,
> 
> The deadlock is gone and I'm very happy! :)
> 
> Thanks,
> 
> jeff
> 
> On Wed, 2002-06-05 at 21:25, costinm@covalent.net wrote:
> > Hi, 
> > 
> > I found the problem, it seems the lock/unlock were in the wrong order.
> > 
> > Please checkout from head and try again, and let me know if it still
> > fails.
> > 
> > ( thanks for reporting it )
> > 
> > Costin
> > 
> > On 5 Jun 2002, Jean-Francois Nadeau wrote:
> > 
> > > Hi.
> > > 
> > > I started to load / stress test our web application. It is running under
> > > Apache 1.3.22 and Tomcat 4.03 and the mod_jk binary that came with it.
> > > The OS is Linux 2.4.7, RedHat 7.2 without any updates.
> > > 
> > > I discovered that httpd processes deadlock after a certain amount of
> > > huge requests.
> > > 
> > > I decided to investigate the issue by looking at the source code. The
> > > jk_handler function does not terminate. In fact, the call to
> > > end->done($end, l) (just before the jk_close_pool) deadlock (not always
> > > however). That function calls pthread mutex lock/unlock for connection
> > > reuse.
> > > 
> > > I tried to comment all connection reuse code. (in jk_ajp_done,
> > > jk_ajp_service, jk_ajp_getendpoint). The deadlock is not gone, but it
> > > appears later.
> > > 
> > > Have you ever encountered this problem before? I'd like to fix it. May
> > > it be a kernel bug, glibc bug? (The problem seems to come from pthread
> > > mutexes...)
> > > 
> > > Thanks a lot,
> > > 
> > > jeff
> > > 
> > > 
> > > --
> > > To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> > > For additional commands, e-mail: <ma...@jakarta.apache.org>
> > > 
> > > 
> > 
> > 
> > --
> > To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> > For additional commands, e-mail: <ma...@jakarta.apache.org>
> > 
> > 
> 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: mod_jk 4.03 deadlock

Posted by Jean-Francois Nadeau <jf...@irg.ca>.
Hi.

The lock/unlock fix may help but it doesn't fix the problem. I patched
my tree with the jk_mt.h modification and I investigated the bug even
deeper.

The problem was in jk_connect.c, jk_tcp_socket_recvfull, recv call. It
seems that Tomcat 4.03 (I didn't try with CVS head version...) sometimes
doesn't send all the data required. So, mod_jk blocks in recv forever,
causing a deadlock.

I patched my tree with the following:

-- jk_connect.c, jk_tcp_socket_recvfull
-- after while(rdlen < len) {

int this_time, select_ret;
fd_set set;
struct timeval timeout;
      
FD_ZERO(&set);
FD_SET(sd, &set);

timeout.tv_sec = 3;
timeout.tv_usec = 0;

select_ret = select(sd+1, &set, NULL, NULL, &timeout);

if (-1 == select_ret) {
	return -1;
}

if (0 == select_ret) {
	return -1;
}

-- before this_time = recv(sd,

The deadlock is gone and I'm very happy! :)

Thanks,

jeff

On Wed, 2002-06-05 at 21:25, costinm@covalent.net wrote:
> Hi, 
> 
> I found the problem, it seems the lock/unlock were in the wrong order.
> 
> Please checkout from head and try again, and let me know if it still
> fails.
> 
> ( thanks for reporting it )
> 
> Costin
> 
> On 5 Jun 2002, Jean-Francois Nadeau wrote:
> 
> > Hi.
> > 
> > I started to load / stress test our web application. It is running under
> > Apache 1.3.22 and Tomcat 4.03 and the mod_jk binary that came with it.
> > The OS is Linux 2.4.7, RedHat 7.2 without any updates.
> > 
> > I discovered that httpd processes deadlock after a certain amount of
> > huge requests.
> > 
> > I decided to investigate the issue by looking at the source code. The
> > jk_handler function does not terminate. In fact, the call to
> > end->done($end, l) (just before the jk_close_pool) deadlock (not always
> > however). That function calls pthread mutex lock/unlock for connection
> > reuse.
> > 
> > I tried to comment all connection reuse code. (in jk_ajp_done,
> > jk_ajp_service, jk_ajp_getendpoint). The deadlock is not gone, but it
> > appears later.
> > 
> > Have you ever encountered this problem before? I'd like to fix it. May
> > it be a kernel bug, glibc bug? (The problem seems to come from pthread
> > mutexes...)
> > 
> > Thanks a lot,
> > 
> > jeff
> > 
> > 
> > --
> > To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> > For additional commands, e-mail: <ma...@jakarta.apache.org>
> > 
> > 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: mod_jk 4.03 deadlock

Posted by co...@covalent.net.
Hi, 

I found the problem, it seems the lock/unlock were in the wrong order.

Please checkout from head and try again, and let me know if it still
fails.

( thanks for reporting it )

Costin

On 5 Jun 2002, Jean-Francois Nadeau wrote:

> Hi.
> 
> I started to load / stress test our web application. It is running under
> Apache 1.3.22 and Tomcat 4.03 and the mod_jk binary that came with it.
> The OS is Linux 2.4.7, RedHat 7.2 without any updates.
> 
> I discovered that httpd processes deadlock after a certain amount of
> huge requests.
> 
> I decided to investigate the issue by looking at the source code. The
> jk_handler function does not terminate. In fact, the call to
> end->done($end, l) (just before the jk_close_pool) deadlock (not always
> however). That function calls pthread mutex lock/unlock for connection
> reuse.
> 
> I tried to comment all connection reuse code. (in jk_ajp_done,
> jk_ajp_service, jk_ajp_getendpoint). The deadlock is not gone, but it
> appears later.
> 
> Have you ever encountered this problem before? I'd like to fix it. May
> it be a kernel bug, glibc bug? (The problem seems to come from pthread
> mutexes...)
> 
> Thanks a lot,
> 
> jeff
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>