You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Roy Wilson <de...@bellatlantic.net> on 2000/11/05 19:00:40 UTC

TC4-m4 CPU per request increase and thread queue locking?

1) Ab with Apache_1.3.9 results

When I ran ab with apache, I ran with concurrency C = 1 or C = 100. I 
initially thought that C had something do with the number of instances of 
httpd, but it seems to only indirectly: as load goes up (due to larger 
C), more children get spawned. What I noted is that ab+httpd CPU usage 
per request goes down slightly as C increases. This appears to be because 
accept(2) processing is "amortized" (as Dan Gaudet put it) over C 
connections. I ran with N = 100,000 because of variability in results 
observed when N was smaller. No errors were observed.

2) ab with tomcat4-4.0-m4

With Tomcat4, there seemed to be some threading problems (thread queue 
lockouts, unknown threads waiting to be notified). I don't know enough 
about TC to be able to fix or even diagnose what's going on, but 
something does seem fishy to me: Once N and C become "large enough": ab 
quits producing output; the log files show an unknown thread waiting 
notification (possibly an artifact of my shell program); thread queue 
locking occurring (which seems like it might be serious); and cpu usage 
per request seems to INCREASE with C (in contrast with apache_1.3.9).

I have a dual Celeron 400Mhz machine that I ran under a Redhat6.1 smp and 
a up kernel with the JDK1.2 ClassicVM (green threads). The smp kernel 
does suck (as noted by Dean Gaudet): thruput tends to DECREASE with 
increasing C, thread queue locking is noted in one of the log files, and 
ab produces nothing or garbage. I haven't bothered to record data from 
the SMP runs. I have 256MB SD100 memory and a WD Caviar disk (there's 
very little IO, as expected).

Here are the summarized data, running the following command

/usr/sbin/ab -n N -c C 
http://localhost:8080/examples/servlet/HelloWorldExample
				    
	N (ab)	C (ab)   thruput/sec	%CPU busy	avg proc. time (ab) ms 			
	1000		1		79.11		99.67		11
			20		61.53		100		255		
			40		65.46		100		541		
			60		63.56		100		784		
			80		68.63		100		671				
			100		67.79		100		632		

	10000		1		83.95		100		11
			20		78.21		100		283		
			40		75.41		100		680

When N = 10,000 and C > 40 catalina.log shows an error mesg re: unknown 
thread waiting to be notified. Since the average number of active 
connections/threads is a function of the memory available and processor 
speed, I expect other people will see similar logfile entries with 
different N and C values. I'll send an updated, more flexible, version of 
the bash scripts today or tomorrow as I'm feelin' poorly. :-)

Roy
-- 
Roy Wilson
E-mail: designrw@bellatlantic.net						

 

Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Roy Wilson <de...@bellatlantic.net>.
<snip>


> The initial goal was correct behavior, which seems pretty well in hand.  
Now it's
> time to make it faster, without giving up the correctness :-)

When I see a thread dump in catalina.out and ab doesn't produce 
reasonable output, that makes me wonder if "correctness" ought to include 
graceful degradation when configuration settings are less than optimal 
for the OS, VM, etc. being used. I realize this is a rather simplistic 
conception,but ...

Roy

-- 
Roy Wilson
E-mail: designrw@bellatlantic.net

Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by "Craig R. McClanahan" <Cr...@eng.sun.com>.
Remy Maucherat wrote:

> crimson.jar should be both in "lib" and "server" directories.

To be more complete, crimson.jar (along with jaxp.jar) are the XML parser
from the JAXP 1.1-ea release.  We needed the parser-independent SAX2
support that JAXP 1.1 provides in order to implement the new features of
Jasper.

The build scripts for Tomcat 4.0 now let you specify independent XML
parsers for the three places that they are used:
- The parser Ant uses to read your build.xml file
- The parser Catalina uses to read web.xml files
- The parser Jasper uses to read TLD files and JSP
  pages that are in XML syntax.

Only the last one *must* be JAXP 1.1 compliant.  At the moment, the early
access release of  JAXP-1.1 is the only one that requires 1.1, and
currently the RI is the only available release.  I understand that
updates to Xerces to make it 1.1-compliant as well are in progress.

>
> Remy
>

Craig




Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Remy Maucherat <re...@apache.org>.
> Excuse me for being dense, what is crimson.jar for in M4?

The HttpProcessor is the main part of the HTTP connector, and it's in
package org.apache.catalina.connector.http.

crimson.jar should be both in "lib" and "server" directories.

Remy


Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Nick Bauman <ni...@cortexity.com>.
Excuse me for being dense, what is crimson.jar for in M4?

On Sun, 5 Nov 2000, Craig R. McClanahan wrote:

> Nick Bauman wrote:
> 
> > On Sun, 5 Nov 2000, Craig R. McClanahan wrote:
> >
> > > A very large chunk of Catalina's processing time is consumed by parsing the
> > > request headers, and converting them into a Request object that is passed on for
> > > processing.  Volunteers who want one convenient place to start suggesting
> > > improvements can look at org.apache.connector.http.HttpProcessor -- there is an
> > > instance of this class running for each processor thread that is created.
> > >
> >
> > So does HttpProcessor use a pooler? If not, I would imagine this would be
> > a huge performace benefit. Is anyone slated to do this?
> >
> 
> The instances of HttpProcessor -- each with an associated thread -- are indeed pooled
> and reused by the HttpConnector (the object that accepts incoming socket connections
> and hands them off for processing).  Initially, the connector creates the number of
> instances that you specify in the minProcessors property (default=5), and the number
> grows up to the specified maximum if you get more simultaneous requests than you
> currently have processors.
> 
> Once created, an HttpProcessor instance and its corresponding thread are never
> destroyed until the server is shut down.  Idle threads sit on a wait() call (no "do
> you have work for me to do" type polling), so they don't consume any CPU time -- well,
> I'm assuming that threading was implemented competently in your JVM :-) -- and the
> memory footprint is pretty insigificant, so there did not seem to be any gain in
> harvesting them if the current level of server activity is low.
> 
> >
> > Nicolaus Bauman
> > Software Engineer
> > Simplexity Systems
> >
> 
> Craig
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

-- 
Nicolaus Bauman
Software Engineer
Simplexity Systems



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by "Craig R. McClanahan" <Cr...@eng.sun.com>.
Nick Bauman wrote:

> On Sun, 5 Nov 2000, Craig R. McClanahan wrote:
>
> > A very large chunk of Catalina's processing time is consumed by parsing the
> > request headers, and converting them into a Request object that is passed on for
> > processing.  Volunteers who want one convenient place to start suggesting
> > improvements can look at org.apache.connector.http.HttpProcessor -- there is an
> > instance of this class running for each processor thread that is created.
> >
>
> So does HttpProcessor use a pooler? If not, I would imagine this would be
> a huge performace benefit. Is anyone slated to do this?
>

The instances of HttpProcessor -- each with an associated thread -- are indeed pooled
and reused by the HttpConnector (the object that accepts incoming socket connections
and hands them off for processing).  Initially, the connector creates the number of
instances that you specify in the minProcessors property (default=5), and the number
grows up to the specified maximum if you get more simultaneous requests than you
currently have processors.

Once created, an HttpProcessor instance and its corresponding thread are never
destroyed until the server is shut down.  Idle threads sit on a wait() call (no "do
you have work for me to do" type polling), so they don't consume any CPU time -- well,
I'm assuming that threading was implemented competently in your JVM :-) -- and the
memory footprint is pretty insigificant, so there did not seem to be any gain in
harvesting them if the current level of server activity is low.

>
> Nicolaus Bauman
> Software Engineer
> Simplexity Systems
>

Craig



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by "Craig R. McClanahan" <Cr...@eng.sun.com>.
Nick Bauman wrote:

> I just double checked your original email; this is what caused the
> confusion:
>
> >> processing.  Volunteers who want one convenient place to start
> suggesting
> >> improvements can look at org.apache.connector.http.HttpProcessor --
> there
>
> I assume you forgot the ".catalina." in that statement.
>

I did ... too many patches before responding to that message, sorry 'bout that :-)

>
> >
> > Craig
> >
>
> --
> Nicolaus Bauman
> Software Engineer
> Simplexity Systems
>

Craig



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Nick Bauman <ni...@cortexity.com>.
On Sun, 5 Nov 2000, Craig R. McClanahan wrote:

> > Remy,
> >
> > Where is the source, it's not in the M4 sources, I just looked. Is it only
> > in CVS?
> >
> 
> I just double checked, and the relevant files are indeed in the m4 source
> bundle.  Look in directory
> "catalina/src/share/org/apache/catalina/connector/http" for the HttpConnector and
> HttpProcessor classes -- the source distribution (jakarta-tomcat-4.0-m4-src.xxx)
> is just a snapshot of the directory structure of the CVS repository.

I just double checked your original email; this is what caused the
confusion:

>> processing.  Volunteers who want one convenient place to start 
suggesting
>> improvements can look at org.apache.connector.http.HttpProcessor -- 
there

I assume you forgot the ".catalina." in that statement.

> 
> Craig
> 

-- 
Nicolaus Bauman
Software Engineer
Simplexity Systems



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by "Craig R. McClanahan" <Cr...@eng.sun.com>.
Nick Bauman wrote:

> Remy,
>
> Where is the source, it's not in the M4 sources, I just looked. Is it only
> in CVS?
>

I just double checked, and the relevant files are indeed in the m4 source
bundle.  Look in directory
"catalina/src/share/org/apache/catalina/connector/http" for the HttpConnector and
HttpProcessor classes -- the source distribution (jakarta-tomcat-4.0-m4-src.xxx)
is just a snapshot of the directory structure of the CVS repository.

If you've got the binary distribution (jakarta-tomcat-4.0-m4.xxx), the directory
is "src/catalina/share/org/apache/catalina/connector/http" instead.

Craig


>
> On Sun, 5 Nov 2000, Remy Maucherat wrote:
>
> > ----- Original Message -----
> > From: "Nick Bauman" <ni...@cortexity.com>
> > To: <to...@jakarta.apache.org>
> > Sent: Sunday, November 05, 2000 5:24 PM
> > Subject: Re: TC4-m4 CPU per request increase and thread queue locking?
> >
> >
> > > On Sun, 5 Nov 2000, Craig R. McClanahan wrote:
> > >
> > > > A very large chunk of Catalina's processing time is consumed by parsing
> > the
> > > > request headers, and converting them into a Request object that is
> > passed on for
> > > > processing.  Volunteers who want one convenient place to start
> > suggesting
> > > > improvements can look at org.apache.connector.http.HttpProcessor --
> > there is an
> > > > instance of this class running for each processor thread that is
> > created.
> > > >
> > >
> > > So does HttpProcessor use a pooler? If not, I would imagine this would be
> > > a huge performace benefit. Is anyone slated to do this?
> >
> > The request parsing has been cut & paste from a very old version of Tomcat,
> > and needs work, esp the HttpProcessor.read() function.
> > I'll probably take care of the optimisation of the connector, but
> > contributions in the "performance" area are more than welcome.
> >
> > Remy
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> >
>
> --
> Nicolaus Bauman
> Software Engineer
> Simplexity Systems
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Nick Bauman <ni...@cortexity.com>.
Remy,

Where is the source, it's not in the M4 sources, I just looked. Is it only
in CVS? 

On Sun, 5 Nov 2000, Remy Maucherat wrote:

> ----- Original Message -----
> From: "Nick Bauman" <ni...@cortexity.com>
> To: <to...@jakarta.apache.org>
> Sent: Sunday, November 05, 2000 5:24 PM
> Subject: Re: TC4-m4 CPU per request increase and thread queue locking?
> 
> 
> > On Sun, 5 Nov 2000, Craig R. McClanahan wrote:
> >
> > > A very large chunk of Catalina's processing time is consumed by parsing
> the
> > > request headers, and converting them into a Request object that is
> passed on for
> > > processing.  Volunteers who want one convenient place to start
> suggesting
> > > improvements can look at org.apache.connector.http.HttpProcessor --
> there is an
> > > instance of this class running for each processor thread that is
> created.
> > >
> >
> > So does HttpProcessor use a pooler? If not, I would imagine this would be
> > a huge performace benefit. Is anyone slated to do this?
> 
> The request parsing has been cut & paste from a very old version of Tomcat,
> and needs work, esp the HttpProcessor.read() function.
> I'll probably take care of the optimisation of the connector, but
> contributions in the "performance" area are more than welcome.
> 
> Remy
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

-- 
Nicolaus Bauman
Software Engineer
Simplexity Systems



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Remy Maucherat <re...@apache.org>.
----- Original Message -----
From: "Nick Bauman" <ni...@cortexity.com>
To: <to...@jakarta.apache.org>
Sent: Sunday, November 05, 2000 5:24 PM
Subject: Re: TC4-m4 CPU per request increase and thread queue locking?


> On Sun, 5 Nov 2000, Craig R. McClanahan wrote:
>
> > A very large chunk of Catalina's processing time is consumed by parsing
the
> > request headers, and converting them into a Request object that is
passed on for
> > processing.  Volunteers who want one convenient place to start
suggesting
> > improvements can look at org.apache.connector.http.HttpProcessor --
there is an
> > instance of this class running for each processor thread that is
created.
> >
>
> So does HttpProcessor use a pooler? If not, I would imagine this would be
> a huge performace benefit. Is anyone slated to do this?

The request parsing has been cut & paste from a very old version of Tomcat,
and needs work, esp the HttpProcessor.read() function.
I'll probably take care of the optimisation of the connector, but
contributions in the "performance" area are more than welcome.

Remy


Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by Nick Bauman <ni...@cortexity.com>.
On Sun, 5 Nov 2000, Craig R. McClanahan wrote:

> A very large chunk of Catalina's processing time is consumed by parsing the
> request headers, and converting them into a Request object that is passed on for
> processing.  Volunteers who want one convenient place to start suggesting
> improvements can look at org.apache.connector.http.HttpProcessor -- there is an
> instance of this class running for each processor thread that is created.
> 

So does HttpProcessor use a pooler? If not, I would imagine this would be
a huge performace benefit. Is anyone slated to do this?

Nicolaus Bauman
Software Engineer
Simplexity Systems



Re: TC4-m4 CPU per request increase and thread queue locking?

Posted by "Craig R. McClanahan" <Cr...@eng.sun.com>.
Roy,

Thanks for the starting point on performance analysis for Tomcat 4.0!  As we will
all find, benchmarking is an intriguing art.  For example, the versions of
Catalina, the JVM, the OS, multiple processors, and memory size will all have
significant impacts.

One Tomcat 4.0 configuration setting that matters a great deal in scenarios like
this is the connection pool sizes (on the <Connector> element in
$CATALINA_HOME/conf/server.xml).  The number of "processors" configured here
determines the maximum number of simultaneous request threads that Catalina will
set up -- default maximum is 75.  A second setting that matters (but somewhat less
so) is the "acceptCount" property, that determines how many backlogged requests
the server socket will allow -- this becomes the "backlog" property when the
server socket is opened.  The default is 10.

A very large chunk of Catalina's processing time is consumed by parsing the
request headers, and converting them into a Request object that is passed on for
processing.  Volunteers who want one convenient place to start suggesting
improvements can look at org.apache.connector.http.HttpProcessor -- there is an
instance of this class running for each processor thread that is created.

The initial goal was correct behavior, which seems pretty well in hand.  Now it's
time to make it faster, without giving up the correctness :-)

Craig


Roy Wilson wrote:

> 1) Ab with Apache_1.3.9 results
>
> When I ran ab with apache, I ran with concurrency C = 1 or C = 100. I
> initially thought that C had something do with the number of instances of
> httpd, but it seems to only indirectly: as load goes up (due to larger
> C), more children get spawned. What I noted is that ab+httpd CPU usage
> per request goes down slightly as C increases. This appears to be because
> accept(2) processing is "amortized" (as Dan Gaudet put it) over C
> connections. I ran with N = 100,000 because of variability in results
> observed when N was smaller. No errors were observed.
>
> 2) ab with tomcat4-4.0-m4
>
> With Tomcat4, there seemed to be some threading problems (thread queue
> lockouts, unknown threads waiting to be notified). I don't know enough
> about TC to be able to fix or even diagnose what's going on, but
> something does seem fishy to me: Once N and C become "large enough": ab
> quits producing output; the log files show an unknown thread waiting
> notification (possibly an artifact of my shell program); thread queue
> locking occurring (which seems like it might be serious); and cpu usage
> per request seems to INCREASE with C (in contrast with apache_1.3.9).
>
> I have a dual Celeron 400Mhz machine that I ran under a Redhat6.1 smp and
> a up kernel with the JDK1.2 ClassicVM (green threads). The smp kernel
> does suck (as noted by Dean Gaudet): thruput tends to DECREASE with
> increasing C, thread queue locking is noted in one of the log files, and
> ab produces nothing or garbage. I haven't bothered to record data from
> the SMP runs. I have 256MB SD100 memory and a WD Caviar disk (there's
> very little IO, as expected).
>
> Here are the summarized data, running the following command
>
> /usr/sbin/ab -n N -c C
> http://localhost:8080/examples/servlet/HelloWorldExample
>
>         N (ab)  C (ab)   thruput/sec    %CPU busy       avg proc. time (ab) ms
>         1000            1               79.11           99.67           11
>                         20              61.53           100             255
>                         40              65.46           100             541
>                         60              63.56           100             784
>                         80              68.63           100             671
>                         100             67.79           100             632
>
>         10000           1               83.95           100             11
>                         20              78.21           100             283
>                         40              75.41           100             680
>
> When N = 10,000 and C > 40 catalina.log shows an error mesg re: unknown
> thread waiting to be notified. Since the average number of active
> connections/threads is a function of the memory available and processor
> speed, I expect other people will see similar logfile entries with
> different N and C values. I'll send an updated, more flexible, version of
> the bash scripts today or tomorrow as I'm feelin' poorly. :-)
>
> Roy
> --
> Roy Wilson
> E-mail: designrw@bellatlantic.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org