You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jean-Luc Rochat <jn...@cybercable.fr> on 2000/02/01 23:04:45 UTC

Re: Discussion: AJP next

costin@eng.sun.com wrote:
> 
> >  The getRealPath callback is a good example for something that may be a
> > problem, not because of the callback that we will have to do but because
> > of the servlet APIs. We have getRealPath in two places - the request and
> > the context objects, we can use a callback only during a request, how are
> > we going to resolve the difference? Same thing applys when we want to map
> > a suffic to a mime type using the web server.
> 
> It's a problem only if tomcat will call getRealPath outside of request
> processing. It's a very special case - most of the time it's a servlet
> that calls the methods, and the servlet is executed as a result of a
> request ( i.e. you have an open communication). Doesn't matter if Context
> or Request is called for realPath - the implementation of the method will
> just send a message to Apache and get the response.
> 
> In the special case, when you have a non-servlet thread calling back, you
> can do some tricks ( like a HTTP connection back to apache ?), but again,
> it's not the normal case and probably will not happen too much ( so we can
> have something complex for this case ).
> 
> > > 1. Apache starts and open a connection with tomcat ( that will remain open
> > > for the live of the httpd process - avoid TCP open delay/request). Tomcat
> > > thread will be blocked on read, waiting for apache message.
> > >
> >
> >  Actually, since (sadly) Apache is multiprocess and can open up to 256
> > processes, one socket for the duration of the process is too much.
> >
> > Also such policy opens the door for denial of service attacks.
> >
> > I though that we should keep the connection open long enough but not for
> > the duration of the process. Not if we are going to allocate a tomcat
> > thread per connection.
> 
> Opening a TCP connection is slow, probably we can open it at first request
> to a servlet and close it after a while.
> 
> DOS can happen anyway - I don't think 1 connection/request is a solution.
> 
> ( also, TIME_WAIT is a big problem sometimes!)
> 
> We can discuss that more, but it's not so important - it's easy to make it
> optional.
> 
> > > I don't know how do you plan to deal with callbacks ( from a thread point
> > > of view ), everything else is fine in your proposal, but I think
> > > callbacks need to be addressed and explained better.
> > >
> >
> > I am going to deal with them similar to your description... I was also
> > thinking about writing some native code on the tomcat side so I will be
> > able to use select (no select in Java !!!). This way I can reduce the
> > number of Java threads and leave more connections open for a longer time.
> 
> I don't know if it's a good ideea - it will be benefical for high-loaded
> sites ( where you have a large number of concurent connections, that mean
> milions of hits/day ) - in this case load balancing is much better ( and
> using more boxes). Also, JNI and Invocation are faster and probably a
> better way to spend the time.
> 
> Costin
> 

At that point I'll try to give my opinion and possibly offer some ideas.
This topic has been one of the problems that have not been addressed in
Jserv, but we tried to imagine solutions. I am happy to see that a lot
of my previous ideas are expressed by others, and we are here only
missing some glue, which I hope I could contribute to bring.

1 - integrating a JVM in an Apache module is a _bad thing_ (tm) IMO.
    . from the architecture point of view. (n-tier is better for
scalability than monolithic servers)
    . from the security point of view (small pieces of code means less
bugs, user permissions problems)
    . from the portability point of view (example: Apache runs very well
on BSDs. JVMs don't).

2 - Opening a TCP connection for each request brings an extra overhead.
    . connections should be reused if possible (keepalive feature).

3 - one socket / httpd process could be too much, especially if more
than 1 Apache send requests to 1 servlet engine.
    . sockets can be closed at any time by the servlet engine, so we can
use one socket/httpd

4 - sockets cannot be easily shared between httpd processes (on Unix at
least, and with Apache 1.x).
    . true if the servlet engine stops and restarts while the Apache
server is still up. (load-balanced configuration). So don't multiplex
data on one socket, or offer it as an optional feature.
    . asking for the Apache father to re-create the socket and re-fork
all the children would be a bad idea.

5 - Using any other protocol than ajpv11 is faster.
    . yes but ajpv11 brings everything needed. So keep the "verbose"
protocol a possible choice.(for ben-ssl/mod_ssl by example).
 
6 - Callbacks to Apache are not mandatory. Or maybe they are. or will
be.
    . let's keep the protocol pluggable, so we can change it.

7 - All of this has to pass firewalls (callbacks ?)

8 - This protocol implementation should allow the basic load-balancing
mechanisms used in JServ, take care of keepalive enabled sockets,
authentication, without problem.

The solution seems to split the initial problem in 2 parts :
1 : the keepalive feature.
   . every httpd opens a socket -> one servlet engine (default server if
load-balanced config)
   . the socket is not closed by the httpd. it is reused and reused.
   . if the socket is closed by the servlet engine, httpd opens a new
one when needed.
   . the servlet engine accepts incoming connections from httpd
processes (belonging to 1..n Apache servers)
   . once accepted, a thread block on the socket to read new requests
(keepalive). 
     authentication is performed once, and the load-balancing parameter
(the servername to add to the cookie) is sent once.
   . if a servlet engine whishes, some of the opened sockets can be
closed.
   This should give a boost to performances, as the socket creation has
a huge cost.


2 : the new protocol (possibly) bi-directionnal protocol), built on top
of the the above keepalive enabled connections.
   . ajpv11 like. (not very clever, but everything is there).
   . ajpv21 like (with multiple packet exchanges in one single request
(yes, including requests servlet engine -> Apache).

Proposal :

Step 1 : I am going to start working on the keepalive feature, and will
bring C & java code ASAP. This will be tested with JServ, as I'm far
more familiar with the code.
This will be for Apache only (I don't care IIS, or NS for the moment)
and especially U**x, (I don't care NT/W95 for the moment).

Step 2 : see the code & talk.

Step 3 : protocol enhancements if people like it.

Feedback welcome.

Jean-Luc
http://ma-planete.net

Re: Discussion: AJP next

Posted by co...@costin.dnt.ro.
> Of course this problem is more related to general policies of Tomcat
> development. IMHO folding Tomcat into non-expandable environment, which
> is very restrictively limited to current specification is bad. It's
> safer to assume that current specification is not the ultimate solution,
> and it will have to evolve.

There is a price you have to pay for standards.

You should be very carefull with using proprietary APIs - you're code will
no longer be portable, and then I see no reason for using Servlets
anyway, since your code will run only on a particular engine. 

There are many things I don't like in the Servlet API, and it will be very
hard to find someone ( anyone ) that think it is "perfect". 

On the other side, what you want can be implemented in tomcat ( low level)
and will be _very_ usefull for JSP ( or other template engines ). There is
nothing wrong in having a JSP adapter that is specific to tomcat and takes
advantage of Apache-tomcat cache and optimizations. 

It is usefull to have the "caching" API implemented in tomcat, use it on
JSP ( where at least you have a very clear separation between static
and dynamic content). 

Costin


Re: Discussion: AJP next

Posted by Michal Mosiewicz <mi...@interdata.com.pl>.
costin@eng.sun.com wrote:
> [...]
> Wait, I don't get it - how can you guess what portion doesn't change ???
> The servlet API doesn't have any support for that - in a templating system
> like JSP you might detect if a portion changed, but for a general Servlet
> I don't have any idea...

True that servlet API doesn't solve it. But I'm looking toward future
specifications. I was experimenting on JServ to test potential gains. In
that case I was able to insert additional methods to mark static body in
JServConnection and then I could cast request to JServConnection, and I
could use those methods. That's how I tested my ideas. That's is hardly
possible in reference implementation.

As for now, you can only use some custom content encoding to accomplish
this features. So that means that some custom protocol have to be
layered over http response. 

Another simple solution would be using dispatchers as the mean of
marking different content cacheability. That is I think implemented in
Resin. However in case of resin, cache is stored and managed by the java
engine, not the apache.

Of course this problem is more related to general policies of Tomcat
development. IMHO folding Tomcat into non-expandable environment, which
is very restrictively limited to current specification is bad. It's
safer to assume that current specification is not the ultimate solution,
and it will have to evolve.

So my point is, that smart information caching may be useful right now,
but currently it can be only limited to full resource content. However
it is better to give it more power, that could be used in the future. 

-- Mike

Re: Discussion: AJP next

Posted by co...@eng.sun.com.
> costin@eng.sun.com wrote:
> > [...]
> > 1. If a response _does_ set expire headers, it should be easy to cache it
> > on apache side, either in memory or in file system. ( similar with using a
> > squid accelerator). We don't need any API, just an apache module ( that
> > can be used for CGIs for example ). It would be really _cool_.
> 
> But you are talking about the whole response, while I'm talking about
> fragments. To get the idea, just take a look at some larger portal site
> - you've got several common elements, like w3 catalogs, stock indexes,
> latest news, and finally - some advertising. Usually you can cache those
> elements with different expiration times - for example hours for w3
> catalog is usually fine, latest news may be delayed by minutes, stock
> indexes by seconds or minutes. Ads cannot be cached at all, cause you
> will usually need some personal profiling methods to serve them. So -
> the common denominator for the whole page is not to cache it at all,
> becouse it is partially served on a per-user basis.

Wait, I don't get it - how can you guess what portion doesn't change ???
The servlet API doesn't have any support for that - in a templating system
like JSP you might detect if a portion changed, but for a general Servlet
I don't have any idea...

Costin



Re: Discussion: AJP next

Posted by Michal Mosiewicz <mi...@interdata.com.pl>.
costin@eng.sun.com wrote:
> [...]
> 1. If a response _does_ set expire headers, it should be easy to cache it
> on apache side, either in memory or in file system. ( similar with using a
> squid accelerator). We don't need any API, just an apache module ( that
> can be used for CGIs for example ). It would be really _cool_.

But you are talking about the whole response, while I'm talking about
fragments. To get the idea, just take a look at some larger portal site
- you've got several common elements, like w3 catalogs, stock indexes,
latest news, and finally - some advertising. Usually you can cache those
elements with different expiration times - for example hours for w3
catalog is usually fine, latest news may be delayed by minutes, stock
indexes by seconds or minutes. Ads cannot be cached at all, cause you
will usually need some personal profiling methods to serve them. So -
the common denominator for the whole page is not to cache it at all,
becouse it is partially served on a per-user basis.

> 2. For JSPs we have a great advantage - we know that a portion of the page
> is constant. ( as a matter of fact, it's one of the main reasons I like
> JSPs ! ). It is very easy to mark the constant body.
> 
> In API terms, the response from tomcat will consist in fragements. Because
> of  the 2.2 buffering we already have that implemented, and that might
> help a lot in another area ( HTTP/1.1 and HTTP connection reuse ).

That also could help in lowering communication costs. Sometimes it
appears to be very easy to push 10kB response in one tcp packet that is
fraction of that size. 

Mike

Re: Discussion: AJP next

Posted by co...@eng.sun.com.
> How it works (or is intended to work) - basically when I send the
> response to apache, I can mark some part of the response data with a
> unique reference id and the expiration time. Then - once the response is
> received in apache, this marked data is stored in a local heap and
> indexed using the reference id.


1. If a response _does_ set expire headers, it should be easy to cache it
on apache side, either in memory or in file system. ( similar with using a
squid accelerator). We don't need any API, just an apache module ( that
can be used for CGIs for example ). It would be really _cool_.

2. For JSPs we have a great advantage - we know that a portion of the page
is constant. ( as a matter of fact, it's one of the main reasons I like
JSPs ! ). It is very easy to mark the constant body.

In API terms, the response from tomcat will consist in fragements. Because  
of  the 2.2 buffering we already have that implemented, and that might
help a lot in another area ( HTTP/1.1 and HTTP connection reuse ).

It would be very easy to send a fragment type, that can be either
"constant" ( with Apache caching it using (1) - either in memory or on
disk), or a special fragment "page fragment - with a file name and 2
offsets" - then Apache will read the fragment from the file directly,
and use the OS caches or mmap or any other optimizations it has for files.
( apache needs to be able to send file fragments anyway - I think it's in
HTTP/1.1).

That's also great for JSP for "large-file" case - the "tomcat" overhead
will be minimal for pages with a lot of text and little dynamic content.

I think we should spend some time in this area, it's really interesting.
( just one more month to 3.1, and we can have fun with optimizations :-) 

Costin


Re: Discussion: AJP next

Posted by Michal Mosiewicz <mi...@interdata.com.pl>.
Jean-Luc Rochat wrote:
> [...]
> Feedback welcome.

Just one more thing...

I've been pretty busy these days and I'm catching up with this
discussion.

In september I've been researching something that in my projects would
give me about 3 to 20 times speedups - a decent caching algorithm.

Something similiar already exists in resin. You can see it's benchmarks
to get the figures. The main idea is that even in case of fully dynamic
content, you may point areas that are more and less dynamic. Some data
may be very volatille and you may need to read them from database or
other storage each new request comes, but some can be valid for several
seconds or minutes. 

At first I tried to accomplish my data caching by using dispatchers and
routing some dispatch requests through apache, so some subrequests would
be cached. But then I noticed that it is not that good idea.

Finally I tested a technique to 'bracket' some parts of the content, and
identify them through references.

How it works (or is intended to work) - basically when I send the
response to apache, I can mark some part of the response data with a
unique reference id and the expiration time. Then - once the response is
received in apache, this marked data is stored in a local heap and
indexed using the reference id.

Of course, the java backends also stores locally this reference to the
data fragment that has just been sent to apache, so it knows that it
doesn't need to send the whole content again. Instead it's sufficient
only to send the reference with the next request. Apache would be able
to use this reference, to get the data from the heap and include it in
the response. 

I didn't finish implementing it cause there was (and still are)
architectural issues making it hard to use. But still the figures was
very promising. For example, that time I was developing a kind of portal
website. The most frequently viewed pages involved a lot of data taken
from database. On a pretty reasonable hardware, I couldn't serve those
pages faster than in 150-250ms. However for most parts of those pages it
would be harmless if I cached them for several seconds, sometimes even
minutes wouldn't make much difference. I tested some code using hacked
JServ. It appeared, that the above delay could be easily shorted to less
than 15ms. Of course - sometimes I could cache the whole page, and I
could serve it even faster - but that's pretty obvious. However there
are many cases, that you just can't cache the whole page, while you are
free to cache some/most parts. 

The more difficult part of this is how to implement such a mechanism in
JSDK. It's really bad idea that subrequests, i.e. included dispatches
are not able to pass the information useful for caching (like
expiration). It could be the easiest just to use them.

But the real gain would be if you could just mark some cacheable areas
while sending the response. It's easy to imagine how powerful it could
be in XML based tools like Cocoon, if you could just mark some cacheable
areas using custom tags.

Anyhow, foreseeing it in the protocol would be a good start to introduce
some API hooks to use it in JSDK.

-- Mike