You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Joe Schaefer <jo...@sunstarsys.com> on 2000/09/15 00:28:50 UTC

mod_perl guide corrections

Stas,

I was looking over the latest version of the performance
section, and I have a few suggestions/comments regarding

http://perl.apache.org/guide/performance.html

1) Your description of keep-alive performance is confusing. 
Every browser I've seen that implements keep-alives
will open at least 2 connections per server (HTTP/1.1 mandates
a max of 2, but 1.0 browsers like netscape open 3 or more).  The
browsers are usually smart enough to round-robin the requests
between connections, so there's really no sequential delay.

Furthermore, HTTP/1.1 keepalive connections can be pipelined, to
make multiple requests on each connection without waiting for each 
server response.  In any real-world implementation, keepalives 
are a clear winner over closed connections, even if they're 
only left idle for a second or two.

Since most of us put a reverse proxy between the mod_perl server
and the browser; running keepalives on the browser<->proxy connection
is desirable.

I think your section on keepalives and their usefulness should include
some of the above comments.  Recently I posted a patch for mod_proxy 
to the mod_perl mailing list that enables keep-alives on the browser side;
I've also written a small Apache Perl module that does the same thing.
They also do store-and-forward on the request body (POST data), which 
addresses an issue you raised in 

http://perl.apache.org/guide/scenario.html#Buffering_Feature 
...
There is no buffering of data uploaded from the client browser to the proxy, 
thus you cannot use this technique to prevent the heavy mod_perl server from 
being tied up during a large POST such as a file upload. Falling back to mod_cgi 
seems to be the best solution for these specific scripts whose major function is 
receiving large amounts of upstream data. 
...


2) Apache::Request is better than your performance numbers indicate.

The problem I have with your comparison with Apache::args vs Apache::Request vs CGI
is that your benchmark code isn't fair.  You're comparing method calls against
hash-table lookups, which is apples and oranges.  To get more representative
numbers, try the following code instead

processing_with_apache_request.pl
             ---------------------------------
             use strict;
             use Apache::Request ();
             my $r = shift;
             my $q = Apache::Request->new($r);
             $r->send_http_header('text/plain');
             my $args = $q->param; # hash ref
             print join "\n", map {"$_ => ".$$args{$_} } keys %$args;

and similiarly for CGI.  The numbers you get should more accurately reflect
the performance of each.

HTH
-- 
Joe Schaefer
joe@sunstarsys.com

SunStar Systems, Inc.



Re: mod_perl guide corrections

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Doug MacEachern <do...@covalent.net> writes:

> >              my $args = $q->param; # hash ref
> 
> you mean parms() ?   the Apache::Request::parms hash ref is tied, so
> there are still method calls, but less than calling params(), which does
> extra stuff to emulate CGI::params.  

I just looked at line 21 from Request.pm; it looks like $q->param() returns
the same thing as $q->parms does; but surely $q->parms is even better!

> parms() is going to be renamed (to something less like params()) and 
> documented as faster than using the params() wrapper, in the next release.

A new libapreq release? Great news! Here's YAP for libapreq - I added Dave Mitchell's
memmem in multipart_buffer.c for better portability, and made some minor changes
to apache_request.c to eliminate some unnecessary copying. I'd be glad to send
you a url to a production server, if you'd like to see it in action.

HTH, and thanks again.
-- 
Joe Schaefer
joe@sunstarsys.com

SunStar Systems, Inc.


Re: mod_perl guide corrections

Posted by Doug MacEachern <do...@covalent.net>.
On 14 Sep 2000, Joe Schaefer wrote:
 
> 2) Apache::Request is better than your performance numbers indicate.
> 
> The problem I have with your comparison with Apache::args vs Apache::Request vs CGI
> is that your benchmark code isn't fair.  You're comparing method calls against
> hash-table lookups, which is apples and oranges.  To get more representative
> numbers, try the following code instead

>              my $args = $q->param; # hash ref

you mean parms() ?   the Apache::Request::parms hash ref is tied, so
there are still method calls, but less than calling params(), which does
extra stuff to emulate CGI::params.  parms() is going to be
renamed (to something less like params()) and documented as faster than
using the params() wrapper, in the next release.


Re: patches to mod_proxy (was: Re: mod_perl guide corrections)

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Roger Espel Llima <es...@iagora.net> writes:

> On Tue, Sep 19, 2000 at 03:24:50PM -0400, Joe Schaefer wrote:
> > On linux, the ext2 filesystem is VERY efficient at buffering filesystem 
> > writes (see http://www.tux.org/lkml/#s9-12).  If the post data is small 
> > ( I don't know what the default size is, but the FILE buffer for the tmpfile
> > is adjustable with setvbuf) it's never written to disk.  AFAIK, the only 
> > problem with this arrangement for small posts is the extra file descriptor 
> > consumed by the apache process.  
> 
> Yeah, I know it's fairly negligible, but I'm not sure the FILE buffer is
> the one that matters here.  If I fwrite(), rewind() and then fread()
> again, AFAIK libc's stdio still translates this into real kernel
> write(), lseek(), read()  [strace woudl be the final judge here].  From
> there, the kernel can be smart enough to not actually even touch the
> disk, but that doesn't work with e.g journaling filesystems which impose
> stronger sequential conditions on disk writes, or systems like BSD that
> do synchronous metadata updates.  And in any case, you're still doing
> extra memory copies to and from kernel space.
> 
> If it was hard to do it otherwise i'd agree with you, but it sounds so
> simple to keep it in a memory buffer when it's under 16k or some similar
> limit, that I just think it's much more "obviously right" to do it that
> way.

Sounds good- thanks for the details. How about making a lower limit 
(for switching from a memory buffer to tmpfile) configurable? 

Any thoughts on what the directive should be called?

-- 
Joe Schaefer
joe@sunstarsys.com

SunStar Systems, Inc.

Re: patches to mod_proxy (was: Re: mod_perl guide corrections)

Posted by Roger Espel Llima <es...@iagora.net>.
On Tue, Sep 19, 2000 at 03:24:50PM -0400, Joe Schaefer wrote:
> On linux, the ext2 filesystem is VERY efficient at buffering filesystem 
> writes (see http://www.tux.org/lkml/#s9-12).  If the post data is small 
> ( I don't know what the default size is, but the FILE buffer for the tmpfile
> is adjustable with setvbuf) it's never written to disk.  AFAIK, the only 
> problem with this arrangement for small posts is the extra file descriptor 
> consumed by the apache process.  

Yeah, I know it's fairly negligible, but I'm not sure the FILE buffer is
the one that matters here.  If I fwrite(), rewind() and then fread()
again, AFAIK libc's stdio still translates this into real kernel
write(), lseek(), read()  [strace woudl be the final judge here].  From
there, the kernel can be smart enough to not actually even touch the
disk, but that doesn't work with e.g journaling filesystems which impose
stronger sequential conditions on disk writes, or systems like BSD that
do synchronous metadata updates.  And in any case, you're still doing
extra memory copies to and from kernel space.

If it was hard to do it otherwise i'd agree with you, but it sounds so
simple to keep it in a memory buffer when it's under 16k or some similar
limit, that I just think it's much more "obviously right" to do it that
way.

-- 
Roger Espel Llima, espel@iagora.net
http://www.iagora.com/~espel/index.html

Re: patches to mod_proxy (was: Re: mod_perl guide corrections)

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Roger Espel Llima <es...@iagora.net> writes:

> > The patch makes mod_proxy buffer the post data in a temp file
> > by setting the (new) ProxyPostMax directive to a positive number.
> > If the Content-Length header supplied by Z is greater than this
> > number, mod_proxy rejects the post request.
> 
> Why a temp file?  Maybe I'm particular about this but I don't like
> programs writing to temp files and re-reading them for no particular
> reason.  Since you're limiting the size anyway, why not just make it a
> memory buffer?  Or you could write to a temp file only when it's greater
> than some constant (say, 16k), which would let most of your POSTs go
> without touching the filesystem.

On linux, the ext2 filesystem is VERY efficient at buffering filesystem 
writes (see http://www.tux.org/lkml/#s9-12).  If the post data is small 
( I don't know what the default size is, but the FILE buffer for the tmpfile
is adjustable with setvbuf) it's never written to disk.  AFAIK, the only 
problem with this arrangement for small posts is the extra file descriptor 
consumed by the apache process.  

It might be a good idea to put in a config setting for the FILE buffer size
also- similar to ProxyReceiveBufferSize.

-- 
Joe Schaefer
joe@sunstarsys.com

SunStar Systems, Inc.

Re: patches to mod_proxy (was: Re: mod_perl guide corrections)

Posted by Roger Espel Llima <es...@iagora.net>.
Joe Schaefer wrote:
> 1) Z requests a dynamic page from A.
> 
> Z -GET 1.1-> A -PROXY-> B -PROXY-> A -CLOSE-> Z
> 
> The current mod_proxy CLOSES the connection from A to Z,
> even if Z requests keepalives, and A implements them.  This
> is bad since subsequent requests for static content (images/stylesheets,etc.)
> will require a new connection.
> 
> The patch should prevent mod_proxy from forcibly closing the 
> A-Z connection.

Sounds good to me.  Does anyone know just why mod_proxy forcibly closes
it by default?  It sounds to me like it would have to actually have
explicit code to forcibly close it, otherwise it woudl be using apache's
generic mechanisms which handle keep-alives...

> 2) Z posts form data that will ultimately be handled by B.
> 
> Z -POST-> A ->PROXY-> B
> 
> Currently, mod_proxy opens the connection to B as soon as it
> determines B is the ultimate destination.  As the POST data 
> is read from Z to A, it is passed along directly to B.  This
> will tie up both A and B if the A-Z connection is slow and/or
> the post data is huge.
> 
> The patch makes mod_proxy buffer the post data in a temp file
> by setting the (new) ProxyPostMax directive to a positive number.
> If the Content-Length header supplied by Z is greater than this
> number, mod_proxy rejects the post request.

Why a temp file?  Maybe I'm particular about this but I don't like
programs writing to temp files and re-reading them for no particular
reason.  Since you're limiting the size anyway, why not just make it a
memory buffer?  Or you could write to a temp file only when it's greater
than some constant (say, 16k), which would let most of your POSTs go
without touching the filesystem.

-- 
Roger Espel Llima, espel@iagora.net
http://www.iagora.com/~espel/index.html

Re: mod_perl guide corrections

Posted by Joe Schaefer <jo...@sunstarsys.com>.
<te...@blackhole.acon.nl> writes:

> What if you wanted the functionality of the fase handlers before and after
> the loading of the file..
>
> Could this also be accomplished by proper use of configuration statements
> in http.conf?
> Right now I do not think so, so getting the child tied up for the time
> of the upload I take for granted.
> 

I'm not quite sure what your driving at.  Let me see if I can
describe how things work now, and what I'm trying to accomplish with the 
patch.

Setup: 
       A = mod_proxy enabled front-end server; 
               keepalives enabled 
               delivers static content (images, stylesheets, etc)
               proxies dynamic content 

       B = mod_perl server; responsible for dynamic content; 
               keepalives disabled
        
       Z = browser 

Event:
1) Z requests a dynamic page from A.

Z -GET 1.1-> A -PROXY-> B -PROXY-> A -CLOSE-> Z

The current mod_proxy CLOSES the connection from A to Z,
even if Z requests keepalives, and A implements them.  This
is bad since subsequent requests for static content (images/stylesheets,etc.)
will require a new connection.

The patch should prevent mod_proxy from forcibly closing the 
A-Z connection.


2) Z posts form data that will ultimately be handled by B.

Z -POST-> A ->PROXY-> B

Currently, mod_proxy opens the connection to B as soon as it
determines B is the ultimate destination.  As the POST data 
is read from Z to A, it is passed along directly to B.  This
will tie up both A and B if the A-Z connection is slow and/or
the post data is huge.

The patch makes mod_proxy buffer the post data in a temp file
by setting the (new) ProxyPostMax directive to a positive number.
If the Content-Length header supplied by Z is greater than this
number, mod_proxy rejects the post request.

Once the post data has been uploaded from Z->A, the patched
mod_proxy opens the connection to B and delivers the POST data
directly from the temp file.

That's what I'm trying to accomplish with the mod_proxy patch.
I've done only minimal testing on http requests; https is NOT
implemented at all.

I'd need something like this implemented, since I use mod_perl 
for authenticating "POSTers". In my case the POST data must
be processed by the mod_perl server.

Any help/suggestions are welcome and appreciated!

-- 
Joe Schaefer
joe@sunstarsys.com

SunStar Systems, Inc.

Re: mod_perl guide corrections

Posted by te...@blackhole.acon.nl.

On 14 Sep 2000, Joe Schaefer wrote:

> Stas,
> 
> 
> http://perl.apache.org/guide/scenario.html#Buffering_Feature 
> ...
> There is no buffering of data uploaded from the client browser to the proxy, 
> thus you cannot use this technique to prevent the heavy mod_perl server from 
> being tied up during a large POST such as a file upload. Falling back to mod_cgi 
> seems to be the best solution for these specific scripts whose major function is 
> receiving large amounts of upstream data. 
> ...


What if you wanted the functionality of the fase handlers before and after
the loading of the file..

Could this also be accomplished by proper use of configuration statements
in http.conf?
Right now I do not think so, so getting the child tied up for the time
of the upload I take for granted.

Of course I have been mistaken several other times.

Arnold