You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Dirk-Willem van Gulik <di...@elect6.jrc.it> on 1997/11/17 15:27:37 UTC

*sql driver

Ok, See the attached mod_auth_sql; which can now cope
with my-sql, mSQL 1, mSQL 2 and postgress. I have trouble
with seeing how I can add the Sybase and Oracle API's. 

I've got them to work; but the text of the licence is not very
reasuring. Does anyone have specific ideas about this ? Or
should one in the long run aim for ODBC / DBI ?

Any important databases I've missed ?

Also what do people feel about the feature that one can
now return _all_ the fields from a 

	select * from users where uname="someone"

rather than

	select passwd from users where uname="someone"

And have those extra fields (save the passwd) passed on
to CGI/ssi as a env var.

Would it be something to aim to have a generic SQL included
in version 2.0; or should one leave all those *auth* modules
savely at an arms length ?

Dw.

Re: *sql driver

Posted by Martin Kraemer <Ma...@mch.sni.de>.

On Mon, Nov 17, 1997 at 03:27:37PM +0100, Dirk-Willem van Gulik wrote:
> 
> Ok, See the attached mod_auth_sql; which can now cope
> with my-sql, mSQL 1, mSQL 2 and postgress. I have trouble
> with seeing how I can add the Sybase and Oracle API's. 
...
> Dw.

Content-Description: generic SQL module
> /tmp: no free space left on device
> 

Please re-post.

    Martin
-- 
| S I E M E N S |  <Ma...@mch.sni.de>  |      Siemens Nixdorf
| ------------- |   Voice: +49-89-636-46021     |  Informationssysteme AG
| N I X D O R F |   FAX:   +49-89-636-44994     |   81730 Munich, Germany
~~~~~~~~~~~~~~~~My opinions only, of course; pgp key available on request

Re: Want to add file caching to Apache

Posted by Martin Kraemer <Ma...@mch.sni.de>.

On Wed, Nov 19, 1997 at 03:13:01PM -0600, Igor Tatarinov wrote:
> ... But I would like to get more opinions on it. So
> please let me know what you think.

*   It would be desirable to add HTTP/1.1 cache control mechanisms for
    expiration and verification into such a cache. That would of course
    require LRU use and cache file expiration and reusing "cache holes".

*   The OS would impose a slightly higher "penalty" to cache items at the
    end of the cache file, since the double indirect / triple indirect
    block lookup of traditional *nix systems needs a higher overhead
    there.

*   In theory, and with an mostly-in-memory hash lookup cache, this would
    of course yield an impressive performance gain for documents (when
    the expiration needs not be checked against an upstream proxy),
    and no copying has to be done at all either, the document
    can be served right out of the mmaped cache file.

*   This could eventually replace the (rather slow) mod_proxy cache.

    Martin
-- 
| S I E M E N S |  <Ma...@mch.sni.de>  |      Siemens Nixdorf
| ------------- |   Voice: +49-89-636-46021     |  Informationssysteme AG
| N I X D O R F |   FAX:   +49-89-636-44994     |   81730 Munich, Germany
~~~~~~~~~~~~~~~~My opinions only, of course; pgp key available on request

Re: Want to add file caching to Apache

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.

Ben Laurie wrote:
> 
> Igor Tatarinov wrote:
> > It is relatively easy to get a high hit ratio (>80%) in a Web server
> > cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> > you don't need to read it entirely, just look at the graphs)
> 
> This does not agree with a study Digital did (admittedly that was of
> proxy caches). If I remember correctly (and it's entirely possible I
> don't), they got < 40% hits.

Right! But that was about Web proxies!

igor
 
> Cheers,
> 
> Ben.
> 
> --
> Ben Laurie            |Phone: +44 (181) 735 0686|Apache Group member
> Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org
> and Technical Director|Email: ben@algroup.co.uk |Apache-SSL author
> A.L. Digital Ltd,     |http://www.algroup.co.uk/Apache-SSL
> London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Igor Tatarinov, graduate student, Computer Science Dept, NDSU
e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
http://www.cs.ndsu.nodak.edu/~tatarino
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: Want to add file caching to Apache

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.

Marc Slemko wrote:
> 
> On Thu, 20 Nov 1997, Ben Laurie wrote:
> 
> > Igor Tatarinov wrote:
> > > It is relatively easy to get a high hit ratio (>80%) in a Web server
> > > cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> > > you don't need to read it entirely, just look at the graphs)
> >
> > This does not agree with a study Digital did (admittedly that was of
> > proxy caches). If I remember correctly (and it's entirely possible I
> > don't), they got < 40% hits.
> 
> I think that is a critical difference.
> 
> Consider your 2 meg porn site that gets a million hits a day.  A cache
> would certainly help that.
> 
> Consider your site that is a frontend to a multigigabyte database, where
> queries are spread reasonably evenly across the whole thing.  A cache
> would help that considerably less.
> 
> I would suggest it would be worthwhile modelling the hit rates using a
> simulation based on logfiles.  While you will have to fudge a few numbers
> (unless you log in a nonstandard form), it should be reasonable.

That's what I've been doing for almost a year:
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/cache-policies.ps
http://www.cs.ndsu.nodak.edu/~tatarino/pubs/perf-analysis.ps

Log-based simulation has certain problems though. One small problem is 
that it is impossible to check if the requested file can be cached or
it's not allowed to be cached. A related question, is there any way to 
check that in the Apache request handler. I know that there is the 
no_cache field in request_rec but is it really used?
I would guess that files with SSI (and some CGI ouputs) shouldn't be 
cached at all but is there a simple way to check for that?

The main problem with simulation is that you never know how difficult 
it will be to implement what you are suggesting. This wouldn't be a
problem on a single-process, async I/O-based server (which is not 
multiple CPU-scaleable) but Apache is not that ilk.
Because of the above, I will first try to implement a simple policy 
like lru+threshold instead of something really smart. As a result, 
files larger that the threshold will never be cached. Ideally, the
threshold should be tuned automatically.

igor

Re: Want to add file caching to Apache

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.

Rob Hartill wrote:
> 
> 
> serving static content is pretty trivial and has almost no overhead
> on reasonable hardware.
> 
> I've got a 586 Pentium 166mhz doing >20 static file requests per second
> with 90% idle cpu and a load of ~0.2 . It also acts as a mailing list
> server and nameserver when it gets bored.
> 
> I'm not knocking the idea of caching but do wonder if it's a solution
> to a relatively unimportant problem. Are there better bottlenecks to
> take a shot at widening ?

A related question: is it possible to cache cgi script output?
I've heard that SQUID guys check for ? in the file name to find out
if the file was produced by a cgi script. Of course, they also check
the response header but it's not always setup correctly. Anyways, 
sometimes they may cache cgi scripts.

It is probably impossible to cache POST reponses but with GETs there 
should be no problem. 

igor

Re: Want to add file caching to Apache

Posted by Rob Hartill <ro...@imdb.com>.

On Thu, 20 Nov 1997, Marc Slemko wrote:

> On Thu, 20 Nov 1997, Ben Laurie wrote:
> 
> > Igor Tatarinov wrote:
> > > It is relatively easy to get a high hit ratio (>80%) in a Web server
> > > cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> > > you don't need to read it entirely, just look at the graphs)
> > 
> > This does not agree with a study Digital did (admittedly that was of
> > proxy caches). If I remember correctly (and it's entirely possible I
> > don't), they got < 40% hits.
> 
> I think that is a critical difference.
> 
> Consider your 2 meg porn site that gets a million hits a day.  A cache
> would certainly help that.

does it need help ?  :-)

serving static content is pretty trivial and has almost no overhead
on reasonable hardware.

I've got a 586 Pentium 166mhz doing >20 static file requests per second
with 90% idle cpu and a load of ~0.2 . It also acts as a mailing list
server and nameserver when it gets bored.

I'm not knocking the idea of caching but do wonder if it's a solution
to a relatively unimportant problem. Are there better bottlenecks to
take a shot at widening ?

--
Rob Hartill                              Internet Movie Database (Ltd)
http://www.moviedatabase.com/   .. a site for sore eyes.

Re: Want to add file caching to Apache

Posted by Marc Slemko <ma...@worldgate.com>.

On Thu, 20 Nov 1997, Ben Laurie wrote:

> Igor Tatarinov wrote:
> > It is relatively easy to get a high hit ratio (>80%) in a Web server
> > cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> > you don't need to read it entirely, just look at the graphs)
> 
> This does not agree with a study Digital did (admittedly that was of
> proxy caches). If I remember correctly (and it's entirely possible I
> don't), they got < 40% hits.

I think that is a critical difference.

Consider your 2 meg porn site that gets a million hits a day.  A cache
would certainly help that.

Consider your site that is a frontend to a multigigabyte database, where
queries are spread reasonably evenly across the whole thing.  A cache
would help that considerably less.

I would suggest it would be worthwhile modelling the hit rates using a
simulation based on logfiles.  While you will have to fudge a few numbers
(unless you log in a nonstandard form), it should be reasonable.

Re: Want to add file caching to Apache

Posted by Ben Laurie <be...@algroup.co.uk>.

Igor Tatarinov wrote:
> It is relatively easy to get a high hit ratio (>80%) in a Web server
> cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> you don't need to read it entirely, just look at the graphs)

This does not agree with a study Digital did (admittedly that was of
proxy caches). If I remember correctly (and it's entirely possible I
don't), they got < 40% hits.

Cheers,

Ben.

-- 
Ben Laurie            |Phone: +44 (181) 735 0686|Apache Group member
Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org
and Technical Director|Email: ben@algroup.co.uk |Apache-SSL author
A.L. Digital Ltd,     |http://www.algroup.co.uk/Apache-SSL
London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache

Re: Want to add file caching to Apache

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 19 Nov 1997, Igor Tatarinov wrote:

> The main benefits are that
> + in 80% (hit rate) of cases we don't need to open, mmap (or copy), and close
> the file.
> + file system cache policy is not good (a large requested file may flush the
> entire cache)

I disagree with the latter claim.  It's true if you serialize requests
against the filesystem, but in reality small documents are being requested
all the time while a large document is being served.  So those small
documents are not going to be knocked out of the fs cache.  If the large
file is really busy then perhaps it needs to be in the cache anyhow, and
the webmaster will have to deal with the #1 hardware requirement of
webservers:  RAM.

Of course my claim is useless unless I back it up with data :) 

Dean

Re: Want to add file caching to Apache

Posted by Dirk-Willem van Gulik <di...@elect6.jrc.it>.

On Wed, 19 Nov 1997, Igor Tatarinov wrote:

We've been playing with something similar. The early conclusion was
using a _lossy_ pipe after a document was servered, i.e. no locking
or anything fancy, just pushing it down and hoping the thing would
arrive in one piece before the next request for the child comes
in. But it is a thorny issue, well worth wasting real phd cycles on :-)
(Hey, if you are looking at working in sunny italy :-)

> First, let me exaplain what I mean by a file cache. It is not reusing 
> mmaped files. Instead, I am suggesting allocating a shared memory segment
> (1-64M, may be even larger) that would store copies of frequently requested
> files. It is relatively easy to get a high hit ratio (>80%) in a Web server
> cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
> you don't need to read it entirely, just look at the graphs)
> 
> The main benefits are that
> + in 80% (hit rate) of cases we don't need to open, mmap (or copy), and close
> the file.
> + file system cache policy is not good (a large requested file may flush the
> entire cache)
> 
> Let me emphasize that we don't have to cache _all_ requested files. A smart
> cache policy would instead admit only smaller, more popular files.
> Unfortunately, that may be hard to implement. But there is a simple
> policy LRU+threshold that doesn't cache large files and performs pretty well.
> 
> The possible problems are:
> - whenever we cache a document, an extra memcopy has to be done
> this is true but it needs to be done only on a cache miss (20%, actually
> less than that since not all misses require caching the requested file)
> - additional synchronization overhead
> that's true, but I think the benefits will outweigh this. The real problem is
> that cheap inter-process synch mechanisms are only available in Solaris (once
> Apache becomes multithreaded, synch will be simpler)
> 
> Finally, let me mention the killer idea (presented in the paper referenced
> above). I know most of you will not like it but I still believe that it's nice
> and useful. The paper talks about static caching, that is filling the cache
> once a day (week,etc) and not replacing anything till the end of the day. As
> odd as it may look, this policy often performs better than anything else. Its
> main advanatge is that there is absolutely no extra overhead during the entire
> day. Refilling a cache should take no more than a minute, so it shouldn't be a
> problem.
> One may argue that in this policy, new documents are not cached until the end
> of the day. That's true but the number of sites that continuously create new
> pages (ala CNN) is really small.
> 
> Well, seems like I've said too much already. 
> 
> Thanks for reading
> Waiting for feedback
> igor
>

Re: Want to add file caching to Apache

Posted by Brian Behlendorf <br...@organic.com>.

At 03:13 PM 11/19/97 -0600, Igor Tatarinov wrote:
>Finally, let me mention the killer idea (presented in the paper referenced
>above). I know most of you will not like it but I still believe that it's
nice
>and useful. The paper talks about static caching, that is filling the cache
>once a day (week,etc) and not replacing anything till the end of the day. As
>odd as it may look, this policy often performs better than anything else. Its
>main advanatge is that there is absolutely no extra overhead during the
entire
>day. Refilling a cache should take no more than a minute, so it shouldn't
be a
>problem.
>One may argue that in this policy, new documents are not cached until the end
>of the day. That's true but the number of sites that continuously create new
>pages (ala CNN) is really small.

I know of at least one company that does this today.  In their system it's
not only the reading from the filesystem which is cached, but also (since
their "pages" are dynamic templates, a la mod_include) they cache the parse
tree.  But their web site is very very focused, and still they make
reasonable use of a gigabyte of RAM on their front-end machines.

There are a couple models which could be provided:

1) a LRU dynamic cache where documents are always checked for freshness
before being served
2) a LRU dynamic cache where documents are only checked once a [time
period] for freshness, through some batch out-of-band process
3) a static cache where all documents being served are loaded into memory.
4) a static cache where a particular set of documents is pre-loaded.  This
set could be determined by a better algorithm than LRU, looking at, say,
the previous days' trend.

Also, caching the parse tree for any template-based system is going to be a
big win; if you focused on that you'd probably be doing the world the
greatest good.  There's a lot of room for improvement in mod_include.

Good luck!

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"it's a big world, with lots of records to play." - sig   brian@organic.com

Want to add file caching to Apache

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.

Hi there,

I've already discussed my idea with Dean and he eventually agreed that 
it makes certain sence. But I would like to get more opinions on it. So
please let me know what you think.

I am suggesting and willing to implement a document (file) cache in Apache. (I
am a Ph.D. student and I am doing reserach in Web caching so
this could help my dissertation; if I'll ever write one :)

First, let me exaplain what I mean by a file cache. It is not reusing 
mmaped files. Instead, I am suggesting allocating a shared memory segment
(1-64M, may be even larger) that would store copies of frequently requested
files. It is relatively easy to get a high hit ratio (>80%) in a Web server
cache (see for example, http://www.cs.ndsu.nodak.edu/~tatarino/pubs/static.ps
you don't need to read it entirely, just look at the graphs)

The main benefits are that
+ in 80% (hit rate) of cases we don't need to open, mmap (or copy), and close
the file.
+ file system cache policy is not good (a large requested file may flush the
entire cache)

Let me emphasize that we don't have to cache _all_ requested files. A smart
cache policy would instead admit only smaller, more popular files.
Unfortunately, that may be hard to implement. But there is a simple
policy LRU+threshold that doesn't cache large files and performs pretty well.

The possible problems are:
- whenever we cache a document, an extra memcopy has to be done
this is true but it needs to be done only on a cache miss (20%, actually
less than that since not all misses require caching the requested file)
- additional synchronization overhead
that's true, but I think the benefits will outweigh this. The real problem is
that cheap inter-process synch mechanisms are only available in Solaris (once
Apache becomes multithreaded, synch will be simpler)

Finally, let me mention the killer idea (presented in the paper referenced
above). I know most of you will not like it but I still believe that it's nice
and useful. The paper talks about static caching, that is filling the cache
once a day (week,etc) and not replacing anything till the end of the day. As
odd as it may look, this policy often performs better than anything else. Its
main advanatge is that there is absolutely no extra overhead during the entire
day. Refilling a cache should take no more than a minute, so it shouldn't be a
problem.
One may argue that in this policy, new documents are not cached until the end
of the day. That's true but the number of sites that continuously create new
pages (ala CNN) is really small.

Well, seems like I've said too much already. 

Thanks for reading
Waiting for feedback
igor

Re: *sql driver

Posted by Dirk-Willem van Gulik <di...@elect6.jrc.it>.

On Wed, 19 Nov 1997 rasmus@bellglobal.com wrote:

> Ok, but what I meant was that it wouldn't be that hard to make mod_auth_*sql
> keep the connection open.  There isn't much of a layer needed.  Then again,

It does now (and used to do with mod_auth_mysql and _msql when compiled
with the KEEP_OPEN flag). But this is a bit trouble some, for 15 apche
children, you eat away 32 file descriptors and quite some resources on
both ends of the conenction. Plus I am a bit unhappy with the serializing
of mSQL in particular. Given that we are talking about reads onlu.

Dw.

Re: *sql driver

Posted by ra...@bellglobal.com.

> >That doesn't have to be true.  The way it is done in PHP3 is to optionally
> >keep the connection between the httpd and the sql engine open for the
> >life of the httpd process.  Most of the slowness from a keyed query is in
> >establishing the connection between the calling client and the SQL server.
> 
> Right.  I'd call PHP in this case the middleware layer :)

Ok, but what I meant was that it wouldn't be that hard to make mod_auth_*sql
keep the connection open.  There isn't much of a layer needed.  Then again,
the whole world could just start using mod_php3 or mod_perl to do all
authentication.  

-Rasmus

Re: *sql driver

Posted by Brian Behlendorf <br...@organic.com>.

At 08:33 AM 11/19/97 -0500, Rasmus Lerdorf wrote:
>> Would be interesting, but if we don't have some middleware layer caching
>> the responses to those queries it'll be fairly slow.
>
>That doesn't have to be true.  The way it is done in PHP3 is to optionally
>keep the connection between the httpd and the sql engine open for the
>life of the httpd process.  Most of the slowness from a keyed query is in
>establishing the connection between the calling client and the SQL server.

Right.  I'd call PHP in this case the middleware layer :)

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"it's a big world, with lots of records to play." - sig   brian@organic.com

Re: *sql driver

Posted by Dirk-Willem van Gulik <di...@elect6.jrc.it>.

On Wed, 19 Nov 1997, Rasmus Lerdorf wrote:

> > Would be interesting, but if we don't have some middleware layer caching
> > the responses to those queries it'll be fairly slow.
> 
> That doesn't have to be true.  The way it is done in PHP3 is to optionally
> keep the connection between the httpd and the sql engine open for the
> life of the httpd process.  Most of the slowness from a keyed query is in
> establishing the connection between the calling client and the SQL server.

Same for mSQL and mySQL if you compile it with the right flags. Also
do not forget that for example mSQL caches itsel verry effictively
and allows for a local device connection which is _fast_.

Dw.

Re: *sql driver

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.

> Would be interesting, but if we don't have some middleware layer caching
> the responses to those queries it'll be fairly slow.

That doesn't have to be true.  The way it is done in PHP3 is to optionally
keep the connection between the httpd and the sql engine open for the
life of the httpd process.  Most of the slowness from a keyed query is in
establishing the connection between the calling client and the SQL server.

-Rasmus

Re: *sql driver

Posted by Brian Behlendorf <br...@organic.com>.

At 03:27 PM 11/17/97 +0100, Dirk-Willem van Gulik wrote:
>Also what do people feel about the feature that one can
>now return _all_ the fields from a 
>
>	select * from users where uname="someone"
>
>rather than
>
>	select passwd from users where uname="someone"
>
>And have those extra fields (save the passwd) passed on
>to CGI/ssi as a env var.

Would be interesting, but if we don't have some middleware layer caching
the responses to those queries it'll be fairly slow.

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"it's a big world, with lots of records to play." - sig   brian@organic.com