You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1999/11/18 00:03:10 UTC

proxy caching is important

first off i'll ignore the case of using proxy caches for firewalls... that
is well served by squid, and has somewhat different requirements.

but the case of using HTTP proxying to hide other (usually dynamic
content) webservers behind a single url-space, this is really interesting.  
it's actually just one case of the general problem of hiding any dynamic
content engine in the url-space.

much "dynamic" content is actually static, and could take advantage of
caching.  caching content is the same whether it gets to the server via
HTTP or some other protocol (such as IPC with another process running a
JVM or perl or php).

my personal opinion is that the caching function should be well integrated
with the core, and be generic enough to support multiple methods of
populating the cache.

in a sense, i envision apache as an "HTTP router".  it communicates with
the client, parses the url, consults its cache and serves from there if
available, otherwise it passes through to a backend.

notice that serving static content and serving from a cache are
essentially identical.  the only difference is the url->filename map.

Dean



Re: proxy caching is important

Posted by Brian Behlendorf <br...@apache.org>.
On Wed, 17 Nov 1999, Dean Gaudet wrote:
> in a sense, i envision apache as an "HTTP router".  it communicates with
> the client, parses the url, consults its cache and serves from there if
> available, otherwise it passes through to a backend.

I've seen it this way for a long time; I see lots of groups trying to
figure out their own backend protocols as well.

	Brian




Re: proxy caching is important

Posted by Tony Finch <do...@dotat.at>.
Graham Leggett <mi...@sharp.fm> wrote:
>
>The problem of checking the whether cached data has changed on disk can
>be handled the same way a normal forward proxy handles it - using data
>aging and shift-reload in the browser to force a cache refresh.

In the situation where we use reverse proxy caches (an almost-entirely
static content server) it's possible to do much better than the usual
"guess and hope we aren't too far wrong" approach to cache freshness.
We know about all the updates to the content on the server and we have
a mechanism for notifying the caches of changes, so when users alter
their pages they see the update immediately without having to fiddle
with shift-reload (which they find quite tricky).

The cacheing mechanism in Apache should aim for that level of service.

Tony.
-- 
how to dot at

Re: proxy caching is important

Posted by Graham Leggett <mi...@sharp.fm>.
Ryan Bloom wrote:

> Okay, I gave my APR opinion, now for my Apache opinion.  (I love changing
> hats  :-)

:)

> I don't see a good way to do this without I/O layering, or without being
> able to determine which modules are loaded in the server.  I/O layering
> isn't coming until 2.1 at the earliest.  That is an upsetting but true
> fact.  It is just going to take too long to put it into 2.0, and we'll
> never release a beta.

This is why I would build the hashing functions into either APR or if
not appropriate the Apache core.

The putting-objects-into-cache function would then be built in to the
other modules where needed and practical, knowing that without IO
layering the full solution isn't here yet. For example the module
responsible for shipping static content should be the first place to
build the cache into, what gets pumped to the browser also gets pumped
to the cache. These modules will be using the hash table functions
implemented either in APR or the core, but not in mod_cache (even though
it would be nice if it was).

The pulling-cached-objects-out-and-serving-them job will be done by a
piece of code called mod_cache. It's job is to decide if it's
appropriate to serve cached data (Pragma: no-cache stopping me
perhaps?), then to determine if there is cached data (no there isn't,
I'll get over it) and last of all to ship the data to the client if all
is well.

I get your point about the hashing function, so the underlying "common
code" will be a general hash table accesible to all of Apache-land
either via APR or via the Apache core, whichever. Putting stuff into the
cache will be a function of those areas of the code that have been
modified to support it. To start with simple bits like the serving of
static content only, with more complex bits down the line. Finally the
last third of the job will be handling by mod_cache as described above.
mod_cache would be only 33% of the solution, not 100%.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...

Re: proxy caching is important

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.
Okay, I gave my APR opinion, now for my Apache opinion.  (I love changing
hats  :-)

> Throughout the other modules, content that is shipped to the browser is
> either given to the cache API via ap_cache_in() (if it's static and
> should be cached), or deleted from the cache via ap_cache_del() (if it's
> dynamic and shouldn't be cached, like CGIs).
> 
> The mod_cache module would then be responsible for deciding whether it
> should deliver the content from the cache using ap_cache_out().
> 
> The problem of checking the whether cached data has changed on disk can
> be handled the same way a normal forward proxy handles it - using data
> aging and shift-reload in the browser to force a cache refresh.
> 
> Is this a good way to do this? What do people think?

I don't see a good way to do this without I/O layering, or without being
able to determine which modules are loaded in the server.  I/O layering
isn't coming until 2.1 at the earliest.  That is an upsetting but true
fact.  It is just going to take too long to put it into 2.0, and we'll
never release a beta.

Determining which modules are loaded is easy, and I know how to do it, but
it relies on hash functions, and I haven't had a lot of time recently.
This would allow modules to call ap_is_mod_loaded(char *), where the char
* is the name of the module minus the mod_.  It would return TRUE or
FALSE.  This would allow mod_cgi to call mod_cache's api's without
worrying about whether or not the module were there.  Mod_cgi could
determine if the modules were there.  I would also expect modules to cache
the results themselves, not that this owuld be an expensive function.

Just some thoughts.

Ryan

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.	


Re: proxy caching is important

Posted by Graham Leggett <mi...@sharp.fm>.
Ryan Bloom wrote:

> Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
> ap_cache_(in|out|etc).  This is because I think apr needs to be more
> generalized than that.  What we really want/need, is a simple hash table.
> The contexts already let us put this into whatever scope we want to put it
> in.

That's pretty much what I was after, which was why I (vaguely) indicated
the "key" in brackets.

The "cache" or "hash table" library's job would be to store binary
objects in a combination of memory and disk, accessed through some kind
of key. That key could be anything, a URL (in our case), a string of
digits (in some-other-app-that-we-haven't-thought-of-yet), some string
basically.

The three functions I listed would be all that's needed - put an object
in the hash table, take one out of the hash table, delete the object
from the hash table, or get info about an object in the hash table (it's
size, mime type, whatever).

> BTW, if the hash table functions aren't in APR already, they will be there
> soon.  I am having a hard time keeping up with everything right now. :-)

If the hash table you're thinking of is the same as the one I'm thinking
of, then I thing half the problem will be already solved.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...

Re: proxy caching is important

Posted by Dean Gaudet <dg...@arctic.org>.
+1 :)

Dean

On Thu, 18 Nov 1999, Greg Stein wrote:

> If that's your intent, then don't port it. You'll just drag out the
> problem. People get motivated when something doesn't work. :-)
> 
> -g
> 
> On Thu, 18 Nov 1999, Ryan Bloom wrote:
> > I only volunteered to port the proxy module so we would have a working
> > module while somebody else re-designed the thing.  I think re-designing
> > the proxy module is in everybody's best interest, but we need something
> > that works in the mean-time.
> > 
> > Ryan
> > 
> > On Thu, 18 Nov 1999, Greg Stein wrote:
> > > I volunteered to axe the proxy stuff. Not rewrite it :-)
> > > 
> > > It looks like Martin and Ryan have stepped up to at least port the thing
> > > to 2.0. That renders some of my rationale as moot (although I still think
> > > we could reduce our complexity, bug count, maintenance by axing some/all
> > > of the proxy support).
> > > 
> > > Cheers,
> > > -g
> 
> -- 
> Greg Stein, http://www.lyra.org/
> 
> 


Re: proxy caching is important

Posted by Greg Stein <gs...@lyra.org>.
If that's your intent, then don't port it. You'll just drag out the
problem. People get motivated when something doesn't work. :-)

-g

On Thu, 18 Nov 1999, Ryan Bloom wrote:
> I only volunteered to port the proxy module so we would have a working
> module while somebody else re-designed the thing.  I think re-designing
> the proxy module is in everybody's best interest, but we need something
> that works in the mean-time.
> 
> Ryan
> 
> On Thu, 18 Nov 1999, Greg Stein wrote:
> > I volunteered to axe the proxy stuff. Not rewrite it :-)
> > 
> > It looks like Martin and Ryan have stepped up to at least port the thing
> > to 2.0. That renders some of my rationale as moot (although I still think
> > we could reduce our complexity, bug count, maintenance by axing some/all
> > of the proxy support).
> > 
> > Cheers,
> > -g

-- 
Greg Stein, http://www.lyra.org/


Re: proxy caching is important

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.
I only volunteered to port the proxy module so we would have a working
module while somebody else re-designed the thing.  I think re-designing
the proxy module is in everybody's best interest, but we need something
that works in the mean-time.

Ryan

On Thu, 18 Nov 1999, Greg Stein wrote:

> I volunteered to axe the proxy stuff. Not rewrite it :-)
> 
> It looks like Martin and Ryan have stepped up to at least port the thing
> to 2.0. That renders some of my rationale as moot (although I still think
> we could reduce our complexity, bug count, maintenance by axing some/all
> of the proxy support).
> 
> Cheers,
> -g
> 
> 
> On Thu, 18 Nov 1999, David Reid wrote:
> 
> > Nice of you to volunteer????
> > 
> > d :-)
> > ----- Original Message -----
> > From: Greg Stein <gs...@lyra.org>
> > To: <ne...@apache.org>
> > Sent: 18 November 1999 12:18
> > Subject: Re: proxy caching is important
> > 
> > 
> > > On Thu, 18 Nov 1999, Ryan Bloom wrote:
> > > > > In terms of the caching itself, how about a generic mod_cache module
> > > > > that handles it?
> > > > >
> > > > > A generic set of caching functions could be added to APR, something
> > > > > like:
> > > >
> > > > Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
> > > > ap_cache_(in|out|etc).  This is because I think apr needs to be more
> > > > generalized than that.  What we really want/need, is a simple hash
> > table.
> > > > The contexts already let us put this into whatever scope we want to put
> > it
> > > > in.
> > > >
> > > > BTW, if the hash table functions aren't in APR already, they will be
> > there
> > > > soon.  I am having a hard time keeping up with everything right now. :-)
> > >
> > > That's because something like the caching doesn't go in APR. It is easily
> > > built on *top* of APR. You can keep up if you keep stuff *out* of APR :-)
> > >
> > > And on a separate note: I also believe the HTTP/FTP fetching stuff does
> > > not go into APR either, but into a utility library on top of it.
> > >
> > > Cheers,
> > > -g
> > >
> > > --
> > > Greg Stein, http://www.lyra.org/
> > >
> > 
> 
> -- 
> Greg Stein, http://www.lyra.org/
> 

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.	


Re: proxy caching is important

Posted by Greg Stein <gs...@lyra.org>.
I volunteered to axe the proxy stuff. Not rewrite it :-)

It looks like Martin and Ryan have stepped up to at least port the thing
to 2.0. That renders some of my rationale as moot (although I still think
we could reduce our complexity, bug count, maintenance by axing some/all
of the proxy support).

Cheers,
-g


On Thu, 18 Nov 1999, David Reid wrote:

> Nice of you to volunteer????
> 
> d :-)
> ----- Original Message -----
> From: Greg Stein <gs...@lyra.org>
> To: <ne...@apache.org>
> Sent: 18 November 1999 12:18
> Subject: Re: proxy caching is important
> 
> 
> > On Thu, 18 Nov 1999, Ryan Bloom wrote:
> > > > In terms of the caching itself, how about a generic mod_cache module
> > > > that handles it?
> > > >
> > > > A generic set of caching functions could be added to APR, something
> > > > like:
> > >
> > > Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
> > > ap_cache_(in|out|etc).  This is because I think apr needs to be more
> > > generalized than that.  What we really want/need, is a simple hash
> table.
> > > The contexts already let us put this into whatever scope we want to put
> it
> > > in.
> > >
> > > BTW, if the hash table functions aren't in APR already, they will be
> there
> > > soon.  I am having a hard time keeping up with everything right now. :-)
> >
> > That's because something like the caching doesn't go in APR. It is easily
> > built on *top* of APR. You can keep up if you keep stuff *out* of APR :-)
> >
> > And on a separate note: I also believe the HTTP/FTP fetching stuff does
> > not go into APR either, but into a utility library on top of it.
> >
> > Cheers,
> > -g
> >
> > --
> > Greg Stein, http://www.lyra.org/
> >
> 

-- 
Greg Stein, http://www.lyra.org/


Re: proxy caching is important

Posted by David Reid <ab...@dial.pipex.com>.
Nice of you to volunteer????

d :-)
----- Original Message -----
From: Greg Stein <gs...@lyra.org>
To: <ne...@apache.org>
Sent: 18 November 1999 12:18
Subject: Re: proxy caching is important


> On Thu, 18 Nov 1999, Ryan Bloom wrote:
> > > In terms of the caching itself, how about a generic mod_cache module
> > > that handles it?
> > >
> > > A generic set of caching functions could be added to APR, something
> > > like:
> >
> > Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
> > ap_cache_(in|out|etc).  This is because I think apr needs to be more
> > generalized than that.  What we really want/need, is a simple hash
table.
> > The contexts already let us put this into whatever scope we want to put
it
> > in.
> >
> > BTW, if the hash table functions aren't in APR already, they will be
there
> > soon.  I am having a hard time keeping up with everything right now. :-)
>
> That's because something like the caching doesn't go in APR. It is easily
> built on *top* of APR. You can keep up if you keep stuff *out* of APR :-)
>
> And on a separate note: I also believe the HTTP/FTP fetching stuff does
> not go into APR either, but into a utility library on top of it.
>
> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/
>


Re: proxy caching is important

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.
> > BTW, if the hash table functions aren't in APR already, they will be there
> > soon.  I am having a hard time keeping up with everything right now. :-)
> 
> That's because something like the caching doesn't go in APR. It is easily
> built on *top* of APR. You can keep up if you keep stuff *out* of APR :-)
> 
> And on a separate note: I also believe the HTTP/FTP fetching stuff does
> not go into APR either, but into a utility library on top of it.

That's exactly my point though.  The cache was to specific.  What belongs
in APR is the hash functions, nothing more.  What belongs in mod_cache is
the add_cache/remove_cache functions.  Those functions USE the APR hash
functions.

I have been trying very hard not to put things into APR.  I do not want to
tie APR to Apache in any way.  The more cruft we put into APR that makes
it a web server library, the less likely other people are to use it.  If I
am not doing a good job of keeping things out of APR, let me know.  I will
struggle to fix it.  :-)

Ryan

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.	


Re: proxy caching is important

Posted by Greg Stein <gs...@lyra.org>.
On Thu, 18 Nov 1999, Ryan Bloom wrote:
> > In terms of the caching itself, how about a generic mod_cache module
> > that handles it?
> > 
> > A generic set of caching functions could be added to APR, something
> > like:
> 
> Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
> ap_cache_(in|out|etc).  This is because I think apr needs to be more
> generalized than that.  What we really want/need, is a simple hash table.
> The contexts already let us put this into whatever scope we want to put it
> in.
> 
> BTW, if the hash table functions aren't in APR already, they will be there
> soon.  I am having a hard time keeping up with everything right now. :-)

That's because something like the caching doesn't go in APR. It is easily
built on *top* of APR. You can keep up if you keep stuff *out* of APR :-)

And on a separate note: I also believe the HTTP/FTP fetching stuff does
not go into APR either, but into a utility library on top of it.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: proxy caching is important

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.
> In terms of the caching itself, how about a generic mod_cache module
> that handles it?
> 
> A generic set of caching functions could be added to APR, something
> like:

Well, APR was mentioned, so I'm jumping in.  I would prefer not to use
ap_cache_(in|out|etc).  This is because I think apr needs to be more
generalized than that.  What we really want/need, is a simple hash table.
The contexts already let us put this into whatever scope we want to put it
in.

BTW, if the hash table functions aren't in APR already, they will be there
soon.  I am having a hard time keeping up with everything right now. :-)

Ryan

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.	



Re: proxy caching is important

Posted by Graham Leggett <mi...@sharp.fm>.
Eric Robibaro wrote:

> > my personal opinion is that the caching function should be well integrated
> > with the core, and be generic enough to support multiple methods of
> > populating the cache.

In terms of the caching itself, how about a generic mod_cache module
that handles it?

A generic set of caching functions could be added to APR, something
like:

ap_cache_in(key)
ap_cache_out(key)
ap_cache_del(key)

A combination of memory (MM module?) or disk could be used for the cache
objects.

Throughout the other modules, content that is shipped to the browser is
either given to the cache API via ap_cache_in() (if it's static and
should be cached), or deleted from the cache via ap_cache_del() (if it's
dynamic and shouldn't be cached, like CGIs).

The mod_cache module would then be responsible for deciding whether it
should deliver the content from the cache using ap_cache_out().

The problem of checking the whether cached data has changed on disk can
be handled the same way a normal forward proxy handles it - using data
aging and shift-reload in the browser to force a cache refresh.

Is this a good way to do this? What do people think?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...

Re: proxy caching is important

Posted by Eric Robibaro <eh...@point-net.com>.
<snip>
> my personal opinion is that the caching function should be well integrated
> with the core, and be generic enough to support multiple methods of
> populating the cache.
If you'll allow someone who hasn't had time to contribute to the codebase
yet, "I like the way you think"
> in a sense, i envision apache as an "HTTP router".  it communicates with
> the client, parses the url, consults its cache and serves from there if
> available, otherwise it passes through to a backend.
> 
> notice that serving static content and serving from a cache are
> essentially identical.  the only difference is the url->filename map.
I've always found odd how closely mod_rewrite and mod_proxy integrated
each other, could the answer be found by generalizing mod_rewrite to
include the proxy "functions" e.g. the proper apr calls to perform the
url->filename mapping?
Perhaps by joining mod_alias, mod_rewrite and mod_proxy into a sort of
supermodule?
say mod_uri
which would allow uri rewriting, proxying, etc...
perhaps using a different way of specifying rewrites than the (to many)
mod_rewrite rewrites?

> Dean
> 
> 

-- 
Eric Robibaro - Administrateur de Systemes - Systems Administrator - 
Point Net Connect 4535, Park Avenue, Montreal, Quebec h2v 4e4
May the system be with you ;o)