You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modproxy-dev@apache.org by rb...@covalent.net on 2000/11/16 02:52:04 UTC

mod_cache and the proxy.

Okay,

I have really spent a lot of time in the proxy today, and a lot of my
opinions have changed.  Here is where we are right now IMHO.

1)  The cache that is in the proxy.  This is messing up the code and
should be removed.  The cache is done for Apache 1.3, and that is just
wrong for 2.0.  I plan to spend some time tomorrow afternoon to rip the
cache out of the code.

2)  BUFF.  This is wrong.  We can remove a lot of duplicated code if we
can use filters for the back-end communication.  I think I see how to do
this VERY cleanly, but I need to actually write the code.  Expect this to
be done tomorrow sometime.

3)  mod_cache.c.  This needs to be done.  Since there are arguments
against putting it into CVS before it is ready, I am considering setting
up a CVS repository on my home machine to allow people to collaborate on
this cleanly.  I would ask people to give me a day to get this all
setup.  If somebody already has a CVS server setup for public use, and
they want to host this module until it gets stable, please speak up.

Ryan


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------





Re: mod_cache and the proxy.

Posted by Chuck Murcko <ch...@topsail.org>.
Graham Leggett wrote:
> 
> Letting the cache hander handle content negotiation inside itself
> prevents some kind of weird mapping between URLs and their multiple
> representations being necessary outside the cache. This can be hidden
> inside the cache engine, which could probably find a fast an efficient
> way of storing the entities so that telling them apart is easy to do.
> 
> One of the broken assumptions of the previous mod_proxy was that there
> was only one object representation per URL. If different content was
> negotiated, the previous cache entry was invalidated when it need not
> have been.
> 

Agreed. There was a lot of thrashing in the cache for this reason.
-- 
Chuck
Chuck Murcko
Topsail Group
chuck@topsail.org

Re: mod_cache and the proxy.

Posted by rb...@covalent.net.
> > Well, the skeleton was posted yesterday, and I am working on getting a
> > pserver CVS repository setup on my machine to let more people hack on this
> > together.  Give me one more day, and I should get this setup.
> > 
> Why not just get it working for the case where's there's no cache
> configured, and work on the rest of the code in the meantime. That could
> go into the proxy directory now and we could get rid of the file_cache.

I'm busy hacking something else into the proxy right now.  I'll try to
hack the cache filter to work cleanly tomorrow.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: mod_cache and the proxy.

Posted by Chuck Murcko <ch...@topsail.org>.
rbb@covalent.net wrote:
> 
> > > The filter is an actual filter, which is what I have at least
> > > started.  The content generator is the handler.  I have a design for it,
> > > but no time to actually write it.  I think we could have a working cache
> > > in under a week of somebody takes what I posted yesterday, and just works
> > > on it for a day or two.
> >
> > I don't know enough about filters to hack at this yet, which is why I'm
> > keen to see some real code to start hacking against that would teach me
> > how filters work. Once a rough skeleton is in place, I'd like to start
> > making it generate conditional requests within Apache.
> 
> Well, the skeleton was posted yesterday, and I am working on getting a
> pserver CVS repository setup on my machine to let more people hack on this
> together.  Give me one more day, and I should get this setup.
> 
Why not just get it working for the case where's there's no cache
configured, and work on the rest of the code in the meantime. That could
go into the proxy directory now and we could get rid of the file_cache.
-- 
Chuck
Chuck Murcko
Topsail Group
chuck@topsail.org

Re: mod_cache and the proxy.

Posted by rb...@covalent.net.
> > The filter is an actual filter, which is what I have at least
> > started.  The content generator is the handler.  I have a design for it,
> > but no time to actually write it.  I think we could have a working cache
> > in under a week of somebody takes what I posted yesterday, and just works
> > on it for a day or two.
> 
> I don't know enough about filters to hack at this yet, which is why I'm
> keen to see some real code to start hacking against that would teach me
> how filters work. Once a rough skeleton is in place, I'd like to start
> making it generate conditional requests within Apache.

Well, the skeleton was posted yesterday, and I am working on getting a
pserver CVS repository setup on my machine to let more people hack on this
together.  Give me one more day, and I should get this setup.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: mod_cache and the proxy.

Posted by Graham Leggett <mi...@sharp.fm>.
rbb@covalent.net wrote:

> > What I want to do is build in content negotiation into that as well. In
> > other words, you provide hostname:port/URI *and* the request headers,
> > and based on both you get a response back. This will allow you to cache
> > both an English and French representation of the same URL, or a
> > compressed and non-compressed representation of an URL at the same time.
> 
> Interesting.  I need to think about that more, because it changes the hash
> structure I had thought about last night, but it should be VERY cool.

Letting the cache hander handle content negotiation inside itself
prevents some kind of weird mapping between URLs and their multiple
representations being necessary outside the cache. This can be hidden
inside the cache engine, which could probably find a fast an efficient
way of storing the entities so that telling them apart is easy to do.

One of the broken assumptions of the previous mod_proxy was that there
was only one object representation per URL. If different content was
negotiated, the previous cache entry was invalidated when it need not
have been.

> The filter is an actual filter, which is what I have at least
> started.  The content generator is the handler.  I have a design for it,
> but no time to actually write it.  I think we could have a working cache
> in under a week of somebody takes what I posted yesterday, and just works
> on it for a day or two.

I don't know enough about filters to hack at this yet, which is why I'm
keen to see some real code to start hacking against that would teach me
how filters work. Once a rough skeleton is in place, I'd like to start
making it generate conditional requests within Apache.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_cache and the proxy.

Posted by rb...@covalent.net.
> > What still needs to be done to mod_cache.c:
> > 
> > The abstraction needs to be put in so that multiple cache back-ends can be
> > used.  This requires abstracting out:
> > 
> >         get_cache_location
> >         cache_read
> >         cache_write
> >         cache_open
> 
> Multiple backends I assume means a shared memory backend, a disk
> backend, etc?

Yep.

> > Cache entries should be stored in a hash table that uses the key values:
> > 
> >         hostname:port/URI
> > 
> > The handler just checks the hash and sends the file if it is there.
> 
> What I want to do is build in content negotiation into that as well. In
> other words, you provide hostname:port/URI *and* the request headers,
> and based on both you get a response back. This will allow you to cache
> both an English and French representation of the same URL, or a
> compressed and non-compressed representation of an URL at the same time.

Interesting.  I need to think about that more, because it changes the hash
structure I had thought about last night, but it should be VERY cool.

> > Once the handler is written and the filter can get the correct location to
> > save the cached data, this module is done.
> 
> This effectively represents the caching filter part of the design I
> proposed. What is needed now is a content-generator entity that
> optionally returns cached data if it exists, or DECLINED if not.

The filter is an actual filter, which is what I have at least
started.  The content generator is the handler.  I have a design for it,
but no time to actually write it.  I think we could have a working cache
in under a week of somebody takes what I posted yesterday, and just works
on it for a day or two.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: mod_cache and the proxy.

Posted by Graham Leggett <mi...@sharp.fm>.
rbb@covalent.net wrote:

> Well almost, but not quite.  In order to do this, the proxy would need to
> be its own protocol server, and that just isn't going to happen.  I'll
> outline my design in another note.

Ok.

> It isn't close at all.  My prototype module is basically just a filter
> that directs the data to the disk, and even doing that it is broken,
> because it always uses the same file.  :-)  I had wrtten this to allow it
> to be thrown into the existing cache code, but that cache code is very
> difficult to track, so I gave up.  :-)

I had this problem when I did the HTTP/1.1 patch - I rewrote much of the
core stuff within the proxy.

> What still needs to be done to mod_cache.c:
> 
> The abstraction needs to be put in so that multiple cache back-ends can be
> used.  This requires abstracting out:
> 
>         get_cache_location
>         cache_read
>         cache_write
>         cache_open

Multiple backends I assume means a shared memory backend, a disk
backend, etc?

> Cache entries should be stored in a hash table that uses the key values:
> 
>         hostname:port/URI
> 
> The handler just checks the hash and sends the file if it is there.

What I want to do is build in content negotiation into that as well. In
other words, you provide hostname:port/URI *and* the request headers,
and based on both you get a response back. This will allow you to cache
both an English and French representation of the same URL, or a
compressed and non-compressed representation of an URL at the same time.

> Once the handler is written and the filter can get the correct location to
> save the cached data, this module is done.

This effectively represents the caching filter part of the design I
proposed. What is needed now is a content-generator entity that
optionally returns cached data if it exists, or DECLINED if not.

These two together will make the start of a working cache.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: mod_cache and the proxy.

Posted by rb...@covalent.net.
> > 1)  The cache that is in the proxy.  This is messing up the code and
> > should be removed.  The cache is done for Apache 1.3, and that is just
> > wrong for 2.0.  I plan to spend some time tomorrow afternoon to rip the
> > cache out of the code.
> 
> Effectively mod_proxy should be 100% straight through - with the request
> from Apache being forwarded as-is to the backend, and any reply being
> returned again as is into the filter chain (so that uncompressed backend
> webservers can pass through compression filters, etc etc).

Well almost, but not quite.  In order to do this, the proxy would need to
be its own protocol server, and that just isn't going to happen.  I'll
outline my design in another note.

> > 3)  mod_cache.c.  This needs to be done.  Since there are arguments
> > against putting it into CVS before it is ready, I am considering setting
> > up a CVS repository on my home machine to allow people to collaborate on
> > this cleanly.  I would ask people to give me a day to get this all
> > setup.  If somebody already has a CVS server setup for public use, and
> > they want to host this module until it gets stable, please speak up.
> 
> I've been meaning to put some more detailed effort into the design docs
> that I posted a few weeks ago, but I've been snowed under with a burning
> project and it's been taking up too much of my time. How close to the
> design is the mod_cache you have created?

It isn't close at all.  My prototype module is basically just a filter
that directs the data to the disk, and even doing that it is broken,
because it always uses the same file.  :-)  I had wrtten this to allow it
to be thrown into the existing cache code, but that cache code is very
difficult to track, so I gave up.  :-)

What still needs to be done to mod_cache.c:

The abstraction needs to be put in so that multiple cache back-ends can be
used.  This requires abstracting out:

	get_cache_location
	cache_read
	cache_write
	cache_open

Cache entries should be stored in a hash table that uses the key values:

	hostname:port/URI

The handler just checks the hash and sends the file if it is there.

Once the handler is written and the filter can get the correct location to
save the cached data, this module is done.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------



Re: mod_cache and the proxy.

Posted by Graham Leggett <mi...@sharp.fm>.
rbb@covalent.net wrote:

> I have really spent a lot of time in the proxy today, and a lot of my
> opinions have changed.  Here is where we are right now IMHO.
> 
> 1)  The cache that is in the proxy.  This is messing up the code and
> should be removed.  The cache is done for Apache 1.3, and that is just
> wrong for 2.0.  I plan to spend some time tomorrow afternoon to rip the
> cache out of the code.

Effectively mod_proxy should be 100% straight through - with the request
from Apache being forwarded as-is to the backend, and any reply being
returned again as is into the filter chain (so that uncompressed backend
webservers can pass through compression filters, etc etc).

> 2)  BUFF.  This is wrong.  We can remove a lot of duplicated code if we
> can use filters for the back-end communication.  I think I see how to do
> this VERY cleanly, but I need to actually write the code.  Expect this to
> be done tomorrow sometime.

Ok.

> 3)  mod_cache.c.  This needs to be done.  Since there are arguments
> against putting it into CVS before it is ready, I am considering setting
> up a CVS repository on my home machine to allow people to collaborate on
> this cleanly.  I would ask people to give me a day to get this all
> setup.  If somebody already has a CVS server setup for public use, and
> they want to host this module until it gets stable, please speak up.

I've been meaning to put some more detailed effort into the design docs
that I posted a few weeks ago, but I've been snowed under with a burning
project and it's been taking up too much of my time. How close to the
design is the mod_cache you have created?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."



Re: mod_cache and the proxy.

Posted by Ask Bjoern Hansen <as...@develooper.com>.
On Wed, 15 Nov 2000 rbb@covalent.net wrote:

[...]
> 3)  mod_cache.c.  This needs to be done.  Since there are arguments
> against putting it into CVS before it is ready, I am considering setting
> up a CVS repository on my home machine to allow people to collaborate on
> this cleanly.  I would ask people to give me a day to get this all
> setup.  If somebody already has a CVS server setup for public use, and
> they want to host this module until it gets stable, please speak up.

There's this thing called sourceforge ... :)

If people wants to work on it together, then it should be in our
CVS.

Releases are supposed to have releasable code. CVS is supposed to
have code that is being worked on.


 - ask

-- 
ask bjoern hansen - <http://www.netcetera.dk/~ask/>
more than 70M impressions per day, <http://valueclick.com>