You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Neil Gunton <ne...@nilspace.com> on 2004/05/04 01:45:37 UTC

Re: mod_proxy distinguish cookies?

Neil Gunton wrote:
> 
> Igor Sysoev wrote:
> >
> > On Sat, 24 Apr 2004, Neil Gunton wrote:
> >
> > > Neil Gunton wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I apologise in advance if this is obvious or otherwise been answered
> > > > elsewhere, but I can't seem to find any reference to it.
> > > >
> > > > I am using Apache 1.3.29 with mod_perl, on Linux 2.4. I am running
> > > > mod_proxy as a caching reverse proxy front end, and mod_perl on the
> > > > backend. This works really well, but I have noticed that mod_proxy does
> > > > not seem to be able to distinguish requests as being different if the
> > > > URLs are the same, but they contain different cookies. I would like to
> > > > be able to enable more personalization on my site, which would best be
> > > > done using cookies. The problem is that when a page has an expiration
> > > > greater than 'now', then any request to the same URL will get the cache
> > > > version, even if the requests have different cookies. Currently I have
> > > > to pass options around as part of the URL in order to make the requests
> > > > look different to mod_proxy.
> > > >
> > > > Am I missing something here? Or, will this be included in either future
> > > > versions of mod_proxy or the equivalent module in Apache 2.x? Any
> > > > insights greatly appreciated.
> > >
> > > I should perhaps make clear that I do have cookies working through the
> > > proxy just fine, for pages that are set to be 'no-cache'. So this isn't
> > > an issue with the proxy being able to pass cookies to/from the backend
> > > and browser (which I think I have seen mentioned before as a bugfix),
> > > but rather with mod_proxy simply being able to distinguish otherwise
> > > identical URL requests that have different cookies, and cache those as
> > > different requests.
> > >
> > > So for example, the request "GET /somedir/somepage.html?xxx=yyy" passed
> > > with a cookie that value 'pics=small' should be seen as different from
> > > another identical request, but with cookie value 'pics=large'. Currently
> > > my tests indicate that mod_proxy returns the same cached page for each
> > > request.
> > >
> > > I assume that mod_proxy only checks the actual request string, and not
> > > the HTTP header which contains the cookie.
> > >
> > > Obviously, under this scheme, if you were using cookies to track
> > > sessions then all requests would get passed to the backend server - so,
> > > perhaps it would be a nice additional feature to be able to configure,
> > > through httpd.conf, how mod_proxy (or its successor) pays attention to
> > > cookies. For example, you might say something to the effect of "ignore
> > > this cookie" or "differentiate requests using this cookie". Then we
> > > could have sitewide options like e.g. 'pics' (to set what size pictures
> > > are shown), and this could be used to distinguish cached pages, but
> > > other cookies might be ignored on some pages. This would allow for more
> > > flexibility, with some cached pages being "sensitive" to cookies, while
> > > others are not. An obvious way this would be useful is in the use of
> > > login cookies. These will be passed in by the browser for every page on
> > > the site, but this doesn't mean we want to distinguish cached pages
> > > based on it for every page. Some user-specific pages would have
> > > 'no-cache' set, while other pages could be set to ignore this login
> > > cookie, thus gaining the benefits of the proxy caching. This would be
> > > useful for pages that have no user-specific or personalizable aspects -
> > > they could be cached regardless of who is logged in.
> > >
> > > Sorry if this wasn't clear from the original post, just wanted to
> > > clarify and expand... any advice on this would be VERY welcomed, since
> > > my options with personalization are currently rather limited.
> > >
> > > Also, if this is actually addressed to the wrong list for some reason
> > > then a pointer would be much appreciated...
> >
> > mod_accel ( http://sysoev.ru/en/ ) allows to take cookies into account while
> > caching:
> >
> > AccelCacheCookie  some_cookie_name another_cookie_name
> >
> > You can set it on per-location basis.
> >
> > Besides, my upcoming light-weight http and reverse proxy server nginx
> > will allow to do it too.
> >
> > Igor Sysoev
> > http://sysoev.ru/en/
> 
> Thanks a lot Igor, I will try to find the time over the next couple of
> days to look into your module and build my Apache with it. I'll let you
> know how I get on...
> 
> -Neil

I have been testing Igor's mod_accel and mod_deflate modules, and I have
to say that they work extremely well. mod_accel does indeed fix the
cookie problem (at least on the server side - browsers are a whole other
issue). This enables me to do more personalization on my mod_perl
backend, reverse proxy front-end setup, while still having pages cached
in the proxy. I am also using mod_deflate, replacing mod_gzip. Both
mod_accel and mod_gzip work flawlessly so far.

I would surmise that the main reason for the relative lack of English
language sites using these modules is that the English documentation
isn't totally finished off and "polished". I could see the link to the
incomplete English version of the docs on the WayBack engine putting
some people off, but I would say that it's worth the effort to download
and use this, if you're still using Apache 1.3. I can also report that
Igor is very helpful and responded quickly to my questions.

The problem now is that the browsers (IE and Mozilla at least) don't
seem to differentiate requests based on cookies. I have tested
requesting a page with a certain cookie (where the page has a sufficient
expiration to warrant being cached for the duration of the test), and
then changing the cookie, and re-requesting the same page as before. The
cookie is different, but the browsers still seem to use their local
cached copy of the page. So, I am currently thinking that the solution
to this is to use a combination of cookies and URL parameters to make
the requests look different. The cookie can persist and be the thing
that is actually used by the server code, and the URL param can be
inserted into all the links on the site simply to make the subsequent
requests look different to the browsers.

Anyway, just thought I'd report on what happened here, I'm very happy
with the mod_accel and mod_deflate modules. Thanks again, Igor!

-Neil Gunton
http://www.neilgunton.com
http://www.crazyguyonabike.com

Re: mod_proxy distinguish cookies?

Posted by Neil Gunton <ne...@nilspace.com>.
Graham Leggett wrote:
> There is already a mechanism for caching different variants of a page -
> simply encode the info into the URL. This is supported on all browsers
> and cannot be switched off through user preference (as cookies can).
> Because a mechanism already exists, there isn't much point in changing
> the standard to accomodate a second method to do the same thing.

As I said previously, storing user options in the URL is "broken"
because following someone else's link to the same website erases your
options. I use this currently on my website, to pass an option for size
of pics (thumbnail, small or large). Every time someone posts a link to
a page on my website on some message board or email, they inevitably
include the whole query string, with whatever option they happen to have
at that moment. So every person who clicks on the link gets their option
overwritten by the pic option of the person who posted the link. I don't
see how anyone could see this as being a good way to do things.
 
> But you're also fighting with existing websites that use cookies to try
> and track individual requests, and there are a lot of them out there. If
> each different cookie was cached separately, then you're effectively
> caching separate copies of every page, which makes caching a waste of time.

I suggested expanding the cookie definition to include a type or
qualifier that could be used to say whether the cookie should be treated
as a param. Using cookies in this way would not put any more load on the
net than at present, if the default cookie behavior was left as it is
now (i.e. with additional qualifier being required in order to have the
cookie taken into account). Using a special cookie or using the URL are
both functionally equivalent as far as information being passed, the
crucial difference being that using a different URL would not erase your
options - they are being passed via cookie.

To emphasize: I am not suggesting that EVERY cookie out there already be
used by caches, but rather that we amend the standard so that certain
cookies CAN be taken into account. This would be very useful, imho.

One could make the argument that more traffic might be generated if
websites started using the cookie qualifier to make ALL cookies be used
by caches (thus ensuring that they would "see" every click by a
particular user, making tracking all that much easier). However I don't
think this would make any difference in reality, since websites that
want this functionality can already get it by setting the pages to be
no-cache. The cookie qualifier would add the benefit of being able to
cache pages that have the same options set as the same cache entry. 

The addition of a cookie cache qualifier would not break any existing
systems, because the default behavior of cookies remains unchanged. It
would also not put any more load on the net than would be caused by
sites passing options in the URL, since each request with a different
option in the URL would have to be cached differently anyway. We gain
something, and lose nothing, as far as I can tell.

All the best,

-Neil

Re: mod_proxy distinguish cookies?

Posted by Graham Leggett <mi...@sharp.fm>.
Neil Gunton wrote:

> Is this really such a special case? I can't believe nobody else has
> wanted to implement a server like this.

It's a special case in the context of all of the servers, proxies, 
transparent proxies and browsers together out there on the net - it's 
useful to take off the load of your server, but at the cost of 
_increasing_ the load on transparent proxies on the net.

That's not to say that making an attempt to reduce the load on your 
server is a bad idea or even a rare occurence (it's not), it's just that 
changing an RFC to do it is not the right way to achieve this.

 > If you want to have a setup
> where there is a heavy backend app server, with a lightweight reverse
> proxy front end, and you also want to have pages be cached, AND have
> personalization of pages based on cookies, then it makes perfect sense
> to store user options in a cookie, and then for the pages to be cached
> taking cookies into account.

There is already a mechanism for caching different variants of a page - 
simply encode the info into the URL. This is supported on all browsers 
and cannot be switched off through user preference (as cookies can). 
Because a mechanism already exists, there isn't much point in changing 
the standard to accomodate a second method to do the same thing.

But you're also fighting with existing websites that use cookies to try 
and track individual requests, and there are a lot of them out there. If 
each different cookie was cached separately, then you're effectively 
caching separate copies of every page, which makes caching a waste of time.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

Posted by Neil Gunton <ne...@nilspace.com>.
Graham Leggett wrote:
> I would disagree - if a proxy on the net cached every variant of every
> page simply based on a cookie header, there would so many different
> variants of the same page in the cache that from a system resource
> perspective the cache might as well not be there. Cookies only make
> sense in most cases when caching has been switched off, as the cookie is
> usually targeted at that single user only.
> 
> Your application is a unique one, in that you're trying to improve the
> performance of a single server on the net. This should be done within
> the design of that server, not by trying to change the RFC to accomodate
> what is a special case.

Is this really such a special case? I can't believe nobody else has
wanted to implement a server like this. If you want to have a setup
where there is a heavy backend app server, with a lightweight reverse
proxy front end, and you also want to have pages be cached, AND have
personalization of pages based on cookies, then it makes perfect sense
to store user options in a cookie, and then for the pages to be cached
taking cookies into account. That's pretty much what cookies were made
for. In this case, a cookie that set 'opts=xxx' can be seen as
equivalent to having 'opts=xxx' in the request query string - but
instead of the parameter having to be present in the query string, it's
there in the cookie. This is much more useful, because it means that
this parameter can be set once in the browser, so that this user always
uses this option on this server. All pages which have the same request
and same option cookie would be seen as the same page by browsers and
caches. Any pages with the same request, but different option cookie are
treated differently. To the caches, this is no different from passing
the option in the query string.

I can see that not every cookie should be seen in this way. The solution
to this would perhaps be an additional property for cookies to determine
how they are treated by caches and browsers. In order to not break
existing behavior, the default could be what happens now - i.e. cookies
are ignored as far as differentiating requests. But if there was some
cookie setting that said "user param" or something similar, then it
could be used by browsers and intermediate caches to differentiate.

If a website used the query string to pass options around, then every
page that had a different option would have to be cached differently
anyway, so this really doesn't add any additional stress to the network.
It's simply moving an option from the query string into the cookie area,
so that links posted around the internet don't contain users' individual
settings. It just doesn't make any sense for website user options to be
stored in the URL, because it makes a nonsense out of the whole concept
of setting options - anytime you happen to click on some other user's
link to the same website, it wipes out any options you set yourself.
Cookies are made for this sort of thing. Some cookies (random numbers,
tracking cookies) don't have to be treated in this way, but I think
having an additional property that makes a cookie be treated in the same
way as a query string param would be very beneficial.

I don't know what hope there is for getting anything like this actually
implemented in the standards... but if anyone has any ideas, I'm all
ears...

Thanks again,

-Neil

Re: mod_proxy distinguish cookies?

Posted by Graham Leggett <mi...@sharp.fm>.
Neil Gunton wrote:

>>Rather just use URL parameters. As I recall RFC2616 does not consider a
>>request with a different cookie a different variant, so even if you
>>patch your server to allow it to differentiate between cookies, neither
>>the browsers nor the transparent proxies in the path of the request will
>>do what you want them to do :(

> Well, that truly sucks. If you pass options around in params then
> whenever someone follows a link posted by someone else, they will
> inherit that person's options. The only alternative might be to make
> pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel
> directive (which I haven't tried, but I assume that's what it does)...
> so even though my server will get hit a lot more, at least it might be
> stopped by the proxy rather than hitting the mod_perl.

This is probably your best bet.

> From what you are saying, it would appear that HTTP is broken with
> regard to cookies and caching. I thought they had all that sorted out a
> while back. Never mind...

I would disagree - if a proxy on the net cached every variant of every 
page simply based on a cookie header, there would so many different 
variants of the same page in the cache that from a system resource 
perspective the cache might as well not be there. Cookies only make 
sense in most cases when caching has been switched off, as the cookie is 
usually targeted at that single user only.

Your application is a unique one, in that you're trying to improve the 
performance of a single server on the net. This should be done within 
the design of that server, not by trying to change the RFC to accomodate 
what is a special case.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Mon, 3 May 2004, Neil Gunton wrote:

> Well, that truly sucks. If you pass options around in params then
> whenever someone follows a link posted by someone else, they will
> inherit that person's options. The only alternative might be to make
> pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel
> directive (which I haven't tried, but I assume that's what it does)...
> so even though my server will get hit a lot more, at least it might be
> stopped by the proxy rather than hitting the mod_perl.

The "AccelIgnoreNoCache" disables a client's "Pragma: no-cache",
"Cache-Control: no-cache" and "Cache-Control: max-age=<number>" headers.

The "AccelIgnoreExpires" disables a backend's "Expires",
"Cache-Control: no-cache" and "Cache-Control: max-age=<number>" headers.


Igor Sysoev
http://sysoev.ru/en/

Re: mod_proxy distinguish cookies?

Posted by Graham Leggett <mi...@sharp.fm>.
Roy T. Fielding wrote:

> I do wish people would read the specification to refresh their memory
> before summarizing.  RFC 2616 doesn't say anything about cookies -- it
> doesn't have to because there are already several mechanisms for marking
> a request or response as varying.  In this case
> 
>    Vary: Cookie
> 
> added to the response by the server module (the only component capable
> of knowing how the resource varies) is sufficient for caching clients
> that are compliant with HTTP/1.1.

My sentence "RFC2616 does not consider a request with a different cookie 
a different variant" should have read "RFC2616 does not recognise 
cookies specifically at all, as they are just another header". I did not 
think of the Vary case, sorry for the confusion.

Regards,
Graham
--


Re: mod_proxy distinguish cookies?

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
>> Rather just use URL parameters. As I recall RFC2616 does not consider 
>> a
>> request with a different cookie a different variant, so even if you
>> patch your server to allow it to differentiate between cookies, 
>> neither
>> the browsers nor the transparent proxies in the path of the request 
>> will
>> do what you want them to do :(
>
> Well, that truly sucks. If you pass options around in params then
> whenever someone follows a link posted by someone else, they will
> inherit that person's options.

I do wish people would read the specification to refresh their memory
before summarizing.  RFC 2616 doesn't say anything about cookies -- it
doesn't have to because there are already several mechanisms for marking
a request or response as varying.  In this case

    Vary: Cookie

added to the response by the server module (the only component capable
of knowing how the resource varies) is sufficient for caching clients
that are compliant with HTTP/1.1.  Expires and Cache-Control are usually
added as well if HTTP/1.0 caches are a problem.

....Roy


Re: mod_proxy distinguish cookies?

Posted by Neil Gunton <ne...@nilspace.com>.
Graham Leggett wrote:
> 
> Neil Gunton wrote:
> 
> > The problem now is that the browsers (IE and Mozilla at least) don't
> > seem to differentiate requests based on cookies. I have tested
> > requesting a page with a certain cookie (where the page has a sufficient
> > expiration to warrant being cached for the duration of the test), and
> > then changing the cookie, and re-requesting the same page as before. The
> > cookie is different, but the browsers still seem to use their local
> > cached copy of the page. So, I am currently thinking that the solution
> > to this is to use a combination of cookies and URL parameters to make
> > the requests look different.
> 
> Rather just use URL parameters. As I recall RFC2616 does not consider a
> request with a different cookie a different variant, so even if you
> patch your server to allow it to differentiate between cookies, neither
> the browsers nor the transparent proxies in the path of the request will
> do what you want them to do :(

Well, that truly sucks. If you pass options around in params then
whenever someone follows a link posted by someone else, they will
inherit that person's options. The only alternative might be to make
pages 'No-Cache' and then set the 'AccelIgnoreNoCache' mod_accel
directive (which I haven't tried, but I assume that's what it does)...
so even though my server will get hit a lot more, at least it might be
stopped by the proxy rather than hitting the mod_perl.

>From what you are saying, it would appear that HTTP is broken with
regard to cookies and caching. I thought they had all that sorted out a
while back. Never mind...

Thanks for the insight, I'll have to think about this some more it
seems. Either have extremely volatile options via URL params with page
caching, or no caching (outside of my server, which would mean a LOT
more traffic since every time someone hits 'Back' on their browser it
would think it had to re-get the page) and persistent options. Hmmm...

Any other ideas would be welcomed, but right now that's about all I can
think of...

Thanks again,

-Neil

Re: mod_proxy distinguish cookies?

Posted by Graham Leggett <mi...@sharp.fm>.
Neil Gunton wrote:

> The problem now is that the browsers (IE and Mozilla at least) don't
> seem to differentiate requests based on cookies. I have tested
> requesting a page with a certain cookie (where the page has a sufficient
> expiration to warrant being cached for the duration of the test), and
> then changing the cookie, and re-requesting the same page as before. The
> cookie is different, but the browsers still seem to use their local
> cached copy of the page. So, I am currently thinking that the solution
> to this is to use a combination of cookies and URL parameters to make
> the requests look different.

Rather just use URL parameters. As I recall RFC2616 does not consider a 
request with a different cookie a different variant, so even if you 
patch your server to allow it to differentiate between cookies, neither 
the browsers nor the transparent proxies in the path of the request will 
do what you want them to do :(

Regards,
Graham
--