You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Sherrard Burton <sb...@allafrica.com> on 2010/07/09 18:41:20 UTC

[users@httpd] cache key generation for reverse proxy

please forgive me if this is the wrong place for this question, or if 
this has been discussed elsewhere. i searched most of the night and 
morning, and then started pouring through the source code, and i'm 
pretty sure i've isolated the "issue" but need some advice as to where 
to go.

we have been using apache 1.3 as a caching reverse proxy in front of 
mod_perl backend servers for several years, and it has worked 
flawlessly. we recently took the leap to upgrade to 2.2, and everything 
is working generally well, but we are having some issues with on-disk 
caching using mod_disk_cache.

for each request, we determine which backend server to proxy off to 
based on various complex criteria, using mod_rewite. but our setup can 
be generalized as follows: we send "human" traffic to one set of 
machines and "bot" traffic to another. the content returned for each is 
a little different, mostly based on things like not including 
image-heavy widgets and links to pages or section of the site that 
require user login.

with apache 1.3, this worked swimmingly, as the key for the on-disk 
cache seemed to be generated based on the proxy target url, as opposed 
to the canonical request url.

$ sudo grep -rF -m1 '/stories/201007010002.html' .
./R/U/dsvSj2WzpWj5Do8S0Lcw:X-URL: 
http://crawler:8082/stories/201007010002.html
./x/h/EiBz@6Q5ZMJcpUnUqnAg:X-URL: 
http://backend:8081/stories/201007010002.html

with the move to apache 2.2, it appears, based on the behavior we are 
seeing, as well as the comments in cache_storage.c, that the default key 
generation method is based on the canonical request url, taken before 
the translate name hook runs. the end result is that, based on our 
setup, we end up with "practical key collisions", although there is not 
technically a collision. so in the above example, humans might see the 
cached version of the page as generated for bot consumption, or vice-versa.

looking at the cache key generation methods, and where they are called 
from, i could not see any way that i could, through the apache 
configuration, change this behavior. so is there a directive or module 
that i'm overlooking, or is there even a "simple" patch that i could 
apply that would get us back to the key generation behavior of 1.3, or 
something similar?

thanks in advance.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org