You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by r....@t-online.de on 2005/05/24 22:24:07 UTC

404 does not delete cached entries using mod_disk_cache

Felix Enning pointed me again to an interesting question regarding mod_cache / mod_disk_cache:

The following situation was observed with Apache 2.0.54 (same applies to trunk):

1. A resource gets cached.
2. The original resource gets removed from the backend (e.g on a proxied webserver,
   on the local disk, wherever).
3. The client sents a request that forces the cache to revalidate this entry.
4. The 404 received from the backend is correctly passed back to the client by mod_cache.
5. The client sents a request that does NOT require the cache to revalidate this entry.
6. Cache delivers the old resource that had been cached before, instead of a 404.

Is this behaviour intended and compliant with the RFC?

The reason for this behaviour is that the remove_url function of mod_disk_cache is a dummy function
(BTW: mod_mem_cache seems to really remove the cache entry in remove_url).
If this behaviour is not intended I would have a look into this to create a patch.


Regards

Rüdiger

Re: 404 does not delete cached entries using mod_disk_cache

Posted by r....@t-online.de.
Anybody found some time / has some time to have a look at the patch?
This would be really great and appreciated.

Thanks

Rüdiger

r.pluem@t-online.de wrote:
> Sander Striker wrote:
> 
>>r.pluem@t-online.de wrote:
> 
> 
> [..cut..]
> 
> 
>>>Is this behaviour intended and compliant with the RFC?
>>
>>
>>Not to my knowlegde.  Given that mod_mem_cache and mod_disk_cache are doing
>>different things is pretty much indicative that one of the two is wrong ;).
> 
> 
> That was also my thought.
> 
> 
>>>The reason for this behaviour is that the remove_url function of
>>>mod_disk_cache is a dummy function
>>>(BTW: mod_mem_cache seems to really remove the cache entry in
>>>remove_url).
>>>If this behaviour is not intended I would have a look into this to
>>>create a patch.
>>
>>
>>Please do!
>>
> 
> 
> I created a patch but the problem turned out to be more complex than I thought
> originally. So a close look on the patch is definitely a good thing. Some comments:
> 
> 1. I had to adjust the cache provider API for remove_url as I need the request_rec
>    struct to remove the files correctly in mod_disk_cache.
> 
> 2. It turned out that 404 responses are not passed down the filter chain the way I expected.
>    Adjusting the default handler again proved that the changes to mod_disk_cache worked
>    (files got deleted), but this broke any error page handling in Apache. So I tried to address
>    this problem at other locations of the code. I detected two cases:
> 
>    1. Apache generated error messages or redirect to external source.
>    2. Custom local error documents.
> 
>    In the first case I use the insert_error_filter hook to ensure that the CACHE_SAVE filter
>    is reinserted to the filter chain if it has been inserted before during the request.
> 
>    In the second case the filter chain is run, but with the wrong URI. So I checked if there
>    is a previous request (r->prev) and if it has the same status code (this happens in a section
>    where we only handle uncachable status codes). If this is the case I assume that I should delete
>    the URL from the previous request from the cache.
> 
> So any comments / thoughts on this?
> 
> 
> Regards
> 
> Rüdiger

Re: 404 does not delete cached entries using mod_disk_cache

Posted by r....@t-online.de.
Sander Striker wrote:
> r.pluem@t-online.de wrote:

[..cut..]

>> Is this behaviour intended and compliant with the RFC?
> 
> 
> Not to my knowlegde.  Given that mod_mem_cache and mod_disk_cache are doing
> different things is pretty much indicative that one of the two is wrong ;).

That was also my thought.

> 
>> The reason for this behaviour is that the remove_url function of
>> mod_disk_cache is a dummy function
>> (BTW: mod_mem_cache seems to really remove the cache entry in
>> remove_url).
>> If this behaviour is not intended I would have a look into this to
>> create a patch.
> 
> 
> Please do!
> 

I created a patch but the problem turned out to be more complex than I thought
originally. So a close look on the patch is definitely a good thing. Some comments:

1. I had to adjust the cache provider API for remove_url as I need the request_rec
   struct to remove the files correctly in mod_disk_cache.

2. It turned out that 404 responses are not passed down the filter chain the way I expected.
   Adjusting the default handler again proved that the changes to mod_disk_cache worked
   (files got deleted), but this broke any error page handling in Apache. So I tried to address
   this problem at other locations of the code. I detected two cases:

   1. Apache generated error messages or redirect to external source.
   2. Custom local error documents.

   In the first case I use the insert_error_filter hook to ensure that the CACHE_SAVE filter
   is reinserted to the filter chain if it has been inserted before during the request.

   In the second case the filter chain is run, but with the wrong URI. So I checked if there
   is a previous request (r->prev) and if it has the same status code (this happens in a section
   where we only handle uncachable status codes). If this is the case I assume that I should delete
   the URL from the previous request from the cache.

So any comments / thoughts on this?


Regards

Rüdiger

Re: 404 does not delete cached entries using mod_disk_cache

Posted by Sander Striker <st...@apache.org>.
r.pluem@t-online.de wrote:
> Felix Enning pointed me again to an interesting question regarding mod_cache / mod_disk_cache:
> 
> The following situation was observed with Apache 2.0.54 (same applies to trunk):
> 
> 1. A resource gets cached.
> 2. The original resource gets removed from the backend (e.g on a proxied webserver,
>    on the local disk, wherever).
> 3. The client sents a request that forces the cache to revalidate this entry.
> 4. The 404 received from the backend is correctly passed back to the client by mod_cache.
> 5. The client sents a request that does NOT require the cache to revalidate this entry.
> 6. Cache delivers the old resource that had been cached before, instead of a 404.
> 
> Is this behaviour intended and compliant with the RFC?

Not to my knowlegde.  Given that mod_mem_cache and mod_disk_cache are doing
different things is pretty much indicative that one of the two is wrong ;).
 
> The reason for this behaviour is that the remove_url function of mod_disk_cache is a dummy function
> (BTW: mod_mem_cache seems to really remove the cache entry in remove_url).
> If this behaviour is not intended I would have a look into this to create a patch.

Please do!

Sander