You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Alexander Todorov <al...@gmail.com> on 2014/09/18 15:24:55 UTC

[users@httpd] Can I change how mod_disk_cache stores content on disk?

Hi guys,
is it possible to use a different directory structure/file names for storing 
content from mod_disk_cache? I don't see anyway to configure this so I assume not.

I'm running an experiment which needs to collect http objects (html pages, 
images, CSS, JavaScript, etc) and store them in some easy to access/analyze 
structure. Something like:

.../device-mac-addr/timestamp/url-or-domain-would-be-nice/content/

under content/ goes
  * the actual content
  * the headers
  * any referenced content in a subdir if this is an HTML page

I was using Apache with mod_proxy and mod_disk_cache but it looks like I can't 
get the above structure easily. Please advise of any alternatives.

Thanks,
Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Can I change how mod_disk_cache stores content on disk?

Posted by Mark Montague <ma...@catseye.org>.
On 2014-09-18 9:24, Alexander Todorov wrote:
> I'm running an experiment which needs to collect http objects (html 
> pages, images, CSS, JavaScript, etc) and store them in some easy to 
> access/analyze structure. Something like:
>
> .../device-mac-addr/timestamp/url-or-domain-would-be-nice/content/
>
> under content/ goes
>  * the actual content
>  * the headers
>  * any referenced content in a subdir if this is an HTML page

Hi, Alex,

The purpose of a cache is to serve collected HTTP resources to large 
numbers of clients as quickly as possible, while minimizing duplication 
and keeping the content as fresh as possible; this is then complicated 
by the HTTP Vary mechanism.

What you're asking for is something very different than this, so 
mod_cache_disk is not a good solution.  For example, a MAC address is 
irrelevant, and a timestamp in the path is actually harmful. 
mod_cache_disk does, however, use the host header, port, URL path and 
query string to create a hash that it uses for its filenames and 
directory names -- this permits mod_cache_disk to find cached resources 
quickly while avoiding problems with URL length or special characters in 
filenames.

In case it is helpful, you can see what is in the cache by running the 
command "htcacheclean -a -D -p/path/to/your/disk/cache".  You can also 
get more detailed information by using the "-A" option instead of "-a".  
You could then use the output from this as an index to what is in the 
cache at a particular point in time.  See 
https://httpd.apache.org/docs/2.4/programs/htcacheclean.html

If this doesn't meet your need, you might want to look into writing your 
own module to do exactly what you need for your experiment.

-- 
   Mark Montague
   mark@catseye.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org