You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 2010/09/16 02:42:02 UTC

mod_disk_cache: making commit_entity() atomic

Hi all,

Now that the three ad hoc file writes in mod_disk_cache have been  
grouped together into one place (commit_entity()), the last step is to  
ensure that the three writes happen atomically.

Ideally, the locking of url A should have no effect on url B, so a  
separate lock file per .header/.data file should be needed.

Will a simple apr_file_lock() on a per-URL lock file do the trick, or  
will performance be a killer?

The alternative is to change the format of the disk cache so that  
the .data file has a temporary filename, for example XYZ.data.12367  
instead of just XYZ.data, and then to key the string "12367" in the  
header file. The new data file can be written alongside the old one if  
necessary at leisure, and only when the header file is renamed into  
place will the new data file come into effect. This avoids locks  
entirely.

Thoughts?

Regards,
Graham
--

Re: mod_disk_cache: making commit_entity() atomic

Posted by Graham Leggett <mi...@sharp.fm>.

On 20 Sep 2010, at 12:52 PM, Niklas Edmundsson wrote:

> As we cache files from an nfs mount, we hash on device:inode as a  
> simple method of reducing duplicates of files (say a dozen URL:s all  
> resolving to the same DVD image). We see a huge benefit of being  
> able to do this as we get a grotesque amount of data duplication  
> otherwise.
>
> So we usually have multiple header files all pointing to the same  
> data file.
>
> For the more generic cache it might also be useful provided that you  
> have a mechanism to identify duplicated data, the only thing I can  
> think of is hashing on the data block but that isn't really feasible  
> for large files. I suspect there might be cases where there exists  
> usecases with a backend that can provide hints for this though.

I think this use case is bordering on something that would need to be  
in it's own module, rather than trying to stretch mod_disk_cache to be  
aware of FILE buckets. Something like mod_diskfile_cache (or  
something, mod_file_cache already exists and probably should have been  
called mod_fd_cache, but oh well).

Hmmm...

I notice the interface for create_entity() in the cache provider  
doesn't pass the output bucket brigade through to the provider.

This would be useful in this case, because a dedicated file caching  
provider module might want to look inside the brigade to see if it  
contains a single FILE bucket, and if not, to DECLINE the request to  
cache.

Does such a change sound sensible?

     int (*create_entity) (cache_handle_t *h, request_rec *r,
                            const char *urlkey, apr_off_t len,  
apr_bucket_brigade *bb);

Regards,
Graham
--

Re: mod_disk_cache: making commit_entity() atomic

Posted by Niklas Edmundsson <ni...@acc.umu.se>.

On Fri, 17 Sep 2010, Graham Leggett wrote:

> On 17 Sep 2010, at 1:41 PM, Niklas Edmundsson wrote:
>
>> I personally favor designs needs at most O_EXCL style write locking.
>> 
>> Having been bitten by various lock-related issues over the years I'm in 
>> favor of a explicit-lock-free design if it can be done cleanly and with 
>> good performance.
>> 
>> If going this route, I'd suggest to put the entire path to the data file in 
>> the header and not just a uniqifying string (to make it easier to split 
>> hashing of header and data in the future).
>
> The problem with this is that if you bake the location of the file into the 
> cache, you would never be able to move the cache around.

Just store the path relative to the cache root.

> Is there a benefit to keeping headers and bodies separate?

As we cache files from an nfs mount, we hash on device:inode as a 
simple method of reducing duplicates of files (say a dozen URL:s all 
resolving to the same DVD image). We see a huge benefit of being able 
to do this as we get a grotesque amount of data duplication otherwise.

So we usually have multiple header files all pointing to the same data 
file.

For the more generic cache it might also be useful provided that you 
have a mechanism to identify duplicated data, the only thing I can 
think of is hashing on the data block but that isn't really feasible 
for large files. I suspect there might be cases where there exists 
usecases with a backend that can provide hints for this though.

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  "Come on, higher now! A watcher scoffs at gravity!" - Giles
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache: making commit_entity() atomic

Posted by Graham Leggett <mi...@sharp.fm>.

On 17 Sep 2010, at 1:41 PM, Niklas Edmundsson wrote:

> I personally favor designs needs at most O_EXCL style write locking.
>
> Having been bitten by various lock-related issues over the years I'm  
> in favor of a explicit-lock-free design if it can be done cleanly  
> and with good performance.
>
> If going this route, I'd suggest to put the entire path to the data  
> file in the header and not just a uniqifying string (to make it  
> easier to split hashing of header and data in the future).

The problem with this is that if you bake the location of the file  
into the cache, you would never be able to move the cache around.

Is there a benefit to keeping headers and bodies separate?

Regards,
Graham
--

Re: mod_disk_cache: making commit_entity() atomic

Posted by Niklas Edmundsson <ni...@acc.umu.se>.

On Thu, 16 Sep 2010, Graham Leggett wrote:

> The alternative is to change the format of the disk cache so that the .data 
> file has a temporary filename, for example XYZ.data.12367 instead of just 
> XYZ.data, and then to key the string "12367" in the header file. The new data 
> file can be written alongside the old one if necessary at leisure, and only 
> when the header file is renamed into place will the new data file come into 
> effect. This avoids locks entirely.

I personally favor designs needs at most O_EXCL style write locking.

Having been bitten by various lock-related issues over the years I'm 
in favor of a explicit-lock-free design if it can be done cleanly and 
with good performance.

If going this route, I'd suggest to put the entire path to the data 
file in the header and not just a uniqifying string (to make it easier 
to split hashing of header and data in the future).

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  Fight War, Not Wars!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=