You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Niklas Edmundsson <ni...@acc.umu.se> on 2008/06/19 20:44:28 UTC

mod_disk_cache jumbopatch - 2.2.9 version

Hi all!

I've uploaded our most recent mod_disk_cache jumbopatch for httpd 
2.2.9 to https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 for 
those interested.

Includes the submitted fixes for:
"FIX htcacheclean" - Since we use a script that looks at atime we 
don't use htcacheclean. The fix looks sane though.

"FIX corruption in near-simultaneous requests of uncached files" - 
We're not hitting this, so it probably only affects usage in 
combination with proxies.

Major additional fixes:

- Adapt to recent APR sub-second file timestamps, meaning that we have
   to truncate to whole-second granularity when comparing http
   timestamps with file timestamps.
- Be a tad more clever when trying to detect corrupted files
   (commonly caused by a machine using xfs crashing). It's perfectly
   valid to have the consumed size smaller than the actual size, for
   example filesystems with compression (SUN ZFS). Also, don't check
   consumed size on new files since for example ZFS only updates it
   when data is commited to disk.

It works for us, and has survived the recent mozilla release so it 
should be fairly stable.

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  "No, no, no. I don't get wild. Wild on me, equals spaz." - Willow
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache jumbopatch - 2.2.9 version

Posted by "Akins, Brian" <Br...@turner.com>.
On 6/19/08 5:35 PM, "Niklas Edmundsson" <ni...@acc.umu.se> wrote:

> Brian: I remember you talking about some in-house modifications using
> DBM or something to track accesses to cached data and using it to find
> candidates to remove. Care to share?

On every cache store, we write out to a pipe some info (url, filename, time,
expire time, size, and probably some more stuff).  This pipe is read by a
really simple perl script that sticks this stuff in a mysql database.  The
actual "cache manager" is a fast-cgi script for "manual" ejections and is
ran "on the command line" via cron to "prune" the cache.  (it includes a
small bit on Inline C.)  This was done as a prototype, but works so well
that we kept it.  I think this was in one of my apachecon presentations.
I'll try to look it up.

Wish I could share source... That's a discussion  I have been having here
for 5 years...


BTW, XFS sucked for us.  It would randomly go into la-la land and nothing
could be read or written.  A mildly tuned ext3 seems to be good enough,
although some of our caches live in /dev/shm...

-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies


Re: mod_disk_cache jumbopatch - 2.2.9 version

Posted by Niklas Edmundsson <ni...@acc.umu.se>.
On Thu, 19 Jun 2008, Akins, Brian wrote:

> On 6/19/08 2:44 PM, "Niklas Edmundsson" <ni...@acc.umu.se> wrote:
>
>> Includes the submitted fixes for:
>> "FIX htcacheclean" - Since we use a script that looks at atime we
>> don't use htcacheclean. The fix looks sane though.
>
> FWIW, we saw a nice increase in performance when we turned off atime...  We
> eject things from the cache via an htcacheclean-like script that first
> ejects expired objects then trims based on expire time and size.

Hmm, I vaguely remember having had this discussion before :)

Anyway, we serve mainly big files, and thus don't get that many 
requests/s. The mozilla mirror is the extreme case in serving small 
files actually (a few MB), and even then we have no real trouble 
filling a GigE with 5ish year old hardware (provided good enough disks 
to use as cache, a couple of 10kRPM U160 SCSI drives is fine).

We use Linux+xfs, and it copes with atime updates rather well (I think 
they buffer the updates and delay commit of atime updates to disk, 
something any sane OS/filesystem combo should do).

One big plus is that using atime we don't accidentaly purge DVD isos 
that people are currently downloading on slow connections, a problem 
quite obvious with the only-update-lastaccess-on-request method.

The only real issue is cleaning when under high load and close to 
disk-full, it seems to take ages and forever to traverse the 
filesystem then.

Brian: I remember you talking about some in-house modifications using 
DBM or something to track accesses to cached data and using it to find 
candidates to remove. Care to share?

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     nikke@acc.umu.se
---------------------------------------------------------------------------
  The worst thing about censorship is ÛÛÛÛÛÛÛÛÛÛ.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_disk_cache jumbopatch - 2.2.9 version

Posted by "Akins, Brian" <Br...@turner.com>.
On 6/19/08 2:44 PM, "Niklas Edmundsson" <ni...@acc.umu.se> wrote:

> Includes the submitted fixes for:
> "FIX htcacheclean" - Since we use a script that looks at atime we
> don't use htcacheclean. The fix looks sane though.

FWIW, we saw a nice increase in performance when we turned off atime...  We
eject things from the cache via an htcacheclean-like script that first
ejects expired objects then trims based on expire time and size.
 


-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies