You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by "William A. Rowe Jr." <wr...@rowe-clan.net> on 2010/06/09 18:30:45 UTC

mod_socache_shmcb bogosity?

Just noticed that our shmcb socache never replaces an identical node
on ->store, leading to multiple entries for the same id (with different
expiries and data, obviously).

Is this deliberate?  What is the distcache/memcached/dbm behavior, are
they all replacing the existing node?  What is the programming contract?

Re: mod_socache_shmcb bogosity?

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.

On 6/11/2010 5:58 AM, Joe Orton wrote:
> On Thu, Jun 10, 2010 at 07:28:55PM -0500, William Rowe wrote:
>> Have not checked specifically, the problem I observed was ejecting other
>> unexpired elts as other records were repeatedly updated.
>>
>> It seems the simple fix is to change the pre-store free logic,
>>
>>  * expire records [do this only if space needed?  right now, it's on all stores]
>>
>>  * if enospace, compress removed/expired records until space is available
>>
>>  * if still enospace, drop the first-to-expire [current behavior]
> 
> Looking at this... currently the code assumes that expiry times of the 
> objects are in time order, which is rather a leap of faith now.  It does 
> allow expiry to be extremely cheap though; you just have to move the 
> index pointers to the first still-valid entry.

Right - the expires shouldn't have to be in sequence, probably should be
presumed not to be in sequence.  However, in order to salvage space, we
need to move some bytes on occassion.  I'm thinking in terms of compressing
the most recent space to gain a hole, until we ultimately compress the
entire block.  If we keep these of 'managable size' (yes, define manageable)
then the impact should not be that great.

> I'd start to get rather worried about the performance impact of doing 
> anything more complicated, esp if by "compress" you mean starting to 
> actually moving data records about.

That's why I suggested a greater number of subcaches.  We need solid holes
if we are going to maintain the usability of the subcache.  But there is
no sense in compressing spaces we don't need 'now'.

>> Seem sensible?
>>
>>> w.r.t. contract: specific behaviour will vary across the providers; 
>>> could nail down the ->retrieve API with a bunch of stuff which is *not* 
>>> guaranteed, I suppose, is that what you're thinking of?  "object 
>>> returned may be out-of-date/stale, etc"
>>
>> If we know it may be stale, we should declare that (in the shmcb case, this
>> never happens, IIRC).  We obviously should declare this is a lossy cache.
> 
> Yeah, I was thinking of memcache w.r.t. staleness.  "lossy cache" is a 
> tautology!

:)

>> In other events, the current logic of max 256 caches keyed on the first byte
>> of ID is not sufficiently resuable.  What do folks suggest for the most efficient
>> hash method?
>>
>> I'm thinking we add to the hints that hashing is required, or assuring that
>> the data is 8-bit hashed already and leading bytes are just fine for choosing
>> subcaches.  In order to achieve the goal above of compressing records, it would
>> seem sensible to favor more subcaches with no more than 64k worth of elts, which
>> ensures any compressing memcpy's are more efficient.
> 
> Yes, this is a great point, and adding a "ids not random" hint seems 
> like a really excellent idea.
> 
> Could we just use apr_hashfunc_default() as the hash and XOR the bytes 
> of the returned int down to a single byte rather than import something 
> else?

Well, two bytes (if ) - but sure, apr_hashfunc_default() is fine.  That's sensible,
I'll work something up now.

Re: mod_socache_shmcb bogosity?

Posted by Joe Orton <jo...@redhat.com>.

On Thu, Jun 10, 2010 at 07:28:55PM -0500, William Rowe wrote:
> Have not checked specifically, the problem I observed was ejecting other
> unexpired elts as other records were repeatedly updated.
> 
> It seems the simple fix is to change the pre-store free logic,
> 
>  * expire records [do this only if space needed?  right now, it's on all stores]
>
>  * if enospace, compress removed/expired records until space is available
>
>  * if still enospace, drop the first-to-expire [current behavior]

Looking at this... currently the code assumes that expiry times of the 
objects are in time order, which is rather a leap of faith now.  It does 
allow expiry to be extremely cheap though; you just have to move the 
index pointers to the first still-valid entry.

I'd start to get rather worried about the performance impact of doing 
anything more complicated, esp if by "compress" you mean starting to 
actually moving data records about.

> Seem sensible?
> 
> > w.r.t. contract: specific behaviour will vary across the providers; 
> > could nail down the ->retrieve API with a bunch of stuff which is *not* 
> > guaranteed, I suppose, is that what you're thinking of?  "object 
> > returned may be out-of-date/stale, etc"
> 
> If we know it may be stale, we should declare that (in the shmcb case, this
> never happens, IIRC).  We obviously should declare this is a lossy cache.

Yeah, I was thinking of memcache w.r.t. staleness.  "lossy cache" is a 
tautology!

> In other events, the current logic of max 256 caches keyed on the first byte
> of ID is not sufficiently resuable.  What do folks suggest for the most efficient
> hash method?
> 
> I'm thinking we add to the hints that hashing is required, or assuring that
> the data is 8-bit hashed already and leading bytes are just fine for choosing
> subcaches.  In order to achieve the goal above of compressing records, it would
> seem sensible to favor more subcaches with no more than 64k worth of elts, which
> ensures any compressing memcpy's are more efficient.

Yes, this is a great point, and adding a "ids not random" hint seems 
like a really excellent idea.

Could we just use apr_hashfunc_default() as the hash and XOR the bytes 
of the returned int down to a single byte rather than import something 
else?

Regards, Joe

Re: mod_socache_shmcb bogosity?

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.

On 6/10/2010 7:15 AM, Joe Orton wrote:
> On Wed, Jun 09, 2010 at 11:30:45AM -0500, William Rowe wrote:
>> Just noticed that our shmcb socache never replaces an identical node
>> on ->store, leading to multiple entries for the same id (with different
>> expiries and data, obviously).
>>
>> Is this deliberate?  What is the distcache/memcached/dbm behavior, are
>> they all replacing the existing node?  What is the programming contract?
> 
> It's a cyclic buffer - the most recently stored entry should be returned 
> by a retrieve operation... the old entry should effectively be invisible 
> to the caller.  Is that not working?

Have not checked specifically, the problem I observed was ejecting other
unexpired elts as other records were repeatedly updated.

It seems the simple fix is to change the pre-store free logic,

 * expire records [do this only if space needed?  right now, it's on all stores]

 * if enospace, compress removed/expired records until space is available

 * if still enospace, drop the first-to-expire [current behavior]

Seem sensible?

> w.r.t. contract: specific behaviour will vary across the providers; 
> could nail down the ->retrieve API with a bunch of stuff which is *not* 
> guaranteed, I suppose, is that what you're thinking of?  "object 
> returned may be out-of-date/stale, etc"

If we know it may be stale, we should declare that (in the shmcb case, this
never happens, IIRC).  We obviously should declare this is a lossy cache.

In other events, the current logic of max 256 caches keyed on the first byte
of ID is not sufficiently resuable.  What do folks suggest for the most efficient
hash method?

I'm thinking we add to the hints that hashing is required, or assuring that
the data is 8-bit hashed already and leading bytes are just fine for choosing
subcaches.  In order to achieve the goal above of compressing records, it would
seem sensible to favor more subcaches with no more than 64k worth of elts, which
ensures any compressing memcpy's are more efficient.

Re: mod_socache_shmcb bogosity?

Posted by Joe Orton <jo...@redhat.com>.

On Wed, Jun 09, 2010 at 11:30:45AM -0500, William Rowe wrote:
> Just noticed that our shmcb socache never replaces an identical node
> on ->store, leading to multiple entries for the same id (with different
> expiries and data, obviously).
>
> Is this deliberate?  What is the distcache/memcached/dbm behavior, are
> they all replacing the existing node?  What is the programming contract?

It's a cyclic buffer - the most recently stored entry should be returned 
by a retrieve operation... the old entry should effectively be invisible 
to the caller.  Is that not working?

w.r.t. contract: specific behaviour will vary across the providers; 
could nail down the ->retrieve API with a bunch of stuff which is *not* 
guaranteed, I suppose, is that what you're thinking of?  "object 
returned may be out-of-date/stale, etc"

Regards, Joe