You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2015/05/19 12:24:26 UTC

reversible fingerprint comment in membuffer code

Is this comment in cache-membuffer.c:combine_key correct?

      /* scramble key DATA.  All of this must be reversible to prevent key
       * collisions.  So, we limit ourselves to xor and permutations. */
      data[1] = (data[1] << 27) | (data[1] >> 37);
      data[1] ^= data[0] & 0xffff;
      data[0] ^= data[1] & APR_UINT64_C(0xffffffffffff0000);

I don't see why this needs to be reversible, and it's not clear it is
reversible.

The comment was added in r1458643 on the cache-server branch
http://svn.apache.org/viewvc?view=revision&revision=r1458643

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: reversible fingerprint comment in membuffer code

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Fuhrmann wrote:
> Julian Foad  wrote:
>> Stefan Sperling wrote:
>>> Is it worth extending the comment in the code with information from
>>> this thread?
>>
>> IMO, Yes it is worth it.
>
> O.k. Done in r1681960 and r1681965.

Thanks!

- Julian

Re: reversible fingerprint comment in membuffer code

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, May 25, 2015 at 11:26 PM, Julian Foad <ju...@btopenworld.com>
wrote:

> Stefan Sperling wrote:
> > Is it worth extending the comment in the code with information from
> > this thread?
>
> IMO, Yes it is worth it.
>

O.k. Done in r1681960 and r1681965.

-- Stefan^2.

Re: reversible fingerprint comment in membuffer code

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Sperling wrote:
> Is it worth extending the comment in the code with information from
> this thread?

IMO, Yes it is worth it.

- Julian

Re: reversible fingerprint comment in membuffer code

Posted by Stefan Sperling <st...@elego.de>.
On Wed, May 20, 2015 at 08:36:23AM +0200, Stefan Fuhrmann wrote:
> On Tue, May 19, 2015 at 8:25 PM, Philip Martin <ph...@wandisco.com>
> wrote:
> > I talked to Stefan and I believe I can explain.
 
> Yes.

> Exactly.

Is it worth extending the comment in the code with information from
this thread?

Re: reversible fingerprint comment in membuffer code

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, May 19, 2015 at 8:25 PM, Philip Martin <ph...@wandisco.com>
wrote:

> Philip Martin <ph...@wandisco.com> writes:
>
> > Is this comment in cache-membuffer.c:combine_key correct?
> >
> >       /* scramble key DATA.  All of this must be reversible to prevent
> key
> >        * collisions.  So, we limit ourselves to xor and permutations. */
> >       data[1] = (data[1] << 27) | (data[1] >> 37);
> >       data[1] ^= data[0] & 0xffff;
> >       data[0] ^= data[1] & APR_UINT64_C(0xffffffffffff0000);
> >
> > I don't see why this needs to be reversible, and it's not clear it is
> > reversible.
>
> I talked to Stefan and I believe I can explain.  Reversibility is not a
> hard requirement on trunk or 1.9 at present, since we store the full
> key, but may be a requirement in future.  This code is for keys that fit
> in the fingerprint and when generating the fingerprint from such keys we
> want to avoid introducing collisions, i.e. distinct keys should generate
> distinct fingerprints.  The cache would cope with collisions but is
> better without.  A reversible scramble is one way to ensure no
> collisions are introduced.
>

Yes. While the restriction implied by the comment does
not apply as strictly to /trunk, it will apply again once the
1.10-cache-improvements branch gets merged.

The other reason for the scramble is that it spreads the variation for
> small keys across more bytes of the fingerprint and thus across the
> cache.
>

Exactly.

-- Stefan^2.

Re: reversible fingerprint comment in membuffer code

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Is this comment in cache-membuffer.c:combine_key correct?
>
>       /* scramble key DATA.  All of this must be reversible to prevent key
>        * collisions.  So, we limit ourselves to xor and permutations. */
>       data[1] = (data[1] << 27) | (data[1] >> 37);
>       data[1] ^= data[0] & 0xffff;
>       data[0] ^= data[1] & APR_UINT64_C(0xffffffffffff0000);
>
> I don't see why this needs to be reversible, and it's not clear it is
> reversible.

I talked to Stefan and I believe I can explain.  Reversibility is not a
hard requirement on trunk or 1.9 at present, since we store the full
key, but may be a requirement in future.  This code is for keys that fit
in the fingerprint and when generating the fingerprint from such keys we
want to avoid introducing collisions, i.e. distinct keys should generate
distinct fingerprints.  The cache would cope with collisions but is
better without.  A reversible scramble is one way to ensure no
collisions are introduced.

The other reason for the scramble is that it spreads the variation for
small keys across more bytes of the fingerprint and thus across the
cache.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*