You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2014/03/07 23:11:24 UTC

svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Author: stefan2
Date: Fri Mar  7 22:11:24 2014
New Revision: 1575428

URL: http://svn.apache.org/r1575428
Log:
In FSFS, remove padding from cache key structs to simplify handling.
Also, document why we want them to be 16 bytes whenever we can help it.

* subversion/libsvn_fs_fs/fs.h
  (pair_cache_key_t): Widen revision type to eliminate padding on
                      platforms with sizeof(long) != 8.  Document the
                      struct size rationale.
  (window_cache_key_t): Document the struct size rationale.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/fs.h

Modified: subversion/trunk/subversion/libsvn_fs_fs/fs.h
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs.h?rev=1575428&r1=1575427&r2=1575428&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/fs.h (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/fs.h Fri Mar  7 22:11:24 2014
@@ -254,20 +254,27 @@ typedef struct fs_fs_dag_cache_t fs_fs_d
 
 /* Key type for all caches that use revision + offset / counter as key.
 
-   NOTE: always initialize this using calloc() or '= {0};'!  This is used
-   as a cache key and the padding bytes on 32 bit archs should be zero for
-   cache effectiveness. */
+   Note: Cache keys should be 16 bytes for best performance and there
+         should be no padding. */
 typedef struct pair_cache_key_t
 {
-  svn_revnum_t revision;
+  /* The object's revision.  Use the 64 data type to prevent padding. */
+  apr_int64_t revision;
 
+  /* Sub-address: item index, revprop generation, packed flag, etc. */
   apr_int64_t second;
 } pair_cache_key_t;
 
-/* Key type that identifies a txdelta window. */
+/* Key type that identifies a txdelta window.
+
+   Note: Cache keys should be 16 bytes for best performance and there
+         should be no padding. */
 typedef struct window_cache_key_t
 {
-  /* Revision that contains the representation */
+  /* Revision that contains the representation.
+     We limit this to 32 bit because it revision numbers are practically
+     limited to 32 bits by the RA layer, ATM, and we want to keep this
+     struct 16 bytes long. */
   apr_uint32_t revision;
 
   /* Window number within that representation */



AW: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Markus Schaber <m....@codesys.com>.
Hi,

Von: Stefan Fuhrmann [mailto:stefan.fuhrmann@wandisco.com] 
>> Are you saying that Valgrind will
>> complain about uninitialized memory because of the padding on 64-bit Windows
>> (since it's going to pad due to the 64-bit word size but long is still 32-bits)?

> If there was valgrind on Windows (which is the next
> problem with verification), it would spot the first place
> where either the padding or any copy of it gets read.
> In the case of cache keys, that would be trivially the
> place when we use the struct to calculate the cache-
> internal bucket index.

I did not try it myself, but http://sourceforge.net/projects/valgrind4win/ 
aims to port valgrind for windows. It is currently "alpha", but also has
a 5 star rating, so maybe it is already useful for some cases.


Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH 

Inspiring Automation Solutions 
________________________________________
3S-Smart Software Solutions GmbH 
Dipl.-Inf. Markus Schaber | Product Development Core Technology 
Memminger Str. 151 | 87439 Kempten | Germany 
Tel. +49-831-54031-979 | Fax +49-831-54031-50 

E-Mail: m.schaber@codesys.com | Web: codesys.com | CODESYS store: store.codesys.com 
CODESYS forum: forum.codesys.com 

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Mar 8, 2014 at 12:25 AM, Ben Reser <be...@reser.org> wrote:

> On 3/7/14, 3:11 PM, Stefan Fuhrmann wrote:
> > Two comments:
> >
> > * The keys are being used one-way, i.e. any type > input type will do
>
> *nod* I realize it's safe.  But the problem is that we know we have a type
> that
> isn't stable for revnum's across platforms.  Say we do Subversion 2.0 and
> we
> want to fix it.  If we're doing stuff like this we then have to go around
> and
> find all the places where we used some other type to represent a revision
> number.
>

That is true and hard to avoid: Ultimately, we need to
map to some on-disk representation that may not
allow for open addressing.

My idea here is to enable Subversion to be built using
a C++ compiler and add some magic template header
(I posted such a thing 4 years ago or so - it's not difficult)
that prevents implicit assignment between different
integer types - even if they are the same fundamental
C type. That header would only be activated using
some configure option and not affect production code.

Can we at least have these private types for this sort of thing:
> typedef apr_int32_t svn__revnum32_t
> typedef apr_int64_t svn__revnum64_t
>

Good idea. I think, we should define those and make
the FS2 API use svn__revnum64_t consistently. It is
still a fixed / limited size value but clearly "big enough"
and "grep-able".


> If we can avoid the svn__revnum32_t that'd be nice since the conversion
> from
> svn_revnum_t to svn__revnum64_t is always safe (at least until someone
> shows up
> with a platform with 128-bit longs), while svn_revnum_t to svn__revnum32_t
> is
> not.  Of course in practice all conversions would always be safe in our
> code.
>
> With the private types it'll be easy to find these cases in the future.  It
> also self documents when things that aren't necessarily obviously revision
> numbers.
>

IIRC, I use 32 bit revnums in a few places in FSX, ATM.
But that code is either very much in the flux or even
destined to be removed at some point this year (FSFS
leftovers).

Don't bother fixing types and conversions over there
right now unless you think there is ordinary bug. I will
go over the code about 2 weeks before 1.9-beta and
replicate the relevant changes from FSFS to FSX.


> > * Padding is horrible because it inserts sections of random data
> >   into a *hash* key. Valgrind might find these instances, if run on
> Windows.
>
> Unless you initialize the keys before filling them, but I'm guessing you
> want
> to avoid that for performance reasons.


Performance is not the problem here as the compiler
*should* remove redundant initialization. The actual
performance issue is <=16 byte structs (can be used
for cache addressing as is) vs. other keys (require a
MD5 calculation just to figure out what cache array
index to use).

Concerning { 0 } initialization, it is very hard to verify
that we actually do it everywhere. Default initializers
had been frowned upon on this list on several occasions
over the last couple of years or so. *I* am happy to use
them but they are easy to omit and a maintenance hazard.


>  Are you saying that Valgrind will
> complain about uninitialized memory because of the padding on 64-bit
> Windows
> (since it's going to pad due to the 64-bit word size but long is still
> 32-bits)?
>

If there was valgrind on Windows (which is the next
problem with verification), it would spot the first place
where either the padding or any copy of it gets read.
In the case of cache keys, that would be trivially the
place when we use the struct to calculate the cache-
internal bucket index.

-- Stefan^2.

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Mar 8, 2014 at 12:25 AM, Ben Reser <be...@reser.org> wrote:

> On 3/7/14, 3:11 PM, Stefan Fuhrmann wrote:
> > Two comments:
> >
> > * The keys are being used one-way, i.e. any type > input type will do
>
> *nod* I realize it's safe.  But the problem is that we know we have a type
> that
> isn't stable for revnum's across platforms.  Say we do Subversion 2.0 and
> we
> want to fix it.  If we're doing stuff like this we then have to go around
> and
> find all the places where we used some other type to represent a revision
> number.
>

That is true and hard to avoid: Ultimately, we need to
map to some on-disk representation that may not
allow for open addressing.

My idea here is to enable Subversion to be built using
a C++ compiler and add some magic template header
(I posted such a thing 4 years ago or so - it's not difficult)
that prevents implicit assignment between different
integer types - even if they are the same fundamental
C type. That header would only be activated using
some configure option and not affect production code.

Can we at least have these private types for this sort of thing:
> typedef apr_int32_t svn__revnum32_t
> typedef apr_int64_t svn__revnum64_t
>

Good idea. I think, we should define those and make
the FS2 API use svn__revnum64_t consistently. It is
still a fixed / limited size value but clearly "big enough"
and "grep-able".


> If we can avoid the svn__revnum32_t that'd be nice since the conversion
> from
> svn_revnum_t to svn__revnum64_t is always safe (at least until someone
> shows up
> with a platform with 128-bit longs), while svn_revnum_t to svn__revnum32_t
> is
> not.  Of course in practice all conversions would always be safe in our
> code.
>
> With the private types it'll be easy to find these cases in the future.  It
> also self documents when things that aren't necessarily obviously revision
> numbers.
>

IIRC, I use 32 bit revnums in a few places in FSX, ATM.
But that code is either very much in the flux or even
destined to be removed at some point this year (FSFS
leftovers).

Don't bother fixing types and conversions over there
right now unless you think there is ordinary bug. I will
go over the code about 2 weeks before 1.9-beta and
replicate the relevant changes from FSFS to FSX.


> > * Padding is horrible because it inserts sections of random data
> >   into a *hash* key. Valgrind might find these instances, if run on
> Windows.
>
> Unless you initialize the keys before filling them, but I'm guessing you
> want
> to avoid that for performance reasons.


Performance is not the problem here as the compiler
*should* remove redundant initialization. The actual
performance issue is <=16 byte structs (can be used
for cache addressing as is) vs. other keys (require a
MD5 calculation just to figure out what cache array
index to use).

Concerning { 0 } initialization, it is very hard to verify
that we actually do it everywhere. Default initializers
had been frowned upon on this list on several occasions
over the last couple of years or so. *I* am happy to use
them but they are easy to omit and a maintenance hazard.


>  Are you saying that Valgrind will
> complain about uninitialized memory because of the padding on 64-bit
> Windows
> (since it's going to pad due to the 64-bit word size but long is still
> 32-bits)?
>

If there was valgrind on Windows (which is the next
problem with verification), it would spot the first place
where either the padding or any copy of it gets read.
In the case of cache keys, that would be trivially the
place when we use the struct to calculate the cache-
internal bucket index.

-- Stefan^2.

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Ben Reser <be...@reser.org>.
On 3/7/14, 3:11 PM, Stefan Fuhrmann wrote:
> Two comments:
> 
> * The keys are being used one-way, i.e. any type > input type will do

*nod* I realize it's safe.  But the problem is that we know we have a type that
isn't stable for revnum's across platforms.  Say we do Subversion 2.0 and we
want to fix it.  If we're doing stuff like this we then have to go around and
find all the places where we used some other type to represent a revision number.

Can we at least have these private types for this sort of thing:
typedef apr_int32_t svn__revnum32_t
typedef apr_int64_t svn__revnum64_t

If we can avoid the svn__revnum32_t that'd be nice since the conversion from
svn_revnum_t to svn__revnum64_t is always safe (at least until someone shows up
with a platform with 128-bit longs), while svn_revnum_t to svn__revnum32_t is
not.  Of course in practice all conversions would always be safe in our code.

With the private types it'll be easy to find these cases in the future.  It
also self documents when things that aren't necessarily obviously revision numbers.

> * Padding is horrible because it inserts sections of random data
>   into a *hash* key. Valgrind might find these instances, if run on Windows.

Unless you initialize the keys before filling them, but I'm guessing you want
to avoid that for performance reasons.  Are you saying that Valgrind will
complain about uninitialized memory because of the padding on 64-bit Windows
(since it's going to pad due to the 64-bit word size but long is still 32-bits)?

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Ben Reser <be...@reser.org>.
On 3/7/14, 3:11 PM, Stefan Fuhrmann wrote:
> Two comments:
> 
> * The keys are being used one-way, i.e. any type > input type will do

*nod* I realize it's safe.  But the problem is that we know we have a type that
isn't stable for revnum's across platforms.  Say we do Subversion 2.0 and we
want to fix it.  If we're doing stuff like this we then have to go around and
find all the places where we used some other type to represent a revision number.

Can we at least have these private types for this sort of thing:
typedef apr_int32_t svn__revnum32_t
typedef apr_int64_t svn__revnum64_t

If we can avoid the svn__revnum32_t that'd be nice since the conversion from
svn_revnum_t to svn__revnum64_t is always safe (at least until someone shows up
with a platform with 128-bit longs), while svn_revnum_t to svn__revnum32_t is
not.  Of course in practice all conversions would always be safe in our code.

With the private types it'll be easy to find these cases in the future.  It
also self documents when things that aren't necessarily obviously revision numbers.

> * Padding is horrible because it inserts sections of random data
>   into a *hash* key. Valgrind might find these instances, if run on Windows.

Unless you initialize the keys before filling them, but I'm guessing you want
to avoid that for performance reasons.  Are you saying that Valgrind will
complain about uninitialized memory because of the padding on 64-bit Windows
(since it's going to pad due to the 64-bit word size but long is still 32-bits)?

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Fri, Mar 7, 2014 at 11:32 PM, Ben Reser <be...@reser.org> wrote:

> This is downright horrible for code maintainability.  We have a type for
> revision numbers.
>
> It's bad enough we don't have a stable type for revision numbers across
> platforms.  But now you're abusing it and using 32-bit integers in some
> places
> for them and 64-bit values in others.
>
> Now that you're doing this I'm now of the opinion that I should veto the
> code
> that limits them to 32-bits at the RA layer.  Because all it's done is
> encourage us to do things like this.


Thanks for the review. r1575447 fix the inconsistency.

Two comments:

* The keys are being used one-way, i.e. any type > input type will do
* Padding is horrible because it inserts sections of random data
  into a *hash* key. Valgrind might find these instances, if run on Windows.

-- Stefan^2.

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Fri, Mar 7, 2014 at 11:32 PM, Ben Reser <be...@reser.org> wrote:

> This is downright horrible for code maintainability.  We have a type for
> revision numbers.
>
> It's bad enough we don't have a stable type for revision numbers across
> platforms.  But now you're abusing it and using 32-bit integers in some
> places
> for them and 64-bit values in others.
>
> Now that you're doing this I'm now of the opinion that I should veto the
> code
> that limits them to 32-bits at the RA layer.  Because all it's done is
> encourage us to do things like this.


Thanks for the review. r1575447 fix the inconsistency.

Two comments:

* The keys are being used one-way, i.e. any type > input type will do
* Padding is horrible because it inserts sections of random data
  into a *hash* key. Valgrind might find these instances, if run on Windows.

-- Stefan^2.

Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Ben Reser <be...@reser.org>.
This is downright horrible for code maintainability.  We have a type for
revision numbers.

It's bad enough we don't have a stable type for revision numbers across
platforms.  But now you're abusing it and using 32-bit integers in some places
for them and 64-bit values in others.

Now that you're doing this I'm now of the opinion that I should veto the code
that limits them to 32-bits at the RA layer.  Because all it's done is
encourage us to do things like this.

On 3/7/14, 2:11 PM, stefan2@apache.org wrote:
> Author: stefan2
> Date: Fri Mar  7 22:11:24 2014
> New Revision: 1575428
> 
> URL: http://svn.apache.org/r1575428
> Log:
> In FSFS, remove padding from cache key structs to simplify handling.
> Also, document why we want them to be 16 bytes whenever we can help it.
> 
> * subversion/libsvn_fs_fs/fs.h
>   (pair_cache_key_t): Widen revision type to eliminate padding on
>                       platforms with sizeof(long) != 8.  Document the
>                       struct size rationale.
>   (window_cache_key_t): Document the struct size rationale.
> 
> Modified:
>     subversion/trunk/subversion/libsvn_fs_fs/fs.h
> 
> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs.h
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs.h?rev=1575428&r1=1575427&r2=1575428&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs.h (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs.h Fri Mar  7 22:11:24 2014
> @@ -254,20 +254,27 @@ typedef struct fs_fs_dag_cache_t fs_fs_d
>  
>  /* Key type for all caches that use revision + offset / counter as key.
>  
> -   NOTE: always initialize this using calloc() or '= {0};'!  This is used
> -   as a cache key and the padding bytes on 32 bit archs should be zero for
> -   cache effectiveness. */
> +   Note: Cache keys should be 16 bytes for best performance and there
> +         should be no padding. */
>  typedef struct pair_cache_key_t
>  {
> -  svn_revnum_t revision;
> +  /* The object's revision.  Use the 64 data type to prevent padding. */
> +  apr_int64_t revision;
>  
> +  /* Sub-address: item index, revprop generation, packed flag, etc. */
>    apr_int64_t second;
>  } pair_cache_key_t;
>  
> -/* Key type that identifies a txdelta window. */
> +/* Key type that identifies a txdelta window.
> +
> +   Note: Cache keys should be 16 bytes for best performance and there
> +         should be no padding. */
>  typedef struct window_cache_key_t
>  {
> -  /* Revision that contains the representation */
> +  /* Revision that contains the representation.
> +     We limit this to 32 bit because it revision numbers are practically
> +     limited to 32 bits by the RA layer, ATM, and we want to keep this
> +     struct 16 bytes long. */
>    apr_uint32_t revision;
>  
>    /* Window number within that representation */
> 
> 


Re: svn commit: r1575428 - /subversion/trunk/subversion/libsvn_fs_fs/fs.h

Posted by Ben Reser <be...@reser.org>.
This is downright horrible for code maintainability.  We have a type for
revision numbers.

It's bad enough we don't have a stable type for revision numbers across
platforms.  But now you're abusing it and using 32-bit integers in some places
for them and 64-bit values in others.

Now that you're doing this I'm now of the opinion that I should veto the code
that limits them to 32-bits at the RA layer.  Because all it's done is
encourage us to do things like this.

On 3/7/14, 2:11 PM, stefan2@apache.org wrote:
> Author: stefan2
> Date: Fri Mar  7 22:11:24 2014
> New Revision: 1575428
> 
> URL: http://svn.apache.org/r1575428
> Log:
> In FSFS, remove padding from cache key structs to simplify handling.
> Also, document why we want them to be 16 bytes whenever we can help it.
> 
> * subversion/libsvn_fs_fs/fs.h
>   (pair_cache_key_t): Widen revision type to eliminate padding on
>                       platforms with sizeof(long) != 8.  Document the
>                       struct size rationale.
>   (window_cache_key_t): Document the struct size rationale.
> 
> Modified:
>     subversion/trunk/subversion/libsvn_fs_fs/fs.h
> 
> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs.h
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs.h?rev=1575428&r1=1575427&r2=1575428&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs.h (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs.h Fri Mar  7 22:11:24 2014
> @@ -254,20 +254,27 @@ typedef struct fs_fs_dag_cache_t fs_fs_d
>  
>  /* Key type for all caches that use revision + offset / counter as key.
>  
> -   NOTE: always initialize this using calloc() or '= {0};'!  This is used
> -   as a cache key and the padding bytes on 32 bit archs should be zero for
> -   cache effectiveness. */
> +   Note: Cache keys should be 16 bytes for best performance and there
> +         should be no padding. */
>  typedef struct pair_cache_key_t
>  {
> -  svn_revnum_t revision;
> +  /* The object's revision.  Use the 64 data type to prevent padding. */
> +  apr_int64_t revision;
>  
> +  /* Sub-address: item index, revprop generation, packed flag, etc. */
>    apr_int64_t second;
>  } pair_cache_key_t;
>  
> -/* Key type that identifies a txdelta window. */
> +/* Key type that identifies a txdelta window.
> +
> +   Note: Cache keys should be 16 bytes for best performance and there
> +         should be no padding. */
>  typedef struct window_cache_key_t
>  {
> -  /* Revision that contains the representation */
> +  /* Revision that contains the representation.
> +     We limit this to 32 bit because it revision numbers are practically
> +     limited to 32 bits by the RA layer, ATM, and we want to keep this
> +     struct 16 bytes long. */
>    apr_uint32_t revision;
>  
>    /* Window number within that representation */
> 
>