You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2009/10/12 22:36:17 UTC

FSFS/svn-rep-sharing-stats questions

Recently I added svn-rep-sharing-stats to the repository [1].  (It aims to 
be a tool for calculating reference counts for representations in an FSFS 
repository.)

Since I'm not an FSFS expert, and wrote it partly based on the APIs and
partly based on "doing it this way seems to work", I'll appreciate
having some review on it.

(What can go wrong?  Anything, I suppose, from FS corruption bugs to
simple "doesn't calculate the refcounts correctly" bugs.)

Most of the logic is in the functions process() and especially
process_one_revision().

Thanks.

Daniel
(who considers copying fsfs-reshard.py's always-on "This isn't ready to be 
used on live data" warning)

[1] http://svn.collab.net/repos/svn/trunk/tools/server-side/svn-rep-sharing-stats.c

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406831

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Tue, 13 Oct 2009 at 16:04 +0200:
> Later I plan switch the hash to (revnum, offset) => (checksum, refcount) 
> in order to do the counting correctly.  As a side effect this will get rid 
> of the counter-inside-a-pointer tricks.

r39997.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407288

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel, thanks for the information.  I have already converted the code to 
use 'char *' in r39973.

Later I plan switch the hash to (revnum, offset) => (checksum, refcount) 
in order to do the counting correctly.  As a side effect this will get rid 
of the counter-inside-a-pointer tricks.

Thanks,

Daniel

Daniel Widenfalk wrote on Tue, 13 Oct 2009 at 08:30 +0200:
> David Glasser wrote:
> > <snip>
> > 
> > Also, I'm not really sure that casting unsigned int<->void* is
> > technically well-defined.
> 
> Hi all,
> 
> Excuse me for barging in like this but as I've got some
> cross compiler development background I feel like I could
> give you a small nugget of information here.
> 
> The cast is well defined but may not work on 64-bit
> architectures that define int as a 32 bit type (or any arch
> where a (default) pointer is larger than an int). You should
> consider using uintptr_t (or intptr_t), which is defined in
> <stddef.h>, whenever you need to do freaky stuff with pointers.
> Or simply cast to a char * (assuming, of course, that a char
> is 8 bits...).
> 
> Regards
> /Daniel Widenfalk
> 
> 
> 
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407103

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Widenfalk <Da...@iar.se>.
David Glasser wrote:
> <snip>
> 
> Also, I'm not really sure that casting unsigned int<->void* is
> technically well-defined.

Hi all,

Excuse me for barging in like this but as I've got some
cross compiler development background I feel like I could
give you a small nugget of information here.

The cast is well defined but may not work on 64-bit
architectures that define int as a 32 bit type (or any arch
where a (default) pointer is larger than an int). You should
consider using uintptr_t (or intptr_t), which is defined in
<stddef.h>, whenever you need to do freaky stuff with pointers.
Or simply cast to a char * (assuming, of course, that a char
is 8 bits...).

Regards
/Daniel Widenfalk

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406952

Re: FSFS/svn-rep-sharing-stats questions

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Oct 13, 2009 at 04:58:29AM +0200, Daniel Shahaf wrote:
> Branko Cibej wrote on Tue, 13 Oct 2009 at 04:39 +0200:
> > Daniel Shahaf wrote:
> > > Was trying to avoid malloc'ing an unsigned int.  But, indeed, I can't
> > > find a reference allowing the cast.  So I have two options:
> > >
> > > * malloc an unsigned int
> > >
> > > * use pointer arithmetic:
> > >      def increment():
> > >        void *p = apr_hash_get();
> > >        apr_hash_set(p+1)
> > >      def extract_final_value():
> > >        void *p = apr_hash_get();
> > >        return (p - NULL)
> > >   
> > 
> > Which is not strictly well-defined, either. :) void* doesn't have a
> > size, so you can't do arithmetic with it (never mind GCC's extension).
> > 
> 
> First, we use sizeof(void *) in our code.

Well, sizeof(void *) asks for the size of a generic pointer which has
constant size for a given architecture.

But in case of pointer arithmetic, e.g. "struct foo *p; ...; p++;",
you want the increment to skip the number of bytes in struct foo
("point at the next foo"). It refers to the size of the data structure
being pointed at, not the size of the pointer itself.

Stefan

-- 
Don't wait! Buy my new book "C for Subversion Developers" today!
Place your orders at http://stsp.name/C-for-Subversion-Developers.html
Exclusive discount for full committers.
Every purchase includes a 10% donation to the Subversion Corporation.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407022

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Tue, 13 Oct 2009 at 04:58 +0200:
> Branko Cibej wrote on Tue, 13 Oct 2009 at 04:39 +0200:
> >     * You could always do your counting with char* ...
> 
> I might go with that, then.

r39973.

Thanks,

Daniel

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406911

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Tue, 13 Oct 2009 at 16:04 +0200:
> Daniel Shahaf wrote:
> > Branko Cibej wrote on Tue, 13 Oct 2009 at 04:39 +0200:
> >> Which is not strictly well-defined, either. :) void* doesn't have a
> >> size, so you can't do arithmetic with it (never mind GCC's extension).
...
> (And yes, my  bad, I inadvertently wrote "void* doesn't have a size"
> when it should've been "void doesn't have a size". Sorry.)

*nod* I know about sizeof(void) being forbidden, but I didn't catch that
the example code was invalid because of this.  Thanks for pointing it
out.

Daniel

P.S.  Stefan, your URL is 404...

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407115

Re: FSFS/svn-rep-sharing-stats questions

Posted by Branko Cibej <br...@xbc.nu>.
Daniel Shahaf wrote:
> Branko Cibej wrote on Tue, 13 Oct 2009 at 04:39 +0200:
>   
>> Daniel Shahaf wrote:
>>     
>>> Was trying to avoid malloc'ing an unsigned int.  But, indeed, I can't
>>> find a reference allowing the cast.  So I have two options:
>>>
>>> * malloc an unsigned int
>>>
>>> * use pointer arithmetic:
>>>      def increment():
>>>        void *p = apr_hash_get();
>>>        apr_hash_set(p+1)
>>>      def extract_final_value():
>>>        void *p = apr_hash_get();
>>>        return (p - NULL)
>>>   
>>>       
>> Which is not strictly well-defined, either. :) void* doesn't have a
>> size, so you can't do arithmetic with it (never mind GCC's extension).
>>
>>     
>
> First, we use sizeof(void *) in our code.
>   

So we do. But as luck would have it, we don't use sizeof(void), which
your example code would require to be well-defined.

(And yes, my  bad, I inadvertently wrote "void* doesn't have a size"
when it should've been "void doesn't have a size". Sorry.)

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407102

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Cibej wrote on Tue, 13 Oct 2009 at 04:39 +0200:
> Daniel Shahaf wrote:
> > Was trying to avoid malloc'ing an unsigned int.  But, indeed, I can't
> > find a reference allowing the cast.  So I have two options:
> >
> > * malloc an unsigned int
> >
> > * use pointer arithmetic:
> >      def increment():
> >        void *p = apr_hash_get();
> >        apr_hash_set(p+1)
> >      def extract_final_value():
> >        void *p = apr_hash_get();
> >        return (p - NULL)
> >   
> 
> Which is not strictly well-defined, either. :) void* doesn't have a
> size, so you can't do arithmetic with it (never mind GCC's extension).
> 

First, we use sizeof(void *) in our code.

[[[
.../trunk/subversion% grep -R void . | grep sizeof | grep -v /.svn/text-base/
./bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:    void **result = apr_palloc(pool, sizeof(void *));
./bindings/swig/perl/libsvn_swig_perl/swigutil_pl.c:    void **result = apr_palloc(pool, sizeof(void *));
./bindings/swig/python/libsvn_swig_py/swigutil_py.c:    temp = apr_array_make(pool, targlen, sizeof(void *));
./bindings/swig/ruby/libsvn_swig_ruby/swigutil_rb.c:  result = apr_palloc(pool, sizeof(void *));
./bindings/swig/ruby/libsvn_swig_ruby/swigutil_rb.c:  *handler_baton = apr_palloc(*handler_pool, sizeof(void *));
./libsvn_client/commit.c:      batons = apr_array_make(pool, new_entries->nelts, sizeof(void *));
./libsvn_client/commit_util.c:        apr_array_make(subpool, commit_items->nelts, sizeof(void*));
./libsvn_delta/path_driver.c:  apr_array_header_t *db_stack = apr_array_make(pool, 4, sizeof(void *));
./libsvn_diff/diff_file.c:  memcpy((void *) (argv + 1), args->elts, sizeof(char*) * args->nelts);
./libsvn_ra_svn/client.c:  memcpy((void *) *argv, cmd_argv, n * sizeof(char *));
./mod_dav_svn/lock.c:      (void)apr_xml_parser_geterror(xml_parser, errbuf, sizeof(errbuf));
]]]

Second, I haven't tried to compile that code, but I'd be surprised if it
doesn't work (unless, e.g., arrays of void* are also not allowed).

> > I'll go with the first option :)
> >   
> 
> However from a *practical* standpoint:
> 
>     * C99 defines the types intptr_t and uintptr_t which are integer
>       types guaranteed to be large enough to store a void*'s value
>       without data loss;
>     * We don't use C99, but if you let configure find an appropriate
>       integer type, or use ptrdiff_t, it'll work for all practical
>       purposes since you control what goes in the hash.

apr_int64_t?

>     * You could always do your counting with char* ...
> 

I might go with that, then.

> -- Brane
> 


Thanks,

Daniel

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406907

Re: FSFS/svn-rep-sharing-stats questions

Posted by Branko Cibej <br...@xbc.nu>.
Daniel Shahaf wrote:
> David Glasser wrote on Mon, 12 Oct 2009 at 16:54 -0700:
>   
>> Also, I'm not really sure that casting unsigned int<->void* is
>> technically well-defined.
>>     
It's not, quite ...

> Was trying to avoid malloc'ing an unsigned int.  But, indeed, I can't
> find a reference allowing the cast.  So I have two options:
>
> * malloc an unsigned int
>
> * use pointer arithmetic:
>      def increment():
>        void *p = apr_hash_get();
>        apr_hash_set(p+1)
>      def extract_final_value():
>        void *p = apr_hash_get();
>        return (p - NULL)
>   

Which is not strictly well-defined, either. :) void* doesn't have a
size, so you can't do arithmetic with it (never mind GCC's extension).

> I'll go with the first option :)
>   

However from a *practical* standpoint:

    * C99 defines the types intptr_t and uintptr_t which are integer
      types guaranteed to be large enough to store a void*'s value
      without data loss;
    * We don't use C99, but if you let configure find an appropriate
      integer type, or use ptrdiff_t, it'll work for all practical
      purposes since you control what goes in the hash.
    * You could always do your counting with char* ...

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406900

Re: FSFS/svn-rep-sharing-stats questions

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
David Glasser wrote on Mon, 12 Oct 2009 at 16:54 -0700:
> I'm not clear on what exactly the data it prints is supposed to
> represent.  As implemented, it appears to be "the amount of
> rep-sharing possible on this repository, if it had been entirely
> written to in a world where rep-sharing exists".  Is that what you
> mean to have?  Or are you trying to calculate how much rep-sharing is
> actually accomplishing in the repository you're looking at?
> 

True, I compute the former but intended the latter.  I neglected to
check whether two reps with matching sha1's are also shared on disk.  In
other words, I should key the hashes on (rev file, offset) tuples rather
than on the sha1.

> Also, I'm not really sure that casting unsigned int<->void* is
> technically well-defined.
> 

Was trying to avoid malloc'ing an unsigned int.  But, indeed, I can't
find a reference allowing the cast.  So I have two options:

* malloc an unsigned int

* use pointer arithmetic:
     def increment():
       void *p = apr_hash_get();
       apr_hash_set(p+1)
     def extract_final_value():
       void *p = apr_hash_get();
       return (p - NULL)

I'll go with the first option :)

> --dave
> 

Thanks for the eyes,

Daniel

> On Mon, Oct 12, 2009 at 3:36 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Recently I added svn-rep-sharing-stats to the repository [1].  (It aims to
> > be a tool for calculating reference counts for representations in an FSFS
> > repository.)
> >
> > Since I'm not an FSFS expert, and wrote it partly based on the APIs and
> > partly based on "doing it this way seems to work", I'll appreciate
> > having some review on it.
> >
> > (What can go wrong?  Anything, I suppose, from FS corruption bugs to
> > simple "doesn't calculate the refcounts correctly" bugs.)
> >
> > Most of the logic is in the functions process() and especially
> > process_one_revision().
> >
> > Thanks.
> >
> > Daniel
> > (who considers copying fsfs-reshard.py's always-on "This isn't ready to be
> > used on live data" warning)
> >
> > [1] http://svn.collab.net/repos/svn/trunk/tools/server-side/svn-rep-sharing-stats.c
> >
> > ------------------------------------------------------
> > http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406831
> >
> 
> 
> 
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406885

Re: FSFS/svn-rep-sharing-stats questions

Posted by David Glasser <gl...@davidglasser.net>.
I'm not clear on what exactly the data it prints is supposed to
represent.  As implemented, it appears to be "the amount of
rep-sharing possible on this repository, if it had been entirely
written to in a world where rep-sharing exists".  Is that what you
mean to have?  Or are you trying to calculate how much rep-sharing is
actually accomplishing in the repository you're looking at?

Also, I'm not really sure that casting unsigned int<->void* is
technically well-defined.

--dave

On Mon, Oct 12, 2009 at 3:36 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Recently I added svn-rep-sharing-stats to the repository [1].  (It aims to
> be a tool for calculating reference counts for representations in an FSFS
> repository.)
>
> Since I'm not an FSFS expert, and wrote it partly based on the APIs and
> partly based on "doing it this way seems to work", I'll appreciate
> having some review on it.
>
> (What can go wrong?  Anything, I suppose, from FS corruption bugs to
> simple "doesn't calculate the refcounts correctly" bugs.)
>
> Most of the logic is in the functions process() and especially
> process_one_revision().
>
> Thanks.
>
> Daniel
> (who considers copying fsfs-reshard.py's always-on "This isn't ready to be
> used on live data" warning)
>
> [1] http://svn.collab.net/repos/svn/trunk/tools/server-side/svn-rep-sharing-stats.c
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406831
>



-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406850