You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Reser <be...@reser.org> on 2014/02/20 00:00:01 UTC

fsfs-stats potentially busted on some architectures

Found via the following compiler warning on OS X:
/Users/breser/wandisco/share/wcs/svn-trunk/tools/server-side/fsfs-stats.c:1556:61:
warning: implicit conversion loses integer precision: 'long' to 'int'
[-Wshorten-64-to-32]
                                    ffd->youngest_rev_cache + 1,
                                    ~~~~~~~~~~~~~~~~~~~~~~~~^~~
1 warning generated.


The rest of the statement for that line:
[[[
  /* create data containers and caches */
  (*fs)->revisions = apr_array_make(pool,
                                    ffd->youngest_rev_cache + 1,
                                    sizeof(revision_info_t *));
]]]

apr_array_make is declared as:
apr_array_header_t * 	apr_array_make (apr_pool_t *p, int nelts, int elt_size)

int is only guaranteed to be 16 bits.  We support up to 32-bit revision numbers
(which technically some platforms supported up to 64-bit revision numbers until
we recently started enforcing that it stayed under 32-bits in our revision
string parser).

Thus running fsfs-stats on a machine with a 16-bit int and a repository with
more than 32768 revisions should have a problem.

I'm not sure if we should care about this.  I'm guessing we in general don't
support 16-bit architectures, which is where this is likely to be a problem.
We almost certainly have a lot worse problems elsewhere.

I only see two paths to fixing this:

1) Create a new APR array interface that uses something other than int for the
size of the array.

2) Rework the code so that it doesn't need to maintain information about every
revision.  However, this seems like it'd make things much slower due to needing
to look up rep sharing.

I'm tempted to commit the following for now:
[[[
Index: tools/server-side/fsfs-stats.c
===================================================================
--- tools/server-side/fsfs-stats.c      (revision 1569653)
+++ tools/server-side/fsfs-stats.c      (working copy)
@@ -1551,9 +1551,14 @@
   SVN_ERR(fs_open(fs, path, pool));
   ffd = (*fs)->fs->fsap_data;

-  /* create data containers and caches */
+  /* create data containers and caches
+   * Note: this assumes that int is at least 32-bits and that we only support
+   * 32-bit wide revision numbers (actually 31-bits due to the signedness
+   * of both the nelts field of the array and our revision numbers). This
+   * means this code will fail on platforms where int is less than 32-bits
+   * and the repository has more revisions than int can hold. */
   (*fs)->revisions = apr_array_make(pool,
-                                    ffd->youngest_rev_cache + 1,
+                                    (int) ffd->youngest_rev_cache + 1,
                                     sizeof(revision_info_t *));
   (*fs)->null_base = apr_pcalloc(pool, sizeof(*(*fs)->null_base));
   initialize_largest_changes(*fs, 64, pool);
]]]

Re: fsfs-stats potentially busted on some architectures

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Feb 20, 2014 at 8:00 PM, Ben Reser <be...@reser.org> wrote:

> On 2/20/14, 3:13 AM, Branko Čibej wrote:
> > On 20.02.2014 12:08, Stefan Fuhrmann wrote:
> >> On Thu, Feb 20, 2014 at 3:13 AM, Branko Čibej <br...@wandisco.com>
> wrote:
> >>> FWIW, I'd be fine with replacing all ints with apr_int32_t in places
> where
> >>> we interact with apr_array_header_t and similar. Widening promotions
> to int
> >>> would be automatic, and narrowing conversions on 16-bit platforms would
> >>> (presumably) make compilers yell very loudly.
> >>>
> >> I assume you mean in the SVN code, i.e. at least
> >> replace all "(int)"-style casts and then get most of
> >> the other places by tedious search-for-APR-macro-
> >> and-find-index-variable-declaration.
> >>
> >> That would be fine with me.
> >
> > Yup, that's what I meant. Use an explicitly 32-bit signed type in our
> code
> > instead of casting to int all over the place.
>
> Seems like a reasonable course of action to me.  But we're still going to
> have
> casts because of our use in apr_size_t in some APIs.
>

I tried that approach and realized how many places
use ints or unsigneds even in our API. So, I decided
to simply codify our implicit assumption that int is
32 bits or wider in r1570882 (with an option to override
that check). Another advantage is that we have a better
chance to actually take advantage of ILP64 platforms.

-- Stefan^2.

Re: fsfs-stats potentially busted on some architectures

Posted by Ben Reser <be...@reser.org>.
On 2/20/14, 3:13 AM, Branko Čibej wrote:
> On 20.02.2014 12:08, Stefan Fuhrmann wrote:
>> On Thu, Feb 20, 2014 at 3:13 AM, Branko Čibej <br...@wandisco.com> wrote:
>>> FWIW, I'd be fine with replacing all ints with apr_int32_t in places where
>>> we interact with apr_array_header_t and similar. Widening promotions to int
>>> would be automatic, and narrowing conversions on 16-bit platforms would
>>> (presumably) make compilers yell very loudly.
>>>
>> I assume you mean in the SVN code, i.e. at least
>> replace all "(int)"-style casts and then get most of
>> the other places by tedious search-for-APR-macro-
>> and-find-index-variable-declaration.
>>
>> That would be fine with me.
> 
> Yup, that's what I meant. Use an explicitly 32-bit signed type in our code
> instead of casting to int all over the place.

Seems like a reasonable course of action to me.  But we're still going to have
casts because of our use in apr_size_t in some APIs.

Re: fsfs-stats potentially busted on some architectures

Posted by Branko Čibej <br...@wandisco.com>.
On 20.02.2014 12:08, Stefan Fuhrmann wrote:
> On Thu, Feb 20, 2014 at 3:13 AM, Branko Čibej <br...@wandisco.com> wrote:
>
>>  On 20.02.2014 02:52, Ben Reser wrote:
>>
>> On 2/19/14, 5:30 PM, Branko Čibej wrote:
>>
>>  3) Do nothing. Trying to support 16-bit architectures is futile. No-one will
>> host Subversion servers on microcontrollers.
>>
>> SVN is already "broken" on systems like 16 bit DOS
> since we use those arrays (at least as intermediates)
> for things like the changed paths list and directories.
> Both would be limited to 32k entries.
>
>>  I'm perfectly fine with that decision.  I just wanted some feedback before I
>> hid the limitation with a typecast.
>>
>>
>> FWIW, I'd be fine with replacing all ints with apr_int32_t in places where
>> we interact with apr_array_header_t and similar. Widening promotions to int
>> would be automatic, and narrowing conversions on 16-bit platforms would
>> (presumably) make compilers yell very loudly.
>>
> I assume you mean in the SVN code, i.e. at least
> replace all "(int)"-style casts and then get most of
> the other places by tedious search-for-APR-macro-
> and-find-index-variable-declaration.
>
> That would be fine with me.

Yup, that's what I meant. Use an explicitly 32-bit signed type in our
code instead of casting to int all over the place.

> The only reason *not* to do that is marginally worse performance on some
>> 64-bit platforms.
>>
> If even. In super tight loops we access the elts
> array directly, so there is no further conversion
> needed.

Ack.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: fsfs-stats potentially busted on some architectures

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Feb 20, 2014 at 3:13 AM, Branko Čibej <br...@wandisco.com> wrote:

>  On 20.02.2014 02:52, Ben Reser wrote:
>
> On 2/19/14, 5:30 PM, Branko Čibej wrote:
>
>  3) Do nothing. Trying to support 16-bit architectures is futile. No-one will
> host Subversion servers on microcontrollers.
>
> SVN is already "broken" on systems like 16 bit DOS
since we use those arrays (at least as intermediates)
for things like the changed paths list and directories.
Both would be limited to 32k entries.

>  I'm perfectly fine with that decision.  I just wanted some feedback before I
> hid the limitation with a typecast.
>
>
> FWIW, I'd be fine with replacing all ints with apr_int32_t in places where
> we interact with apr_array_header_t and similar. Widening promotions to int
> would be automatic, and narrowing conversions on 16-bit platforms would
> (presumably) make compilers yell very loudly.
>

I assume you mean in the SVN code, i.e. at least
replace all "(int)"-style casts and then get most of
the other places by tedious search-for-APR-macro-
and-find-index-variable-declaration.

That would be fine with me.

The only reason *not* to do that is marginally worse performance on some
> 64-bit platforms.
>

If even. In super tight loops we access the elts
array directly, so there is no further conversion
needed.

-- Stefan^2.

Re: fsfs-stats potentially busted on some architectures

Posted by Branko Čibej <br...@wandisco.com>.
On 20.02.2014 02:52, Ben Reser wrote:
> On 2/19/14, 5:30 PM, Branko Čibej wrote:
>> 3) Do nothing. Trying to support 16-bit architectures is futile. No-one will
>> host Subversion servers on microcontrollers.
> I'm perfectly fine with that decision.  I just wanted some feedback before I
> hid the limitation with a typecast.

FWIW, I'd be fine with replacing all ints with apr_int32_t in places
where we interact with apr_array_header_t and similar. Widening
promotions to int would be automatic, and narrowing conversions on
16-bit platforms would (presumably) make compilers yell very loudly.

The only reason /not/ to do that is marginally worse performance on some
64-bit platforms.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: fsfs-stats potentially busted on some architectures

Posted by Ben Reser <be...@reser.org>.
On 2/19/14, 5:30 PM, Branko Čibej wrote:
> 3) Do nothing. Trying to support 16-bit architectures is futile. No-one will
> host Subversion servers on microcontrollers.

I'm perfectly fine with that decision.  I just wanted some feedback before I
hid the limitation with a typecast.

Re: fsfs-stats potentially busted on some architectures

Posted by Branko Čibej <br...@wandisco.com>.
On 20.02.2014 00:00, Ben Reser wrote:
> Found via the following compiler warning on OS X:
> /Users/breser/wandisco/share/wcs/svn-trunk/tools/server-side/fsfs-stats.c:1556:61:
> warning: implicit conversion loses integer precision: 'long' to 'int'
> [-Wshorten-64-to-32]
>                                     ffd->youngest_rev_cache + 1,
>                                     ~~~~~~~~~~~~~~~~~~~~~~~~^~~
> 1 warning generated.
>
>
> The rest of the statement for that line:
> [[[
>   /* create data containers and caches */
>   (*fs)->revisions = apr_array_make(pool,
>                                     ffd->youngest_rev_cache + 1,
>                                     sizeof(revision_info_t *));
> ]]]
>
> apr_array_make is declared as:
> apr_array_header_t * 	apr_array_make (apr_pool_t *p, int nelts, int elt_size)
>
> int is only guaranteed to be 16 bits.  We support up to 32-bit revision numbers
> (which technically some platforms supported up to 64-bit revision numbers until
> we recently started enforcing that it stayed under 32-bits in our revision
> string parser).
>
> Thus running fsfs-stats on a machine with a 16-bit int and a repository with
> more than 32768 revisions should have a problem.
>
> I'm not sure if we should care about this.  I'm guessing we in general don't
> support 16-bit architectures, which is where this is likely to be a problem.
> We almost certainly have a lot worse problems elsewhere.
>
> I only see two paths to fixing this:
>
> 1) Create a new APR array interface that uses something other than int for the
> size of the array.
>
> 2) Rework the code so that it doesn't need to maintain information about every
> revision.  However, this seems like it'd make things much slower due to needing
> to look up rep sharing.

3) Do nothing. Trying to support 16-bit architectures is futile. No-one
will host Subversion servers on microcontrollers.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com