You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Blair Zajac <bl...@orcaware.com> on 2012/04/18 07:04:47 UTC

Changing svn_checksum__from_digest()'s signature

In case of an illegal svn_checksum_kind_t being passed to 
svn_checksum__from_digest(), I want to change it from

svn_checksum_t *
svn_checksum__from_digest(const unsigned char *digest,
                           svn_checksum_kind_t kind,
                           apr_pool_t *result_pool);

to

svn_error_t *
svn_checksum__from_digest(svn_checksum_t **checksum,
                           const unsigned char *digest,
                           svn_checksum_kind_t kind,
                           apr_pool_t *result_pool);

This is to prevent a core dump we've observed with our RPC server.

The function is in a public .h file but shown as private.  Do I need to 
add svn_checksum__from_digest2() or can I just change it?

Blair

Re: Changing svn_checksum__from_digest()'s signature

Posted by Greg Stein <gs...@gmail.com>.
On Apr 18, 2012 1:05 AM, "Blair Zajac" <bl...@orcaware.com> wrote:
>
> In case of an illegal svn_checksum_kind_t being passed to
svn_checksum__from_digest(), I want to change it from
>
> svn_checksum_t *
> svn_checksum__from_digest(const unsigned char *digest,
>                          svn_checksum_kind_t kind,
>                          apr_pool_t *result_pool);
>
> to
>
> svn_error_t *
> svn_checksum__from_digest(svn_checksum_t **checksum,
>                          const unsigned char *digest,
>                          svn_checksum_kind_t kind,
>                          apr_pool_t *result_pool);
>
> This is to prevent a core dump we've observed with our RPC server.
>
> The function is in a public .h file but shown as private.  Do I need to
add svn_checksum__from_digest2() or can I just change it?

I'd suggest moving it into private/svn_subr_private.h, and changing it to
whatever you need.

(read: it should never have been in a public .h file)

Cheers,
-g

Re: Changing svn_checksum__from_digest()'s signature

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Apr 18, 2012 at 09:47, Hyrum K Wright <hy...@wandisco.com> wrote:
> On Wed, Apr 18, 2012 at 8:18 AM, Julian Foad <ju...@btopenworld.com> wrote:
>...
>> In response to consistency: after writing the above I've just realized why this function really is different from the others.  Most of the checksum functions that take a 'kind' input will create whatever kind you like, independently of their other params.  On the other hand, *this* function takes two parameters that are required to agree with each other: the 'digest' parameter has to point to an MD5 digest iff 'kind'=md5, and to a SHA1 digest iff kind=sha1.  It's not like 'const char *digest' is the universal standard external representation of an arbitrary checksum kind; although it would be possible for any checksum kind to have an external repr that can be addressed as 'char *', the next kind we add might prefer to use a different external representation such as 'struct foo'.  So it's not wierd to have one 'checksum' constructor function per checksum kind.  That would be the normal pattern for constructor functions.
>
> That argument makes sense.

Agreed.

I also dislike functions that have this pattern:

function(type, ...)
{
  switch (type)
    case 1:
      do_something_for_1()
      break
    case 2:
      do_something_for_2()
      break;
}

IOW, just unroll the switch statement and make for_1 and for_2
publicly available. That is the API Julian suggested. And when TYPE is
a constant, it *really* makes sense to unroll.

One more thing: do we actually need svn_checksum__from_digest_sha1()
?? Are there any such digests anywhere in our system? We certainly
used MD5 digests in lots of our APIs and in our storage. But SHA1
digests?

Cheers,
-g

Re: Changing svn_checksum__from_digest()'s signature

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Wed, Apr 18, 2012 at 8:18 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Hyrum K Wright wrote:
>
>> On Wed, Apr 18, 2012 at 2:15 AM, Julian Foad wrote:
>>>  Blair Zajac wrote:
>>>>  In case of an illegal svn_checksum_kind_t being passed to
>>>>  svn_checksum__from_digest(), I want to change it from
>>>>
>>>>  svn_checksum_t *
>>>>  svn_checksum__from_digest(const unsigned char *digest,
>>>>                            svn_checksum_kind_t kind,
>>>>                            apr_pool_t *result_pool);
>>>>
>>>>  to
>>>>
>>>>  svn_error_t *
>>>>  svn_checksum__from_digest(svn_checksum_t **checksum,
>>>>                            const unsigned char *digest,
>>>>                            svn_checksum_kind_t kind,
>>>>                            apr_pool_t *result_pool);
>>>
>>>  Wouldn't a better API be:
>>>
>>>    svn_checksum_t *
>>>    svn_checksum__from_digest_md5(digest, result_pool);
>>>    svn_checksum_t *
>>>    svn_checksum__from_digest_sha1(digest, result_pool);
>>>
>>>  since all current callers want just one specific kind (apart from internal
>>> calls from the implementations of svn_checksum_dup and
>> svn_checksum_empty_checksum)?
>>
>> No.  This would be the only case where a specific checksum kind is
>> embedded in the API name rather than as a parameter.  The latter makes
>> extending the API easier in the future, and localizes the 'switch'
>> statement to that method.
>
> It *would* make extending the API easier and localize the 'switch' *if* there were parameterized calls... but there aren't any.  So there is no need for any 'switch' statement, in practice.  Look at the use cases.  There are 14 callers.  The 12 callers outside of checksum.c all want an already-known kind, not a parameterized kind.  The two internal callers each handle any kind, but one of them (_empty_checksum) already contains a switch on the kind.  That leaves only one caller, the implementation of _dup(), where a parameterized API could be beneficial, but such implementation is of course free to use local static functions and other short-cuts.

There aren't any such callers *now*.  That does not preclude the
addition of them in the future.

> We could certainly keep the parameterized API for cases when it would be useful, and if it were public that would be the right thing to do, but it's private.
>
>>  What would be the benefit of separate APIs?
>
> Consistency with the style of related APIs is good, granted.  Simplicity
>  of APIs is good too.  Simplicity prefers avoiding possible error
> conditions by design, rather than coding to detect and handle errors.
> Omitting the 'kind' input and the 'error' output (where the only possible error is 'you passed an invalid kind') makes a much simpler API.
>
> In response to consistency: after writing the above I've just realized why this function really is different from the others.  Most of the checksum functions that take a 'kind' input will create whatever kind you like, independently of their other params.  On the other hand, *this* function takes two parameters that are required to agree with each other: the 'digest' parameter has to point to an MD5 digest iff 'kind'=md5, and to a SHA1 digest iff kind=sha1.  It's not like 'const char *digest' is the universal standard external representation of an arbitrary checksum kind; although it would be possible for any checksum kind to have an external repr that can be addressed as 'char *', the next kind we add might prefer to use a different external representation such as 'struct foo'.  So it's not wierd to have one 'checksum' constructor function per checksum kind.  That would be the normal pattern for constructor functions.

That argument makes sense.

-Hyrum

>>>  Notice that svn_checksum_empty_checksum() returns NULL if the kind is not
>> recognized, while svn_checksum_dup() does no such check and would pass the
>> unknown kind into ...from_digest().
>>>
>>>  Therefore it appears that _dup() is the only caller that could have been
>> passing a bad kind -- at least in current trunk code; it may be different in
>> whatever version you're running.  And so, to fix the crash, you will still
>> need to decide how _dup() should handle a bogus checksum struct.  You could have
>> _dup() return NULL, but unless the callers check for null that would just defer
>> the crash.  It would be a little better to call SVN_ERR_MALFUNCTION_NO_RETURN(),
>> to give the client program a little more awareness of the abnormal termination.
>>>
>>>
>>>>  This is to prevent a core dump we've observed with our RPC server.
>>>>
>>>>  The function is in a public .h file but shown as private.  Do I need to
>> add
>>>>  svn_checksum__from_digest2() or can I just change it?
>>>
>>>  We're allowed to just change it, as I understand it.
>>>
>>>  (The closest thing I can find in 'hacking' is
>> <http://subversion.apache.org/docs/community-guide/conventions.html#interface-visibility>.
>> But that doesn't seem to mention the nuances of ABI vs. API and the issue
>> of, IIRC, all symbols in a shared library getting into a Windows DLL ABI even if
>> they're supposed to be private, or whatever exactly that issue was.)
>>
>> Nobody should be calling this function anyway, so ABI shouldn't be an
>> issue, IIUC.
>>
>> -Hyrum



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: Changing svn_checksum__from_digest()'s signature

Posted by Julian Foad <ju...@btopenworld.com>.
Hyrum K Wright wrote:

> On Wed, Apr 18, 2012 at 2:15 AM, Julian Foad wrote:
>>  Blair Zajac wrote:
>>>  In case of an illegal svn_checksum_kind_t being passed to
>>>  svn_checksum__from_digest(), I want to change it from
>>> 
>>>  svn_checksum_t *
>>>  svn_checksum__from_digest(const unsigned char *digest,
>>>                            svn_checksum_kind_t kind,
>>>                            apr_pool_t *result_pool);
>>> 
>>>  to
>>> 
>>>  svn_error_t *
>>>  svn_checksum__from_digest(svn_checksum_t **checksum,
>>>                            const unsigned char *digest,
>>>                            svn_checksum_kind_t kind,
>>>                            apr_pool_t *result_pool);
>> 
>>  Wouldn't a better API be:
>> 
>>    svn_checksum_t *
>>    svn_checksum__from_digest_md5(digest, result_pool);
>>    svn_checksum_t *
>>    svn_checksum__from_digest_sha1(digest, result_pool);
>> 
>>  since all current callers want just one specific kind (apart from internal 
>> calls from the implementations of svn_checksum_dup and 
> svn_checksum_empty_checksum)?
> 
> No.  This would be the only case where a specific checksum kind is
> embedded in the API name rather than as a parameter.  The latter makes
> extending the API easier in the future, and localizes the 'switch'
> statement to that method.

It *would* make extending the API easier and localize the 'switch' *if* there were parameterized calls... but there aren't any.  So there is no need for any 'switch' statement, in practice.  Look at the use cases.  There are 14 callers.  The 12 callers outside of checksum.c all want an already-known kind, not a parameterized kind.  The two internal callers each handle any kind, but one of them (_empty_checksum) already contains a switch on the kind.  That leaves only one caller, the implementation of _dup(), where a parameterized API could be beneficial, but such implementation is of course free to use local static functions and other short-cuts.

We could certainly keep the parameterized API for cases when it would be useful, and if it were public that would be the right thing to do, but it's private.

>  What would be the benefit of separate APIs?

Consistency with the style of related APIs is good, granted.  Simplicity
 of APIs is good too.  Simplicity prefers avoiding possible error 
conditions by design, rather than coding to detect and handle errors.  
Omitting the 'kind' input and the 'error' output (where the only possible error is 'you passed an invalid kind') makes a much simpler API.

In response to consistency: after writing the above I've just realized why this function really is different from the others.  Most of the checksum functions that take a 'kind' input will create whatever kind you like, independently of their other params.  On the other hand, *this* function takes two parameters that are required to agree with each other: the 'digest' parameter has to point to an MD5 digest iff 'kind'=md5, and to a SHA1 digest iff kind=sha1.  It's not like 'const char *digest' is the universal standard external representation of an arbitrary checksum kind; although it would be possible for any checksum kind to have an external repr that can be addressed as 'char *', the next kind we add might prefer to use a different external representation such as 'struct foo'.  So it's not wierd to have one 'checksum' constructor function per checksum kind.  That would be the normal pattern for constructor functions.

- Julian


>>  Notice that svn_checksum_empty_checksum() returns NULL if the kind is not 
> recognized, while svn_checksum_dup() does no such check and would pass the 
> unknown kind into ...from_digest().
>> 
>>  Therefore it appears that _dup() is the only caller that could have been 
> passing a bad kind -- at least in current trunk code; it may be different in 
> whatever version you're running.  And so, to fix the crash, you will still 
> need to decide how _dup() should handle a bogus checksum struct.  You could have 
> _dup() return NULL, but unless the callers check for null that would just defer 
> the crash.  It would be a little better to call SVN_ERR_MALFUNCTION_NO_RETURN(), 
> to give the client program a little more awareness of the abnormal termination.
>> 
>> 
>>>  This is to prevent a core dump we've observed with our RPC server.
>>> 
>>>  The function is in a public .h file but shown as private.  Do I need to 
> add
>>>  svn_checksum__from_digest2() or can I just change it?
>> 
>>  We're allowed to just change it, as I understand it.
>> 
>>  (The closest thing I can find in 'hacking' is 
> <http://subversion.apache.org/docs/community-guide/conventions.html#interface-visibility>.  
> But that doesn't seem to mention the nuances of ABI vs. API and the issue 
> of, IIRC, all symbols in a shared library getting into a Windows DLL ABI even if 
> they're supposed to be private, or whatever exactly that issue was.)
> 
> Nobody should be calling this function anyway, so ABI shouldn't be an
> issue, IIUC.
> 
> -Hyrum

Re: Changing svn_checksum__from_digest()'s signature

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Wed, Apr 18, 2012 at 2:15 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Blair Zajac wrote:
>
>> In case of an illegal svn_checksum_kind_t being passed to
>> svn_checksum__from_digest(), I want to change it from
>>
>> svn_checksum_t *
>> svn_checksum__from_digest(const unsigned char *digest,
>>                           svn_checksum_kind_t kind,
>>                           apr_pool_t *result_pool);
>>
>> to
>>
>> svn_error_t *
>> svn_checksum__from_digest(svn_checksum_t **checksum,
>>                           const unsigned char *digest,
>>                           svn_checksum_kind_t kind,
>>                           apr_pool_t *result_pool);
>
> Wouldn't a better API be:
>
>   svn_checksum_t *
>   svn_checksum__from_digest_md5(digest, result_pool);
>   svn_checksum_t *
>   svn_checksum__from_digest_sha1(digest, result_pool);
>
> since all current callers want just one specific kind (apart from internal calls from the implementations of svn_checksum_dup and svn_checksum_empty_checksum)?

No.  This would be the only case where a specific checksum kind is
embedded in the API name rather than as a parameter.  The latter makes
extending the API easier in the future, and localizes the 'switch'
statement to that method.  What would be the benefit of separate APIs?

> Notice that svn_checksum_empty_checksum() returns NULL if the kind is not recognized, while svn_checksum_dup() does no such check and would pass the unknown kind into ...from_digest().
>
> Therefore it appears that _dup() is the only caller that could have been passing a bad kind -- at least in current trunk code; it may be different in whatever version you're running.  And so, to fix the crash, you will still need to decide how _dup() should handle a bogus checksum struct.  You could have _dup() return NULL, but unless the callers check for null that would just defer the crash.  It would be a little better to call SVN_ERR_MALFUNCTION_NO_RETURN(), to give the client program a little more awareness of the abnormal termination.
>
>
>> This is to prevent a core dump we've observed with our RPC server.
>>
>> The function is in a public .h file but shown as private.  Do I need to add
>> svn_checksum__from_digest2() or can I just change it?
>
> We're allowed to just change it, as I understand it.
>
> (The closest thing I can find in 'hacking' is <http://subversion.apache.org/docs/community-guide/conventions.html#interface-visibility>.  But that doesn't seem to mention the nuances of ABI vs. API and the issue of, IIRC, all symbols in a shared library getting into a Windows DLL ABI even if they're supposed to be private, or whatever exactly that issue was.)

Nobody should be calling this function anyway, so ABI shouldn't be an
issue, IIUC.

-Hyrum



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: Changing svn_checksum__from_digest()'s signature

Posted by Blair Zajac <bl...@orcaware.com>.
On 04/18/2012 12:15 AM, Julian Foad wrote:
> Blair Zajac wrote:
>
>> In case of an illegal svn_checksum_kind_t being passed to
>> svn_checksum__from_digest(), I want to change it from
>>
>> svn_checksum_t *
>> svn_checksum__from_digest(const unsigned char *digest,
>>                            svn_checksum_kind_t kind,
>>                            apr_pool_t *result_pool);
>>
>> to
>>
>> svn_error_t *
>> svn_checksum__from_digest(svn_checksum_t **checksum,
>>                            const unsigned char *digest,
>>                            svn_checksum_kind_t kind,
>>                            apr_pool_t *result_pool);
>
> Wouldn't a better API be:
>
>    svn_checksum_t *
>    svn_checksum__from_digest_md5(digest, result_pool);
>    svn_checksum_t *
>    svn_checksum__from_digest_sha1(digest, result_pool);
>
> since all current callers want just one specific kind (apart from internal calls from the implementations of svn_checksum_dup and svn_checksum_empty_checksum)?
>
> Notice that svn_checksum_empty_checksum() returns NULL if the kind is not recognized, while svn_checksum_dup() does no such check and would pass the unknown kind into ...from_digest().
>
> Therefore it appears that _dup() is the only caller that could have been passing a bad kind -- at least in current trunk code; it may be different in whatever version you're running.  And so, to fix the crash, you will still need to decide how _dup() should handle a bogus checksum struct.  You could have _dup() return NULL, but unless the callers check for null that would just defer the crash.  It would be a little better to call SVN_ERR_MALFUNCTION_NO_RETURN(), to give the client program a little more awareness of the abnormal termination.

That's what I'm seeing, _dup() is calling __from_digest().  It appears 
to be a lifetime issue where it's trying to dup a svn_checksum_t * 
pointing to bad data.

It would be nice if _dup() returned a svn_error_t * instead of causing a 
core dump later or calling SVN_ERR_MALFUNCTION_NO_RETURN(), but that 
would be a messier API and not consistent with all our other _dup()'s. 
Too bad we're not working in C++ where we could throw an exception.

>> This is to prevent a core dump we've observed with our RPC server.
>>
>> The function is in a public .h file but shown as private.  Do I need to add
>> svn_checksum__from_digest2() or can I just change it?
>
> We're allowed to just change it, as I understand it.
>
> (The closest thing I can find in 'hacking' is<http://subversion.apache.org/docs/community-guide/conventions.html#interface-visibility>.  But that doesn't seem to mention the nuances of ABI vs. API and the issue of, IIRC, all symbols in a shared library getting into a Windows DLL ABI even if they're supposed to be private, or whatever exactly that issue was.)

Thanks.  I'm just deleting it following this discussion.

Blair



Re: Changing svn_checksum__from_digest()'s signature

Posted by Julian Foad <ju...@btopenworld.com>.
Blair Zajac wrote:

> In case of an illegal svn_checksum_kind_t being passed to 
> svn_checksum__from_digest(), I want to change it from
> 
> svn_checksum_t *
> svn_checksum__from_digest(const unsigned char *digest,
>                           svn_checksum_kind_t kind,
>                           apr_pool_t *result_pool);
> 
> to
> 
> svn_error_t *
> svn_checksum__from_digest(svn_checksum_t **checksum,
>                           const unsigned char *digest,
>                           svn_checksum_kind_t kind,
>                           apr_pool_t *result_pool);

Wouldn't a better API be:

  svn_checksum_t *
  svn_checksum__from_digest_md5(digest, result_pool);
  svn_checksum_t *
  svn_checksum__from_digest_sha1(digest, result_pool);

since all current callers want just one specific kind (apart from internal calls from the implementations of svn_checksum_dup and svn_checksum_empty_checksum)?

Notice that svn_checksum_empty_checksum() returns NULL if the kind is not recognized, while svn_checksum_dup() does no such check and would pass the unknown kind into ...from_digest().

Therefore it appears that _dup() is the only caller that could have been passing a bad kind -- at least in current trunk code; it may be different in whatever version you're running.  And so, to fix the crash, you will still need to decide how _dup() should handle a bogus checksum struct.  You could have _dup() return NULL, but unless the callers check for null that would just defer the crash.  It would be a little better to call SVN_ERR_MALFUNCTION_NO_RETURN(), to give the client program a little more awareness of the abnormal termination.


> This is to prevent a core dump we've observed with our RPC server.
> 
> The function is in a public .h file but shown as private.  Do I need to add 
> svn_checksum__from_digest2() or can I just change it?

We're allowed to just change it, as I understand it.

(The closest thing I can find in 'hacking' is <http://subversion.apache.org/docs/community-guide/conventions.html#interface-visibility>.  But that doesn't seem to mention the nuances of ABI vs. API and the issue of, IIRC, all symbols in a shared library getting into a Windows DLL ABI even if they're supposed to be private, or whatever exactly that issue was.)

- Julian