You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Neels Janosch Hofmeyr <ne...@elego.de> on 2009/05/19 02:16:13 UTC

checksum / compare_files question

Hi all,

I came across this snippet of code:

subversion/libsvn_subr/checksum.c:
[[[
svn_boolean_t
svn_checksum_match(const svn_checksum_t *checksum1,
                   const svn_checksum_t *checksum2)
{
  if (checksum1 == NULL || checksum2 == NULL)
    return TRUE;
]]]

I don't see how that makes sense. If it does, then the function name is
misleading. The comment from include/checksum.h is ... tricky:

[[[
/** Compare checksums @a checksum1 and @a checksum2.  If their kinds do not
 * match or if neither is all zeros, and their content does not match, then
 * return FALSE; else return TRUE.
 *
 * @since New in 1.6.
 */
svn_boolean_t
svn_checksum_match(const svn_checksum_t *checksum1,
                   const svn_checksum_t *checksum2);
]]]


So, which way should I fix this. Like this:
[[[
  if (checksum1 == NULL || checksum2 == NULL)
    return (checksum1 == checksum2)? TRUE:FALSE;
]]]
[[[
/** Compare checksums @a checksum1 and @a checksum2.  Return TRUE if both
 * their kinds and content match, FALSE otherwise.
]]]


Or like this:
[[[
/** Compare checksums @a checksum1 and @a checksum2.  Return TRUE if both
 * their kinds and content match, FALSE otherwise. However, if one of the
 * checksums is NULL, return TRUE, even if the other one is non-NULL.
]]]

?

I am not really getting though what the reference "if neither is all zeros"
in the original comment is trying to say -- whether it's the NULLness of the
pointer or whether the checksum digits are all '0'.

Thanks,
~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2305880

Re: checksum / compare_files question

Posted by Edmund Wong <ed...@belfordhk.com>.
Neels Janosch Hofmeyr wrote:
> Hi all,
> 
> I came across this snippet of code:
> 
> subversion/libsvn_subr/checksum.c:
> [[[
> svn_boolean_t
> svn_checksum_match(const svn_checksum_t *checksum1,
>                    const svn_checksum_t *checksum2)
> {
>   if (checksum1 == NULL || checksum2 == NULL)
>     return TRUE;
> ]]]
> 
> I don't see how that makes sense. If it does, then the function name is
> misleading. The comment from include/checksum.h is ... tricky:

I would hazard a guess that it should be returning 'FALSE' and not
'TRUE', which would explain the 'make no sense' part.  If it return'd
TRUE, wouldn't something later on choke thinking that the two checksums
equal to each other, when in fact, they don't?

> 
> [[[
> /** Compare checksums @a checksum1 and @a checksum2.  If their kinds do not
>  * match or if neither is all zeros, and their content does not match, then
>  * return FALSE; else return TRUE.
>  *
>  * @since New in 1.6.
>  */
> svn_boolean_t
> svn_checksum_match(const svn_checksum_t *checksum1,
>                    const svn_checksum_t *checksum2);
> ]]]
> 
> 
> So, which way should I fix this. Like this:
> [[[
>   if (checksum1 == NULL || checksum2 == NULL)
>     return (checksum1 == checksum2)? TRUE:FALSE;
> ]]]

This means if both were NULL, it will return TRUE.
Otherwise FALSE, right?

Perhaps my understanding of this function isn't to
clear.  If I've made any errors in concept, please
correct.  I would think if they were both NULL, that
this would turn out to be FALSE; because,  the point
of this function is to determine the 'valid sameness'
of the checksums.   If either or both checksums are
NULL, then would that not throw the validity out the
window, so to speak?   Whatever called this function
has basically supplied two NULL pointers.  Sure, they're
the same, but it defeats the purpose of this
function?


> [[[
> /** Compare checksums @a checksum1 and @a checksum2.  Return TRUE if both
>  * their kinds and content match, FALSE otherwise.
> ]]]
> 
> 
> Or like this:
> [[[
> /** Compare checksums @a checksum1 and @a checksum2.  Return TRUE if both
>  * their kinds and content match, FALSE otherwise. However, if one of the
>  * checksums is NULL, return TRUE, even if the other one is non-NULL.
> ]]]

That last part would implicate checksum1 != checksum2, which AFAIK,
should still return FALSE.

> 
> I am not really getting though what the reference "if neither is all zeros"
> in the original comment is trying to say -- whether it's the NULLness of the
> pointer or whether the checksum digits are all '0'.

I would think the reference is to check if the checksum digits are all
'0's , and not the NULLness of the pointers.

Any corrections in my understanding definitely appreciated.

Edmund

Btw, I don't know if I'm in the right position to say this, but
nice find, Neels.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2307172

Re: checksum / compare_files question

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tue, May 19, 2009 at 6:12 PM, Neels Janosch Hofmeyr <ne...@elego.de> wrote:
[snip]
> I'd agree, but have one problem remaining: is "the all-zeros wildcard"
> applicable to both SHA1 and MD5 checksums? Futhermore, is it then desired
> that an MD5 wildcard matches a SHA1 checksum (and vice versa)?
>
> The current code returns true for a wildcard checksum, even if the kinds
> were originally different. Say if I give it an MD5 all-zeros and check it
> against any SHA1 checksum, I still get a match.

I personally don't think this is a huge issue.  Is there potential we
could confuse something, and accidentally compare a wildcard intended
for MD5 with SHA1 (assuming your using a wildcard "kind")?  Sure.
Would it go unnoticed for years to come?  No.  The wildcard matching
occurs extremely infrequently.  At the moment, it would only occur if
you edited the backend repository and insert the wildcard checksum in
place of what was there.  Chances are you'd end up with some other
sort of failure (probably from one of the tests) showing the error of
your ways because you were trying to compare an MD5 with a SHA1, for
real.

> To stay out of that kind of trouble, I'd suggest something like this:
>
> [[[
> const svn_checksum_t svn_checksum_wildcard_md5 =
>    { "00000000000000000000000000000000", svn_checksum_md5 };
> #define SVN_CHECKSUM_WILDCARD_MD5 (&svn_checksum_wildcard_md5)
>
> const svn_checksum_t svn_checksum_wildcard_sha1 =
>    { "00000000 00000000 00000000 00000000 00000000", svn_checksum_sha1 };
>
> #define SVN_CHECKSUM_WILDCARD_SHA1 (&svn_checksum_wildcard_sha1)
> ]]]
>
> Wherever the current code replaces an all-zero checksum with a NULL
> svn_checksum_t*, do this instead:
> [[[
>  if (checksum.kind == svn_checksum_md5 &&
>      strcmp(checksum->digest, SVN_CHECKSUM_WILDCARD_MD5->digest) == 0)
>    checksum = SVN_CHECKSUM_WILDCARD_MD5;  // replace with constant
> ]]]
> (And same for sha1)


I'm not sure you can do that, since you have to return a
'svn_checksum_t *' (a non-const pointer).

You could create a copy though.

At any rate, I'm not trying to stand guard for the original code.  I
was simply trying to answer your original question and show that NULL
does have a meaning. :-)  If you think this is a better way, go for
it! :-)  I certainly won't stand in your way.  Anything that makes the
current code more understandable is a Good Thing.  How far you want to
go with it is completely up to you. :-)

-John

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2326883


Re: checksum / compare_files question

Posted by Neels Janosch Hofmeyr <ne...@elego.de>.
John Szakmeister wrote:
> On Tue, May 19, 2009 at 7:59 AM, Stefan Sperling <st...@elego.de> wrote:
> [snip]
>> I'd say a better representation of a wildcard checksum is in order.
>> The API currently overloads the meaning of NULL, which I don't think
>> is a good idea. Whenever I see a NULL pointer, I associate that with
>> "uninitialized" and hence "not OK to use".
> 
> Yeah, I'm not typically a fan of that either, although I have to cope
> with that sort behavior quite often (mainly from 3rd party libraries)
> so it doesn't bother me much.
> 
>> I don't really care if it's done by setting pointers to NULL,
>> but then we should at least
>>
>> #define SVN_WILDCARD_CHECKSUM NULL
>>
>> and have the code say:
>>
>>   if (checksum1 == SVN_WILDCARD_CHECKSUM ||
>>      checksum2 == SVN_WILDCARD_CHECKSUM)
>>     return TRUE;
>>
>> Or going even further, we could add a "wildchard" checksum type.
>> Something like this:
> 
>> Index: subversion/include/svn_checksum.h
>> ===================================================================
>> --- subversion/include/svn_checksum.h   (revision 37767)
>> +++ subversion/include/svn_checksum.h   (working copy)
>> @@ -43,7 +43,11 @@ typedef enum
>>   svn_checksum_md5,
>>
>>   /** The checksum is (or should be set to) a SHA1 checksum. */
>> -  svn_checksum_sha1
>> +  svn_checksum_sha1,
>> +
>> +  /** The checksum is a wildcard.
>> +   * @since New in 1.7 */
>> +  svn_checksum_wildcard
>>  } svn_checksum_kind_t;
>>
> 
> [snip]
>> svn_checksum_parse_hex() would need a bump of course, but the diff
>> above shows just the docstring diff on purpose.
>>
>> Either approach would make the API more clear I think.
> 
> Explicit is better than implicit. :-)  FWIW, I think both ideas are
> great.  I kind of like the second suggestion more, since it officially
> makes the wildcard it's own "type".

I'd agree, but have one problem remaining: is "the all-zeros wildcard"
applicable to both SHA1 and MD5 checksums? Futhermore, is it then desired
that an MD5 wildcard matches a SHA1 checksum (and vice versa)?

The current code returns true for a wildcard checksum, even if the kinds
were originally different. Say if I give it an MD5 all-zeros and check it
against any SHA1 checksum, I still get a match.

To stay out of that kind of trouble, I'd suggest something like this:

[[[
const svn_checksum_t svn_checksum_wildcard_md5 =
    { "00000000000000000000000000000000", svn_checksum_md5 };
#define SVN_CHECKSUM_WILDCARD_MD5 (&svn_checksum_wildcard_md5)

const svn_checksum_t svn_checksum_wildcard_sha1 =
    { "00000000 00000000 00000000 00000000 00000000", svn_checksum_sha1 };

#define SVN_CHECKSUM_WILDCARD_SHA1 (&svn_checksum_wildcard_sha1)
]]]

Wherever the current code replaces an all-zero checksum with a NULL
svn_checksum_t*, do this instead:
[[[
  if (checksum.kind == svn_checksum_md5 &&
      strcmp(checksum->digest, SVN_CHECKSUM_WILDCARD_MD5->digest) == 0)
    checksum = SVN_CHECKSUM_WILDCARD_MD5;  // replace with constant
]]]
(And same for sha1)

Then, use these constants, but also reverse the order of the checks, i.e.
make the "kind" check before the wildcard check. Now, NULL checksums mean
"this is undefined".

[[[
svn_boolean_t
svn_checksum_match(const svn_checksum_t *checksum1,
                   const svn_checksum_t *checksum2)
{
  assert(checksum1 != NULL && checksum2 != NULL);

  if (checksum1->kind != checksum2->kind)
    return FALSE;

  // No need to explicitly check the kind here. Only checking for
  // identity with wildcard constants:
  if (checksum1 == SVN_CHECKSUM_WILDCARD_MD5 ||
      checksum2 == SVN_CHECKSUM_WILDCARD_MD5 ||
      checksum1 == SVN_CHECKSUM_WILDCARD_SHA1 ||
      checksum2 == SVN_CHECKSUM_WILDCARD_SHA1)
    return TRUE;
...
]]]

This way the wildcard checksum constants are literal structs -- a fixed and
reserved address in mem which, if anyone chose to print it to stdout, even
produces sensible output and doesn't crash anything.

Also, the currently overlapping meanings of "is not defined" vs. "is all
zeros" are separated.

Plus, the checksum kind is not lost by marking it a wildcard. It remains
clear whether this is a MD5 or SHA1 wildcard.

But, of course, this is quite some code bloat. The question remaining is
whether it is good to amount the absence of a checksum to a wildcard
checksum, and whether it is good to match a wildcard checksum of one kind to
a checksum of any other kind. If these are both good, we should merely
expand the comment in subversion/include/checksum.h to explain all of this
in detail. If either one of these is no good, I'd rather go all the way, as
in my example.

~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2320668

Re: checksum / compare_files question

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tue, May 19, 2009 at 7:59 AM, Stefan Sperling <st...@elego.de> wrote:
[snip]
> I'd say a better representation of a wildcard checksum is in order.
> The API currently overloads the meaning of NULL, which I don't think
> is a good idea. Whenever I see a NULL pointer, I associate that with
> "uninitialized" and hence "not OK to use".

Yeah, I'm not typically a fan of that either, although I have to cope
with that sort behavior quite often (mainly from 3rd party libraries)
so it doesn't bother me much.

> I don't really care if it's done by setting pointers to NULL,
> but then we should at least
>
> #define SVN_WILDCARD_CHECKSUM NULL
>
> and have the code say:
>
>   if (checksum1 == SVN_WILDCARD_CHECKSUM ||
>      checksum2 == SVN_WILDCARD_CHECKSUM)
>     return TRUE;
>
> Or going even further, we could add a "wildchard" checksum type.
> Something like this:

> Index: subversion/include/svn_checksum.h
> ===================================================================
> --- subversion/include/svn_checksum.h   (revision 37767)
> +++ subversion/include/svn_checksum.h   (working copy)
> @@ -43,7 +43,11 @@ typedef enum
>   svn_checksum_md5,
>
>   /** The checksum is (or should be set to) a SHA1 checksum. */
> -  svn_checksum_sha1
> +  svn_checksum_sha1,
> +
> +  /** The checksum is a wildcard.
> +   * @since New in 1.7 */
> +  svn_checksum_wildcard
>  } svn_checksum_kind_t;
>

[snip]
> svn_checksum_parse_hex() would need a bump of course, but the diff
> above shows just the docstring diff on purpose.
>
> Either approach would make the API more clear I think.

Explicit is better than implicit. :-)  FWIW, I think both ideas are
great.  I kind of like the second suggestion more, since it officially
makes the wildcard it's own "type".

-John

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2311595


Re: checksum / compare_files question

Posted by Stefan Sperling <st...@elego.de>.
On Tue, May 19, 2009 at 04:32:13AM -0400, John Szakmeister wrote:
> On Mon, May 18, 2009 at 10:16 PM, Neels Janosch Hofmeyr <ne...@elego.de> wrote:
> > Hi all,
> >
> > I came across this snippet of code:
> >
> > subversion/libsvn_subr/checksum.c:
> > [[[
> > svn_boolean_t
> > svn_checksum_match(const svn_checksum_t *checksum1,
> >                   const svn_checksum_t *checksum2)
> > {
> >  if (checksum1 == NULL || checksum2 == NULL)
> >    return TRUE;
> > ]]]
> >
> > I don't see how that makes sense. If it does, then the function name is
> > misleading. The comment from include/checksum.h is ... tricky:
> 
> It is a bit tricky, but the answer lies in svn_checksum_parse_hex().
> If the checksum provided is all zeros (e.g., if the string read in
> from node rep was '0000000000000000'), then svn_checksum_parse_hex()
> will actually return NULL.  IOW, NULL is the wildcard checksum even
> though it's a string of zeros in the rep.
> 
> I'd say that better documentation of that fact is in order, although
> I'm not sure where exactly that should be documented.

I'd say a better representation of a wildcard checksum is in order.
The API currently overloads the meaning of NULL, which I don't think
is a good idea. Whenever I see a NULL pointer, I associate that with
"uninitialized" and hence "not OK to use". 

I don't really care if it's done by setting pointers to NULL,
but then we should at least

#define SVN_WILDCARD_CHECKSUM NULL

and have the code say:

  if (checksum1 == SVN_WILDCARD_CHECKSUM ||
      checksum2 == SVN_WILDCARD_CHECKSUM)
    return TRUE;

Or going even further, we could add a "wildchard" checksum type.
Something like this:

Index: subversion/include/svn_checksum.h
===================================================================
--- subversion/include/svn_checksum.h	(revision 37767)
+++ subversion/include/svn_checksum.h	(working copy)
@@ -43,7 +43,11 @@ typedef enum
   svn_checksum_md5,
 
   /** The checksum is (or should be set to) a SHA1 checksum. */
-  svn_checksum_sha1
+  svn_checksum_sha1,
+
+  /** The checksum is a wildcard.
+   * @since New in 1.7 */
+  svn_checksum_wildcard
 } svn_checksum_kind_t;
 
 /**
@@ -128,8 +132,9 @@ svn_checksum_to_cstring(const svn_checksum_t *chec
 /** Parse the hex representation @a hex of a checksum of kind @a kind and
  * set @a *checksum to the result, allocating in @a pool.
  *
- * If @a hex is @c NULL or is the all-zeros checksum, then set @a *checksum
- * to @c NULL.
+ * If @a hex is @c NULL then set @a *checksum * to @c NULL.
+ * If @a hex is the all-zeros checksum, then set @a *checksum to
+ * a wildcard checksum instead of a checksum of the requested kind.
  *
  * @since New in 1.6.
  */

svn_checksum_parse_hex() would need a bump of course, but the diff
above shows just the docstring diff on purpose.

Either approach would make the API more clear I think.

Stefan


Re: checksum / compare_files question

Posted by John Szakmeister <jo...@szakmeister.net>.
On Mon, May 18, 2009 at 10:16 PM, Neels Janosch Hofmeyr <ne...@elego.de> wrote:
> Hi all,
>
> I came across this snippet of code:
>
> subversion/libsvn_subr/checksum.c:
> [[[
> svn_boolean_t
> svn_checksum_match(const svn_checksum_t *checksum1,
>                   const svn_checksum_t *checksum2)
> {
>  if (checksum1 == NULL || checksum2 == NULL)
>    return TRUE;
> ]]]
>
> I don't see how that makes sense. If it does, then the function name is
> misleading. The comment from include/checksum.h is ... tricky:

It is a bit tricky, but the answer lies in svn_checksum_parse_hex().
If the checksum provided is all zeros (e.g., if the string read in
from node rep was '0000000000000000'), then svn_checksum_parse_hex()
will actually return NULL.  IOW, NULL is the wildcard checksum even
though it's a string of zeros in the rep.

I'd say that better documentation of that fact is in order, although
I'm not sure where exactly that should be documented.

HTH!

-John

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2309057