You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by pe...@apache.org on 2010/12/05 10:16:53 UTC

svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Author: peters
Date: Sun Dec  5 09:15:42 2010
New Revision: 1042294

URL: http://svn.apache.org/viewvc?rev=1042294&view=rev
Log:
Optimize and simplify svn_checksum_parse_hex().  In my tests, the new
version is 20% to 30% faster.  The EBCDIC port would probably need an
#ifdef here.

Also happens to fix the case of input with uppercase A-F characters.
We neither parsed them correctly nor rejected them as invalid.
Now we accept _and_ parse them correctly.

* subversion/libsvn_subr/checksum.c
  (svn_checksum_parse_hex): Rewrite to do a single table lookup to find
   the validity and value of a hex byte, rather than two lookups
   (svn_ctype_isxdigit, svn_ctype_isalpha) plus some ASCII math.
   Also, don't bother to qualify char as unsigned except where actually
   needed, and rename is_zeros to is_nonzero to match its meaning.

Modified:
    subversion/trunk/subversion/libsvn_subr/checksum.c

Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/checksum.c?rev=1042294&r1=1042293&r2=1042294&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/checksum.c (original)
+++ subversion/trunk/subversion/libsvn_subr/checksum.c Sun Dec  5 09:15:42 2010
@@ -193,9 +193,27 @@ svn_checksum_parse_hex(svn_checksum_t **
                        const char *hex,
                        apr_pool_t *pool)
 {
-  int len;
-  int i;
-  unsigned char is_zeros = '\0';
+  int i, len;
+  char is_nonzero = '\0';
+  static const char xdigitval[256] =
+    {
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+       0, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,   /* 0-9 */
+      -1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,   /* A-F */
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,10,11,12,13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,   /* a-f */
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
+    };
 
   if (hex == NULL)
     {
@@ -210,19 +228,15 @@ svn_checksum_parse_hex(svn_checksum_t **
 
   for (i = 0; i < len; i++)
     {
-      if ((! svn_ctype_isxdigit(hex[i * 2])) ||
-          (! svn_ctype_isxdigit(hex[i * 2 + 1])))
+      char x1 = xdigitval[(unsigned char)hex[i * 2]];
+      char x2 = xdigitval[(unsigned char)hex[i * 2 + 1]];
+      if (x1 == (char)-1 || x2 == (char)-1)
         return svn_error_create(SVN_ERR_BAD_CHECKSUM_PARSE, NULL, NULL);
 
-      ((unsigned char *)(*checksum)->digest)[i] =
-        ((svn_ctype_isalpha(hex[i*2]) ? hex[i*2] - 'a' + 10
-                                      : hex[i*2] - '0') << 4) |
-        (svn_ctype_isalpha(hex[i*2+1]) ? hex[i*2+1] - 'a' + 10
-                                       : hex[i*2+1] - '0');
-      is_zeros |= (*checksum)->digest[i];
+      is_nonzero |= ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
     }
 
-  if (is_zeros == '\0')
+  if (!is_nonzero)
     *checksum = NULL;
 
   return SVN_NO_ERROR;



Re: svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Dec 05, 2010 at 05:14:20AM -0600, Peter Samuelson wrote:
> 
> [Stefan Sperling]
> > > -      is_zeros |= (*checksum)->digest[i];
> > > +      is_nonzero |= ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
> > 
> > At the very least, this needs parenthesis unless you want everyone
> > to pull out their copy of K&R to check operator precedence rules.
> 
> Can do.  x1 << 4 | x2 didn't seem too ambiguous to me, but I can see
> how it might be.
> 
> > Can we do one assignment per line instead? Maybe that's easier to parse.
> 
> Hmmm.
> 
>       ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
>       is_nonzero |= ((char *)(*checksum)->digest)[i];
> 
> ...The repeated expression is complex, so you have to visually verify
> that it is indeed identical.  That seems harder, to me.  Or did you
> mean:
> 
>       is_nonzero |= 
>         ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
> 
> ...Which doesn't seem much easier to parse either.

What about something like this (pseudo code):

      x = x1 << 4 | x2;
      ((char *)(*checksum)->digest)[i] = x;
      is_nonzero |= x;

Stefan

Re: svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Posted by Peter Samuelson <pe...@p12n.org>.
[Daniel Shahaf]
>     char *digest = (char *) checksum->digest;
>     for (i = 0; ...)
>       {
>       	is_nonzero |= (x1 | x2);
>       	digest[i] = (x1 << 4) | x2;
>       }

r1042319.  I left both lines as (x1 << 4) | x2, even though is_nonzero
doesn't need the << 4, so that the compiler will only calculate a
single expression.

Re: svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Peter Samuelson wrote on Sun, Dec 05, 2010 at 05:14:20 -0600:
> 
> [Stefan Sperling]
> > > -      is_zeros |= (*checksum)->digest[i];
> > > +      is_nonzero |= ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
> > 
> > At the very least, this needs parenthesis unless you want everyone
> > to pull out their copy of K&R to check operator precedence rules.
> 
> Can do.  x1 << 4 | x2 didn't seem too ambiguous to me, but I can see
> how it might be.
> 
> > Can we do one assignment per line instead? Maybe that's easier to parse.
> 
> Hmmm.
> 
>       ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
>       is_nonzero |= ((char *)(*checksum)->digest)[i];
> 
> ...The repeated expression is complex, so you have to visually verify
> that it is indeed identical.  That seems harder, to me.  Or did you
> mean:
> 
>       is_nonzero |= 
>         ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
> 

    char *digest = (char *) checksum->digest;
    for (i = 0; ...)
      {
      	is_nonzero |= (x1 | x2);
      	digest[i] = (x1 << 4) | x2;
      }

> ...Which doesn't seem much easier to parse either.
> 
> Peter

Re: svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Posted by Peter Samuelson <pe...@p12n.org>.
[Stefan Sperling]
> > -      is_zeros |= (*checksum)->digest[i];
> > +      is_nonzero |= ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
> 
> At the very least, this needs parenthesis unless you want everyone
> to pull out their copy of K&R to check operator precedence rules.

Can do.  x1 << 4 | x2 didn't seem too ambiguous to me, but I can see
how it might be.

> Can we do one assignment per line instead? Maybe that's easier to parse.

Hmmm.

      ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;
      is_nonzero |= ((char *)(*checksum)->digest)[i];

...The repeated expression is complex, so you have to visually verify
that it is indeed identical.  That seems harder, to me.  Or did you
mean:

      is_nonzero |= 
        ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;

...Which doesn't seem much easier to parse either.

Peter

Re: svn commit: r1042294 - /subversion/trunk/subversion/libsvn_subr/checksum.c

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Dec 05, 2010 at 09:16:53AM -0000, peters@apache.org wrote:
> Author: peters
> Date: Sun Dec  5 09:15:42 2010
> New Revision: 1042294
> 
> URL: http://svn.apache.org/viewvc?rev=1042294&view=rev
> Log:
> Optimize and simplify svn_checksum_parse_hex().  In my tests, the new
> version is 20% to 30% faster.  The EBCDIC port would probably need an
> #ifdef here.
> 
> Also happens to fix the case of input with uppercase A-F characters.
> We neither parsed them correctly nor rejected them as invalid.
> Now we accept _and_ parse them correctly.
> 
> * subversion/libsvn_subr/checksum.c
>   (svn_checksum_parse_hex): Rewrite to do a single table lookup to find
>    the validity and value of a hex byte, rather than two lookups
>    (svn_ctype_isxdigit, svn_ctype_isalpha) plus some ASCII math.
>    Also, don't bother to qualify char as unsigned except where actually
>    needed, and rename is_zeros to is_nonzero to match its meaning.
> 
> Modified:
>     subversion/trunk/subversion/libsvn_subr/checksum.c
 
> -      is_zeros |= (*checksum)->digest[i];
> +      is_nonzero |= ((char *)(*checksum)->digest)[i] = x1 << 4 | x2;

At the very least, this needs parenthesis unless you want everyone
to pull out their copy of K&R to check operator precedence rules.

Can we do one assignment per line instead? Maybe that's easier to parse.

Stefan