You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by David Mankin <ma...@ants.com> on 2002/12/06 03:06:44 UTC

[PATCH] strlen() optimizations (1)

A couple weeks ago, someone was doing profiling on the code.  It was 
briefly claimed (and later discounted) that strlen() was causing a 
noticeable slow-down.  So, I went through the code base and looked at 
each use of strlen() to find out if it was necessary.  It turns out the 
svn code is not doing very many excessive strlen calls.  But there are 
a few.  Find below the first strlen optimization patches.  Probably, in 
the grand scheme of things, it's not worth doing this optimization.  
However, since I've already done it, I say it should be reviewed and 
applied. :-)

These all pass make check for ra_local, but I have no way to run 
ra_dav.  So please review the svn_ra_dav/fetch.c change extra carefully.

-David Mankin

A few STRLEN optimizations.

    * cmdline/feedback.c (notify): removed redundant strlen() call.

    * auth.c (get_password): removed unnecessary strlen() call.

    * svn_string.h, svn_string.c (svn_cstring_ensure_len,
      svn_cstring_strlencmp): new functions

    * fetch.c (svn_ra_dav__do_checkout): use new svn_cstring_strlencmp
      function for faster string length comparison.

Index: subversion/include/svn_string.h
===================================================================
--- subversion/include/svn_string.h	(revision 3951)
+++ subversion/include/svn_string.h	(working copy)
@@ -262,6 +262,24 @@
                                 svn_boolean_t chop_whitespace,
                                 apr_pool_t *pool);

+/* Checks to see if cstring S is at least LEN characters.
+ * Better than calling strlen because it only compares at most
+ * LEN characters.
+ *
+ * Returns TRUE when S is at least LEN characters, FALSE otherwise.
+ */
+svn_boolean_t svn_cstring_ensure_len (const char *s,
+                                      apr_size_t len);
+
+
+/* Returns 1 if A is longer than B, -1 if B is longer than A, and
+ * 0 if A and B are the same length.
+ *
+ * This is better than using (strlen(a) > strlen(b)) because only
+ * O(min(len(a), len(b))) chars are examined instead of O(len(a) + 
len(b))
+ */
+int svn_cstring_strlencmp (const char* a,
+                           const char* b);

  
  #ifdef __cplusplus
Index: subversion/libsvn_subr/svn_string.c
===================================================================
--- subversion/libsvn_subr/svn_string.c	(revision 3951)
+++ subversion/libsvn_subr/svn_string.c	(working copy)
@@ -579,3 +579,35 @@
    svn_cstring_split_append (a, input, sep_chars, chop_whitespace, 
pool);
    return a;
  }
+
+
+svn_boolean_t
+svn_cstring_ensure_len (const char *s, apr_size_t len)
+{
+  apr_size_t i;
+  for (i = 0 ; i < len ; i ++)
+    {
+      if (s[i] == '\0')
+        return FALSE;
+    }
+  return TRUE;
+}
+
+int
+svn_cstring_strlencmp (const char* a, const char* b)
+{
+  apr_size_t i;
+
+  for (i = 0 ; ; i++)
+    {
+      if ((b[i] == '\0') && (a[i] != '\0')) /* B is shorter */
+        return 1;
+      if (a[i] == '\0')
+        {
+          if (b[i] != '\0')                 /* A is shorter */
+            return -1;
+          else                              /* must be they're equal */
+            return 0;
+        }
+    }
+}
Index: subversion/libsvn_client/auth.c
===================================================================
--- subversion/libsvn_client/auth.c	(revision 3951)
+++ subversion/libsvn_client/auth.c	(working copy)
@@ -149,7 +149,7 @@
    svn_client__callback_baton_t *cb = baton;
    svn_client_auth_baton_t *ab = cb->auth_baton;

-  if (strlen(username) > 0)
+  if (*username)
      prompt = apr_psprintf (pool, "%s's password: ", username);
    else
      prompt = apr_psprintf (pool, "password: ");
Index: subversion/clients/cmdline/feedback.c
===================================================================
--- subversion/clients/cmdline/feedback.c	(revision 3951)
+++ subversion/clients/cmdline/feedback.c	(working copy)
@@ -191,7 +191,6 @@

      case svn_wc_notify_commit_added:
        if (mime_type
-          && ((strlen (mime_type)) > 5)
            && ((strncmp (mime_type, "text/", 5)) != 0))
          printf ("Adding  (bin)  %s\n", path_native);
        else
Index: subversion/libsvn_ra_dav/fetch.c
===================================================================
--- subversion/libsvn_ra_dav/fetch.c	(revision 3951)
+++ subversion/libsvn_ra_dav/fetch.c	(working copy)
@@ -1224,7 +1224,7 @@
              }
          }

-      if (strlen(url) > strlen(bc_root))
+      if (svn_cstring_strlencmp(url, bc_root) > 0)
          {
            const char *comp;
            comp = svn_path_uri_decode(svn_path_basename(url, subpool),


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] strlen() optimizations (1)

Posted by Brandon Ehle <az...@yahoo.com>.

> 
>
>>A couple weeks ago, someone was doing profiling on the code.  It was
>>briefly claimed (and later discounted) that strlen() was causing a
>>noticeable slow-down.  So, I went through the code base and looked at
>>each use of strlen() to find out if it was necessary.  It turns out
>>the svn code is not doing very many excessive strlen calls.  But there
>>are a few.  Find below the first strlen optimization patches. 
>>Probably, in the grand scheme of things, it's not worth doing this
>>optimization.  However, since I've already done it, I say it should be
>>reviewed and applied. :-)
>>    
>>
>
>Well, here's the review, and it should not be applied (except possibly
>the one correct change).
>  
>
I don't think any of these changes should be noticeably faster.  The 
reason why strlen is showing up in your profile is not because strlen is 
slow, but because the data strlen is examining probably does not reside 
in the cache.  If you optimize the strlen out, the slowdown should show 
up in the function instead of strlen, and may give the appearance that 
its faster, when in fact the whole execution time may not have changed.

>>+/* Returns 1 if A is longer than B, -1 if B is longer than A, and
>>+ * 0 if A and B are the same length.
>>+ *
>>+ * This is better than using (strlen(a) > strlen(b)) because only
>>+ * O(min(len(a), len(b))) chars are examined instead of O(len(a) +
>>len(b))
>>
>>    
>>
Chances are this won't be much faster if at all, because a cacheline 
usually can encompass the entire string.  You may see some speed 
increase with this, but if that occurs, the better way to fix this is to 
fix apr's malloc alignment so the string resides entirely within a 
cacheline.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] strlen() optimizations (1)

Posted by Branko Čibej <br...@xbc.nu>.

David Mankin wrote:

> Well, thanks to Brane's comments I did some benchmarks on my strlencmp
> method.  I don't have svn setup so I can get realistic measurements of
> the function in real life, so I tested it in isolation.

Heh. Those results don't mean a thing. You have to measure how the code
behaves in the program you're actually trying to improve. When you test
things in isolation, different cache behaviour will skew the results,
often by so much that the results don't mean anything any more.

I got burned by this ince myself. I had a huge progem (1 Mloc) where bad
string handling used up about 25% of a crirical operation. So I wrote
"faster" macros and indeed, isolated measurements showed speedups of 2x
to 10x. So I spent two weeks reworking all the string handling code in
the critical path and guess what? After two weeks, I had shortened the
execution time by less than 2%, which was well within the error margin.
And of course, the code was less readable afterwards.

Measure first -- and always with real data.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] strlen() optimizations (1)

Posted by David Mankin <ma...@ants.com>.

On Thursday, December 5, 2002, at 07:33  PM, Branko Čibej wrote:
>> +/* Returns 1 if A is longer than B, -1 if B is longer than A, and
>> + * 0 if A and B are the same length.
>> + *
>> + * This is better than using (strlen(a) > strlen(b)) because only
>> + * O(min(len(a), len(b))) chars are examined instead of O(len(a) +
>> len(b))
>> + */
>> +int svn_cstring_strlencmp (const char* a,
>> +                           const char* b);
>
> It's a common misconception that hand-coded functions will be faster
> than what the compiler provides. Modern compilers will generate highly
> optimized inline code for string functions that uses word-length
> instructions where possible. So it's entirely possible, expecially
> taking function call overhead into account, that your functions are
> actually slower than what was being used now. You should _always_
> measure the effects of  your "optimizations", even though they might
> seem oviously correct to you.

Well, thanks to Brane's comments I did some benchmarks on my strlencmp 
method.  I don't have svn setup so I can get realistic measurements of 
the function in real life, so I tested it in isolation.  I can post 
more details if anyone is really interested, but here is the summary:

If min(len(a), len(b)) <= 6, my strlencmp() is always faster (by up to 
150%).   If min(len(a), len(b)) >= 17, two calls to strlen() is always 
faster (by up to 100%).  11-12 seems to be the time when either one is 
about as likely to be faster as slower.  (This is on my Powerbook 
G4/667MHz, Mac OS X 10.2.2, gcc -O2 version 3.1.)

I could put checks in strlencmp to call strlen() once it crosses the 
short-string threshold, but since I don't know how real data fits into 
these length requirements, I don't know how they change on different 
architectures/compilers/C libraries, and I doubt that a even 150% 
speedup in the few places where we're checking to see which string is 
longer will make any noticeable difference, I withdraw my patch to add 
strlencmp.

I'll fix my other problem (thanks to Brane & C-Mike for pointing out 
the problem and to C-Mike for a suggestion for repair) and resubmit.

>> Index: subversion/libsvn_ra_dav/fetch.c
>> ===================================================================
>> --- subversion/libsvn_ra_dav/fetch.c    (revision 3951)
>> +++ subversion/libsvn_ra_dav/fetch.c    (working copy)
>> @@ -1224,7 +1224,7 @@
>>              }
>>          }
>>
>> -      if (strlen(url) > strlen(bc_root))
>> +      if (svn_cstring_strlencmp(url, bc_root) > 0)
>>          {
>>            const char *comp;
>>            comp = svn_path_uri_decode(svn_path_basename(url, subpool),
>
> See above. -1 on committing things like this until you can prove that
> your change actually makes a measurable difference -- for the better.
>
> -- 
> Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

-David Mankin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] strlen() optimizations (1)

Posted by Branko Čibej <br...@xbc.nu>.

David Mankin wrote:

> A couple weeks ago, someone was doing profiling on the code.  It was
> briefly claimed (and later discounted) that strlen() was causing a
> noticeable slow-down.  So, I went through the code base and looked at
> each use of strlen() to find out if it was necessary.  It turns out
> the svn code is not doing very many excessive strlen calls.  But there
> are a few.  Find below the first strlen optimization patches. 
> Probably, in the grand scheme of things, it's not worth doing this
> optimization.  However, since I've already done it, I say it should be
> reviewed and applied. :-)

Well, here's the review, and it should not be applied (except possibly
the one correct change).

> Index: subversion/include/svn_string.h
> ===================================================================
> --- subversion/include/svn_string.h    (revision 3951)
> +++ subversion/include/svn_string.h    (working copy)
> @@ -262,6 +262,24 @@
>                                 svn_boolean_t chop_whitespace,
>                                 apr_pool_t *pool);
>
> +/* Checks to see if cstring S is at least LEN characters.
> + * Better than calling strlen because it only compares at most
> + * LEN characters.
> + *
> + * Returns TRUE when S is at least LEN characters, FALSE otherwise.
> + */
> +svn_boolean_t svn_cstring_ensure_len (const char *s,
> +                                      apr_size_t len);
> +
> +
> +/* Returns 1 if A is longer than B, -1 if B is longer than A, and
> + * 0 if A and B are the same length.
> + *
> + * This is better than using (strlen(a) > strlen(b)) because only
> + * O(min(len(a), len(b))) chars are examined instead of O(len(a) +
> len(b))
> + */
> +int svn_cstring_strlencmp (const char* a,
> +                           const char* b);

It's a common misconception that hand-coded functions will be faster
than what the compiler provides. Modern compilers will generate highly
optimized inline code for string functions that uses word-length
instructions where possible. So it's entirely possible, expecially
taking function call overhead into account, that your functions are
actually slower than what was being used now. You should _always_
measure the effects of  your "optimizations", even though they might
seem oviously correct to you.

> Index: subversion/libsvn_client/auth.c
> ===================================================================
> --- subversion/libsvn_client/auth.c    (revision 3951)
> +++ subversion/libsvn_client/auth.c    (working copy)
> @@ -149,7 +149,7 @@
>    svn_client__callback_baton_t *cb = baton;
>    svn_client_auth_baton_t *ab = cb->auth_baton;
>
> -  if (strlen(username) > 0)
> +  if (*username)
>      prompt = apr_psprintf (pool, "%s's password: ", username);
>    else
>      prompt = apr_psprintf (pool, "password: ");

This change is correct.

> Index: subversion/clients/cmdline/feedback.c
> ===================================================================
> --- subversion/clients/cmdline/feedback.c    (revision 3951)
> +++ subversion/clients/cmdline/feedback.c    (working copy)
> @@ -191,7 +191,6 @@
>
>      case svn_wc_notify_commit_added:
>        if (mime_type
> -          && ((strlen (mime_type)) > 5)
>            && ((strncmp (mime_type, "text/", 5)) != 0))
>          printf ("Adding  (bin)  %s\n", path_native);
>        else

This one, OTOH, is _wrong_. Your change will match "text/", which is
_not_ a valid MIME type!

> Index: subversion/libsvn_ra_dav/fetch.c
> ===================================================================
> --- subversion/libsvn_ra_dav/fetch.c    (revision 3951)
> +++ subversion/libsvn_ra_dav/fetch.c    (working copy)
> @@ -1224,7 +1224,7 @@
>              }
>          }
>
> -      if (strlen(url) > strlen(bc_root))
> +      if (svn_cstring_strlencmp(url, bc_root) > 0)
>          {
>            const char *comp;
>            comp = svn_path_uri_decode(svn_path_basename(url, subpool),

See above. -1 on committing things like this until you can prove that
your change actually makes a measurable difference -- for the better.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org