You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Sander Roobol <ph...@wanadoo.nl> on 2003/04/25 17:39:24 UTC

Performance: MD4 vs MD5

When it came to my attention that Subversion uses MD5 for it's checksums, I
suddenly remembered that MD5 was designed as a safer but slower version of MD4.
This is what RFC 1321 (MD5 Message-Digest Algorithm) says:

  "MD5 is slightly slower than MD4, but is more "conservative" in design. (...)
  because MD4 was designed to be exceptionally fast, it is "at the edge" in
  terms of risking successful cryptanalytic attack. MD5 backs off a bit, giving
  up a little in speed for a much greater likelihood of ultimate security."

As the RFC is not very clear about MD5's performance, I searched the web for
some benchmarks. This one is about a python module that needed MD4 and/or MD5,
on http://minkirri.apana.org.au/~abo/projects/pysync/swf/librsync.diary

  "Quick benchmark of md4 vs md5 using 256K sums of 1K blocks random data. I
  used native md4 and [md5] modules on a Cel-366. Results; md4 12.8secs, md5
  15.6secs."

I also found an archived thread comparing MD5 to SHA-1:

Re: Performance: MD4 vs MD5

Posted by Branko Čibej <br...@xbc.nu>.
Sander Roobol wrote:

>When it came to my attention that Subversion uses MD5 for it's checksums, I
>suddenly remembered that MD5 was designed as a safer but slower version of MD4.
>This is what RFC 1321 (MD5 Message-Digest Algorithm) says:
>
>  "MD5 is slightly slower than MD4, but is more "conservative" in design. (...)
>  because MD4 was designed to be exceptionally fast, it is "at the edge" in
>  terms of risking successful cryptanalytic attack. MD5 backs off a bit, giving
>  up a little in speed for a much greater likelihood of ultimate security."
>
>As the RFC is not very clear about MD5's performance, I searched the web for
>some benchmarks. This one is about a python module that needed MD4 and/or MD5,
>on http://minkirri.apana.org.au/~abo/projects/pysync/swf/librsync.diary
>
>  "Quick benchmark of md4 vs md5 using 256K sums of 1K blocks random data. I
>  used native md4 and [md5] modules on a Cel-366. Results; md4 12.8secs, md5
>  15.6secs."
>
>I also found an archived thread comparing MD5 to SHA-1:
From http://www.sandelman.ottawa.on.ca/ipsec/1996/06/msg00003.html
>Someone did some benchmarks and included MD4 too:
>
>  "Performance in Megabytes per Second on a 90 MHz Pentium
>     MD4    MD5   SHA-1   RIPEMD RIPEMD-128 RIPEMD-160
>    20.9   14.2    6.1     10.3     8.0       5.0"
>
>I ran some benchmarks too, on an AMD XP 1700+ and an Intel Pentium 4 1700MHz,
>using openssl(1). The AMD box runs OpenSSL 0.9.7a on Debian Sid, the Pentium
>runs OpenSSL 0.9.6b on Red Hat 7.2.
>The table below shows the time in milliseconds needed to checksum a 100MB file
>(cached in memory) filled with random garbage.
>I also ran md5sum(1) on that file, and that gives some surprising results.
>Apart from a clear speed difference between MD4 and MD5, there also appears to
>be a quite significant difference between various MD5 implementations.
>
>             MD4   MD5   md5sum(1)
>  XP 1800+   550   600   1100
>  P4 1700    805   1040  4950
>
>
>These three benchmarks together show that MD4 is approximately 10 to 25% faster
>than MD5. I can't imagine that subversion needs cryptographic safety of it's
>checksums, although I don't exactly know what the checksumming code does. So
>my question is, why is subversion using MD5?
>  
>

Heh, another case of misplaced bechmarking. :-)

What you really want to measure is the percentage of time spent in
checksum calculation during checkout/update/commit. I'd guess it's on
the order of several % -- in other words, not worht optimizing until we
have a much faster server, client and libsvn_wc.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Performance: MD4 vs MD5

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
> 
>
>Yeah, a CRC32 checksum would probably be fine, really.  But given all
>the lossage we saw going to checksums in the first place, I don't
>imagine we want to change the algorithm now.
>
>  
>
I totally agree!

gat


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Performance: MD4 vs MD5

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2003-04-25 at 14:07, Glenn A. Thompson wrote:
> Hun, the check sums are stored per representation (file).  Who cares if 
> two different files have the same check sum.  I thoiught the point is to 
> detect if unexpected changes have occured in the storage of a specific 
> version of a file.

Yeah, a CRC32 checksum would probably be fine, really.  But given all
the lossage we saw going to checksums in the first place, I don't
imagine we want to change the algorithm now.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Performance: MD4 vs MD5

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
> 
>
>Your data are *extremely* old.  The problem with MD4 is that it's easy
>to generate collisions, to the point where it's likely to be a practical
>concern for a repository storing a fairly large number of files.  That
>trumps any performance concerns.
>  
>
Hun, the check sums are stored per representation (file).  Who cares if 
two different files have the same check sum.  I thoiught the point is to 
detect if unexpected changes have occured in the storage of a specific 
version of a file.  i.e. has the DB/hardware stepped on our data 
somewhere along the line.  'dem bits can be sitting around for a long 
time between accesses:-)

gat


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Performance: MD4 vs MD5

Posted by Bryan O'Sullivan <bo...@serpentine.com>.
On Fri, 2003-04-25 at 10:39, Sander Roobol wrote:
> When it came to my attention that Subversion uses MD5 for it's checksums, I
> suddenly remembered that MD5 was designed as a safer but slower version of MD4.
> This is what RFC 1321 (MD5 Message-Digest Algorithm) says:

Your data are *extremely* old.  The problem with MD4 is that it's easy
to generate collisions, to the point where it's likely to be a practical
concern for a repository storing a fairly large number of files.  That
trumps any performance concerns.

Nobody uses MD5 for cryptographic integrity in new software, so that's
not really an issue.

	<b


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Performance: MD4 vs MD5

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Apr 25, 2003 at 10:52:29AM -0700, Justin Erenkrantz wrote:
> --On Friday, April 25, 2003 19:47:53 +0200 Sander Striker 
> <st...@apache.org> wrote:
> 
> >>safety of it's checksums, although I don't exactly know what the
> >>checksumming code does. So my question is, why is subversion using MD5?
> >
> >Because we do have apr_md5_xxx, but don't have apr_md4_xxx :)
> 
> It does have it.  -- justin

$ cvs log -r1.1 apr_md4.c
...
----------------------------
revision 1.1
date: 2001/06/01 22:34:06;  author: jerenkrantz;  state: Exp;
MD4 implementation based on code sample in RFC 1320.  The appropriate
copyright notices should be present (ASF and RSA).

This patch is a modified version of one submitted by Sander Striker.

Obtained from: RFC 1320 / RSA Data Security, Inc.
Submitted by:  Sander Striker <st...@samba-tng.org>
Reviewed by:   Justin Erenkrantz (applied in a modified form)
=============================================================================


*snicker*

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Performance: MD4 vs MD5

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Friday, April 25, 2003 19:47:53 +0200 Sander Striker 
<st...@apache.org> wrote:

>> safety of it's checksums, although I don't exactly know what the
>> checksumming code does. So my question is, why is subversion using MD5?
>
> Because we do have apr_md5_xxx, but don't have apr_md4_xxx :)

It does have it.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Performance: MD4 vs MD5

Posted by Sander Striker <st...@apache.org>.
> From: Sander Roobol [mailto:phy@wanadoo.nl]
> Sent: Friday, April 25, 2003 7:39 PM

[...]
> These three benchmarks together show that MD4 is approximately 10 to 25% faster
> than MD5. I can't imagine that subversion needs cryptographic safety of it's
> checksums, although I don't exactly know what the checksumming code does. So
> my question is, why is subversion using MD5?

Because we do have apr_md5_xxx, but don't have apr_md4_xxx :)

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org