You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/11/06 20:55:41 UTC

[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #5 from Mark Martinec <Ma...@ijs.si> 2009-11-06 11:55:37 UTC ---
Just in case people would worry about a change in performance, I did some
benchmarking. Generally, the sha1 in Digest::SHA has a higher call cost,
but then it runs faster on data, so this means that Digest::SHA is
better for digesting long strings (like generating message id from half
of the body text), while Digest::SHA1 is better at digesting many small
strings, like tokenization in bayes.

Probably the heaviest consumer of sha1 digesting is converting tokens to
hashes in Bayes.pm, so I instrumented it with some statistics gathering
and let it run on real mail, pouring into our mailer. Typically for one
mail message it needs to process 120..200 tokens, and occasionally more.
Typical average token length is between 9 and 14 characters.

The loop that goes through all the tokens and hashes them typically
takes about 1 ms per message, and a difference between Digest::SHA and
Digest::SHA1 is about 0.2 ms per message. So, nothing to worry about,
hardly measurable. If one would want to squeeze out the last drop,
a MD5 hashing should have been be used.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.