You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/06/28 08:58:31 UTC

Digest::SHA1 quandary

So, given the performance numbers for Bayes when Digest::SHA1 is not
present and we use our own perl implementation, I think we have four
good options for 3.0:

  - ship with a copy of Digest::SHA1 or equivalent XS implementation
    of SHA1 digest
  - ship with a copy of something else which is just as fast such as
    FNV-1 or MD5 (if it makes sense)
  - don't do Bayes without Digest::SHA1
  - don't do SHA1 in Bayes without Digest::SHA1

I think the current option of horribly slow Bayes without Digest::SHA1
is unacceptable.

There's the related issue of the extra function call overhead, it would
be nice to get rid of that.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Digest::SHA1 quandary

Posted by Sidney Markowitz <si...@sidney.com>.
Daniel Quinlan wrote:
> I think this makes the best case for requiring
> Digest::SHA1 and not using FNV-1

I always meant that part of deciding to use FNV-1 would be for me to 
submit an xs version of it as Digest::FNV to CPAN (which I intend to do 
anyway). But that isn't relevant given that there is not enough 
compelling reason to switch hash functions. I'm +1 on requiring 
Digest::SHA1.

  -- sidney


Re: Digest::SHA1 quandary

Posted by Daniel Quinlan <qu...@pathname.com>.
Sidney Markowitz <si...@sidney.com> writes:

> If we ship an xs version of SHA1, why is that so much easier on people 
> than getting a copy from CPAN? We don't ship every other CPAN module 
> that we require. I'm uneasy about supporting a native code module. spamc 
> is bad enough. We can't assume people not on linux/unix platforms have a 
> C compiler.

I think this makes the best case for requiring Digest::SHA1 and not
using FNV-1, so I'll submit a bug to that effect.
 
> I'm in favor of just requiring the module. It's used in the Habeas eval 
> test and the hashcash plugin as well as in Bayes.

Agreed.  We could just require it for those tests or we could simply
just require it, I favor the latter.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Digest::SHA1 quandary

Posted by Sidney Markowitz <si...@sidney.com>.
The only way to get the speed is to have an xs version of a hash 
function. FNV-1 will not help if it is pure perl.

The profile seems to show that the xs version of SHA1 is fast enough 
that there is little reason to change the database format in order to 
use FNV-1. I can't think of any advantage that MD5 has over FNV-1 except 
that an MD5 xs module exists in CPAN already. MD5 would give us even 
less performance improvement over SHA1 than FNV-1.

If we ship an xs version of SHA1, why is that so much easier on people 
than getting a copy from CPAN? We don't ship every other CPAN module 
that we require. I'm uneasy about supporting a native code module. spamc 
is bad enough. We can't assume people not on linux/unix platforms have a 
  C compiler.

I'm in favor of just requiring the module. It's used in the Habeas eval 
test and the hashcash plugin as well as in Bayes.

  -- sidney

Re: Digest::SHA1 quandary

Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Monday 28 June 2004 09:06 CET Daniel Quinlan wrote:
> Daniel Quinlan <qu...@pathname.com> writes:
> > So, given the performance numbers for Bayes when Digest::SHA1 is not
> > present and we use our own perl implementation, I think we have four
> > good options for 3.0:
> >
> >   - ship with a copy of Digest::SHA1 or equivalent XS implementation
> >     of SHA1 digest

-1

> >   - ship with a copy of something else which is just as fast such as
> >     FNV-1 or MD5 (if it makes sense)

-1

> >   - don't do Bayes without Digest::SHA1

+0.9

> >   - don't do SHA1 in Bayes without Digest::SHA1

Dunno if thats possible.

> Oh, I forgot to mention one obvious option that precedes these:
>
>    - simply require Digest::SHA1

+1

Cheers,
Malte

-- 
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
      <http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
      <http://www.catb.org/~esr/faqs/smart-questions.html>

Re: Digest::SHA1 quandary

Posted by Michael Parker <pa...@pobox.com>.
I think this will be an echo of what's already been said, but thought
I would give my $.02 anyway.

On Mon, Jun 28, 2004 at 12:06:16AM -0700, Daniel Quinlan wrote:
> Daniel Quinlan <qu...@pathname.com> writes:
> 
> > So, given the performance numbers for Bayes when Digest::SHA1 is not
> > present and we use our own perl implementation, I think we have four
> > good options for 3.0:
> > 
> >   - ship with a copy of Digest::SHA1 or equivalent XS implementation
> >     of SHA1 digest
> >   - ship with a copy of something else which is just as fast such as
> >     FNV-1 or MD5 (if it makes sense)

Any thing we include will most likely need to be compiled.  As we just
saw with the spamc stuff, this is a bad thing.

> >   - don't do Bayes without Digest::SHA1

Yes, but what a pain to check and turn off if it's not there, and oh
the support questions.

> >   - don't do SHA1 in Bayes without Digest::SHA1

A headache I don't even want to consider, even worse support
implications, people will have mixed databases, different formats, etc
etc.  I don't mind eventually providing the optional token, but we
should stick to the hash as the primary token storage.

> 
> Oh, I forgot to mention one obvious option that precedes these:
> 
>    - simply require Digest::SHA1 
> 

I'm all over this.  It offers the best of all worlds.

It is a C/XS implementation, but for Windows folks without a compiler
they can ppm install (I just checked and it appears to be available).

Razor appears to use it, so it may already be installed if folks have
already installed razor.

We can get rid of the pureperl implementations.

So, +1 for requiring Digest::SHA1.

Michael

Re: Digest::SHA1 quandary

Posted by Daniel Quinlan <qu...@pathname.com>.
Daniel Quinlan <qu...@pathname.com> writes:

> So, given the performance numbers for Bayes when Digest::SHA1 is not
> present and we use our own perl implementation, I think we have four
> good options for 3.0:
> 
>   - ship with a copy of Digest::SHA1 or equivalent XS implementation
>     of SHA1 digest
>   - ship with a copy of something else which is just as fast such as
>     FNV-1 or MD5 (if it makes sense)
>   - don't do Bayes without Digest::SHA1
>   - don't do SHA1 in Bayes without Digest::SHA1

Oh, I forgot to mention one obvious option that precedes these:

   - simply require Digest::SHA1 

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/