You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Michael Parker <pa...@pobox.com> on 2005/08/03 03:40:57 UTC

Floating a Possible BayesSQL Code Change for 3.1

Howdy,

Just wanted to float this idea.  With the help of Matthew Schumacher
I've worked up a patch for BayesSQL that results in a fairly large speed
boost for the SQL based configurations.  I'm still running baseline and
new code benchmarks, but rough estimates put the speedup in the
neighborhood of 2x for MySQL and 4x for PostgreSQL.

The problem is that it involves some changes to _put_tokens, which is
the most called method in the BayesSQL code and adding an API method to
the BayesStore API.  Obviously not something to be taken lightly.

So, thoughts on getting this change into 3.1?  It would be HUGE in terms
of performance, that is the only reason I'm asking, otherwise I would
hold off for 3.2.

Michael


Re: Floating a Possible BayesSQL Code Change for 3.1

Posted by Daniel Quinlan <qu...@pathname.com>.
Michael Parker <pa...@pobox.com> writes:

> Just wanted to float this idea.  With the help of Matthew Schumacher
> I've worked up a patch for BayesSQL that results in a fairly large speed
> boost for the SQL based configurations.  I'm still running baseline and
> new code benchmarks, but rough estimates put the speedup in the
> neighborhood of 2x for MySQL and 4x for PostgreSQL.
> 
> The problem is that it involves some changes to _put_tokens, which is
> the most called method in the BayesSQL code and adding an API method to
> the BayesStore API.  Obviously not something to be taken lightly.
> 
> So, thoughts on getting this change into 3.1?  It would be HUGE in terms
> of performance, that is the only reason I'm asking, otherwise I would
> hold off for 3.2.

I would be fine with it as long as we get it into pre1 and if there
are any issues, we yank it.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Floating a Possible BayesSQL Code Change for 3.1

Posted by Michael Parker <pa...@pobox.com>.
Warren Togami wrote:

> Michael Parker wrote:
>
>> Howdy,
>>
>> Just wanted to float this idea.  With the help of Matthew Schumacher
>> I've worked up a patch for BayesSQL that results in a fairly large speed
>> boost for the SQL based configurations.  I'm still running baseline and
>> new code benchmarks, but rough estimates put the speedup in the
>> neighborhood of 2x for MySQL and 4x for PostgreSQL.
>>
>> The problem is that it involves some changes to _put_tokens, which is
>> the most called method in the BayesSQL code and adding an API method to
>> the BayesStore API.  Obviously not something to be taken lightly.
>
>
> Does this break software using spamassassin's 3.0 API?
>
> Does any 3rd party software use these particular functions?
>

No, it is ADDING an API call.  Although I do notice a typo, it's
changing _put_token not _put_tokens (this is a new one).  I was less
concerned with 3rd party use issues than I am with changing the
underlying code of a method that is called so often (everytime a token
is learned or forgotten).

We've already added a stored procedure that must be loaded for
PostgreSQL users, but this is IMO no big deal because anyone using the
code with PostgreSQL now is broken.  This removes one procedure and
replaces it with another, so in theory, getting this in in 3.1 will make
package maintainers life easier down the road.  Adding/updating a stored
procedure is doable in a package, via script, but a real PITA.

Michael

Re: Floating a Possible BayesSQL Code Change for 3.1

Posted by Warren Togami <wt...@redhat.com>.
Michael Parker wrote:
> Howdy,
> 
> Just wanted to float this idea.  With the help of Matthew Schumacher
> I've worked up a patch for BayesSQL that results in a fairly large speed
> boost for the SQL based configurations.  I'm still running baseline and
> new code benchmarks, but rough estimates put the speedup in the
> neighborhood of 2x for MySQL and 4x for PostgreSQL.
> 
> The problem is that it involves some changes to _put_tokens, which is
> the most called method in the BayesSQL code and adding an API method to
> the BayesStore API.  Obviously not something to be taken lightly.

Does this break software using spamassassin's 3.0 API?

Does any 3rd party software use these particular functions?

Warren Togami
wtogami@redhat.com

Re: Floating a Possible BayesSQL Code Change for 3.1

Posted by Michael Parker <pa...@pobox.com>.
Michael Parker wrote:

>Howdy,
>
>Just wanted to float this idea.  With the help of Matthew Schumacher
>I've worked up a patch for BayesSQL that results in a fairly large speed
>boost for the SQL based configurations.  I'm still running baseline and
>new code benchmarks, but rough estimates put the speedup in the
>neighborhood of 2x for MySQL and 4x for PostgreSQL.
>  
>

It's actually more like a 7x speedup overall for PostgreSQL, in some
cases it's a 27x speedup (for learning).

MySQL is about 2x, with a possibility of even more speedup with some
cleanups.

No change to the SDBM/DBM code.

Sounds like we're gonna be good to go.  I've got some more work to do,
including some verification tests which might completely invalidate this
whole exercise, before it is ready so if anyone objects, speak now.

Michael