You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steven Manross <st...@manross.net> on 2005/05/14 03:53:28 UTC

RE: SQL Question -- FIX

***This now works (with minor mods to the SA distro files [SQL.pm] and
the creation of an additional MS SQL User defined function)

I've mocked up an MS SQL Version of RPAD that could be easily introduced
into the readme code that creates the bayes tables, and sets the
version. (please correct the SQL for RPAD if I've incorrectly defined
part of it). 

spamassassin -D <input.txt >output.txt

...showed bayes activity and marked spam/ham accordingly.

The only problem now being is that when you call MS SQL RPAD, you need
to do so, like so:

dbo.RPAD('this',5,' ')

Noting the only change necessary being to prepend "dbo." to the function
name in the:

Mail\SpamAssassin\BayesStore\SQL.pm

file.

Thanks be to both Michael Parker and Daryl O'Shea for their help.

How would I get this kind of suggestion considered for implementation in
a future release/or just even in the docs?  Through bugzilla?

Thanks to all involved in SA..  I've been using SA for a couple years
now, and it's a truly remarkable piece of work!

Thanks,
Steven

Re: SQL Question -- FIX

Posted by Michael Parker <pa...@pobox.com>.
On Fri, May 13, 2005 at 06:53:28PM -0700, Steven Manross wrote:
> ***This now works (with minor mods to the SA distro files [SQL.pm] and
> the creation of an additional MS SQL User defined function)
> 
> I've mocked up an MS SQL Version of RPAD that could be easily introduced
> into the readme code that creates the bayes tables, and sets the
> version. (please correct the SQL for RPAD if I've incorrectly defined
> part of it). 
> 
> spamassassin -D <input.txt >output.txt
> 
> ...showed bayes activity and marked spam/ham accordingly.
> 
> The only problem now being is that when you call MS SQL RPAD, you need
> to do so, like so:
> 
> dbo.RPAD('this',5,' ')

If it was straight SQL (ie select token, spam_count, ham_count etc)
what would the token portion have to look like for MS SQL?

Something like:

select substring(token,1,len(token)) + replicate(' ',5-len(token)),
ham_count, spam_count etc etc

?

If so, it would simply be a matter (in 3.1 at least) of creating a
MSSQL.pm module that inherits from SQL.pm and overrides
_token_select_string.  Of course, you can still do that with the RPAD
function and make the call:
select dbo.RPAD(token,5,' '), spam_count, ham_count etc etc

Seems like a reasonable thing to do, and in the future we might find
some other MS SQL specific things we want to override to make things
faster.

FYI, to answer your question about why not just use varchar, we found
that creating variable length rows really slowed down the SQL, so best
to keep things a constant length, things move much faster that way.

Michael