You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Daniel M. Drucker" <dm...@3e.org> on 2004/09/30 15:37:53 UTC

sa-learn with SQL everything?

I'm trying to start using Bayes and sa-learn for the first time, now
that Bayes supports SQL.

I run a smallish system (about 80 users spread over three domains).
The basic setup is Exim -> SpamAssassin 3 -> Exim -> amavis -> Exim ->
delivery. (That is -- SA and amavis are Exim router-transport pipes;
neither knows of the other's existence.)

Apart from me, none of my users have home directories; Exim uses SQL
for all account information. Mail is stored in Maildir format in
/mail/DOMAIN/USER.

The majority of my users use Squirrelmail.

I would like to enable some sort of false-negative/false-positive
reporting for them, as I would imagine that the Bayes system is not
very useful if it's getting uncorrected FN/FP data. However, every
piece of documentation I've seen for sa-learn assumes (1) a unix
account to correspond to the mailbox owner, and (2) that SQL is not
being used for anything.

Can someone point me in the right direction? I'd really like to take
advantage of Bayes, but the documentation is so haphazard right now
that I just don't know what to do.




-- 
Daniel Drucker / dmd@3e.org


Re: sa-learn with SQL everything?

Posted by Ryan Moore <ry...@perigee.net>.
Daniel M. Drucker wrote:
>>I couldn't find anyone who has done this already, so I did it myself -
> 
> 
> Nice work!
> 
> How does this interact with the use/nonuse of report_safe? It seems to
> me that (with report_safe 1) you end up training bayes on the
> encapsulation, or (with report_safe 0) you end up training it on the
> reciprocal of the spamassassin-added headers.
> 
> 

To my knowledge, salearn removes/ignores any SpamAssassin headers, so it 
shouldn't skew your data.


Ryan Moore
----------
Perigee.net Corporation
704-849-8355 (sales)
704-849-8017 (tech)
www.perigee.net

Re: sa-learn with SQL everything?

Posted by "Daniel M. Drucker" <dm...@3e.org>.
> I couldn't find anyone who has done this already, so I did it myself -

Nice work!

How does this interact with the use/nonuse of report_safe? It seems to
me that (with report_safe 1) you end up training bayes on the
encapsulation, or (with report_safe 0) you end up training it on the
reciprocal of the spamassassin-added headers.


-- 
Daniel Drucker / dmd@3e.org


Re: sa-learn with SQL everything?

Posted by Keith Hackworth <ke...@rpemail.com>.
I couldn't find anyone who has done this already, so I did it myself -
anyone who needs this is welcome to use my solution/code.  My solution
requires an IMAP server and bayes to be in mysql.  It also requires
SquirrelMail.  It also requires a /tmp directory.  Since squirrelmail
requires a unix-like system (I belive) and IMAP, you should be all set.

I'm not using Exim - I run Postfix, but that shouldn't make a difference.
I use amavis-new/sa with squirrelmail on top.  I downloaded the
squirrelmail amavisnewsql 0.7.2-1.4 plugin and modified it from there.

The plugin includes a "whitelist user" and other sa controls for the
specific user, but totally misses the bayes aspect of spamassassin.  The
plugin takes some time to configure, but is simple if you just follow the
directions.

Once you get that working, replace the setup.php and create a new
bayes.php in the plugins/amavisdnewsql directory.  I've attached the
bayes.php and setup.php files as .txt files, so remove the .txt extension.

Once you do that, on the top of every message, there will be a "This is
spam" and "This is NOT spam" link.  It will issue a
"/usr/local/bin/sa-learn -D --[sp|h]am", so make sure sa-learn is in this
directory.

***IMPORTANT:  One last thing - make sure you turn on the
"bayes_sql_override_username <user that runs spamassassin or spamd>" in
local.cf or your bayes database will only work for the user that the
webserver runs as.  It took me a while to figure this one out...

If you have any questions or problems with this, please email me.

Keith Hackworth
keith@rpemail.com

>
> I'm trying to start using Bayes and sa-learn for the first time, now
> that Bayes supports SQL.
>
> I run a smallish system (about 80 users spread over three domains).
> The basic setup is Exim -> SpamAssassin 3 -> Exim -> amavis -> Exim ->
> delivery. (That is -- SA and amavis are Exim router-transport pipes;
> neither knows of the other's existence.)
>
> Apart from me, none of my users have home directories; Exim uses SQL
> for all account information. Mail is stored in Maildir format in
> /mail/DOMAIN/USER.
>
> The majority of my users use Squirrelmail.
>
> I would like to enable some sort of false-negative/false-positive
> reporting for them, as I would imagine that the Bayes system is not
> very useful if it's getting uncorrected FN/FP data. However, every
> piece of documentation I've seen for sa-learn assumes (1) a unix
> account to correspond to the mailbox owner, and (2) that SQL is not
> being used for anything.
>
> Can someone point me in the right direction? I'd really like to take
> advantage of Bayes, but the documentation is so haphazard right now
> that I just don't know what to do.
>
>
>
>
> --
> Daniel Drucker / dmd@3e.org
>
>

Re: SpamAssassin 3.0 and sa-learn problem.

Posted by Andy Biddle <an...@the-space.net>.
I've done a CPAN "force install Digest::SHA1" and get the same issue...

On Thu, 30 Sep 2004, Theo Van Dinter wrote:

> On Thu, Sep 30, 2004 at 07:47:35AM -0700, Andy Biddle wrote:
> > Use of inherited AUTOLOAD for non-method Digest::SHA1::sha1_hex() is
> > deprecated at
> > /usr/local/lib/perl5/site_perl/5.8.2/Mail/SpamAssassin/Bayes.pm line 983.
> > Learned from 0 message(s) (1 message(s) examined).
> > Can't locate auto/Digest/SHA1/sha1_hex.al in @INC (@INC contains: lib
>
> This indicates that your Digest::SHA1 installation is botched.
>
> > SHA1 is installed and up to date.
>
> I'd blow away what you have and reinstall the module.
>
> --
> Randomly Generated Tagline:
> DOS: n., A small annoying boot virus that causes random spontaneous system
>       crashes, usually just before saving a massive project.  Easily cured by
>       UNIX.  See also MS-DOS, IBM-DOS, DR-DOS.
>  (from David Vicker's .plan)
>

Re: SpamAssassin 3.0 and sa-learn problem.

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Sep 30, 2004 at 07:47:35AM -0700, Andy Biddle wrote:
> Use of inherited AUTOLOAD for non-method Digest::SHA1::sha1_hex() is
> deprecated at
> /usr/local/lib/perl5/site_perl/5.8.2/Mail/SpamAssassin/Bayes.pm line 983.
> Learned from 0 message(s) (1 message(s) examined).
> Can't locate auto/Digest/SHA1/sha1_hex.al in @INC (@INC contains: lib

This indicates that your Digest::SHA1 installation is botched.

> SHA1 is installed and up to date.

I'd blow away what you have and reinstall the module.

-- 
Randomly Generated Tagline:
DOS: n., A small annoying boot virus that causes random spontaneous system
      crashes, usually just before saving a massive project.  Easily cured by
      UNIX.  See also MS-DOS, IBM-DOS, DR-DOS.
 (from David Vicker's .plan)

SpamAssassin 3.0 and sa-learn problem.

Posted by Andy Biddle <an...@the-space.net>.
I recently sent out a request for help regarding always getting
"autolearn=unavailable" messages.  When I try to train it with sa-learn, I
get:

Use of inherited AUTOLOAD for non-method Digest::SHA1::sha1_hex() is
deprecated at
/usr/local/lib/perl5/site_perl/5.8.2/Mail/SpamAssassin/Bayes.pm line 983.
Learned from 0 message(s) (1 message(s) examined).
Can't locate auto/Digest/SHA1/sha1_hex.al in @INC (@INC contains: lib
/usr/local/lib/perl5/site_perl/5.8.2
/usr/local/lib/perl5/site_perl/5.8.2/mach
/usr/local/lib/perl5/site_perl/5.8.0/i386-freebsd
/usr/local/lib/perl5/site_perl/5.8.0 /usr/local/lib/perl5/site_perl/5.6.1
/usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl
/usr/local/lib/perl5/5.8.2/BSDPAN /usr/local/lib/perl5/5.8.2/mach
/usr/local/lib/perl5/5.8.2) at
/usr/local/lib/perl5/site_perl/5.8.2/Mail/SpamAssassin/Bayes.pm line 983

SHA1 is installed and up to date.  SpamAssassin was installed via CPAN.
My system is FreeBSD...

Assuming no one jumps up and points out specifically how to fix this, I'm
considering just wiping out my installation and rebuilding. Am I correct
in thinking that if this is probably just something wrong with my
installation?

Is there a good way to blow away SpamAssassin and everything it requires?
If I use CPAN to re-install SpamAssassin, shouldn't it re-install anything
it then requires?

Sorry, can't figure out why I'm having dependancy issues and I really want
to get this fixed.  Ugh.


Re: sa-learn with SQL everything?

Posted by Sune Kloppenborg Jeppesen <su...@dir.dk>.
On Thursday 30 September 2004 15:37, Daniel M. Drucker wrote:
> I'm trying to start using Bayes and sa-learn for the first time, now
> that Bayes supports SQL.
>
> I run a smallish system (about 80 users spread over three domains).
> The basic setup is Exim -> SpamAssassin 3 -> Exim -> amavis -> Exim ->
> delivery. (That is -- SA and amavis are Exim router-transport pipes;
> neither knows of the other's existence.)
>
> Apart from me, none of my users have home directories; Exim uses SQL
> for all account information. Mail is stored in Maildir format in
> /mail/DOMAIN/USER.
>
> The majority of my users use Squirrelmail.
>
> I would like to enable some sort of false-negative/false-positive
> reporting for them, as I would imagine that the Bayes system is not
> very useful if it's getting uncorrected FN/FP data. However, every
> piece of documentation I've seen for sa-learn assumes (1) a unix
> account to correspond to the mailbox owner, and (2) that SQL is not
> being used for anything.
>
> Can someone point me in the right direction? I'd really like to take
> advantage of Bayes, but the documentation is so haphazard right now
> that I just don't know what to do.
You could setup a dedicated SA user and have a site wide Bayes database.

-- 
Regards

Sune Kloppenborg Jeppesen

------------------------------------------------------------------
This email was scanned by MailPlus anti-virus at http://www.dir.dk
------------------------------------------------------------------