You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by James <bj...@lockie.ca> on 2012/09/22 21:29:59 UTC

latest rules

I've been getting more spam recently so I did sa-update for the first time in a year (I thought it was automatic :-()).

I restarted the spamassassin service (Ubuntu).

$ /var/lib/spamassassin$ ll
total 16
drwxr-xr-x  4 root root 4096 Oct 15  2011 ./
drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
drwxr-xr-x  3 root root 4096 Aug 27  2011 3.003001/
drwxr-xr-x  3 root root 4096 Sep 21 23:59 3.003002/

Will it use the latest rules or do I need to delete the older ones?

Where in a mail header does it say what version of the rules it used?
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
X-Spam-Level: ***
X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
	RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no version=3.3.2

Re: latest rules

Posted by Bowie Bailey <Bo...@BUC.com>.

On 9/22/2012 4:31 PM, James wrote:
> I am lowering the required score to 3.
> If I still get spam, I will block everything and just use whitelisting.

SpamAssassin's scores have been set to work best with a required score 
of 5.  If you find you have to significantly reduce this score to get 
good results, then there is probably something wrong with your setup.  
It may be worth your while to spend a little more time working with the 
people here on the list (who are generally very helpful) to figure out 
how to fix your system so that SA will work properly.

-- 
Bowie

Re: latest rules

Posted by John Hardin <jh...@impsec.org>.

On Sun, 23 Sep 2012, James wrote:

> On 09/23/12 18:28, John Hardin wrote:
>> On Sun, 23 Sep 2012, James wrote:
>>
>>> I wrote this little script to update the bayes rules. I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.
>>>
>>> Any comments on my script?
>>
>> Bear in mind that spams which don't score high enough to be quarantined or discarded will end up in your inbox, as will false negatives. Training all of the mail in all of your inboxes as ham will train these messages as ham and make any small error in classification much worse over time.

> I will manually move spam to an Junk (not a .INBOX*

Good.

>>
>> During the initial training period you want to manually review messages and build a ham corpus and a spam corpus. Once bayes is running you generally only want to train on misclassified messages. This decisionmaking process cannot be automated, or the errors wouldn't occur in the first place.
>>
>> You should set up per-user train-as-ham and train-as-spam mailboxes, and only train from those, only for the users whose judgement you trust. Then, those users should copy misclassified messages to the appropriate folder and may also add samples of ham to the train-as-ham folder whenever desired.
>
> The only user is me. :-)

:) Should be easy then.

> Is there a way to convert my Thurderbird bayes to spamassassin?

Not that I know of.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Joan Peterson is like that: you expect at least a pseudological
   argument, but instead you get the weird ramblings of a woman with
   the critical thinking abilities of an 18th century peasant.  -- Ken
-----------------------------------------------------------------------
  115 days since the first successful private support mission to ISS (SpaceX)

Re: latest rules

Posted by James <bj...@lockie.ca>.

On 09/23/12 18:28, John Hardin wrote:
> On Sun, 23 Sep 2012, James wrote:
> 
>> I wrote this little script to update the bayes rules. I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.
>>
>> Any comments on my script?
> 
> Bear in mind that spams which don't score high enough to be quarantined or discarded will end up in your inbox, as will false negatives. Training all of the mail in all of your inboxes as ham will train these messages as ham and make any small error in classification much worse over time.
I will manually move spam to an Junk (not a .INBOX*
> 
> During the initial training period you want to manually review messages and build a ham corpus and a spam corpus. Once bayes is running you generally only want to train on misclassified messages. This decisionmaking process cannot be automated, or the errors wouldn't occur in the first place.
> 
> You should set up per-user train-as-ham and train-as-spam mailboxes, and only train from those, only for the users whose judgement you trust. Then, those users should copy misclassified messages to the appropriate folder and may also add samples of ham to the train-as-ham folder whenever desired.

The only user is me. :-)
Is there a way to convert my Thurderbird bayes to spamassassin?
> 
>>
>> #!/bin/bash
>>
>> IFS=$'\n'
>> FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
>>
>> for i in $FOLDERLIST; do
>>    echo "Processing ""$i"
>> #    `sudo sa-learn"--ham "$i"`
>> done
>>
>> #`sudo sa-learn --spam Maildir/.Junk
>>
> 
>

Re: latest rules

Posted by John Hardin <jh...@impsec.org>.

On Sun, 23 Sep 2012, James wrote:

> I wrote this little script to update the bayes rules. I can do this on 
> my imap account but my pop3 account gets way more spam and the messages 
> are no longer on the machine with sa once I pop them off.
>
> Any comments on my script?

Bear in mind that spams which don't score high enough to be quarantined or 
discarded will end up in your inbox, as will false negatives. Training all 
of the mail in all of your inboxes as ham will train these messages as ham 
and make any small error in classification much worse over time.

During the initial training period you want to manually review messages 
and build a ham corpus and a spam corpus. Once bayes is running you 
generally only want to train on misclassified messages. This 
decisionmaking process cannot be automated, or the errors wouldn't occur 
in the first place.

You should set up per-user train-as-ham and train-as-spam mailboxes, and 
only train from those, only for the users whose judgement you trust. Then, 
those users should copy misclassified messages to the appropriate folder 
and may also add samples of ham to the train-as-ham folder whenever 
desired.

>
> #!/bin/bash
>
> IFS=$'\n'
> FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
>
> for i in $FOLDERLIST; do
>    echo "Processing ""$i"
> #    `sudo sa-learn"--ham "$i"`
> done
>
> #`sudo sa-learn --spam Maildir/.Junk
>

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Users mistake widespread adoption of Microsoft Office for the
   development of a document format standard.
-----------------------------------------------------------------------
  115 days since the first successful private support mission to ISS (SpaceX)

Re: latest rules

Posted by James <bj...@lockie.ca>.

On 09/22/12 22:38, John Hardin wrote:
> On Sat, 22 Sep 2012, James wrote:
>> On 09/22/12 20:50, Glenn Sieb wrote:
>>> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>>>> On 9/22/12 8:32 PM, James wrote:
>>>>> It didn't help. :-(
>>>>> I got spam with a low score.
>>>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>>>>     INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>>>>     version=3.3.2
>>
>> I haven't trained my bayes, it seemed complex when I last looked.
> 
> The "BAYES_00" in the rule hits above strongly suggests that mistrained Bayes is the primary culprit (if this is indeed a FN).
> 
> If you haven't trained bayes, then return your required score to 5 and disable Bayes and see if it behaves better.
> 
> Then, collect a few hundred ham and spam messages, wipe your bayes database, train it properly using those messages, and reenable it.
> 
> Training is pretty simple. All you have to do is collect representative ham and spam messages in a couple of mail folders, and tell sa-learn to process them. The critical bit that usually causes problems is you must run sa-learn as the user that the MTA is running spamassassin as, or make sure that your configuration defines a system-global Bayes database.
> 
> You will probably want to set up a nightly cron job to run sa-learn against your ham and spam corpus folders. Then you can just add misclassified messages (spams from your inbox, and hams from your spam quarantine) to those folders as you encounter them.
> 


I wrote this little script to update the bayes rules.
I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.

Any comments on my script?


#!/bin/bash

IFS=$'\n'
FOLDERLIST=`find Maildir -name .INBOX\* -type d;`

for i in $FOLDERLIST; do
    echo "Processing ""$i"
#    `sudo sa-learn"--ham "$i"`
done

#`sudo sa-learn --spam Maildir/.Junk

Re: latest rules

Posted by John Hardin <jh...@impsec.org>.

On Sat, 22 Sep 2012, James wrote:
> On 09/22/12 20:50, Glenn Sieb wrote:
>> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>>> On 9/22/12 8:32 PM, James wrote:
>>>> It didn't help. :-(
>>>> I got spam with a low score.
>>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>>> 	INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>>> 	version=3.3.2
>
> I haven't trained my bayes, it seemed complex when I last looked.

The "BAYES_00" in the rule hits above strongly suggests that mistrained 
Bayes is the primary culprit (if this is indeed a FN).

If you haven't trained bayes, then return your required score to 5 and 
disable Bayes and see if it behaves better.

Then, collect a few hundred ham and spam messages, wipe your bayes 
database, train it properly using those messages, and reenable it.

Training is pretty simple. All you have to do is collect representative 
ham and spam messages in a couple of mail folders, and tell sa-learn to 
process them. The critical bit that usually causes problems is you must 
run sa-learn as the user that the MTA is running spamassassin as, or make 
sure that your configuration defines a system-global Bayes database.

You will probably want to set up a nightly cron job to run sa-learn 
against your ham and spam corpus folders. Then you can just add 
misclassified messages (spams from your inbox, and hams from your spam 
quarantine) to those folders as you encounter them.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   If "healthcare is a Right" means that the government is obligated
   to provide the people with hospitals, physicians, treatments and
   medications at low or no cost, then the right to free speech means
   the government is obligated to provide the people with printing
   presses and public address systems, the right to freedom of
   religion means the government is obligated to build churches for the
   people, and the right to keep and bear arms means the government is
   obligated to provide the people with guns, all at low or no cost.
-----------------------------------------------------------------------
  114 days since the first successful private support mission to ISS (SpaceX)

Re: latest rules

Posted by James <bj...@lockie.ca>.

On 09/22/12 20:50, Glenn Sieb wrote:
> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>> On 9/22/12 8:32 PM, James wrote:
>>> It didn't help. :-(
>>> I got spam with a low score.
>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>> 	INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>> 	version=3.3.2
>>
>> If it's spam, then why does it say "X-Spam-Status: No"?
>>
>> (Hint: It's not spam.)
> 
> Sorry, my bad. I misread the email.
> 
> Are you training your bayes? Do you have your local.cf properly set up?
> 
> Best,
> --G.
> 

I haven't trained my bayes, it seemed complex when I last looked.

The local.cf is the one that comes with Ubuntu (probably the stock one).

Re: latest rules

Posted by Glenn Sieb <ge...@wingfoot.org>.

On 9/22/12 8:36 PM, Glenn Sieb wrote:
> On 9/22/12 8:32 PM, James wrote:
>> It didn't help. :-(
>> I got spam with a low score.
>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>> 	INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>> 	version=3.3.2
> 
> If it's spam, then why does it say "X-Spam-Status: No"?
> 
> (Hint: It's not spam.)

Sorry, my bad. I misread the email.

Are you training your bayes? Do you have your local.cf properly set up?

Best,
--G.

Re: latest rules

Posted by Glenn Sieb <ge...@wingfoot.org>.

On 9/22/12 8:32 PM, James wrote:
> It didn't help. :-(
> I got spam with a low score.
> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
> 	INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
> 	version=3.3.2

If it's spam, then why does it say "X-Spam-Status: No"?

(Hint: It's not spam.)

--Glenn

Re: latest rules

Posted by James <bj...@lockie.ca>.

On 09/22/12 17:11, Daniel McDonald wrote:
> 
> 
> 
> On 9/22/12 3:31 PM, "James" <bj...@lockie.ca> wrote:
> 
>> Great thanks.
>>
>> I am lowering the required score to 3.
> 
> That is generally not a desirable practice.
It didn't help. :-(
I got spam with a low score.
X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
	INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
	version=3.3.2
> 
>> If I still get spam, I will block everything and just use whitelisting.
> 
> I see that you have bayes enabled.  You should train your bayes every now
> and again.
> 
> You may want to look at a few spams and write a rule just for them.  For
> example, we received a spam asking for a loan of a small amount of money.
> It scored about 3.5.  I wrote the following:
> 
> body    __WORD_LOAN        /\bloan\b/
> describe    __WORD_LOAN        Describes a loan
> 
> body    __WORD_URGENT        /\burgent/
> describe    __WORD_URGENT        Something is urgent or urgently needed
> 
> meta    AE_SMALL_URGENT_LOAN    __FRAUD_DBI && __WORD_LOAN && __WORD_URGENT
> && __REPLY_FREEMAIL
> describe    AE_SMALL_URGENT_LOAN    urgent loan for a small dollar figure to
> freemail user
> score    AE_SMALL_URGENT_LOAN    2.3

Does anyone have a good rule for "To cancel your subscription", etc.
It seems a lot of spam has opt out links (but I never opted in).
None of friends have opt out links so it would be fine to block everything that does.

> 
> 
> 
> It's not the most elegant rule, but that's the real power of spamassassin -
> custom rules to kill off the spam.
> 
>

Re: latest rules

Posted by Daniel McDonald <da...@austinenergy.com>.

On 9/22/12 3:31 PM, "James" <bj...@lockie.ca> wrote:

> Great thanks.
> 
> I am lowering the required score to 3.

That is generally not a desirable practice.

> If I still get spam, I will block everything and just use whitelisting.

I see that you have bayes enabled.  You should train your bayes every now
and again.

You may want to look at a few spams and write a rule just for them.  For
example, we received a spam asking for a loan of a small amount of money.
It scored about 3.5.  I wrote the following:

body    __WORD_LOAN        /\bloan\b/
describe    __WORD_LOAN        Describes a loan

body    __WORD_URGENT        /\burgent/
describe    __WORD_URGENT        Something is urgent or urgently needed

meta    AE_SMALL_URGENT_LOAN    __FRAUD_DBI && __WORD_LOAN && __WORD_URGENT
&& __REPLY_FREEMAIL
describe    AE_SMALL_URGENT_LOAN    urgent loan for a small dollar figure to
freemail user
score    AE_SMALL_URGENT_LOAN    2.3

It's not the most elegant rule, but that's the real power of spamassassin -
custom rules to kill off the spam.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281

Re: latest rules

Posted by James <bj...@lockie.ca>.

On 09/22/12 15:38, Martin Hepworth wrote:
> Itll use the lastest rule you downloaded as these are your core rules now
> 
> Martin
> 
> On Saturday, 22 September 2012, James wrote:
> 
>     I've been getting more spam recently so I did sa-update for the first time in a year (I thought it was automatic :-()).
> 
>     I restarted the spamassassin service (Ubuntu).
> 
>     $ /var/lib/spamassassin$ ll
>     total 16
>     drwxr-xr-x  4 root root 4096 Oct 15  2011 ./
>     drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
>     drwxr-xr-x  3 root root 4096 Aug 27  2011 3.003001/
>     drwxr-xr-x  3 root root 4096 Sep 21 23:59 3.003002/
> 
>     Will it use the latest rules or do I need to delete the older ones?
> 
>     Where in a mail header does it say what version of the rules it used?
>     X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
>     X-Spam-Level: ***
>     X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
>             RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no version=3.3.2
> 
> 
> 
> -- 
> -- 
> Martin Hepworth, CISSP
> Oxford, UK
Great thanks.

I am lowering the required score to 3.
If I still get spam, I will block everything and just use whitelisting.

Re: latest rules

Posted by Martin Hepworth <ma...@gmail.com>.

Itll use the lastest rule you downloaded as these are your core rules now

Martin

On Saturday, 22 September 2012, James wrote:

> I've been getting more spam recently so I did sa-update for the first time
> in a year (I thought it was automatic :-()).
>
> I restarted the spamassassin service (Ubuntu).
>
> $ /var/lib/spamassassin$ ll
> total 16
> drwxr-xr-x  4 root root 4096 Oct 15  2011 ./
> drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
> drwxr-xr-x  3 root root 4096 Aug 27  2011 3.003001/
> drwxr-xr-x  3 root root 4096 Sep 21 23:59 3.003002/
>
> Will it use the latest rules or do I need to delete the older ones?
>
> Where in a mail header does it say what version of the rules it used?
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
>         RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no
> version=3.3.2
>
>

-- 
-- 
Martin Hepworth, CISSP
Oxford, UK