You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by James <bj...@lockie.ca> on 2012/09/22 21:29:59 UTC
latest rules
I've been getting more spam recently so I did sa-update for the first time in a year (I thought it was automatic :-()).
I restarted the spamassassin service (Ubuntu).
$ /var/lib/spamassassin$ ll
total 16
drwxr-xr-x 4 root root 4096 Oct 15 2011 ./
drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
drwxr-xr-x 3 root root 4096 Aug 27 2011 3.003001/
drwxr-xr-x 3 root root 4096 Sep 21 23:59 3.003002/
Will it use the latest rules or do I need to delete the older ones?
Where in a mail header does it say what version of the rules it used?
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
X-Spam-Level: ***
X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no version=3.3.2
Re: latest rules
Posted by Bowie Bailey <Bo...@BUC.com>.
On 9/22/2012 4:31 PM, James wrote:
> I am lowering the required score to 3.
> If I still get spam, I will block everything and just use whitelisting.
SpamAssassin's scores have been set to work best with a required score
of 5. If you find you have to significantly reduce this score to get
good results, then there is probably something wrong with your setup.
It may be worth your while to spend a little more time working with the
people here on the list (who are generally very helpful) to figure out
how to fix your system so that SA will work properly.
--
Bowie
Re: latest rules
Posted by John Hardin <jh...@impsec.org>.
On Sun, 23 Sep 2012, James wrote:
> On 09/23/12 18:28, John Hardin wrote:
>> On Sun, 23 Sep 2012, James wrote:
>>
>>> I wrote this little script to update the bayes rules. I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.
>>>
>>> Any comments on my script?
>>
>> Bear in mind that spams which don't score high enough to be quarantined or discarded will end up in your inbox, as will false negatives. Training all of the mail in all of your inboxes as ham will train these messages as ham and make any small error in classification much worse over time.
> I will manually move spam to an Junk (not a .INBOX*
Good.
>>
>> During the initial training period you want to manually review messages and build a ham corpus and a spam corpus. Once bayes is running you generally only want to train on misclassified messages. This decisionmaking process cannot be automated, or the errors wouldn't occur in the first place.
>>
>> You should set up per-user train-as-ham and train-as-spam mailboxes, and only train from those, only for the users whose judgement you trust. Then, those users should copy misclassified messages to the appropriate folder and may also add samples of ham to the train-as-ham folder whenever desired.
>
> The only user is me. :-)
:) Should be easy then.
> Is there a way to convert my Thurderbird bayes to spamassassin?
Not that I know of.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Joan Peterson is like that: you expect at least a pseudological
argument, but instead you get the weird ramblings of a woman with
the critical thinking abilities of an 18th century peasant. -- Ken
-----------------------------------------------------------------------
115 days since the first successful private support mission to ISS (SpaceX)
Re: latest rules
Posted by James <bj...@lockie.ca>.
On 09/23/12 18:28, John Hardin wrote:
> On Sun, 23 Sep 2012, James wrote:
>
>> I wrote this little script to update the bayes rules. I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.
>>
>> Any comments on my script?
>
> Bear in mind that spams which don't score high enough to be quarantined or discarded will end up in your inbox, as will false negatives. Training all of the mail in all of your inboxes as ham will train these messages as ham and make any small error in classification much worse over time.
I will manually move spam to an Junk (not a .INBOX*
>
> During the initial training period you want to manually review messages and build a ham corpus and a spam corpus. Once bayes is running you generally only want to train on misclassified messages. This decisionmaking process cannot be automated, or the errors wouldn't occur in the first place.
>
> You should set up per-user train-as-ham and train-as-spam mailboxes, and only train from those, only for the users whose judgement you trust. Then, those users should copy misclassified messages to the appropriate folder and may also add samples of ham to the train-as-ham folder whenever desired.
The only user is me. :-)
Is there a way to convert my Thurderbird bayes to spamassassin?
>
>>
>> #!/bin/bash
>>
>> IFS=$'\n'
>> FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
>>
>> for i in $FOLDERLIST; do
>> echo "Processing ""$i"
>> # `sudo sa-learn"--ham "$i"`
>> done
>>
>> #`sudo sa-learn --spam Maildir/.Junk
>>
>
>
Re: latest rules
Posted by John Hardin <jh...@impsec.org>.
On Sun, 23 Sep 2012, James wrote:
> I wrote this little script to update the bayes rules. I can do this on
> my imap account but my pop3 account gets way more spam and the messages
> are no longer on the machine with sa once I pop them off.
>
> Any comments on my script?
Bear in mind that spams which don't score high enough to be quarantined or
discarded will end up in your inbox, as will false negatives. Training all
of the mail in all of your inboxes as ham will train these messages as ham
and make any small error in classification much worse over time.
During the initial training period you want to manually review messages
and build a ham corpus and a spam corpus. Once bayes is running you
generally only want to train on misclassified messages. This
decisionmaking process cannot be automated, or the errors wouldn't occur
in the first place.
You should set up per-user train-as-ham and train-as-spam mailboxes, and
only train from those, only for the users whose judgement you trust. Then,
those users should copy misclassified messages to the appropriate folder
and may also add samples of ham to the train-as-ham folder whenever
desired.
>
> #!/bin/bash
>
> IFS=$'\n'
> FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
>
> for i in $FOLDERLIST; do
> echo "Processing ""$i"
> # `sudo sa-learn"--ham "$i"`
> done
>
> #`sudo sa-learn --spam Maildir/.Junk
>
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Users mistake widespread adoption of Microsoft Office for the
development of a document format standard.
-----------------------------------------------------------------------
115 days since the first successful private support mission to ISS (SpaceX)
Re: latest rules
Posted by James <bj...@lockie.ca>.
On 09/22/12 22:38, John Hardin wrote:
> On Sat, 22 Sep 2012, James wrote:
>> On 09/22/12 20:50, Glenn Sieb wrote:
>>> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>>>> On 9/22/12 8:32 PM, James wrote:
>>>>> It didn't help. :-(
>>>>> I got spam with a low score.
>>>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>>>> INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>>>> version=3.3.2
>>
>> I haven't trained my bayes, it seemed complex when I last looked.
>
> The "BAYES_00" in the rule hits above strongly suggests that mistrained Bayes is the primary culprit (if this is indeed a FN).
>
> If you haven't trained bayes, then return your required score to 5 and disable Bayes and see if it behaves better.
>
> Then, collect a few hundred ham and spam messages, wipe your bayes database, train it properly using those messages, and reenable it.
>
> Training is pretty simple. All you have to do is collect representative ham and spam messages in a couple of mail folders, and tell sa-learn to process them. The critical bit that usually causes problems is you must run sa-learn as the user that the MTA is running spamassassin as, or make sure that your configuration defines a system-global Bayes database.
>
> You will probably want to set up a nightly cron job to run sa-learn against your ham and spam corpus folders. Then you can just add misclassified messages (spams from your inbox, and hams from your spam quarantine) to those folders as you encounter them.
>
I wrote this little script to update the bayes rules.
I can do this on my imap account but my pop3 account gets way more spam and the messages are no longer on the machine with sa once I pop them off.
Any comments on my script?
#!/bin/bash
IFS=$'\n'
FOLDERLIST=`find Maildir -name .INBOX\* -type d;`
for i in $FOLDERLIST; do
echo "Processing ""$i"
# `sudo sa-learn"--ham "$i"`
done
#`sudo sa-learn --spam Maildir/.Junk
Re: latest rules
Posted by John Hardin <jh...@impsec.org>.
On Sat, 22 Sep 2012, James wrote:
> On 09/22/12 20:50, Glenn Sieb wrote:
>> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>>> On 9/22/12 8:32 PM, James wrote:
>>>> It didn't help. :-(
>>>> I got spam with a low score.
>>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>>> INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>>> version=3.3.2
>
> I haven't trained my bayes, it seemed complex when I last looked.
The "BAYES_00" in the rule hits above strongly suggests that mistrained
Bayes is the primary culprit (if this is indeed a FN).
If you haven't trained bayes, then return your required score to 5 and
disable Bayes and see if it behaves better.
Then, collect a few hundred ham and spam messages, wipe your bayes
database, train it properly using those messages, and reenable it.
Training is pretty simple. All you have to do is collect representative
ham and spam messages in a couple of mail folders, and tell sa-learn to
process them. The critical bit that usually causes problems is you must
run sa-learn as the user that the MTA is running spamassassin as, or make
sure that your configuration defines a system-global Bayes database.
You will probably want to set up a nightly cron job to run sa-learn
against your ham and spam corpus folders. Then you can just add
misclassified messages (spams from your inbox, and hams from your spam
quarantine) to those folders as you encounter them.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
If "healthcare is a Right" means that the government is obligated
to provide the people with hospitals, physicians, treatments and
medications at low or no cost, then the right to free speech means
the government is obligated to provide the people with printing
presses and public address systems, the right to freedom of
religion means the government is obligated to build churches for the
people, and the right to keep and bear arms means the government is
obligated to provide the people with guns, all at low or no cost.
-----------------------------------------------------------------------
114 days since the first successful private support mission to ISS (SpaceX)
Re: latest rules
Posted by James <bj...@lockie.ca>.
On 09/22/12 20:50, Glenn Sieb wrote:
> On 9/22/12 8:36 PM, Glenn Sieb wrote:
>> On 9/22/12 8:32 PM, James wrote:
>>> It didn't help. :-(
>>> I got spam with a low score.
>>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>>> INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>>> version=3.3.2
>>
>> If it's spam, then why does it say "X-Spam-Status: No"?
>>
>> (Hint: It's not spam.)
>
> Sorry, my bad. I misread the email.
>
> Are you training your bayes? Do you have your local.cf properly set up?
>
> Best,
> --G.
>
I haven't trained my bayes, it seemed complex when I last looked.
The local.cf is the one that comes with Ubuntu (probably the stock one).
Re: latest rules
Posted by Glenn Sieb <ge...@wingfoot.org>.
On 9/22/12 8:36 PM, Glenn Sieb wrote:
> On 9/22/12 8:32 PM, James wrote:
>> It didn't help. :-(
>> I got spam with a low score.
>> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
>> INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
>> version=3.3.2
>
> If it's spam, then why does it say "X-Spam-Status: No"?
>
> (Hint: It's not spam.)
Sorry, my bad. I misread the email.
Are you training your bayes? Do you have your local.cf properly set up?
Best,
--G.
Re: latest rules
Posted by Glenn Sieb <ge...@wingfoot.org>.
On 9/22/12 8:32 PM, James wrote:
> It didn't help. :-(
> I got spam with a low score.
> X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
> INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
> version=3.3.2
If it's spam, then why does it say "X-Spam-Status: No"?
(Hint: It's not spam.)
--Glenn
Re: latest rules
Posted by James <bj...@lockie.ca>.
On 09/22/12 17:11, Daniel McDonald wrote:
>
>
>
> On 9/22/12 3:31 PM, "James" <bj...@lockie.ca> wrote:
>
>> Great thanks.
>>
>> I am lowering the required score to 3.
>
> That is generally not a desirable practice.
It didn't help. :-(
I got spam with a low score.
X-Spam-Status: No, score=2.4 required=3.0 tests=BAYES_00,HTML_MESSAGE,
INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,URIBL_DBL_SPAM autolearn=no
version=3.3.2
>
>> If I still get spam, I will block everything and just use whitelisting.
>
> I see that you have bayes enabled. You should train your bayes every now
> and again.
>
> You may want to look at a few spams and write a rule just for them. For
> example, we received a spam asking for a loan of a small amount of money.
> It scored about 3.5. I wrote the following:
>
> body __WORD_LOAN /\bloan\b/
> describe __WORD_LOAN Describes a loan
>
> body __WORD_URGENT /\burgent/
> describe __WORD_URGENT Something is urgent or urgently needed
>
> meta AE_SMALL_URGENT_LOAN __FRAUD_DBI && __WORD_LOAN && __WORD_URGENT
> && __REPLY_FREEMAIL
> describe AE_SMALL_URGENT_LOAN urgent loan for a small dollar figure to
> freemail user
> score AE_SMALL_URGENT_LOAN 2.3
Does anyone have a good rule for "To cancel your subscription", etc.
It seems a lot of spam has opt out links (but I never opted in).
None of friends have opt out links so it would be fine to block everything that does.
>
>
>
> It's not the most elegant rule, but that's the real power of spamassassin -
> custom rules to kill off the spam.
>
>
Re: latest rules
Posted by Daniel McDonald <da...@austinenergy.com>.
On 9/22/12 3:31 PM, "James" <bj...@lockie.ca> wrote:
> Great thanks.
>
> I am lowering the required score to 3.
That is generally not a desirable practice.
> If I still get spam, I will block everything and just use whitelisting.
I see that you have bayes enabled. You should train your bayes every now
and again.
You may want to look at a few spams and write a rule just for them. For
example, we received a spam asking for a loan of a small amount of money.
It scored about 3.5. I wrote the following:
body __WORD_LOAN /\bloan\b/
describe __WORD_LOAN Describes a loan
body __WORD_URGENT /\burgent/
describe __WORD_URGENT Something is urgent or urgently needed
meta AE_SMALL_URGENT_LOAN __FRAUD_DBI && __WORD_LOAN && __WORD_URGENT
&& __REPLY_FREEMAIL
describe AE_SMALL_URGENT_LOAN urgent loan for a small dollar figure to
freemail user
score AE_SMALL_URGENT_LOAN 2.3
It's not the most elegant rule, but that's the real power of spamassassin -
custom rules to kill off the spam.
--
Daniel J McDonald, CCIE # 2495, CISSP # 78281
Re: latest rules
Posted by James <bj...@lockie.ca>.
On 09/22/12 15:38, Martin Hepworth wrote:
> Itll use the lastest rule you downloaded as these are your core rules now
>
> Martin
>
> On Saturday, 22 September 2012, James wrote:
>
> I've been getting more spam recently so I did sa-update for the first time in a year (I thought it was automatic :-()).
>
> I restarted the spamassassin service (Ubuntu).
>
> $ /var/lib/spamassassin$ ll
> total 16
> drwxr-xr-x 4 root root 4096 Oct 15 2011 ./
> drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
> drwxr-xr-x 3 root root 4096 Aug 27 2011 3.003001/
> drwxr-xr-x 3 root root 4096 Sep 21 23:59 3.003002/
>
> Will it use the latest rules or do I need to delete the older ones?
>
> Where in a mail header does it say what version of the rules it used?
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
> RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no version=3.3.2
>
>
>
> --
> --
> Martin Hepworth, CISSP
> Oxford, UK
Great thanks.
I am lowering the required score to 3.
If I still get spam, I will block everything and just use whitelisting.
Re: latest rules
Posted by Martin Hepworth <ma...@gmail.com>.
Itll use the lastest rule you downloaded as these are your core rules now
Martin
On Saturday, 22 September 2012, James wrote:
> I've been getting more spam recently so I did sa-update for the first time
> in a year (I thought it was automatic :-()).
>
> I restarted the spamassassin service (Ubuntu).
>
> $ /var/lib/spamassassin$ ll
> total 16
> drwxr-xr-x 4 root root 4096 Oct 15 2011 ./
> drwxr-xr-x 45 root root 4096 Jun 12 06:40 ../
> drwxr-xr-x 3 root root 4096 Aug 27 2011 3.003001/
> drwxr-xr-x 3 root root 4096 Sep 21 23:59 3.003002/
>
> Will it use the latest rules or do I need to delete the older ones?
>
> Where in a mail header does it say what version of the rules it used?
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40,HTML_MESSAGE,
> RCVD_IN_BRBL_LASTEXT,SPF_HELO_PASS,URIBL_BLACK autolearn=no
> version=3.3.2
>
>
--
--
Martin Hepworth, CISSP
Oxford, UK