You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Shlomi Fish <sh...@iglu.org.il> on 2010/08/21 10:58:33 UTC

A lot of Recent Spam is Uncaught by SpamAssassin

Hi all,

Recently I've noticed that a lot of spam that I am getting is not caught by 
SpamAssassin, despite the fact that it is very similar to other spam I got and 
that I marked as spam using the Bayesian training. I've placed a sample of 
some of the recent messagess I got here:

http://www.shlomifish.org/sa-uncaught-spam/

A lot of it is "I want to have a relationship with you, please contact me so I 
can send you my picture, etc.".

I've also noticed that the latest version of SpamAssassin, 3.3.1, was released 
on 20-March-2010, which is quite a long time ago:

http://freshmeat.net/projects/spamassassin/

Is there a new release planned soon?

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Understand what Open Source is - http://shlom.in/oss-fs

God considered inflicting XSLT as the tenth plague of Egypt, but then
decided against it because he thought it would be too evil.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Martin Gregorie <ma...@gregorie.org>.

On Mon, 2010-08-23 at 20:09 +0300, Shlomi Fish wrote:
> This email reads:
> 
> {{{
> Subject: Site suggestion: www.careerjet.co.il
> }}}
> 
That looks to be worth a local rule along the lines of:

header LINK_RQST Subject =~ /(Site|Visit).+www\..+\..+/

if you're getting many of these with a URL in the header.


Martin

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Shlomi Fish <sh...@iglu.org.il>.

On Sunday 22 August 2010 12:55:43 Giampaolo Tomassoni wrote:
> > I cannot find any special setting for autolearn (either
> > bayes_auto_learn_threshold_nonspam or
> > bayes_auto_learn_threshold_spam ) in either
> > /etc/mail/spamassassin/local.cf or
> > ~/.spamassassin/user_prefs . It probably means they are the default
> > which is
> > 0.1 and 12.0 respectively which seems reasonable enough.
> 
> In your set, these are the scores of your ham-learnt messages:
> 
> 	1282213227.7828.1nhON:2,S	 0.8
> 	1282213227.7828.AHPct:2,S	 0.8
> 	1282213227.7828.CQy9k:2,S	 0.8
> 	1282213227.7828.VLBN5:2,S	-2.0
> 	1282213227.7828.zvzWO:2,S	 0.8
> 
> Since they score 0.8, they wouldn't be learnt as ham with the default
> bayes_auto_learn_threshold_nonspam value. So, probably at least this
> setting had been overridden somewhere in your SA conf...
> 
> Also, since most of the above messages hit BAYES_50, I would say you have
> network tests disabled in your setup.
> 
> If this is the case, I would suggest to either:
> 
> 1) enable and appropriately configure network test, because they may help
> SA to better classify your messages;
> 
> 2) revise the default auto-learn threshold. Most of the SA user base
> nowadays uses network test, which means that the default auto-learn
> thresholds may reflect the wider spam-to-ham score spread this case yields.
> 
> Also note 1282213227.7828.VLBN5:2,S : is it really spam? Revise you ham
> corpora: isn't there any spam?
> 

This email reads:

{{{
Subject: Site suggestion: www.careerjet.co.il
}}}

I've classified it as spam because it appears to be one of the many link 
requests that I've been getting and which I don't really want.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Understand what Open Source is - http://shlom.in/oss-fs

God considered inflicting XSLT as the tenth plague of Egypt, but then
decided against it because he thought it would be too evil.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Jacek Politowski <jp...@jp.pl.eu.org>.

On Sun, Aug 22, 2010 at 11:40:43AM +0300, Shlomi Fish wrote:

>I cannot find any special setting for autolearn (either 
>bayes_auto_learn_threshold_nonspam or 
>bayes_auto_learn_threshold_spam ) in either /etc/mail/spamassassin/local.cf or 
>~/.spamassassin/user_prefs . It probably means they are the default which is 
>0.1 and 12.0 respectively which seems reasonable enough.

I've seen quite a few spams scoring zero (or even negative, rarely
though) points on my personal server, so I decided to lower autolearn
threshold for ham to -0.5 points.

-- 
Jacek Politowski

RE: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Giampaolo Tomassoni <Gi...@Tomassoni.biz>.

> > Since they score 0.8, they wouldn't be learnt as ham with the default
> > bayes_auto_learn_threshold_nonspam value. So, probably at least this
> > setting had been overridden somewhere in your SA conf...
> 
> Probably not. 0.8 is the score for BAYES_50, which isn't counted in the
> autolearning score. That leaves a score of 0.0 which is below the
> default threshold of 0.1

Ah, right: I forgot Bayes score is not counted for autolearning purposes.

Sorry Shlomi about that.

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by RW <rw...@googlemail.com>.

On Sun, 22 Aug 2010 11:55:43 +0200
"Giampaolo Tomassoni" <Gi...@Tomassoni.biz> wrote:

> > I cannot find any special setting for autolearn (either
> > bayes_auto_learn_threshold_nonspam or
> > bayes_auto_learn_threshold_spam ) in either
> > /etc/mail/spamassassin/local.cf or
> > ~/.spamassassin/user_prefs . It probably means they are the default
> > which is
> > 0.1 and 12.0 respectively which seems reasonable enough.
> 
> In your set, these are the scores of your ham-learnt messages:
> 
> 	1282213227.7828.1nhON:2,S	 0.8
> 	1282213227.7828.AHPct:2,S	 0.8
> 	1282213227.7828.CQy9k:2,S	 0.8
> 	1282213227.7828.VLBN5:2,S	-2.0
> 	1282213227.7828.zvzWO:2,S	 0.8
> 
> Since they score 0.8, they wouldn't be learnt as ham with the default
> bayes_auto_learn_threshold_nonspam value. So, probably at least this
> setting had been overridden somewhere in your SA conf...

Probably not. 0.8 is the score for BAYES_50, which isn't counted in the
autolearning score. That leaves a score of 0.0 which is below the
default threshold of 0.1

RE: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Giampaolo Tomassoni <Gi...@Tomassoni.biz>.

> I cannot find any special setting for autolearn (either
> bayes_auto_learn_threshold_nonspam or
> bayes_auto_learn_threshold_spam ) in either
> /etc/mail/spamassassin/local.cf or
> ~/.spamassassin/user_prefs . It probably means they are the default
> which is
> 0.1 and 12.0 respectively which seems reasonable enough.

In your set, these are the scores of your ham-learnt messages:

	1282213227.7828.1nhON:2,S	 0.8
	1282213227.7828.AHPct:2,S	 0.8
	1282213227.7828.CQy9k:2,S	 0.8
	1282213227.7828.VLBN5:2,S	-2.0
	1282213227.7828.zvzWO:2,S	 0.8

Since they score 0.8, they wouldn't be learnt as ham with the default
bayes_auto_learn_threshold_nonspam value. So, probably at least this setting
had been overridden somewhere in your SA conf...

Also, since most of the above messages hit BAYES_50, I would say you have
network tests disabled in your setup.

If this is the case, I would suggest to either:

1) enable and appropriately configure network test, because they may help SA
to better classify your messages;

2) revise the default auto-learn threshold. Most of the SA user base
nowadays uses network test, which means that the default auto-learn
thresholds may reflect the wider spam-to-ham score spread this case yields.

Also note 1282213227.7828.VLBN5:2,S : is it really spam? Revise you ham
corpora: isn't there any spam?


> I'll try to retrain the Bayesian filter.
> 
> Regards,
> 
> 	Shlomi Fish
> 
> --
> -----------------------------------------------------------------
> Shlomi Fish       http://www.shlomifish.org/
> Understand what Open Source is - http://shlom.in/oss-fs
> 
> God considered inflicting XSLT as the tenth plague of Egypt, but then
> decided against it because he thought it would be too evil.
> 
> Please reply to list if it's a mailing list post -
> http://shlom.in/reply .

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Shlomi Fish <sh...@iglu.org.il>.

On Saturday 21 August 2010 14:43:33 Giampaolo Tomassoni wrote:
> > Hi all,
> > 
> > Recently I've noticed that a lot of spam that I am getting is not
> > caught by
> > SpamAssassin, despite the fact that it is very similar to other spam I
> > got and
> > that I marked as spam using the Bayesian training. I've placed a sample
> > of
> > some of the recent messagess I got here:
> > 
> > http://www.shlomifish.org/sa-uncaught-spam/
> > 
> > A lot of it is "I want to have a relationship with you, please contact
> > me so I
> > can send you my picture, etc.".
> > 
> > I've also noticed that the latest version of SpamAssassin, 3.3.1, was
> > released
> > on 20-March-2010, which is quite a long time ago:
> > 
> > http://freshmeat.net/projects/spamassassin/
> > 
> > Is there a new release planned soon?
> 
> I don't believe a new SA release would be of any help.
> 
> Most of the message in your set hit BAYES_99 on my system.
> 
> The X-Spam-Status of some of the messages in your set, reports that you
> enabled the autolearning facility, and that it has Learnt the message as
> ham. This may be the why you have so low Bayes hits.
> 
> I would suggest to revise the Bayes autolearn thresholds in you SA setup.
> 

I cannot find any special setting for autolearn (either 
bayes_auto_learn_threshold_nonspam or 
bayes_auto_learn_threshold_spam ) in either /etc/mail/spamassassin/local.cf or 
~/.spamassassin/user_prefs . It probably means they are the default which is 
0.1 and 12.0 respectively which seems reasonable enough.

I'll try to retrain the Bayesian filter.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Understand what Open Source is - http://shlom.in/oss-fs

God considered inflicting XSLT as the tenth plague of Egypt, but then
decided against it because he thought it would be too evil.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

Re: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Shlomi Fish <sh...@iglu.org.il>.

On Saturday 21 August 2010 18:39:41 John Hardin wrote:
> On Sat, 21 Aug 2010, Giampaolo Tomassoni wrote:
> > The X-Spam-Status of some of the messages in your set, reports that you
> > enabled the autolearning facility, and that it has Learnt the message as
> > ham. This may be the why you have so low Bayes hits.
> > 
> > I would suggest to revise the Bayes autolearn thresholds in you SA setup.
> 

OK, I will.

> ...and, as your bayes database is now polluted, wipe the database and
> retrain properly from scratch.
> 

OK, thanks.

> You _did_ retain your manual training corpora, right? :)

Yes, I did. :-).

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
What does "Zionism" mean? - http://shlom.in/def-zionism

God considered inflicting XSLT as the tenth plague of Egypt, but then
decided against it because he thought it would be too evil.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

RE: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by John Hardin <jh...@impsec.org>.

On Sat, 21 Aug 2010, Giampaolo Tomassoni wrote:

> The X-Spam-Status of some of the messages in your set, reports that you 
> enabled the autolearning facility, and that it has Learnt the message as 
> ham. This may be the why you have so low Bayes hits.
>
> I would suggest to revise the Bayes autolearn thresholds in you SA setup.

...and, as your bayes database is now polluted, wipe the database and 
retrain properly from scratch.

You _did_ retain your manual training corpora, right? :)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Watch... Wallet... Gun... Knee...                    -- Denny Crane
-----------------------------------------------------------------------
  3 days until the 1931st anniversary of the destruction of Pompeii

RE: A lot of Recent Spam is Uncaught by SpamAssassin

Posted by Giampaolo Tomassoni <Gi...@Tomassoni.biz>.

> Hi all,
> 
> Recently I've noticed that a lot of spam that I am getting is not
> caught by
> SpamAssassin, despite the fact that it is very similar to other spam I
> got and
> that I marked as spam using the Bayesian training. I've placed a sample
> of
> some of the recent messagess I got here:
> 
> http://www.shlomifish.org/sa-uncaught-spam/
> 
> A lot of it is "I want to have a relationship with you, please contact
> me so I
> can send you my picture, etc.".
> 
> I've also noticed that the latest version of SpamAssassin, 3.3.1, was
> released
> on 20-March-2010, which is quite a long time ago:
> 
> http://freshmeat.net/projects/spamassassin/
> 
> Is there a new release planned soon?

I don't believe a new SA release would be of any help.

Most of the message in your set hit BAYES_99 on my system.

The X-Spam-Status of some of the messages in your set, reports that you
enabled the autolearning facility, and that it has Learnt the message as
ham. This may be the why you have so low Bayes hits.

I would suggest to revise the Bayes autolearn thresholds in you SA setup.


> 
> Regards,
> 
> 	Shlomi Fish
> 
> --
> -----------------------------------------------------------------
> Shlomi Fish       http://www.shlomifish.org/
> Understand what Open Source is - http://shlom.in/oss-fs
> 
> God considered inflicting XSLT as the tenth plague of Egypt, but then
> decided against it because he thought it would be too evil.
> 
> Please reply to list if it's a mailing list post -
> http://shlom.in/reply .