You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Christoph Reichenberger <ch...@ergonis.com> on 2006/06/05 12:59:52 UTC

How to train SpamAssassin to catch this kind of spam

Hi,

I am pretty new to SpamAssassin, so I apologize, if this has been  
posted in the past.
I have SA integrated in Communigate Pro with CGPSA, and it has  
already started
to filter out a lot of spam messages right out of the box. However, I  
am still
a bit unsure about how to train it.

I get a lot of spam messages like that, which SA does not recognize  
as spam.
(this one, e..g, got score 0.0)
================================================

Here I wanted to add the original mail, but when I did so, my posting  
was
rejected from the list server with 552 spam score (19.7) exceeded  
threshold.
So this is on the one hand good news, since I now know, that SA can be
trained to catch it, but how can I tell you how this spam message looks
like. I'll try to describe:
The first part contains 8 lines, holding the names of the pills that  
help
male human beings ... ;-)  - you know?
Then follows a line with " all 50 % off"  and an URL where to order,
and  this is followed by  a paragraph with pretty "normal" text.
I try to keep this text here, since I assume this should not trigger the
score anyhow.

 From the original mail:

"If ever you are passing my way, said Bilbo, dont wait to knock!
Tea is at four; but any of you are welcome at any time!
Then he turned away.
The elf-host was on the march;. and if it was sadly lessened, yet
many were glad, for now the northern world would be merrier for many a
long day. The dragon was dead, and the goblins overthrown, and their"

=================================================

I am now wondering, whether there is even a chance for SA to catch  
this, if I
train this message, or whether training messages like this is not  
even a good
idea, because the "normal text" at the end would confuse the bayesian  
corpus
more than it would help.  As I mentioned above, it seems that it can  
be trained,
but how?  I trained already a lot of messages like that with sa- 
train, but
the score is still 0.0.

Thanks for any hint you can provide how to best deal with this kind  
of messages.

thanks

Christoph Reichenberger
-- 
Christoph Reichenberger - ergonis software gmbh

Re: How to train SpamAssassin to catch this kind of spam

Posted by Christoph Reichenberger <ch...@ergonis.com>.

Dear Anthony Peacok,
dear Sander Holthaus,
dear John DeYoung,

thank you very much for all your answers. I am pretty impressed about  
your helpfulness. Thanks to your feedback, I think I am a big step  
further now. I have completely rebuilt the Bayes database, and I now  
see a file named bayes_journal that did not exist before.
Also, I just found the first couple of mails that had BAYES_99 or  
BAYES_80 in it, which I never saw before. So I assume that my  
bayesian filter just didn't work at all (for whatever reasons) until  
now.

I will now wait and see how it further evolves, will manually train  
it with the spams it didn't catch and with FP hams, and then continue  
to work on additional rules as you suggested (probably Razor and  
URIDNSBL) at first, and then will see to integrate some kind of RBL.

But for now, I feel much better than hours again - thanks to your help!

You made my day!

Thanks again.

Christoph

-- 
Christoph Reichenberger - ergonis software gmbh

Re: How to train SpamAssassin to catch this kind of spam

Posted by Anthony Peacock <a....@chime.ucl.ac.uk>.

Hi Christoph,

Christoph Reichenberger wrote:
> Hi Anthony,
> I order not to swamp the list with my beginner questions too much
> (thanks again BTW, that you take the time for your answers), I am
> writing directly to you.

Please don't do that.  I occasionally reply to list emails, I don't 
provide one to one support.  You will get much better answers that way, 
as more people will be able to offer advice, and if I get an answer 
wrong, someone will be likely to correct my reply.  On top of that all 
list traffic is archived, so any discussions we have on the list may 
help someone else in the future.

I have replied to the list for this message.

> It seems that I have a pretty standard installation.
> When looking to your results file, I see that you are using many
> rules, that my configuration is probably missing.
> How do I enable mangled and other rules you are running (URIBL_*)
> and spamcop?
> Would it be better to have spamcop as a SA-rule or to enable the
> check in the MTA ?

What OS are you running on?

I don't know CommuniGate Pro so the following comments are assuming a 
standard installation on a Linux flavour.

To enable the mangled.cf rules, go to the RulesEmporium web site and 
download the file and place it in your local SpamAssassin directory, on 
my system this is /etc/mail/spamassassin then run spamassassin --lint 
from the command line to make sure there are no errors.  Depending on 
how you call SpamAssassin you may need to restart CommuniGate Pro or spamd.

Mangled.cf =>  http://www.rulesemporium.com/rules/mangled.cf

RulesEmporium => http://www.rulesemporium.com/

Have a look at the other rules on the Rules Emporium web site, the ones 
I use are:

70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_evilnum0.cf
70_sare_header0.cf
70_sare_html0.cf
70_sare_obfu0.cf
70_sare_random.cf
70_sare_specific.cf
70_sare_stocks.cf
99_sare_fraud_post25x.cf
chickenpox.cf
mangled.cf
weeds_2.cf

Read the descriptions on the web site and make your own decision about 
their usefulness to you.

To enable the URL blacklits look in the init.pre or v310.pre files in 
the same directory and make sure that the following line is uncommented:

loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

I don't use SpamCop so can't comment about that.  But I don't generally 
switch on RBLs at the MTA level.  I am not happy trusting a RBL 
completely to block at the MTA level, I like the fact that SA allows me 
to build a fuller picture with a number of indicators, and that I can 
check the Junk Folder to weed out any false positives.  Having said that 
my mail server is fairly low volume (~3,000 msgs per day), blocking at 
the MTA would reduce the amount of resources scanning takes up on the 
server.

> The SA documentation said that special rules would not be necessary,
> so I tried to just start with the standard configuration.

SA out of the box does a very good job.  But everybodies mail feed and 
concept of ham/spam is different so some tweaking will be neccessary to 
get the absolutely best results for any particular local configuration.

When I first set up SA I used the basic rule set and manually trained 
the Bayes database.  This worked really well hitting about 85% of spam 
correctly.  With some tweaking and the addition of SARE rules I now run 
at 99%+.

> I also reset the bayes files and turned autolearn off for now.

Manually train your Bayes with at least 200 sample ham and 200 sample 
spam messages.  Your Bayes wasn't even working for that last email.  My 
Bayes works really well at the moment.  Once you have the Bayes system 
working well manually you can then turn on auto-learning, but change the 
auto-learning thresholds to something like:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0

You can add these lines into a file called local.cf found in the 
/etc/mail/spamassassin directory.

Also you could do a lot worse than have a look at the wiki:

Also you

Remember, all of the advise about files and directory assumes a standard 
SpamAssassin install.  Your installation may be slightly different.

> How would you recommend to continue now?
> 
> Thanks a lot
> 
> christoph
> 
> On 05.06.2006, at 14:08, Anthony Peacock wrote:
> 
>> Hi,
>>
>> Christoph Reichenberger wrote:
>>> Hi,
>>> thank you so much for your prompt reply and for your offer to look 
>>> into this and help me.
>>> I have saved the full message in a text file and put it at:
>>>   http://www.ergonis.com/downloads/public/TheSpamMessage.txt
>>> Also, I even saw in the header that it Autolearned it as HAM - so 
>>> this may be even worse, isn't it?
>>> Any help is highly appreciated.
>>
>> OK!  I ran that through my SpamAssassin system and got the following
>> results:
>>
>> http://www.chime.ucl.ac.uk/~rmhiajp/SA-results-20060605a.txt
>>
>>
>> This shows me that my Bayes system was 99-100% sure the message was
>> spam, and the message was hit by a load of network tests, and a rule 
>> from the mangled.cf file.
>>
>> I don't use ComumniGate so I can't really advise on how to configure
>> your specific set up, but it looks to me as if you should switch on some
>> network tests and get Bayes working.
>>
>> The mangled.cf file can be found here:
>>
>> http://www.rulesemporium.com/rules/mangled.cf
>>
>>
>> <SNIP>
>>
>> --Anthony Peacock
>> CHIME, Royal Free & University College Medical School
>> WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
>> "The problem with defending the purity of the English language is that
>> English is about as pure as a cribhouse whore. We don't just borrow
>> words; on occasion, English has pursued other languages down alleyways
>> to beat them unconscious and rifle their pockets for new
>> vocabulary."  -- James D. Nicoll
>>
>>
>>
>>
> 
> --Christoph Reichenberger - ergonis software gmbh
> 
> 
> 
> 

-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll

Re: How to train SpamAssassin to catch this kind of spam

Posted by Anthony Peacock <a....@chime.ucl.ac.uk>.

Hi,

Christoph Reichenberger wrote:
> Hi,
> 
> thank you so much for your prompt reply and for your offer to look into 
> this and help me.
> I have saved the full message in a text file and put it at:
>   http://www.ergonis.com/downloads/public/TheSpamMessage.txt
> 
> Also, I even saw in the header that it Autolearned it as HAM - so this 
> may be even worse, isn't it?
> 
> Any help is highly appreciated.

OK!  I ran that through my SpamAssassin system and got the following
results:

http://www.chime.ucl.ac.uk/~rmhiajp/SA-results-20060605a.txt

This shows me that my Bayes system was 99-100% sure the message was
spam, and the message was hit by a load of network tests, and a rule 
from the mangled.cf file.

I don't use ComumniGate so I can't really advise on how to configure
your specific set up, but it looks to me as if you should switch on some
network tests and get Bayes working.

The mangled.cf file can be found here:

http://www.rulesemporium.com/rules/mangled.cf

<SNIP>

-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll

[RESEND] Re: How to train SpamAssassin to catch this kind of spam

Posted by Sander Holthaus <in...@orangexl.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Christoph Reichenberger wrote:
> Hi,
>
> thank you so much for your prompt reply and for your offer to look
> into this and help me. I have saved the full message in a text file
> and put it at:
> http://www.ergonis.com/downloads/public/TheSpamMessage.txt
>
> Also, I even saw in the header that it Autolearned it as HAM - so
> this may be even worse, isn't it?
>
> Any help is highly appreciated.
>
> Thank you
>
> Christoph
>
> On 05.06.2006, at 13:24, Anthony Peacock wrote:
>
>> Hi,
>>
>> Sander Holthaus wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christoph
>>> Reichenberger wrote:
>>>> Hi,
>>>>
>>>> I am pretty new to SpamAssassin, so I apologize, if this has
>>>> been posted in the past. I have SA integrated in Communigate
>>>> Pro with CGPSA, and it has already started to filter out a
>>>> lot of spam messages right out of the box. However, I am
>>>> still a bit unsure about how to train it.
>>>>
>>>> I get a lot of spam messages like that, which SA does not
>>>> recognize as spam. (this one, e..g, got score 0.0)
> [... snip ...]
>
>>> Training Bayes to catch there kind of messages is difficult.
>>> Your best bets are to use some rule-sets from SARE
>>> (www.rulesemporium.com) and make sure you use several network
>>> tests (rbl's, surbl's, dcc, razor, pyzor).
>>
>> Actually training Bayes for these can be very easy.
>>
>> To work out the reason that your system is not catching these we
>> would need to see the full email message including the original
>> headers.
>>
>> If you can save the full message as a text file and put it
>> somewhere on a web site, people here will be able to tell you
>> exactly which rules should catch the spam.
>>
>>
>> --Anthony Peacock CHIME, Royal Free & University College Medical
>> School WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/ "The problem
>> with defending the purity of the English language is that English
>> is about as pure as a cribhouse whore. We don't just borrow
>> words; on occasion, English has pursued other languages down
>> alleyways to beat them unconscious and rifle their pockets for
>> new vocabulary."  -- James D. Nicoll
>
> --Christoph Reichenberger - ergonis software gmbh
>
>
>
>
Content analysis details:   (31.7 points, 10.0 required)

(unfortunately, the mailling-list does not allow me to include the
detailed report)

Auto-learning should only be enabled with extreme care because there
is indeed a high probability that spam may be learned as ham and
vice-versa.

Kind Regards,
Sander Holthaus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)

iD8DBQFEhB2VVf373DysOTURAjsFAJ0aje5ESF1efBCTIAF9AMkLyvxuzgCgs5ts
UCMAglQ7yQF8jHcHNL+E6Oo=
=pxsw
-----END PGP SIGNATURE-----

Re: URIDNSBL does not work

Posted by Kai Schaetzl <ma...@conactive.com>.

Theo Van Dinter wrote on Mon, 5 Jun 2006 12:42:13 -0400:

> btw, "spamassassin --lint -D uridnsbl" will just output the uridnsbl 
> stuff. :)

Thanks for the info, Theo!

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com

Re: URIDNSBL does not work

Posted by Theo Van Dinter <fe...@apache.org>.

On Mon, Jun 05, 2006 at 06:38:51PM +0200, maillists@conactive.com wrote:
> You may want to run a simple debug:
> 
> spamassassin -D --lint
> 
> and look if URIDNSBL gets used and throws no errors.

btw, "spamassassin --lint -D uridnsbl" will just output the uridnsbl
stuff. :)

-- 
Randomly Generated Tagline:
Borg Mac!  The burger that eats you!

Re: URIDNSBL does not work

Posted by ma...@conactive.com.

>
From: "Kai Schaetzl" <ma...@conactive.com>
Reply-To: users@spamassassin.apache.org
X-Rcpt-To: <us...@spamassassin.apache.org>

Christoph Reichenberger wrote on Mon, 5 Jun 2006 18:30:53 +0200:

> I already received a couple of spams that got BAYES_99, but got 
> a total of less than 5. All these mails are so suspicious that I 
> assume that they should get additional points from URIDNSBL. 

Do they contain URI's? Then check one of the "untagged" URI on the SURBL 
and URIBL pages.

> b) and made sure that  RBL checks are performed, i.e. 
> skip_rbl_checks 0 

URIDNSBL are not affected by this. (= they are carried out with 1 
nonetheless)

>  
> c) made sure that the firewall allows connections from the mail   
> server to the WAN 

You may want to run a simple debug:

spamassassin -D --lint

and look if URIDNSBL gets used and throws no errors. BTW: here's the 
spamassassin documentation
http://spamassassin.apache.org/doc.html
and here's the wiki
http://wiki.apache.org/spamassassin/
that has everything the docs don't have.



Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com

URIDNSBL does not work

Posted by Christoph Reichenberger <ch...@ergonis.com>.

Hi,

it's me once again. After all your help my BAYES is now running fine.
Although I said, I wanted to wait a couple of days, the BAYES is
running so fine now that I could not stand to go further.
I already received a couple of spams that got BAYES_99, but got
a total of less than 5. All these mails are so suspicious that I
assume that they should get additional points from URIDNSBL.

So I tried to activate it by doing the following:

a) uncommented the following line in init.pre (in fact it was even  
uncommented in the default installation);
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

b) and made sure that  RBL checks are performed, i.e.
skip_rbl_checks 0

c) made sure that the firewall allows connections from the mail  
server to the WAN

But no message get's tagged with URIDNSBL points.


Is there anything else to do in order enable URIDNSBL?

P.S.  I promise - if I get this running, I won't bother you anymore  
(today) ;-).

Thanks

Christoph
-- 
Christoph Reichenberger - ergonis software gmbh

Re: How to train SpamAssassin to catch this kind of spam

Posted by Sander Holthaus <in...@orangexl.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Christoph Reichenberger wrote:
> Hi Sander,
>
> thanks a lot for your analysis. Since you sent your original message
> with CC to my
> address, I even got the detailed report. Thanks a lot! But, what can
> I learn from that.
> I see that you have  Razor enabled as well as a couple of RBL servers,
> what I have not.  But even without those points, it would reach a
> high score in
> your configuration.
>
> So the question is:  What is wrong in my configuration?
> Where should I look?
>
> Another question: I had spamcop based blocking enabled in my MTA
> (Communigate Pro) before I installed SpamAssassin, but disabled it
> temporary to make sure it would not conflict. Should I enable it again
> in Communigate Pro, or as part of SpamAssassin. If your answer is
> the 2nd choice, how do I do that?
>
> BTW, I have already disabled auto-learning for now.
> Should i delete the whole database and start from scratch?
>
> thanks a lot for your help.
>
> christoph
>
For me, Razor recognizes the most spam, followed by DCC and Pyzor
(don't have up-to-date statistics, bu used to be ~ 80, 50 and 30%
recognition). IMHO SURBL are an important part of SpamAssassin as they
have a low FP-rate and high recognition (again, no up to date
statistics, but used to be ~85%).

Perhaps you could post your configuration somewhere? Getting
SpamAssasssin configuration right takes some time and depends on the
traffic you receive. In my case for example, I have a few accounts
that receive legitimate mail from Africa and Asia which at times have
poor english spelling tripping a lot of rules, hence I run with a
threshold of 10 (probably will be incresing it to 11 in near future,
depending on the results 3.1.2 gives me).

I personally use several RBL's on my MTA (virbl.dnsbl.bit.nl and
sbl-xbl.spamhaus.org) to make sure SpamAssassin and ClamAV
(virusscanner) don't get overloaded with mail from IP's that are
almost certain to be sending out junk (mind you that it may still
cause FP's), but again, your mileage may vary.  Because I already use
those two, I don't use Spamcop as an RBL in my MTA. In your case, you
can go either way (perhaps someone in this list can has some stats
about Spamcop's performance?).

Last, yes, I would remove the database and start over. A poisoned
Bayes-database will get you in trouble. Before enabling auto-learn,
make sure your spamassassin configuration performs well without Bayes.
I would also suggest to keep an eye on what is auto-learned the first
few days / weeks (depends on the traffic you receive) though that may
or may not be possible due to the privacy of your customers.

Kind Regards,
Sander Holthaus

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)

iD8DBQFEhCdpVf373DysOTURAk0+AJ47VWc4+AlitvZdFEaItVplFsHnkgCgpKrP
krotmFjsnRbAsNGg7Aihyiw=
=Qb2H
-----END PGP SIGNATURE-----

Re: How to train SpamAssassin to catch this kind of spam

Posted by Christoph Reichenberger <ch...@ergonis.com>.

Hi Sander,

thanks a lot for your analysis. Since you sent your original message  
with CC to my
address, I even got the detailed report. Thanks a lot! But, what can  
I learn from that.
I see that you have  Razor enabled as well as a couple of RBL servers,
what I have not.  But even without those points, it would reach a  
high score in
your configuration.

So the question is:  What is wrong in my configuration?
Where should I look?

Another question: I had spamcop based blocking enabled in my MTA
(Communigate Pro) before I installed SpamAssassin, but disabled it
temporary to make sure it would not conflict. Should I enable it again
in Communigate Pro, or as part of SpamAssassin. If your answer is
the 2nd choice, how do I do that?

BTW, I have already disabled auto-learning for now.
Should i delete the whole database and start from scratch?

thanks a lot for your help.

christoph

On 05.06.2006, at 13:55, Sander Holthaus wrote:
>>
>>
[...snip...]

>
> Auto-learning should only be enabled with extreme care because there
> is indeed a high probability that spam may be learned as ham and
> vice-versa.
>
> Kind Regards,
> Sander Holthaus
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (MingW32)
>
> iD8DBQFEhBupVf373DysOTURAiYZAKDPNpaGGoXK36AZIhoOjJfAxMOwhwCfYyDa
> rxVRvZ3iMUToa2W9rlJx9fk=
> =o61L
> -----END PGP SIGNATURE-----
>

-- 
Christoph Reichenberger - ergonis software gmbh

Re: How to train SpamAssassin to catch this kind of spam

Posted by Christoph Reichenberger <ch...@ergonis.com>.

Hi,

thank you so much for your prompt reply and for your offer to look  
into this and help me.
I have saved the full message in a text file and put it at:
   http://www.ergonis.com/downloads/public/TheSpamMessage.txt

Also, I even saw in the header that it Autolearned it as HAM - so  
this may be even worse, isn't it?

Any help is highly appreciated.

Thank you

Christoph

On 05.06.2006, at 13:24, Anthony Peacock wrote:

> Hi,
>
> Sander Holthaus wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>  Christoph Reichenberger wrote:
>>> Hi,
>>>
>>> I am pretty new to SpamAssassin, so I apologize, if this has been
>>> posted in the past. I have SA integrated in Communigate Pro with
>>> CGPSA, and it has already started to filter out a lot of spam
>>> messages right out of the box. However, I am still a bit unsure
>>> about how to train it.
>>>
>>> I get a lot of spam messages like that, which SA does not recognize
>>>  as spam. (this one, e..g, got score 0.0)
[... snip ...]

>> Training Bayes to catch there kind of messages is difficult. Your  
>> best
>> bets are to use some rule-sets from SARE (www.rulesemporium.com) and
>> make sure you use several network tests (rbl's, surbl's, dcc, razor,
>> pyzor).
>
> Actually training Bayes for these can be very easy.
>
> To work out the reason that your system is not catching these we  
> would need to see the full email message including the original  
> headers.
>
> If you can save the full message as a text file and put it  
> somewhere on a web site, people here will be able to tell you  
> exactly which rules should catch the spam.
>
>
> -- 
> Anthony Peacock
> CHIME, Royal Free & University College Medical School
> WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
> "The problem with defending the purity of the English language is that
> English is about as pure as a cribhouse whore. We don't just borrow
> words; on occasion, English has pursued other languages down alleyways
> to beat them unconscious and rifle their pockets for new
> vocabulary."  -- James D. Nicoll

-- 
Christoph Reichenberger - ergonis software gmbh

Re: How to train SpamAssassin to catch this kind of spam

Posted by Anthony Peacock <a....@chime.ucl.ac.uk>.

Hi,

Sander Holthaus wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>  
> Christoph Reichenberger wrote:
>> Hi,
>>
>> I am pretty new to SpamAssassin, so I apologize, if this has been
>> posted in the past. I have SA integrated in Communigate Pro with
>> CGPSA, and it has already started to filter out a lot of spam
>> messages right out of the box. However, I am still a bit unsure
>> about how to train it.
>>
>> I get a lot of spam messages like that, which SA does not recognize
>>  as spam. (this one, e..g, got score 0.0)
>> ================================================
>>
>> Here I wanted to add the original mail, but when I did so, my
>> posting was rejected from the list server with 552 spam score
>> (19.7) exceeded threshold. So this is on the one hand good news,
>> since I now know, that SA can be trained to catch it, but how can I
>> tell you how this spam message looks like. I'll try to describe:
>> The first part contains 8 lines, holding the names of the pills
>> that help male human beings ... ;-)  - you know? Then follows a
>> line with " all 50 % off"  and an URL where to order, and  this is
>> followed by  a paragraph with pretty "normal" text. I try to keep
>> this text here, since I assume this should not trigger the score
>> anyhow.
>>
>> From the original mail:
>>
>> "If ever you are passing my way, said Bilbo, dont wait to knock!
>> Tea is at four; but any of you are welcome at any time! Then he
>> turned away. The elf-host was on the march;. and if it was sadly
>> lessened, yet many were glad, for now the northern world would be
>> merrier for many a long day. The dragon was dead, and the goblins
>> overthrown, and their"
>>
>> =================================================
>>
>> I am now wondering, whether there is even a chance for SA to catch
>> this, if I train this message, or whether training messages like
>> this is not even a good idea, because the "normal text" at the end
>> would confuse the bayesian corpus more than it would help.  As I
>> mentioned above, it seems that it can be trained, but how?  I
>> trained already a lot of messages like that with sa-train, but the
>> score is still 0.0.
>>
>> Thanks for any hint you can provide how to best deal with this kind
>>  of messages.
>>
>> thanks
>>
>> Christoph Reichenberger --Christoph Reichenberger - ergonis
>> software gmbh
>>
>>
>>
>>
> Training Bayes to catch there kind of messages is difficult. Your best
> bets are to use some rule-sets from SARE (www.rulesemporium.com) and
> make sure you use several network tests (rbl's, surbl's, dcc, razor,
> pyzor).

Actually training Bayes for these can be very easy.

To work out the reason that your system is not catching these we would 
need to see the full email message including the original headers.

If you can save the full message as a text file and put it somewhere on 
a web site, people here will be able to tell you exactly which rules 
should catch the spam.


-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new
vocabulary."  -- James D. Nicoll

Re: How to train SpamAssassin to catch this kind of spam

Posted by Sander Holthaus <in...@orangexl.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Christoph Reichenberger wrote:
> Hi,
>
> I am pretty new to SpamAssassin, so I apologize, if this has been
> posted in the past. I have SA integrated in Communigate Pro with
> CGPSA, and it has already started to filter out a lot of spam
> messages right out of the box. However, I am still a bit unsure
> about how to train it.
>
> I get a lot of spam messages like that, which SA does not recognize
>  as spam. (this one, e..g, got score 0.0)
> ================================================
>
> Here I wanted to add the original mail, but when I did so, my
> posting was rejected from the list server with 552 spam score
> (19.7) exceeded threshold. So this is on the one hand good news,
> since I now know, that SA can be trained to catch it, but how can I
> tell you how this spam message looks like. I'll try to describe:
> The first part contains 8 lines, holding the names of the pills
> that help male human beings ... ;-)  - you know? Then follows a
> line with " all 50 % off"  and an URL where to order, and  this is
> followed by  a paragraph with pretty "normal" text. I try to keep
> this text here, since I assume this should not trigger the score
> anyhow.
>
> From the original mail:
>
> "If ever you are passing my way, said Bilbo, dont wait to knock!
> Tea is at four; but any of you are welcome at any time! Then he
> turned away. The elf-host was on the march;. and if it was sadly
> lessened, yet many were glad, for now the northern world would be
> merrier for many a long day. The dragon was dead, and the goblins
> overthrown, and their"
>
> =================================================
>
> I am now wondering, whether there is even a chance for SA to catch
> this, if I train this message, or whether training messages like
> this is not even a good idea, because the "normal text" at the end
> would confuse the bayesian corpus more than it would help.  As I
> mentioned above, it seems that it can be trained, but how?  I
> trained already a lot of messages like that with sa-train, but the
> score is still 0.0.
>
> Thanks for any hint you can provide how to best deal with this kind
>  of messages.
>
> thanks
>
> Christoph Reichenberger --Christoph Reichenberger - ergonis
> software gmbh
>
>
>
>
Training Bayes to catch there kind of messages is difficult. Your best
bets are to use some rule-sets from SARE (www.rulesemporium.com) and
make sure you use several network tests (rbl's, surbl's, dcc, razor,
pyzor).

The SpamAssassin wiki and the mailling-list archives should point you
in the right direction.

Kind Regards,
Sander Holthaus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEhBNlVf373DysOTURAr8eAJwI8MrGQot5LQ6uKTzKmWE8EjfhrgCeMuGa
t8eO33atBeVIZQCOt+cTqmc=
=3B6h
-----END PGP SIGNATURE-----