You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Jackson <mj...@barking-dog.net> on 2005/08/15 21:33:31 UTC
test for multipart/alternative discrepancies?
I've been getting quite a few spams (which slipped past SA) in the last few
minutes with subject lines like "dies in McDonalds", so I looked at the
message source to see how they were scoring (which I've included below). In
all the cases, the HTML content (at least as displayed in Outlook Express)
was fairly consistent, but the plain text version looked like typical Bayes
poisoning text.
Would it be possible to craft a rule that roughly compares the text/plain
and HTML-stripped text/html versions of a message and scored against them if
the words they contained were significantly different? Or is that
technically infeasible?
Content-Type: text/plain;
Hello,
5. Kislovodsk: Literally `acid waters, a popular resort in t=
he =
`Thats wonderful! Koroviev yelled. Somewhat stunned by his =
chatter,that one could execute such a man. There had been no =
execution! Nocloser, youll see the details.midnight moon. A greenish =
kerchief of night-light fell from the window-sillup still more ... She =
greedily began gulping down caviar.up to the footboard of an A tram =
waiting at a stop, brazenly elbow aside a Here he applauded, but =
quite alone, while a confident smile played onthat might occur at the =
time of the execution in the city of Yershalaim, sospeaking, I had =
nothing more to do, and I lived from one meeting with her toPetrakovs. =
Placing his bulging briefcase on the table, Boba immediately =
putposts?[6]horizon. He did not rejoice in the staggeringly beautiful =
view which openedpaying or free, but even changes countenance at any =
theatrical conversation.what she was going to tell the neighbours the =
next day.phrase:
#########
Content-Type: text/html;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial>A court has sentenced a man to life in jail for the
=
=
bombing of a McDonald's restaurant, which left three people =
dead.</FONT></DIV>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial>The man, Agung Abdul Hamid, was found guilty of =
financing
and co-ordinating the attack.</FONT></DIV>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial><A href=3D"http://www.ildhd.lastrez.com">Read full =
=
story.</A></FONT></DIV>
<DIV> </DIV></BODY></HTML>
Re: test for multipart/alternative discrepancies?
Posted by Matt Kettler <mk...@evi-inc.com>.
Mike Jackson wrote:
> I've been getting quite a few spams (which slipped past SA) in the last
> few minutes with subject lines like "dies in McDonalds", so I looked at
> the message source to see how they were scoring (which I've included
> below). In all the cases, the HTML content (at least as displayed in
> Outlook Express) was fairly consistent, but the plain text version
> looked like typical Bayes poisoning text.
>
Really, I'd be looking into why the messages got past SA. Did it get a decent
BAYES_ score? The bayes "poison" really shouldn't be a problem.
The use of chi-squared combining makes bayes poisoning pretty ineffective as
long as you're training your bayes often and training well.
And by "training well" I specifically mean you must train spam messages
containing "poison" as spam. If you're avoiding training "poison", then you
yourself are making that poison effective.
(Bayes can only be as accurate as its training. If its not getting realistic
training, it won't do well with realistic mail.)
Re: test for multipart/alternative discrepancies?
Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Aug 15, 2005 at 07:04:36PM -0700, Loren Wilton wrote:
> I just want a rule that checks the text/plain part for zero uris and the
> html part for > 0 uris. That would catch 99+% of this trash without trying
> very hard.
FWIW, I put in a test rule for this:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
21255 18255 3000 0.859 0.00 0.00 (all messages)
100.000 85.8857 14.1143 0.859 0.00 0.00 (all messages as %)
21.938 25.5327 0.0667 0.997 0.00 0.01 T_URI_HTML_ONLY
nice. :)
--
Randomly Generated Tagline:
There are no threads in a.b.p.erotica, so there's no gain in using a
threaded news reader.
(Unknown source)
Re: test for multipart/alternative discrepancies?
Posted by Loren Wilton <lw...@earthlink.net>.
> Would it be possible to craft a rule that roughly compares the text/plain
> and HTML-stripped text/html versions of a message and scored against them
if
> the words they contained were significantly different? Or is that
> technically infeasible?
I just want a rule that checks the text/plain part for zero uris and the
html part for > 0 uris. That would catch 99+% of this trash without trying
very hard.
Loren
Re: test for multipart/alternative discrepancies?
Posted by Mike Jackson <mj...@barking-dog.net>.
> On Mon, Aug 15, 2005 at 12:33:31PM -0700, Mike Jackson wrote:
> > Would it be possible to craft a rule that roughly compares the
> > text/plain
> > and HTML-stripped text/html versions of a message and scored against
> > them
> > if the words they contained were significantly different? Or is that
> > technically infeasible?
>
> You mean MPART_ALT_DIFF ? ;)
Well blow me down :) Strange that I didn't see that rule hit on this
message though.
Re: test for multipart/alternative discrepancies?
Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Aug 15, 2005 at 12:33:31PM -0700, Mike Jackson wrote:
> Would it be possible to craft a rule that roughly compares the text/plain
> and HTML-stripped text/html versions of a message and scored against them
> if the words they contained were significantly different? Or is that
> technically infeasible?
You mean MPART_ALT_DIFF ? ;)
--
Randomly Generated Tagline:
"You're not significant until someone complains about you publically."
- Theo Van Dinter
RE: test for multipart/alternative discrepancies?
Posted by Herb Martin <He...@learnquick.com>.
> -----Original Message-----
> From: Mike Jackson [mailto:mjackson@barking-dog.net]
> Sent: Monday, August 15, 2005 2:34 PM
> To: users@spamassassin.apache.org
> Subject: test for multipart/alternative discrepancies?
>
> I've been getting quite a few spams (which slipped past SA)
> in the last few minutes with subject lines like "dies in
> McDonalds", so I looked at the message source to see how they
> were scoring (which I've included below). In all the cases,
> the HTML content (at least as displayed in Outlook Express)
> was fairly consistent, but the plain text version looked like
> typical Bayes poisoning text.
>
> Would it be possible to craft a rule that roughly compares
> the text/plain and HTML-stripped text/html versions of a
> message and scored against them if the words they contained
> were significantly different? Or is that technically infeasible?
Found one in my trap -- SpamAssasssin (3.10rc1) with lots of SARE
and many network tests scored it: 29.2
Bayes only scored it at 50% which was good for only 0.7 points.
Content analysis details: (29.2 points, 6.0 required)
pts rule name description
---- ----------------------
--------------------------------------------------
1.1 SPF_FAIL SPF: sender does not match SPF record (fail)
[SPF failed: Please see
http://spf.pobox.com/why.html?sender=tiffiny%40karta.com%3E%0Atiffiny%40kart
a.com&ip=58.51.205.72&receiver=www.LearnQuick.Com]
3.5 SPF_HELO_FAIL SPF: HELO does not match SPF record (fail)
[SPF failed: Please see
http://spf.pobox.com/why.html?sender=karta.com&ip=58.51.205.72&receiver=www.
LearnQuick.Com]
0.7 MPART_ALT_DIFF_COUNT BODY: HTML and text parts are different
1.0 HTML_MESSAGE BODY: HTML included in message
0.9 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.4999]
0.7 Y_SILLY_SALUTATION RAW: Foobar,+ salutation
1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
above 50%
[cf: 100]
0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
above 50%
[cf: 100]
2.0 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
[cf: 100]
3.7 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
1.5 NO_DNS_FOR_FROM DNS: Envelope sender has no MX or A DNS records
1.6 URIBL_SBL Contains an URL listed in the SBL blocklist
[URIs: lastrez.com]
2.5 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: lastrez.com]
4.5 URIBL_SC2_SURBL Has URI in SC2 at http://www.surbl.org/lists.html
[URIs: lastrez.com]
1.0 DIGEST_MULTIPLE Message hits more than one network digest check
0.9 FM_NO_STYLE FM_NO_STYLE
Subject: ***** SPAM *****_29.2 McDonÂld's bomber jailed
--
Herb Martin