You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Martin Gregorie <ma...@gregorie.org> on 2013/07/01 00:31:28 UTC
Re: LONGWORDS not hitting?
On Sun, 2013-06-30 at 20:44 +0100, RW wrote:
> On Sun, 30 Jun 2013 12:42:53 -0600
> Amir 'CG' Caspi wrote:
>
> > Hi all,
> >
> > Just got this spam:
> >
> > http://pastebin.com/KM5paaZ9
> >
>
> > (And yes, I know it only hit BAYES_50... I really think these
> > gibberish strings are confusing Bayes.
>
> I don't think Bayes tokenizes html. When I displayed it in claws mail
> (with the dillo plugin) I just saw 4 links. Bayes is just seeing the
> displayed texts from those links and some tokens from the URIs.
>
Yes. All the textual garbage is in two HTML comments, i.e. between
"<!--" and "-->", so its quite possible that SA's HTML converter would
skip it because the recipient wouldn't see it.
However, its HTML: there are two <body> tags and only one </body> in the
message, so maybe that's why the HTML_TAG_BALANCE_BODY rule fired?
Martin