You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Martin Gregorie <ma...@gregorie.org> on 2013/07/01 00:31:28 UTC

Re: LONGWORDS not hitting?

On Sun, 2013-06-30 at 20:44 +0100, RW wrote:
> On Sun, 30 Jun 2013 12:42:53 -0600
> Amir 'CG' Caspi wrote:
> 
> > Hi all,
> > 
> > 	Just got this spam:
> > 
> > http://pastebin.com/KM5paaZ9
> > 
> 
> > (And yes, I know it only hit BAYES_50... I really think these 
> > gibberish strings are confusing Bayes.  
> 
> I don't think Bayes tokenizes html. When I displayed it in claws mail
> (with the dillo plugin) I just saw 4 links. Bayes is just seeing the
> displayed texts from those links and some tokens from the URIs.
> 
Yes. All the textual garbage is in two HTML comments, i.e. between
"<!--" and "-->", so its quite possible that SA's HTML converter would
skip it because the recipient wouldn't see it.

However, its HTML: there are two <body> tags and only one </body> in the
message, so maybe that's why the HTML_TAG_BALANCE_BODY rule fired? 


Martin