You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Philip Prindeville <ph...@redfish-solutions.com> on 2016/11/01 19:47:30 UTC

Re: Poor performance for rule based on 8-bit chars in supposedly text/plain, 7bit message

> On Oct 31, 2016, at 3:09 PM, RW <rw...@googlemail.com> wrote:
> 
> On Mon, 31 Oct 2016 12:28:27 -0600
> Philip Prindeville wrote:
> 
> 
>>> PP_MIME_FAKE_ASCII_TEXT:  bad, avg S/O=0.62 avg Spam%=0.64 avg
>>> Ham%=0.36  
> 
>> I’m going back through the performance of this rule and I have to say
>> I’m disappointed that it performed so poorly on the general corpus.
>> 
>> It was helping locally, but then I generally get English only texts
>> (and occasionally some French-language text) which are encoded by
>> most MUA’s correctly.
> 
> My guess is that the FPs are from emails that are generated by scripts
> or bespoke software in much the same way as spam is created. This is a
> common problem with rules based on standards violations.
> 
> One thing that you might do is look for hits  on emails that pretend to
> be from an ordinary MUA that would have got it right.
> 

Good point.  Though in at least one email I got that triggered this, I was seeing:

X-Mailer: pMachine/PHP

so there wasn’t any pretending going on there.

-Philip