You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Eric Shubert <ej...@shubes.net> on 2014/10/04 18:17:33 UTC

Re: random low contrast text with bayes [Solved]

On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:
>>> On Sun, 31 Aug 2014, Eric Shubert wrote:
>>>> I've seen an uptick of spam lately with random low contrast (hidden)
>>>> text. This appears to be lowering bayes probabilities.
>
>> On 08/31/2014 10:26 PM, John Hardin wrote:
>>> Learn them as spam. That will tend to eliminate that effect.
>
> On 31.08.14 22:54, Eric Shubert wrote:
>> Been doing that (learning them) for quite a while. I've had that
>> mechanism set up for several years now, and it's working fairly well
>> (after I adjusted the scoring upwards for bayes rules).
>>
>> It appears to me that the hidden text is being randomly generated.
>> Even saw a random function of some sort in there. I presume it's been
>> designed to 'poison' bayes by vitue of the random text (and a sizable
>> amount of it).
>
> note that even the code for low-contrast HTML may be catched as spam...
>
> bayes poisoning has been considered a myth. With good training, and using
> hapaxes (enabled by default) it can even help detecting the spam.
>

John Hardin was instrumental in helping me identify the problem. The 
rule for low contrast text wasn't firing with SA v3.3.4. I upgraded to 
3.4.0, which appears to have fixed the problem.

Many thanks John!

P.S. I did have to apply a patch to 3.4.0 in order for spamd to function 
properly. Sorry I neglected to note the bug number (searching closed 
bugs throws an error at this time). The patch can be found here:
https://github.com/QMailToaster/spamassassin/blob/master/v340-util.patch

-- 
-Eric 'shubes'