You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Eric Shubert <ej...@shubes.net> on 2014/09/01 04:48:40 UTC

random low contrast text with bayes

I've seen an uptick of spam lately with random low contrast (hidden) 
text. This appears to be lowering bayes probabilities.

I'd like to strip low contrast text from messages before they're learned 
by sa-learn in order to combat this.

1) does anyone have some guidance for building such a filter?

2) Is there perhaps a better way of dealing with this type of spam?

Thanks.

-- 
-Eric 'shubes'


Re: random low contrast text with bayes [Solved]

Posted by Eric Shubert <ej...@shubes.net>.
On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:
>>> On Sun, 31 Aug 2014, Eric Shubert wrote:
>>>> I've seen an uptick of spam lately with random low contrast (hidden)
>>>> text. This appears to be lowering bayes probabilities.
>
>> On 08/31/2014 10:26 PM, John Hardin wrote:
>>> Learn them as spam. That will tend to eliminate that effect.
>
> On 31.08.14 22:54, Eric Shubert wrote:
>> Been doing that (learning them) for quite a while. I've had that
>> mechanism set up for several years now, and it's working fairly well
>> (after I adjusted the scoring upwards for bayes rules).
>>
>> It appears to me that the hidden text is being randomly generated.
>> Even saw a random function of some sort in there. I presume it's been
>> designed to 'poison' bayes by vitue of the random text (and a sizable
>> amount of it).
>
> note that even the code for low-contrast HTML may be catched as spam...
>
> bayes poisoning has been considered a myth. With good training, and using
> hapaxes (enabled by default) it can even help detecting the spam.
>

John Hardin was instrumental in helping me identify the problem. The 
rule for low contrast text wasn't firing with SA v3.3.4. I upgraded to 
3.4.0, which appears to have fixed the problem.

Many thanks John!

P.S. I did have to apply a patch to 3.4.0 in order for spamd to function 
properly. Sorry I neglected to note the bug number (searching closed 
bugs throws an error at this time). The patch can be found here:
https://github.com/QMailToaster/spamassassin/blob/master/v340-util.patch

-- 
-Eric 'shubes'


Re: random low contrast text with bayes

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On Sun, 31 Aug 2014, Eric Shubert wrote:
>>>I've seen an uptick of spam lately with random low contrast (hidden)
>>>text. This appears to be lowering bayes probabilities.

>On 08/31/2014 10:26 PM, John Hardin wrote:
>>Learn them as spam. That will tend to eliminate that effect.

On 31.08.14 22:54, Eric Shubert wrote:
>Been doing that (learning them) for quite a while. I've had that 
>mechanism set up for several years now, and it's working fairly well 
>(after I adjusted the scoring upwards for bayes rules).
>
>It appears to me that the hidden text is being randomly generated. 
>Even saw a random function of some sort in there. I presume it's been 
>designed to 'poison' bayes by vitue of the random text (and a sizable 
>amount of it).

note that even the code for low-contrast HTML may be catched as spam...

bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot. 

Re: random low contrast text with bayes

Posted by Eric Shubert <ej...@shubes.net>.
On 08/31/2014 10:26 PM, John Hardin wrote:
> On Sun, 31 Aug 2014, Eric Shubert wrote:
>
>> I've seen an uptick of spam lately with random low contrast (hidden)
>> text. This appears to be lowering bayes probabilities.
>
> Learn them as spam. That will tend to eliminate that effect.
>

Been doing that (learning them) for quite a while. I've had that 
mechanism set up for several years now, and it's working fairly well 
(after I adjusted the scoring upwards for bayes rules).

It appears to me that the hidden text is being randomly generated. Even 
saw a random function of some sort in there. I presume it's been 
designed to 'poison' bayes by vitue of the random text (and a sizable 
amount of it).

Thanks.
-- 
-Eric 'shubes'


Re: random low contrast text with bayes

Posted by John Hardin <jh...@impsec.org>.
On Sun, 31 Aug 2014, Eric Shubert wrote:

> I've seen an uptick of spam lately with random low contrast (hidden) text. 
> This appears to be lowering bayes probabilities.

Learn them as spam. That will tend to eliminate that effect.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It is criminal to teach a man not to defend himself when he is the
   constant victim of brutal attacks.              -- Malcolm X (1964)
-----------------------------------------------------------------------
  822 days since the first successful private support mission to ISS (SpaceX)