You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Eric Shubert <ej...@shubes.net> on 2014/09/01 04:48:40 UTC
random low contrast text with bayes
I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.
I'd like to strip low contrast text from messages before they're learned
by sa-learn in order to combat this.
1) does anyone have some guidance for building such a filter?
2) Is there perhaps a better way of dealing with this type of spam?
Thanks.
--
-Eric 'shubes'
Re: random low contrast text with bayes [Solved]
Posted by Eric Shubert <ej...@shubes.net>.
On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:
>>> On Sun, 31 Aug 2014, Eric Shubert wrote:
>>>> I've seen an uptick of spam lately with random low contrast (hidden)
>>>> text. This appears to be lowering bayes probabilities.
>
>> On 08/31/2014 10:26 PM, John Hardin wrote:
>>> Learn them as spam. That will tend to eliminate that effect.
>
> On 31.08.14 22:54, Eric Shubert wrote:
>> Been doing that (learning them) for quite a while. I've had that
>> mechanism set up for several years now, and it's working fairly well
>> (after I adjusted the scoring upwards for bayes rules).
>>
>> It appears to me that the hidden text is being randomly generated.
>> Even saw a random function of some sort in there. I presume it's been
>> designed to 'poison' bayes by vitue of the random text (and a sizable
>> amount of it).
>
> note that even the code for low-contrast HTML may be catched as spam...
>
> bayes poisoning has been considered a myth. With good training, and using
> hapaxes (enabled by default) it can even help detecting the spam.
>
John Hardin was instrumental in helping me identify the problem. The
rule for low contrast text wasn't firing with SA v3.3.4. I upgraded to
3.4.0, which appears to have fixed the problem.
Many thanks John!
P.S. I did have to apply a patch to 3.4.0 in order for spamd to function
properly. Sorry I neglected to note the bug number (searching closed
bugs throws an error at this time). The patch can be found here:
https://github.com/QMailToaster/spamassassin/blob/master/v340-util.patch
--
-Eric 'shubes'
Re: random low contrast text with bayes
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On Sun, 31 Aug 2014, Eric Shubert wrote:
>>>I've seen an uptick of spam lately with random low contrast (hidden)
>>>text. This appears to be lowering bayes probabilities.
>On 08/31/2014 10:26 PM, John Hardin wrote:
>>Learn them as spam. That will tend to eliminate that effect.
On 31.08.14 22:54, Eric Shubert wrote:
>Been doing that (learning them) for quite a while. I've had that
>mechanism set up for several years now, and it's working fairly well
>(after I adjusted the scoring upwards for bayes rules).
>
>It appears to me that the hidden text is being randomly generated.
>Even saw a random function of some sort in there. I presume it's been
>designed to 'poison' bayes by vitue of the random text (and a sizable
>amount of it).
note that even the code for low-contrast HTML may be catched as spam...
bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot.
Re: random low contrast text with bayes
Posted by Eric Shubert <ej...@shubes.net>.
On 08/31/2014 10:26 PM, John Hardin wrote:
> On Sun, 31 Aug 2014, Eric Shubert wrote:
>
>> I've seen an uptick of spam lately with random low contrast (hidden)
>> text. This appears to be lowering bayes probabilities.
>
> Learn them as spam. That will tend to eliminate that effect.
>
Been doing that (learning them) for quite a while. I've had that
mechanism set up for several years now, and it's working fairly well
(after I adjusted the scoring upwards for bayes rules).
It appears to me that the hidden text is being randomly generated. Even
saw a random function of some sort in there. I presume it's been
designed to 'poison' bayes by vitue of the random text (and a sizable
amount of it).
Thanks.
--
-Eric 'shubes'
Re: random low contrast text with bayes
Posted by John Hardin <jh...@impsec.org>.
On Sun, 31 Aug 2014, Eric Shubert wrote:
> I've seen an uptick of spam lately with random low contrast (hidden) text.
> This appears to be lowering bayes probabilities.
Learn them as spam. That will tend to eliminate that effect.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
It is criminal to teach a man not to defend himself when he is the
constant victim of brutal attacks. -- Malcolm X (1964)
-----------------------------------------------------------------------
822 days since the first successful private support mission to ISS (SpaceX)