You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex Regan <my...@gmail.com> on 2015/05/19 17:11:34 UTC

SEO Spam

Hi,

I'm wondering if anyone is interested in helping to develop a set of 
rules to catch SEO spam? Here's one such example:

http://pastebin.com/S6Jeappj

It's those emails that talk about how they can improve your SEO such as:

..."diverse projects consisting of SEO, PPC, SMM, Affiliate
Marketing, Google Adsense, Blogging, Copy writing, Web Analytic, Local
Search Marketing, Lead Generation, Inbound Marketing, Screen Casting etc"

I've created a series of test rules like:

body	__SEOSPAM1  /Affiliate marketing/i

and meta'd them together, requiring at least three to hit, but it's 
tough keeping up with them all and it's often not enough based just on 
those types of keywords.

I thought it would fit well with other types of fraud rules, such as 
LOTSA_MONEY and others.

Ideas greatly appreciated.
Alex

Re: SEO Spam

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2015-05-19 at 14:38 -0400, Alex Regan wrote:
> I've got more than a dozen now. It's a regular thing. I was just trying 
> to somehow gain support for somehow being more proactive with these.
> 
Here are a couple of ideas that may help. Both use lists of alternate
patterns, i.e.  body RULE /(man|woman|child)/i :

1) if the phrases you're matching fall into groups, such as sales
phrases and product names: 

  'big discounts' available on 'Mickey Mouse Chronometers' 

where I've quoted the candidate sales and product patterns, it may pay
to make each group into a separate list of alternates with a minimum
score and put the punitive score on a meta that requires at least one
hit on each of the groups before it will fire. The benefit of this is
that as the lists get a bit bigger they'll start matching on
combinations that you haven't seen earlier. This approach seems to be
fairly resistant to FPs.

2) If you can't split the matches into categories, consider using a
single list of alternates with the tflags multiple flag set and a
moderate score chosen so that it will only classify the message as spam
if, three or more alternates match. Again, this will hit combinations
you haven't seen earlier, though its probably a bit more FP-prone than
my first suggestion.

The disadvantage of both approaches is that manually editing large
alternate lists is painful. So, I developed a scripted solution, based
on awk/gawk, that lets you keep the list of matching patterns in an
editor-friendly form and generates SA rules from the edited list. Here's
a link to it:
http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz

HTH

Martin





Re: SEO Spam

Posted by Axb <ax...@gmail.com>.
On 19.05.2015 20:38, Alex Regan wrote:
> Hi,
>
> On 05/19/2015 11:40 AM, Reindl Harald wrote:
>>
>> Am 19.05.2015 um 17:11 schrieb Alex Regan:
>>> I'm wondering if anyone is interested in helping to develop a set of
>>> rules to catch SEO spam? Here's one such example:
>>>
>>> http://pastebin.com/S6Jeappj
>>>
>>> It's those emails that talk about how they can improve your SEO such as:
>>>
>>> ..."diverse projects consisting of SEO, PPC, SMM, Affiliate
>>> Marketing, Google Adsense, Blogging, Copy writing, Web Analytic, Local
>>> Search Marketing, Lead Generation, Inbound Marketing, Screen Casting
>>> etc"
>>>
>>> I've created a series of test rules like:
>>>
>>> body    __SEOSPAM1  /Affiliate marketing/i
>>>
>>> and meta'd them together, requiring at least three to hit, but it's
>>> tough keeping up with them all and it's often not enough based just on
>>> those types of keywords.
>>>
>>> I thought it would fit well with other types of fraud rules, such as
>>> LOTSA_MONEY and others
>>
>> they are changing all the time and so hard to catch with rules but over
>> the long bayes training should catch them
>>
>>
>>   5.5 CUST_DNSBL_4           RBL: zen.spamhaus.org (pbl.spamhaus.org)
>>                              [62.11.26.3 listed in zen.spamhaus.org]
>>   5.5 CUST_DNSBL_2           RBL: dnsbl.sorbs.net (dul.dnsbl.sorbs.net)
>>                              [62.11.26.3 listed in dnsbl.sorbs.net]
>>   4.5 CUST_DNSBL_7           RBL: b.barracudacentral.org
>>                              [62.11.26.3 listed in
>> b.barracudacentral.org]
>
> These are the internal bogus addresses I used for this test.
>
>>   7.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
>>                              [score: 1.0000]
>>   0.5 MARKETING_PARTNERS     BODY: Claims you registered with a partner
>>   0.0 HTML_MESSAGE           BODY: HTML included in message
>
> I'm not sure I'd go so far as to change the default bayes to more than
> double the default, and I've found too much ham to change
> MARKETING_MESSAGE here.
>
> Axb wrote:
>  > when you have a dozen or so and add them up, you should get them
>  > easily...
>
> I've got more than a dozen now. It's a regular thing. I was just trying
> to somehow gain support for somehow being more proactive with these.
>
> I'll keep it going, so let me know if you'd like to help or would like
> me to share the rules I have (there's probably 60 or more now).

zip up those rules in one file and send me them OFFLIST!!!!!

I have an idea....



Re: SEO Spam

Posted by Alex Regan <my...@gmail.com>.
Hi,

On 05/19/2015 11:40 AM, Reindl Harald wrote:
>
> Am 19.05.2015 um 17:11 schrieb Alex Regan:
>> I'm wondering if anyone is interested in helping to develop a set of
>> rules to catch SEO spam? Here's one such example:
>>
>> http://pastebin.com/S6Jeappj
>>
>> It's those emails that talk about how they can improve your SEO such as:
>>
>> ..."diverse projects consisting of SEO, PPC, SMM, Affiliate
>> Marketing, Google Adsense, Blogging, Copy writing, Web Analytic, Local
>> Search Marketing, Lead Generation, Inbound Marketing, Screen Casting etc"
>>
>> I've created a series of test rules like:
>>
>> body    __SEOSPAM1  /Affiliate marketing/i
>>
>> and meta'd them together, requiring at least three to hit, but it's
>> tough keeping up with them all and it's often not enough based just on
>> those types of keywords.
>>
>> I thought it would fit well with other types of fraud rules, such as
>> LOTSA_MONEY and others
>
> they are changing all the time and so hard to catch with rules but over
> the long bayes training should catch them
>
>
>   5.5 CUST_DNSBL_4           RBL: zen.spamhaus.org (pbl.spamhaus.org)
>                              [62.11.26.3 listed in zen.spamhaus.org]
>   5.5 CUST_DNSBL_2           RBL: dnsbl.sorbs.net (dul.dnsbl.sorbs.net)
>                              [62.11.26.3 listed in dnsbl.sorbs.net]
>   4.5 CUST_DNSBL_7           RBL: b.barracudacentral.org
>                              [62.11.26.3 listed in b.barracudacentral.org]

These are the internal bogus addresses I used for this test.

>   7.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
>                              [score: 1.0000]
>   0.5 MARKETING_PARTNERS     BODY: Claims you registered with a partner
>   0.0 HTML_MESSAGE           BODY: HTML included in message

I'm not sure I'd go so far as to change the default bayes to more than 
double the default, and I've found too much ham to change 
MARKETING_MESSAGE here.

Axb wrote:
 > when you have a dozen or so and add them up, you should get them
 > easily...

I've got more than a dozen now. It's a regular thing. I was just trying 
to somehow gain support for somehow being more proactive with these.

I'll keep it going, so let me know if you'd like to help or would like 
me to share the rules I have (there's probably 60 or more now).

Thanks,
Alex

Re: SEO Spam

Posted by Reindl Harald <h....@thelounge.net>.
Am 19.05.2015 um 17:11 schrieb Alex Regan:
> I'm wondering if anyone is interested in helping to develop a set of
> rules to catch SEO spam? Here's one such example:
>
> http://pastebin.com/S6Jeappj
>
> It's those emails that talk about how they can improve your SEO such as:
>
> ..."diverse projects consisting of SEO, PPC, SMM, Affiliate
> Marketing, Google Adsense, Blogging, Copy writing, Web Analytic, Local
> Search Marketing, Lead Generation, Inbound Marketing, Screen Casting etc"
>
> I've created a series of test rules like:
>
> body    __SEOSPAM1  /Affiliate marketing/i
>
> and meta'd them together, requiring at least three to hit, but it's
> tough keeping up with them all and it's often not enough based just on
> those types of keywords.
>
> I thought it would fit well with other types of fraud rules, such as
> LOTSA_MONEY and others

they are changing all the time and so hard to catch with rules but over 
the long bayes training should catch them


  5.5 CUST_DNSBL_4           RBL: zen.spamhaus.org (pbl.spamhaus.org)
                             [62.11.26.3 listed in zen.spamhaus.org]
  5.5 CUST_DNSBL_2           RBL: dnsbl.sorbs.net (dul.dnsbl.sorbs.net)
                             [62.11.26.3 listed in dnsbl.sorbs.net]
  4.5 CUST_DNSBL_7           RBL: b.barracudacentral.org
                             [62.11.26.3 listed in b.barracudacentral.org]
  7.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
                             [score: 1.0000]
  0.5 MARKETING_PARTNERS     BODY: Claims you registered with a partner
  0.0 HTML_MESSAGE           BODY: HTML included in message
  0.4 BAYES_999              BODY: Bayes spam probability is 99.9 to 100%
                             [score: 1.0000]


Re: SEO Spam

Posted by Axb <ax...@gmail.com>.
On 19.05.2015 17:11, Alex Regan wrote:
> Hi,
>
> I'm wondering if anyone is interested in helping to develop a set of
> rules to catch SEO spam? Here's one such example:
>
> http://pastebin.com/S6Jeappj
>
> It's those emails that talk about how they can improve your SEO such as:
>
> ..."diverse projects consisting of SEO, PPC, SMM, Affiliate
> Marketing, Google Adsense, Blogging, Copy writing, Web Analytic, Local
> Search Marketing, Lead Generation, Inbound Marketing, Screen Casting etc"
>
> I've created a series of test rules like:
>
> body    __SEOSPAM1  /Affiliate marketing/i
>
> and meta'd them together, requiring at least three to hit, but it's
> tough keeping up with them all and it's often not enough based just on
> those types of keywords.
>
> I thought it would fit well with other types of fraud rules, such as
> LOTSA_MONEY and others.
>
> Ideas greatly appreciated.
> Alex


when you have a dozen or so and add them up, you should get them easily...