You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kenneth Porter <sh...@sewingwitch.com> on 2005/10/08 18:43:03 UTC

Explosion in uk.geocities.com spam

Lately I've been seeing quite a bit of uncaught spam with a link to 
uk.geocities.com.  Using 3.1.0 release with net tests. Here's my "uncaught" 
(false negatives) folder for October (which I feed nightly into sa-learn):

<http://home.sewingwitch.com:8000/Stuff/Uncaught-200510.mbox>

Re: Explosion in uk.geocities.com spam

Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Hi!

> Same here, so I decided to add some points to uk.geocities.com-URLs. I'm 
> assigning high scores if such a link is on a line by itself and low scores 
> otherwise. Not really happy, tough.
>
> Harder to catch are some spams which start with some greeting, thin give a 
> short table of drugs (the V-one, the C-one,etc.) and politely say goodbye.

So are the SARE rule to catch up with those.

http://www.rulesemporium.com/rules/70_sare_specific.cf

> They use html and tables very smart, thus avoiding Bayes rules. 
> Basically it is an invisible tables, using one row and several columns. 
> The first column contains the first letter of every line, separated by 
> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains 
> several more characters for each line, etc.
>
> Bayes and text-matching currently is quite useless and DNS-Blacklists or 
> SURBL hit rarely. I'm quite lost....

Add the ruleset above, pretty effective. On my end 90% hits SURBL still 
however.

Bye,
Raymond.


Re: Explosion in uk.geocities.com spam

Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Mon, Oct 10, 2005 at 04:21:50PM +0200, Maurice Lucas wrote:
> Matthew Newton wrote:
> >On the assumption that "normal" URLs don't use the construct /? in
> >them, and especially at geocities (are CGI scripts even allowed
> >there?) how about the following?
> >
> >full      UOLCC_UKGEO
> >/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
> >describe  UOLCC_UKGEO UK Geocities exploitation
> >score     UOLCC_UKGEO 4.0
> 
> I saw somebody else use
> uri  UK_GEOCITIES   m'^http://uk\.geocities\.com\b'i
> describe UK_GEOCITIES Body contains spammed domain
> score   UK_GEOCITIES 3.0

This only checks the domain name, so I guess it may have fairly
high FPs (depends on how often people send legit e-mail with those
domains in, I suppose).

I wrote my rule specifically to check for the domain name and the
use of "/?", which (I believe) should not occur in normal usage.
Someone on list will probably prove me wrong, though... ;-)

Thanks

Matthew


-- 
Matthew Newton <mc...@le.ac.uk>

UNIX and e-mail Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom

Re: Explosion in uk.geocities.com spam

Posted by An...@blm.gov.



We've been using this with great success:

uri GEOCITIES_SPAM
m'^https?://[a-z]+\.geocities\.com/([a-z]+/)+[?][a-z]+'i

Andrew Hoying



                                                                           
             "Maurice Lucas"                                               
             <mslucas@taos-it.                                             
             nl>                                                        To 
                                       "Loren Wilton"                      
             10/10/2005 08:21          <lw...@earthlink.net>, "Matthew   
             AM                        Newton" <mc...@leicester.ac.uk>      
                                                                        cc 
                                       <us...@spamassassin.apache.org>     
                                                                   Subject 
                                       Re: Explosion in uk.geocities.com   
                                       spam                                
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Matthew Newton wrote:
> On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
>>> They use html and tables very smart, thus avoiding Bayes rules.
>>> Basically it is an invisible tables, using one row and several
>>> columns. The first column contains the first letter of every line,
>>> separated by "<BR>" and optionally some style-tags (b, i, etc.).
>>> Next column contains several more characters for each line, etc.
>>
>> Leo.  There are a good 9 or 10 variations on this now.  The SARE
>> rulesets have a number of rules that catch many of these, though not
>> all of them.
>
> On the assumption that "normal" URLs don't use the construct /? in
> them, and especially at geocities (are CGI scripts even allowed
> there?) how about the following?
>
> full      UOLCC_UKGEO
>
/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/

> describe  UOLCC_UKGEO UK Geocities exploitation
> score     UOLCC_UKGEO 4.0
>
> I've been testing this for a couple of weeks now, and have had no
> complaints yet (but I do not have a corpus of spam to test it
> with, though, so can't be too sure).
>
> It could possibly also be condensed to the following (completely
> untested):
>
> full      UOLCC_UKGEO
> /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/

I saw somebody else use
uri  UK_GEOCITIES   m'^http://uk\.geocities\.com\b'i
describe UK_GEOCITIES Body contains spammed domain
score   UK_GEOCITIES 3.0
uri  MSN_SPACES  m'^http://spaces\.msn\.com\/members\b'i
describe MSN_SPACES Body contains spammed domain
score   MSN_SPACES 3.0
uri  IT_GEOCITIES   m'^http://it\.geocities\.com\b'i
describe IT_GEOCITIES Body contains spammed domain
score   IT_GEOCITIES 3.0

PLEASE NOTE: I haven't used it myself so I don't know the FP count of these

rules

With kind regards,
Met vriendelijke groet,

Maurice Lucas
TAOS-IT




Re: Explosion in uk.geocities.com spam

Posted by Maurice Lucas <ms...@taos-it.nl>.
Matthew Newton wrote:
> On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
>>> They use html and tables very smart, thus avoiding Bayes rules.
>>> Basically it is an invisible tables, using one row and several
>>> columns. The first column contains the first letter of every line,
>>> separated by "<BR>" and optionally some style-tags (b, i, etc.).
>>> Next column contains several more characters for each line, etc.
>>
>> Leo.  There are a good 9 or 10 variations on this now.  The SARE
>> rulesets have a number of rules that catch many of these, though not
>> all of them.
>
> On the assumption that "normal" URLs don't use the construct /? in
> them, and especially at geocities (are CGI scripts even allowed
> there?) how about the following?
>
> full      UOLCC_UKGEO
> /http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
> describe  UOLCC_UKGEO UK Geocities exploitation
> score     UOLCC_UKGEO 4.0
>
> I've been testing this for a couple of weeks now, and have had no
> complaints yet (but I do not have a corpus of spam to test it
> with, though, so can't be too sure).
>
> It could possibly also be condensed to the following (completely
> untested):
>
> full      UOLCC_UKGEO
> /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/

I saw somebody else use
uri  UK_GEOCITIES   m'^http://uk\.geocities\.com\b'i
describe UK_GEOCITIES Body contains spammed domain
score   UK_GEOCITIES 3.0
uri  MSN_SPACES  m'^http://spaces\.msn\.com\/members\b'i
describe MSN_SPACES Body contains spammed domain
score   MSN_SPACES 3.0
uri  IT_GEOCITIES   m'^http://it\.geocities\.com\b'i
describe IT_GEOCITIES Body contains spammed domain
score   IT_GEOCITIES 3.0

PLEASE NOTE: I haven't used it myself so I don't know the FP count of these 
rules

With kind regards,
Met vriendelijke groet,

Maurice Lucas
TAOS-IT


Re: Explosion in uk.geocities.com spam

Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
> > They use html and tables very smart, thus avoiding Bayes rules.
> > Basically it is an invisible tables, using one row and several columns.
> > The first column contains the first letter of every line, separated by
> > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> > several more characters for each line, etc.
> 
> Leo.  There are a good 9 or 10 variations on this now.  The SARE rulesets
> have a number of rules that catch many of these, though not all of them.

On the assumption that "normal" URLs don't use the construct /? in
them, and especially at geocities (are CGI scripts even allowed
there?) how about the following?

full      UOLCC_UKGEO /http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
describe  UOLCC_UKGEO UK Geocities exploitation
score     UOLCC_UKGEO 4.0

I've been testing this for a couple of weeks now, and have had no
complaints yet (but I do not have a corpus of spam to test it
with, though, so can't be too sure).

It could possibly also be condensed to the following (completely
untested):

full      UOLCC_UKGEO /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/

Matthew


-- 
Matthew Newton <mc...@le.ac.uk>

UNIX and e-mail Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom

Re: Explosion in uk.geocities.com spam

Posted by MATSUDA Yoh-ichi <yo...@flcl.org>.
Hello.

From: "Loren Wilton" <lw...@earthlink.net>
Subject: Re: Explosion in uk.geocities.com spam
Date: Sat, 8 Oct 2005 22:01:22 -0700

> > They use html and tables very smart, thus avoiding Bayes rules.
> > Basically it is an invisible tables, using one row and several columns.
> > The first column contains the first letter of every line, separated by
> > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> > several more characters for each line, etc.
> 
> Leo.  There are a good 9 or 10 variations on this now.  The SARE rulesets
> have a number of rules that catch many of these, though not all of them.
> 
>         Loren

The "uk.geocities" spams come from "CHINANET" or "CHINA RAILWAY
TELECOMMUNICATIONS CENTER".

You can catch the above two ISP's IP addresses in a header:

header CHINANET Received =~ /from .+(5[89]\.(3[2-9]|[45][0-9]|6[0-3])|60\.1([6-8][0-9]|9[01])|61\.1(2[89]|[3-8][0-9]|9[01])|218\.([0-9]|[12][0-9]|3[01]|5[6-9]|[678][0-9]|9[0-5])|219\.1(2[89]|[345][0-9])|220\.1([678][0-9]|9[01])|222\.(6[4-9]|[78][0-9]|9[0-5]|1(2[89|3[0-9]|4[0-3])))(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){2,2}[\)\] ]/
describe CHINANET Chinanet - large provider in China
score CHINANET 0.5

header CRTC Received =~ /from .+(61\.23[2-7]|222\.(3[2-9]|[45][0-9]|6[0-3]))(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){2,2}[\)\] ]/
describe CRTC CHINA RAILWAY TELECOMMUNICATIONS CENTER
score CRTC 0.5


And, you can catch uk.geo's URI strings in a message body:

body UKGEOCITIES /http:\/\/[a-z]{2,5}\.geocities\.com\/[A-Za-z0-9_]+\/\?{0,1}[A-Za-z0-9_-]+/
describe UKGEOCITIES http://uk.geocities.com/Hoge_Hoge/?Fuga=tekitou
score UKGEOCITIES 0.5

So, you'll be able to catch the "uk.geocities" spams by META rule.

meta CHINAUKGEO (CHINANET || CRTC) && UKGEOCITIES && BAYES_99

--
Nothing but a peace sign.
MATSUDA Yoh-ichi(yoh)
mailto:yoh@flcl.org
http://www.flcl.org/~yoh/diary/ (only Japanese)

Re: Explosion in uk.geocities.com spam

Posted by Nix <ni...@esperi.org.uk>.
On Sat, 8 Oct 2005, Loren Wilton murmured woefully:
>> They use html and tables very smart, thus avoiding Bayes rules.
>> Basically it is an invisible tables, using one row and several columns.
>> The first column contains the first letter of every line, separated by
>> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
>> several more characters for each line, etc.
> 
> Leo.  There are a good 9 or 10 variations on this now.  The SARE rulesets
> have a number of rules that catch many of these, though not all of them.

For me, all of them are caught simply via

uri NIX_GEOCITIES      /^http:\/\/[a-z0-9-]{1,30}\.geocities\.com\b/i
describe NIX_GEOCITIES Contains a URL in the geocities domain.
score NIX_GEOCITIES    1.1

(1.1 being the minimum score required for that plus a Bayes hit to push
it over 5.0.)

(This might not be a good rule if you often get nonspam from Geocities
addresses which your Bayes thinks is spam.)

-- 
`Next: FEMA neglects to take into account the possibility of
fire in Old Balsawood Town (currently in its fifth year of drought
and home of the General Grant Home for Compulsive Arsonists).'
            --- James Nicoll

Re: Explosion in uk.geocities.com spam

Posted by Loren Wilton <lw...@earthlink.net>.
> They use html and tables very smart, thus avoiding Bayes rules.
> Basically it is an invisible tables, using one row and several columns.
> The first column contains the first letter of every line, separated by
> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> several more characters for each line, etc.

Leo.  There are a good 9 or 10 variations on this now.  The SARE rulesets
have a number of rules that catch many of these, though not all of them.

        Loren


Re: Explosion in uk.geocities.com spam

Posted by Patrick von der Hagen <pa...@wudika.de>.
Kenneth Porter schrieb:
> Lately I've been seeing quite a bit of uncaught spam with a link to 
> uk.geocities.com.  Using 3.1.0 release with net tests. Here's my 
Same here, so I decided to add some points to uk.geocities.com-URLs. I'm 
assigning high scores if such a link is on a line by itself and low 
scores otherwise. Not really happy, tough.

Harder to catch are some spams which start with some greeting, thin give 
a short table of drugs (the V-one, the C-one,etc.) and politely say goodbye.

They use html and tables very smart, thus avoiding Bayes rules. 
Basically it is an invisible tables, using one row and several columns. 
The first column contains the first letter of every line, separated by 
"<BR>" and optionally some style-tags (b, i, etc.). Next column contains 
several more characters for each line, etc.

Bayes and text-matching currently is quite useless and DNS-Blacklists or 
SURBL hit rarely. I'm quite lost....
-- 
CU,
    Patrick.