You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kenneth Porter <sh...@sewingwitch.com> on 2005/10/08 18:43:03 UTC
Explosion in uk.geocities.com spam
Lately I've been seeing quite a bit of uncaught spam with a link to
uk.geocities.com. Using 3.1.0 release with net tests. Here's my "uncaught"
(false negatives) folder for October (which I feed nightly into sa-learn):
<http://home.sewingwitch.com:8000/Stuff/Uncaught-200510.mbox>
Re: Explosion in uk.geocities.com spam
Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Hi!
> Same here, so I decided to add some points to uk.geocities.com-URLs. I'm
> assigning high scores if such a link is on a line by itself and low scores
> otherwise. Not really happy, tough.
>
> Harder to catch are some spams which start with some greeting, thin give a
> short table of drugs (the V-one, the C-one,etc.) and politely say goodbye.
So are the SARE rule to catch up with those.
http://www.rulesemporium.com/rules/70_sare_specific.cf
> They use html and tables very smart, thus avoiding Bayes rules.
> Basically it is an invisible tables, using one row and several columns.
> The first column contains the first letter of every line, separated by
> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> several more characters for each line, etc.
>
> Bayes and text-matching currently is quite useless and DNS-Blacklists or
> SURBL hit rarely. I'm quite lost....
Add the ruleset above, pretty effective. On my end 90% hits SURBL still
however.
Bye,
Raymond.
Re: Explosion in uk.geocities.com spam
Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Mon, Oct 10, 2005 at 04:21:50PM +0200, Maurice Lucas wrote:
> Matthew Newton wrote:
> >On the assumption that "normal" URLs don't use the construct /? in
> >them, and especially at geocities (are CGI scripts even allowed
> >there?) how about the following?
> >
> >full UOLCC_UKGEO
> >/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
> >describe UOLCC_UKGEO UK Geocities exploitation
> >score UOLCC_UKGEO 4.0
>
> I saw somebody else use
> uri UK_GEOCITIES m'^http://uk\.geocities\.com\b'i
> describe UK_GEOCITIES Body contains spammed domain
> score UK_GEOCITIES 3.0
This only checks the domain name, so I guess it may have fairly
high FPs (depends on how often people send legit e-mail with those
domains in, I suppose).
I wrote my rule specifically to check for the domain name and the
use of "/?", which (I believe) should not occur in normal usage.
Someone on list will probably prove me wrong, though... ;-)
Thanks
Matthew
--
Matthew Newton <mc...@le.ac.uk>
UNIX and e-mail Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom
Re: Explosion in uk.geocities.com spam
Posted by An...@blm.gov.
We've been using this with great success:
uri GEOCITIES_SPAM
m'^https?://[a-z]+\.geocities\.com/([a-z]+/)+[?][a-z]+'i
Andrew Hoying
"Maurice Lucas"
<mslucas@taos-it.
nl> To
"Loren Wilton"
10/10/2005 08:21 <lw...@earthlink.net>, "Matthew
AM Newton" <mc...@leicester.ac.uk>
cc
<us...@spamassassin.apache.org>
Subject
Re: Explosion in uk.geocities.com
spam
Matthew Newton wrote:
> On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
>>> They use html and tables very smart, thus avoiding Bayes rules.
>>> Basically it is an invisible tables, using one row and several
>>> columns. The first column contains the first letter of every line,
>>> separated by "<BR>" and optionally some style-tags (b, i, etc.).
>>> Next column contains several more characters for each line, etc.
>>
>> Leo. There are a good 9 or 10 variations on this now. The SARE
>> rulesets have a number of rules that catch many of these, though not
>> all of them.
>
> On the assumption that "normal" URLs don't use the construct /? in
> them, and especially at geocities (are CGI scripts even allowed
> there?) how about the following?
>
> full UOLCC_UKGEO
>
/http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
> describe UOLCC_UKGEO UK Geocities exploitation
> score UOLCC_UKGEO 4.0
>
> I've been testing this for a couple of weeks now, and have had no
> complaints yet (but I do not have a corpus of spam to test it
> with, though, so can't be too sure).
>
> It could possibly also be condensed to the following (completely
> untested):
>
> full UOLCC_UKGEO
> /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/
I saw somebody else use
uri UK_GEOCITIES m'^http://uk\.geocities\.com\b'i
describe UK_GEOCITIES Body contains spammed domain
score UK_GEOCITIES 3.0
uri MSN_SPACES m'^http://spaces\.msn\.com\/members\b'i
describe MSN_SPACES Body contains spammed domain
score MSN_SPACES 3.0
uri IT_GEOCITIES m'^http://it\.geocities\.com\b'i
describe IT_GEOCITIES Body contains spammed domain
score IT_GEOCITIES 3.0
PLEASE NOTE: I haven't used it myself so I don't know the FP count of these
rules
With kind regards,
Met vriendelijke groet,
Maurice Lucas
TAOS-IT
Re: Explosion in uk.geocities.com spam
Posted by Maurice Lucas <ms...@taos-it.nl>.
Matthew Newton wrote:
> On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
>>> They use html and tables very smart, thus avoiding Bayes rules.
>>> Basically it is an invisible tables, using one row and several
>>> columns. The first column contains the first letter of every line,
>>> separated by "<BR>" and optionally some style-tags (b, i, etc.).
>>> Next column contains several more characters for each line, etc.
>>
>> Leo. There are a good 9 or 10 variations on this now. The SARE
>> rulesets have a number of rules that catch many of these, though not
>> all of them.
>
> On the assumption that "normal" URLs don't use the construct /? in
> them, and especially at geocities (are CGI scripts even allowed
> there?) how about the following?
>
> full UOLCC_UKGEO
> /http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
> describe UOLCC_UKGEO UK Geocities exploitation
> score UOLCC_UKGEO 4.0
>
> I've been testing this for a couple of weeks now, and have had no
> complaints yet (but I do not have a corpus of spam to test it
> with, though, so can't be too sure).
>
> It could possibly also be condensed to the following (completely
> untested):
>
> full UOLCC_UKGEO
> /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/
I saw somebody else use
uri UK_GEOCITIES m'^http://uk\.geocities\.com\b'i
describe UK_GEOCITIES Body contains spammed domain
score UK_GEOCITIES 3.0
uri MSN_SPACES m'^http://spaces\.msn\.com\/members\b'i
describe MSN_SPACES Body contains spammed domain
score MSN_SPACES 3.0
uri IT_GEOCITIES m'^http://it\.geocities\.com\b'i
describe IT_GEOCITIES Body contains spammed domain
score IT_GEOCITIES 3.0
PLEASE NOTE: I haven't used it myself so I don't know the FP count of these
rules
With kind regards,
Met vriendelijke groet,
Maurice Lucas
TAOS-IT
Re: Explosion in uk.geocities.com spam
Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Sat, Oct 08, 2005 at 10:01:22PM -0700, Loren Wilton wrote:
> > They use html and tables very smart, thus avoiding Bayes rules.
> > Basically it is an invisible tables, using one row and several columns.
> > The first column contains the first letter of every line, separated by
> > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> > several more characters for each line, etc.
>
> Leo. There are a good 9 or 10 variations on this now. The SARE rulesets
> have a number of rules that catch many of these, though not all of them.
On the assumption that "normal" URLs don't use the construct /? in
them, and especially at geocities (are CGI scripts even allowed
there?) how about the following?
full UOLCC_UKGEO /http:\/\/uk.geocities.com\/[A-Z]?[a-z]{2,20}_[A-Z]?[a-z]{2,20}(?:_[A-Z]?[a-z]{2,20})?\d{0,4}\/\?[\w=\.]{3}/
describe UOLCC_UKGEO UK Geocities exploitation
score UOLCC_UKGEO 4.0
I've been testing this for a couple of weeks now, and have had no
complaints yet (but I do not have a corpus of spam to test it
with, though, so can't be too sure).
It could possibly also be condensed to the following (completely
untested):
full UOLCC_UKGEO /http:\/\/..\.geocities\.com\/[A-Za-z0-9_]{2,40}\/\?[\w=\.]{3}/
Matthew
--
Matthew Newton <mc...@le.ac.uk>
UNIX and e-mail Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom
Re: Explosion in uk.geocities.com spam
Posted by MATSUDA Yoh-ichi <yo...@flcl.org>.
Hello.
From: "Loren Wilton" <lw...@earthlink.net>
Subject: Re: Explosion in uk.geocities.com spam
Date: Sat, 8 Oct 2005 22:01:22 -0700
> > They use html and tables very smart, thus avoiding Bayes rules.
> > Basically it is an invisible tables, using one row and several columns.
> > The first column contains the first letter of every line, separated by
> > "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> > several more characters for each line, etc.
>
> Leo. There are a good 9 or 10 variations on this now. The SARE rulesets
> have a number of rules that catch many of these, though not all of them.
>
> Loren
The "uk.geocities" spams come from "CHINANET" or "CHINA RAILWAY
TELECOMMUNICATIONS CENTER".
You can catch the above two ISP's IP addresses in a header:
header CHINANET Received =~ /from .+(5[89]\.(3[2-9]|[45][0-9]|6[0-3])|60\.1([6-8][0-9]|9[01])|61\.1(2[89]|[3-8][0-9]|9[01])|218\.([0-9]|[12][0-9]|3[01]|5[6-9]|[678][0-9]|9[0-5])|219\.1(2[89]|[345][0-9])|220\.1([678][0-9]|9[01])|222\.(6[4-9]|[78][0-9]|9[0-5]|1(2[89|3[0-9]|4[0-3])))(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){2,2}[\)\] ]/
describe CHINANET Chinanet - large provider in China
score CHINANET 0.5
header CRTC Received =~ /from .+(61\.23[2-7]|222\.(3[2-9]|[45][0-9]|6[0-3]))(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){2,2}[\)\] ]/
describe CRTC CHINA RAILWAY TELECOMMUNICATIONS CENTER
score CRTC 0.5
And, you can catch uk.geo's URI strings in a message body:
body UKGEOCITIES /http:\/\/[a-z]{2,5}\.geocities\.com\/[A-Za-z0-9_]+\/\?{0,1}[A-Za-z0-9_-]+/
describe UKGEOCITIES http://uk.geocities.com/Hoge_Hoge/?Fuga=tekitou
score UKGEOCITIES 0.5
So, you'll be able to catch the "uk.geocities" spams by META rule.
meta CHINAUKGEO (CHINANET || CRTC) && UKGEOCITIES && BAYES_99
--
Nothing but a peace sign.
MATSUDA Yoh-ichi(yoh)
mailto:yoh@flcl.org
http://www.flcl.org/~yoh/diary/ (only Japanese)
Re: Explosion in uk.geocities.com spam
Posted by Nix <ni...@esperi.org.uk>.
On Sat, 8 Oct 2005, Loren Wilton murmured woefully:
>> They use html and tables very smart, thus avoiding Bayes rules.
>> Basically it is an invisible tables, using one row and several columns.
>> The first column contains the first letter of every line, separated by
>> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
>> several more characters for each line, etc.
>
> Leo. There are a good 9 or 10 variations on this now. The SARE rulesets
> have a number of rules that catch many of these, though not all of them.
For me, all of them are caught simply via
uri NIX_GEOCITIES /^http:\/\/[a-z0-9-]{1,30}\.geocities\.com\b/i
describe NIX_GEOCITIES Contains a URL in the geocities domain.
score NIX_GEOCITIES 1.1
(1.1 being the minimum score required for that plus a Bayes hit to push
it over 5.0.)
(This might not be a good rule if you often get nonspam from Geocities
addresses which your Bayes thinks is spam.)
--
`Next: FEMA neglects to take into account the possibility of
fire in Old Balsawood Town (currently in its fifth year of drought
and home of the General Grant Home for Compulsive Arsonists).'
--- James Nicoll
Re: Explosion in uk.geocities.com spam
Posted by Loren Wilton <lw...@earthlink.net>.
> They use html and tables very smart, thus avoiding Bayes rules.
> Basically it is an invisible tables, using one row and several columns.
> The first column contains the first letter of every line, separated by
> "<BR>" and optionally some style-tags (b, i, etc.). Next column contains
> several more characters for each line, etc.
Leo. There are a good 9 or 10 variations on this now. The SARE rulesets
have a number of rules that catch many of these, though not all of them.
Loren
Re: Explosion in uk.geocities.com spam
Posted by Patrick von der Hagen <pa...@wudika.de>.
Kenneth Porter schrieb:
> Lately I've been seeing quite a bit of uncaught spam with a link to
> uk.geocities.com. Using 3.1.0 release with net tests. Here's my
Same here, so I decided to add some points to uk.geocities.com-URLs. I'm
assigning high scores if such a link is on a line by itself and low
scores otherwise. Not really happy, tough.
Harder to catch are some spams which start with some greeting, thin give
a short table of drugs (the V-one, the C-one,etc.) and politely say goodbye.
They use html and tables very smart, thus avoiding Bayes rules.
Basically it is an invisible tables, using one row and several columns.
The first column contains the first letter of every line, separated by
"<BR>" and optionally some style-tags (b, i, etc.). Next column contains
several more characters for each line, etc.
Bayes and text-matching currently is quite useless and DNS-Blacklists or
SURBL hit rarely. I'm quite lost....
--
CU,
Patrick.