You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Markus Mayer <mm...@blastwave.org> on 2006/01/01 22:46:40 UTC

Parse and check URIs

Hi,

I am running running Spamassassin (spamd) with Exim 4.50.

When the Geocities spams picked up a couple of months ago, I added
these rules after searching the Internet as to how to check for
specific URIs in message-bodies.

uri GEOCITIES_CHECK1 /^http:\/\/..\.geocities\.com\//
score GEOCITIES_CHECK1  8.0
describe GEOCITIES_CHECK1 GEOCITIES_CHECK1, Body

uri GEOCITIES_CHECK2 /^http:\/\/geocities\.yahoo\.com\...\//
score GEOCITIES_CHECK2  8.0
describe GEOCITIES_CHECK2 GEOCITIES_CHECK2, Body

I also added country-rules like this to mark mails that come from, say
China:

header RCVD_FROM_CHINA eval:check_rbl_txt('country_cn','cn.countries.blackholes.us.')
describe RCVD_FROM_CHINA Received from China
tflags RCVD_FROM_CHINA net
score RCVD_FROM_CHINA 5.0

Since most Geocities mails came from China, these rules combined worked
really well. Unfortunately, now we are at the point where they are no
longer using simple Geocities links. Now it seems to be sites hosted
somewhere in China, with arbitrary domain-names. I have been getting
more and more of those lately. They slip right through Spamassassin
with very low spam-scores.

What I would like to do now is to somewhat combine the two approaches:
parse mail-bodies for URIs (just as it's done with the Geocities
example), then use the IP the link resolves to with the countries
blackhole-list to find out if the site is in China, Korea or any of the
other well-known spammer-countries (as done in the blackhole rules
above) and assign a spam-score based on that.

Is there a way to do that and is it a reasonable thing to do? The
mail-volume is fairly low, so the overhead involved should not be too
bad.

Many thanks.

-Markus

Re: Parse and check URIs

Posted by Derek Harding <de...@innovyx.com>.
Markus Mayer wrote:
> What I would like to do now is to somewhat combine the two approaches:
> parse mail-bodies for URIs (just as it's done with the Geocities
> example), then use the IP the link resolves to with the countries
> blackhole-list to find out if the site is in China, Korea or any of the
> other well-known spammer-countries (as done in the blackhole rules
> above) and assign a spam-score based on that.
>   
I wrote a plugin that checks the country of a URI. It's available at
http://wiki.apache.org/spamassassin/URICountryPlugin

Derek


Re: Parse and check URIs

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, January 1, 2006, 1:46:40 PM, Markus Mayer wrote:
> Now it seems to be sites hosted
> somewhere in China, with arbitrary domain-names. I have been getting
> more and more of those lately. They slip right through Spamassassin
> with very low spam-scores.

Do you have SURBLs enabled?  Most of the bad guy domains should
be on SURBLs.

  http://www.surbl.org/

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: Parse and check URIs

Posted by Markus Mayer <mm...@blastwave.org>.
Hi,

> why? there is no relation between the client and the target (the uri).
> change your rules to account for other URIs.

I'm not sure I follow. I was probably a bit unclear as to what I'm 
trying to achieve.

I want to be able to assign spam-scores based on the country/countries 
hosting the web-server (or servers) mentioned in the body of an e-mail 
(if there are any links in the mail).

I want to assign a spam-score of say 4.0, if the a mail contains a URI 
that resolves to a server in China, so a message like this would be flagged:

----- Sample -----

It's time to start obtaining the right curative and feeling better!
You're not going to believe what I discovered today!

http://<chinese-pharmacy-hoster> [actual URL removed as I don't want to 
help them spread their site, the IP resolves to 218.106.33.103, though]

----- Sample -----

To do what I want, Spamassassin needs to parse the body for http:// 
URIs, resolve the hostname(s), look-up the IP address(es) in the 
blackholes list and assign a spam-score if any.

The question is: Does this work? And how?

Thanks!

-Markus

Re: Parse and check URIs

Posted by mouss <us...@free.fr>.
Markus Mayer a écrit :
> Hi,
> 
> I am running running Spamassassin (spamd) with Exim 4.50.
> 
> When the Geocities spams picked up a couple of months ago, I added
> these rules after searching the Internet as to how to check for
> specific URIs in message-bodies.
> 
> uri GEOCITIES_CHECK1 /^http:\/\/..\.geocities\.com\//
> score GEOCITIES_CHECK1  8.0
> describe GEOCITIES_CHECK1 GEOCITIES_CHECK1, Body

I guess your problem is that
	www.geocities-DEMUNGE.com
	otherplace.geocities-DEMUNGE.com
	*.geocities-DEMUNGED.uk
nor other free hosters.


> 
> uri GEOCITIES_CHECK2 /^http:\/\/geocities\.yahoo\.com\...\//
> score GEOCITIES_CHECK2  8.0
> describe GEOCITIES_CHECK2 GEOCITIES_CHECK2, Body
> 
> I also added country-rules like this to mark mails that come from, say
> China:
> 
> header RCVD_FROM_CHINA eval:check_rbl_txt('country_cn','cn.countries.blackholes.us.')
> describe RCVD_FROM_CHINA Received from China
> tflags RCVD_FROM_CHINA net
> score RCVD_FROM_CHINA 5.0
> 

This is only ok if you don't receive mail from there. otherwise, you
should use a meta rule to combine both checks.

> Since most Geocities mails came from China, these rules combined worked
> really well. Unfortunately, now we are at the point where they are no
> longer using simple Geocities links. Now it seems to be sites hosted
> somewhere in China, with arbitrary domain-names. I have been getting
> more and more of those lately. They slip right through Spamassassin
> with very low spam-scores.
> 
> What I would like to do now is to somewhat combine the two approaches:
> parse mail-bodies for URIs (just as it's done with the Geocities
> example), then use the IP the link resolves to with the countries
> blackhole-list to find out if the site is in China, Korea or any of the
> other well-known spammer-countries (as done in the blackhole rules
> above) and assign a spam-score based on that.
>

why? there is no relation between the client and the target (the uri).
change your rules to account for other URIs.