You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ev...@coolrunningconcepts.com on 2005/05/24 01:45:12 UTC

Additional SPAM recognition method

I'd like to contribute some research I've done on spam that doesn't use
traditional bayes filters or other scoring methods nor traditional DNS BLs. Its
either spam or its not, but I'd like to see this technique in spamassasin,
possibly with really high scores for things that this method says are "spam".

Advantages:
  * Simple method, no magic numbers
  * Targets the information spammers need you to see
  * Difficult to counter
  * Low false positives (almost none)
  * Incredible high success rate

Disadvantages:
  * High network traffic overhead - a DNS cache is pretty much required.  I use
djbdns's dnscache.


Here's the algorithm:

  1  Decode any URL-encoding in the message
  2  Un-MIME the message
  3  Scan all parts of the message for URLs and email addresses (this can be
links, IMG tags, mailto:'s, or even just something that looks like a web
address or email address).  Do NOT scan the headers.
  4  For each address, resolve the hostname to an IP and then look up that IP
in your favorite DNS RBL - I use "sbl-xbl.spamhaus.org" as it caches the most,
but you can also add bl.spamcop.net and relays.ordb.net
  5  As soon as any test in #4 comes back with a positive result, that message
is spam, and you can go on to the next message.

The last email server I set up had an additional step - each mime piece was a
seperate file (from ripmime) sent to a small perl program to scan the messages
for URLs.  This extra test also sent each file to clamav for virus scanning
configured to search inside archives (but not mailboxes).

-- Evan Langlois

Re: Additional SPAM recognition method

Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, May 23, 2005 at 06:45:12PM -0500, evan@coolrunningconcepts.com wrote:
> Here's the algorithm:
> 
>   1  Decode any URL-encoding in the message
>   2  Un-MIME the message

Wrong order?

>   3  Scan all parts of the message for URLs and email addresses (this can be
> links, IMG tags, mailto:'s, or even just something that looks like a web
> address or email address).  Do NOT scan the headers.

get_uri_list().

>   4  For each address, resolve the hostname to an IP and then look up that IP
> in your favorite DNS RBL - I use "sbl-xbl.spamhaus.org" as it caches the most,
> but you can also add bl.spamcop.net and relays.ordb.net

SURBL?

-- 
Randomly Generated Tagline:
 Professor: Being captain is about intuition and heart. A good 
  captain can't have either one. That's why cold, logical Bender 
  is perfect for the job.
  Bender: Well, I do think of human life as expendable. 

Re: Additional SPAM recognition method

Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, May 23, 2005 at 06:45:12PM -0500, evan@coolrunningconcepts.com wrote:
> Here's the algorithm:
> 
>   1  Decode any URL-encoding in the message
>   2  Un-MIME the message

Wrong order?

>   3  Scan all parts of the message for URLs and email addresses (this can be
> links, IMG tags, mailto:'s, or even just something that looks like a web
> address or email address).  Do NOT scan the headers.

get_uri_list().

>   4  For each address, resolve the hostname to an IP and then look up that IP
> in your favorite DNS RBL - I use "sbl-xbl.spamhaus.org" as it caches the most,
> but you can also add bl.spamcop.net and relays.ordb.net

SURBL?

-- 
Randomly Generated Tagline:
 Professor: Being captain is about intuition and heart. A good 
  captain can't have either one. That's why cold, logical Bender 
  is perfect for the job.
  Bender: Well, I do think of human life as expendable.