You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt <lm...@gmail.com> on 2007/07/03 17:11:13 UTC

MD5 Hash of URL's

Why can't Spamassassin do like a MD5 hash of any URL's in a message
and check them against a database?  I just think it would help catch
things like: geocities.com/spamer123/ or spamer123.tripod.com and etc.
 It would also work for Tinyurl links and the like.

Matt

Re: MD5 Hash of URL's

Posted by Daniel J McDonald <da...@austinenergy.com>.
On Tue, 2007-07-03 at 10:11 -0500, Matt wrote:
> Why can't Spamassassin do like a MD5 hash of any URL's in a message
> and check them against a database?  

Well, not MD5, but Whiplash type 8 signatures in Razor-2 are pretty
similar.

> I just think it would help catch
> things like: geocities.com/spamer123/ or spamer123.tripod.com and etc.

Again, Razor does a fair job at finding this, as long as people report.


>  It would also work for Tinyurl links and the like.

Google recently came out with an anti-malware API that uses various MD5
hashes of URI's, but they have not yet licensed it for the world, and I
only briefly thought about writing a plugin to call it.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com

Re: MD5 Hash of URL's

Posted by Messaging Directories <me...@gmail.com>.
Funny you should mention that.  I recently wrote a proof of concept plugin
that does exactly what you're talking about.  The point was to check URLs
against google's safebrowsing list, which was just announced.

Unfortunately, the results were rather poor.  The only hits that I got were
on messages that already scored 10+ points.  And a few false positives --
last I checked, the main page for myspace was listed in the malware list (I
believe).

If anyone's interested, the (very rough) code for syncing google's lists,
and for checking a database containing the hashes is available.

Austin.

On 7/3/07, Matt <lm...@gmail.com> wrote:
>
> Why can't Spamassassin do like a MD5 hash of any URL's in a message
> and check them against a database?  I just think it would help catch
> things like: geocities.com/spamer123/ or spamer123.tripod.com and etc.
> It would also work for Tinyurl links and the like.
>
> Matt
>

Re: MD5 Hash of URL's

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Matt wrote:
> Why can't Spamassassin do like a MD5 hash of any URL's in a message
> and check them against a database?

Because there isn't such a database?

Daryl

Re: MD5 Hash of URL's

Posted by "John D. Hardin" <jh...@impsec.org>.
On Thu, 5 Jul 2007, Kelson wrote:

> > On Tue, 3 Jul 2007, Matt wrote:
> > 
> >> Why can't Spamassassin do like a MD5 hash of any URL's in a
> >> message and check them against a database?  I just think it would
> >> help catch things like: geocities.com/spamer123/ or
> >> spamer123.tripod.com and etc.
> 
> The concept might still be useful for specific known "grey" hosts
> with a mix of legit sites and spam sites -- geocities, tripod,
> blogspot, etc.  --where the URL patterns are known.  If you know
> the pattern is account.example.com, or example.com/account, then
> throw away the rest of the URL and list/lookup the base pattern.

True. The plugin doing the analysis would have a list of domains and 
slice points (how much of the URL to discard before hashing). I 
presume the MD5 sum would be checked via a DNS lookup? That would be 
the only way to get a reasonable response time for new URLs to block.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  If someone has a gun and is trying to kill you, it would be
  reasonable to shoot back with your own gun.
                                      -- the Dalai Lama, May 15, 2001
-----------------------------------------------------------------------
 2 days until Robert Heinlein's 100th birthday


Re: MD5 Hash of URL's

Posted by Kelson <ke...@speed.net>.
John D. Hardin wrote:
> On Tue, 3 Jul 2007, Matt wrote:
> 
>> Why can't Spamassassin do like a MD5 hash of any URL's in a
>> message and check them against a database?  I just think it would
>> help catch things like: geocities.com/spamer123/ or
>> spamer123.tripod.com and etc.
> 
> Too easy to defeat using a URI with random parameters pointing to a
> PHP et. al. page that ignores parameters (assuming you include
> parameters in the hash) or via wildcard DNS using random third- or
> fourth-level hostnames.

Even the path could be made random if they use mod_rewrite or 
equivalent.  If http://example.com/random/path/gets/ignored always 
serves up the contents of salespitch.html, they can generate as many 
URLs as they want.

The concept might still be useful for specific known "grey" hosts with a 
mix of legit sites and spam sites -- geocities, tripod, blogspot, etc. 
--where the URL patterns are known.  If you know the pattern is 
account.example.com, or example.com/account, then throw away the rest of 
the URL and list/lookup the base pattern.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: MD5 Hash of URL's

Posted by "John D. Hardin" <jh...@impsec.org>.
On Tue, 3 Jul 2007, Matt wrote:

> Why can't Spamassassin do like a MD5 hash of any URL's in a
> message and check them against a database?  I just think it would
> help catch things like: geocities.com/spamer123/ or
> spamer123.tripod.com and etc.

Too easy to defeat using a URI with random parameters pointing to a
PHP et. al. page that ignores parameters (assuming you include
parameters in the hash) or via wildcard DNS using random third- or
fourth-level hostnames.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It there a Special Olympics for terrorists going on in the UK this
  week?                                 -- Bruce Schneier, 07/02/2007
-----------------------------------------------------------------------
 Tomorrow: The 231st anniversary of the Declaration of Independence