You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Giampaolo Tomassoni <g....@libero.it> on 2007/09/26 03:41:49 UTC

URIWhois plugin

Dears,

well, I just did version 0.01 of the URIWhois plugin.

Its purpose is mainly to detect some spam containing URIs to sites in
brand-new domains, or having some conflict in whois and dns records, or
being driven by specific dns servers.

So, it is meant to do something I believe someone else is already doing in
their SA, but this plugin is completely asynchronous in order to minimize
any performance impact.

Also, it caches whois results. But the best thing is that, if you run more
SA copies on the same computer (in example, you use amavis), when one is
asked to issue a whois query for a domain which another copy is already
quering, the first SA copy waits for the results obtained by the latter!

Finally, it is easily configurable to adapt to your own mileage: you may
even avoid whois queries by not using some of the rules. More details by
perldoc.

Please note this is not stable stuff. It is... well, what's before alpha?

The URIWhois plugin needs SA v.3.002003 (or above?) and would surely
appreciate a quite recent copy of BerkeleyDB (I'm using 0.31 with v.4.5 of
the berkeleydb libraries).

You can download it from here:
http://www.tomassoni.biz/download/URIWhois-0.01.tar.bz2 (come on, it is 17
KB...).

Untar it on the /etc/spamassassin directory and you are (almost) done.
Review settings from the /etc/spamassassin/URIWhois.cf file.

I would like to have this code reviewed by you, since I'm not that much used
to the async thingeries in SA.

Enjoy!

Giampaolo

Re: New domains (was: URIWhois plugin)

Posted by Dave Pooser <da...@pooserville.com>.

> 2.  As mentioned above the whois data is sometimes populated *after* the
> domains
> start appearing in spams.  Remember that the whois data is still mostly batch
> processed once or twice a day.  Many of the TLD zone files (where the DNS
> delegations actually come from) are updated in near real time.  IOW the whois
> data can significantly lag usable domains.

I'd be tempted to try using the whois data as the basis for a RHSBL
not-blacklist (i.e: if the domain name DOESN'T match the RHSBL, then it's
not old enough to accept mail from). However, as a practical matter we'd be
talking about a multi-gigabyte zone file and I expect that would prove a
little problematic for whatever sucker^H^H^H kind soul ended up hosting the
thing. 
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"...Life is not a journey to the grave with the intention of arriving
safely in one pretty and well-preserved piece, but to slide across the
finish line broadside, thoroughly used up, worn out, leaking oil, and
shouting GERONIMO!!!" -- Bill McKenna

Re: New domains (was: URIWhois plugin)

Posted by Jeff Chan <je...@surbl.org>.

Quoting Jonas Eckerman <jo...@frukt.org>:

> (The idea below is not mine, someone else (I'm sorry, but I
> forgot who) wrote about it here (I think) before.)
>
> Giampaolo Tomassoni wrote:
>
> > brand-new domains,
>
> Something that could work for this without the problems inherent
> in using whois or registry databases is to simply check how long
> ago a domain was first seen beeing used for sending mail or in
> URIs in mail. (People might allready be doing this locally, but
> doing it centralized could work better.)
>
> A specialized DNS server could be done for this. It'd work
> something like this:
>
> 1: It receives a query.
>
> 2: It checks in it's database.
>
> 3.a, found in database:
> * Return result indicating how long ago domain was added.
>
> 3.b: not found:
> * Adds the domain to the database.
> * Return result indicating new domain.

This is a very good idea, and could be used as a partial substitute for the Day
Old Bread list.  I particularly like that it could be relatively lightweight
and automatic.

Naturally there are some complications:

1.  What happens if the domain is re-registered before it's delisted?

2.  What happens with tasting (kited) domains that get used for 5 days, then
unregistered, re-registered, etc.

3.  It wouldn't distinguish good domains from bad (but nor does DOB).

As Giamoaolo points out, it could be fairly trivially poisoned by the bad guys
submitting misleading queries.  Remember that many spam URI domains appear
before they exist in the whois data or are even delegated from the TLD zones.
(The former is a problem for URIWhois too, it would seem.)

The quick answer to these issues is that whois is only partially useful.

1.  The contact information on spam domains is often false, misleading or stolen
as part of identity theft, so it's not always useful.  (That said, there are
sometimes useful patterns in the contact info.)

2.  As mentioned above the whois data is sometimes populated *after* the domains
start appearing in spams.  Remember that the whois data is still mostly batch
processed once or twice a day.  Many of the TLD zone files (where the DNS
delegations actually come from) are updated in near real time.  IOW the whois
data can significantly lag usable domains.

3.  Nameserver information in whois can be misleading:

  A.  The nameservers and/or their IPs are sometimes changed after the domain is
initially registered, either before or after the domain appears in messages.

  B.  The nameservers in the whois are not always the ones that finally resolve
a domain.  There can be long chains of delegations before getting a final
answer.  Sometimes only 1 of the whois-listed nameservers actually works. 
Sometimes none of them work or exist only in DNS caches.

  C.  Spam nameservers have been known to give misleading responses or block
access to anti-spammers, malware researchers, etc.

There are many other factors besides domain age that can be used to help
identify spam domains, and SURBL does use many of them, though we expect to use
them more effectively  going forward.  This is largely without reference to the
whois data which for some of the reasons mentioned above are not always
reliable or useful.

Domain age is only one factor, and considering other factors can make for a
significantly more useful blacklist.

Cheers,

Jeff C.

Re: New domains

Posted by Jonas Eckerman <jo...@frukt.org>.

mouss wrote:
 > Wouldn't this be reinventing /etc/hosts?

No.

The hosts file contained all individual *hosts* a machine needed 
to know about, and still contains all hosts a machine needs to 
know about without using the DNS.

This database would contain all *domains* that has been used in mail.

 > I mean, if you list all domains, you end up with a huge
 > database...

Yes. A huge (and dynamic) database.

Regards
/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/

Re: New domains

Posted by mouss <mo...@netoyen.net>.

Jonas Eckerman wrote:
> (The idea below is not mine, someone else (I'm sorry, but I forgot 
> who) wrote about it here (I think) before.)
>
> Giampaolo Tomassoni wrote:
>
>> brand-new domains,
>
> Something that could work for this without the problems inherent in 
> using whois or registry databases is to simply check how long ago a 
> domain was first seen beeing used for sending mail or in URIs in mail. 
> (People might allready be doing this locally, but doing it centralized 
> could work better.)
>
> A specialized DNS server could be done for this. It'd work something 
> like this:
>
> 1: It receives a query.
>
> 2: It checks in it's database.
>
> 3.a, found in database:
> * Return result indicating how long ago domain was added.
>
> 3.b: not found:
> * Adds the domain to the database.
> * Return result indicating new domain.
>
> (It might be a good idea to also save last queried time for each 
> domain (meaning 2.a will need to update the database) in order to be 
> able to clean out domains that hasn't been seen for a long time.)
>
> In order to be effective, such a DNS list must be used by a lot of 
> different systems spread all over the world and used by different type 
> of organizations.
>
> It will also take time time until it can be used in an effective 
> manner, so enough people would have to be using it for some time with 
> very low scores just to seed it.

Wouldn't this be reinventing /etc/hosts? I mean, if you list all 
domains, you end up with a huge database...  or am I missing something?

>
> I could probably throw together a proof-of-concept DNS thingy in perl 
> for this, but I don't have the hardware to host it for production use, 
> nor the time to do it properly (perl would probably not be the best 
> language to do it in).
>
> The best way might be to actually implement this in an existing 
> DNS-list server, so it could be seeded thorugh queries fopr that list.
>
> If, just as an example, SURBL did this, the list would be seeded by 
> all systems allready using SURBL lists, and the results could be 
> included in multi.surbl.org.
>
> (Please not, I have no idea if implementing this in SURBLs DNS system 
> is feasible in any way (wr to software, hardware, lunch breaks, or 
> whatever), it was just an example.)
>
> Regards
> /Jonas

Re: R: New domains (was: URIWhois plugin)

Posted by Jeff Chan <je...@surbl.org>.

Quoting Kenneth Porter <sh...@sewingwitch.com>:

> --On Thursday, September 27, 2007 7:05 PM +0200 Giampaolo Tomassoni
> <g....@libero.it> wrote:
>
> > The only problem is that a spammer could "query" it days before it will
> > bulk send, thereby impairing the effectiveness of such approach.
> >
> > I think we need some "official" data like the domain's creation time: at
> > least, spammers will be mandated to buy domains a couple of month before
> > using them...
>
> I recall reading about a scam used by registrars to buy up large blocks of
> names, then cancel the purchase and incur no penalty, then buy them up
> again, over and over, to hold them off the market at no cost. How would you
> detect such a thing?

It's called domain tasting or domain kiting:

  http://en.wikipedia.org/wiki/Domain_tasting
  http://www.bobparsons.com/DomainKiting.html

Jeff C.

Re: R: New domains (was: URIWhois plugin)

Posted by Kenneth Porter <sh...@sewingwitch.com>.

--On Thursday, September 27, 2007 7:05 PM +0200 Giampaolo Tomassoni 
<g....@libero.it> wrote:

> The only problem is that a spammer could "query" it days before it will
> bulk send, thereby impairing the effectiveness of such approach.
>
> I think we need some "official" data like the domain's creation time: at
> least, spammers will be mandated to buy domains a couple of month before
> using them...

I recall reading about a scam used by registrars to buy up large blocks of 
names, then cancel the purchase and incur no penalty, then buy them up 
again, over and over, to hold them off the market at no cost. How would you 
detect such a thing?

Re: R: New domains

Posted by Jonas Eckerman <jo...@frukt.org>.

Giampaolo Tomassoni wrote:

> The only problem is that a spammer could "query" it days before it will bulk
> send, thereby impairing the effectiveness of such approach.

So we check (with DNS) wether a domain MX record exists before 
adding it to the database. Something like this:

1: It receives a query.

2: It checks in it's database.

3.a, found in database:
* Return result indicating how long ago domain was added.

3.b: not found:
* Adds the domain to the database if MX exists in DNS.
* Return result indicating new domain.

/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/

R: New domains (was: URIWhois plugin)

Posted by Giampaolo Tomassoni <g....@libero.it>.

> -----Messaggio originale-----
> Da: Jonas Eckerman [mailto:jonas_lists@frukt.org]
> Inviato: giovedì 27 settembre 2007 18.17
> A: users@spamassassin.apache.org
> Oggetto: New domains (was: URIWhois plugin)
> 
> (The idea below is not mine, someone else (I'm sorry, but I
> forgot who) wrote about it here (I think) before.)
> 
> Giampaolo Tomassoni wrote:
> 
> > brand-new domains,
> 
> Something that could work for this without the problems inherent
> in using whois or registry databases is to simply check how long
> ago a domain was first seen beeing used for sending mail or in
> URIs in mail. (People might allready be doing this locally, but
> doing it centralized could work better.)
> 
> A specialized DNS server could be done for this. It'd work
> something like this:
> 
> 1: It receives a query.
> 
> 2: It checks in it's database.
> 
> 3.a, found in database:
> * Return result indicating how long ago domain was added.
> 
> 3.b: not found:
> * Adds the domain to the database.
> * Return result indicating new domain.

This is really a good idea.

The only problem is that a spammer could "query" it days before it will bulk
send, thereby impairing the effectiveness of such approach.

I think we need some "official" data like the domain's creation time: at
least, spammers will be mandated to buy domains a couple of month before
using them...

Giampaolo


> 
> (It might be a good idea to also save last queried time for each
> domain (meaning 2.a will need to update the database) in order to
> be able to clean out domains that hasn't been seen for a long time.)
> 
> In order to be effective, such a DNS list must be used by a lot
> of different systems spread all over the world and used by
> different type of organizations.
> 
> It will also take time time until it can be used in an effective
> manner, so enough people would have to be using it for some time
> with very low scores just to seed it.
> 
> I could probably throw together a proof-of-concept DNS thingy in
> perl for this, but I don't have the hardware to host it for
> production use, nor the time to do it properly (perl would
> probably not be the best language to do it in).
> 
> The best way might be to actually implement this in an existing
> DNS-list server, so it could be seeded thorugh queries fopr that
> list.
> 
> If, just as an example, SURBL did this, the list would be seeded
> by all systems allready using SURBL lists, and the results could
> be included in multi.surbl.org.
> 
> (Please not, I have no idea if implementing this in SURBLs DNS
> system is feasible in any way (wr to software, hardware, lunch
> breaks, or whatever), it was just an example.)
> 
> Regards
> /Jonas
> --
> Jonas Eckerman, FSDB & Fruktträdet
> http://whatever.frukt.org/
> http://www.fsdb.org/
> http://www.frukt.org/

New domains (was: URIWhois plugin)

Posted by Jonas Eckerman <jo...@frukt.org>.

(The idea below is not mine, someone else (I'm sorry, but I 
forgot who) wrote about it here (I think) before.)

Giampaolo Tomassoni wrote:

> brand-new domains,

Something that could work for this without the problems inherent 
in using whois or registry databases is to simply check how long 
ago a domain was first seen beeing used for sending mail or in 
URIs in mail. (People might allready be doing this locally, but 
doing it centralized could work better.)

A specialized DNS server could be done for this. It'd work 
something like this:

1: It receives a query.

2: It checks in it's database.

3.a, found in database:
* Return result indicating how long ago domain was added.

3.b: not found:
* Adds the domain to the database.
* Return result indicating new domain.

(It might be a good idea to also save last queried time for each 
domain (meaning 2.a will need to update the database) in order to 
be able to clean out domains that hasn't been seen for a long time.)

In order to be effective, such a DNS list must be used by a lot 
of different systems spread all over the world and used by 
different type of organizations.

It will also take time time until it can be used in an effective 
manner, so enough people would have to be using it for some time 
with very low scores just to seed it.

I could probably throw together a proof-of-concept DNS thingy in 
perl for this, but I don't have the hardware to host it for 
production use, nor the time to do it properly (perl would 
probably not be the best language to do it in).

The best way might be to actually implement this in an existing 
DNS-list server, so it could be seeded thorugh queries fopr that 
list.

If, just as an example, SURBL did this, the list would be seeded 
by all systems allready using SURBL lists, and the results could 
be included in multi.surbl.org.

(Please not, I have no idea if implementing this in SURBLs DNS 
system is feasible in any way (wr to software, hardware, lunch 
breaks, or whatever), it was just an example.)

Regards
/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/

Re: URIWhois plugin

Posted by "Michele Neylon :: Blacknight" <mi...@blacknight.ie>.

Jeff Chan wrote:

> 
> 
> In principle, this is a good concept; using domain whois data to spot bad
> domains can be useful.
> 
> In practice, it's a really, really, really bad idea since the public whois
> infrastructure is not designed for this kind of high volume use.  If many
> people did it, it would result in an effective DDOS against whois service, even
> with caching and delays.  Please don't do it.
> 
> It's much better to let URI blacklist operators such as SURBL handle these
> domains in a centralized way and publish the domain data via our four dozen DNS
> servers, etc.
> 
> Jeff C.

The other thing is that a LOT of registrars and registries rate limit 
whois lookups, so it won't work after you've done X lookups in a 24 hour 
period ....


-- 
Mr Michele Neylon
Blacknight Solutions
Hosting & Colocation, Brand Protection
http://www.blacknight.ie/
http://blog.blacknight.ie/
Tel. 1850 929 929
Intl. +353 (0) 59  9183072
Direct Dial: +353 (0)59 9183090
Fax. +353 (0) 1 4811 763
-------------------------------
Blacknight Internet Solutions Ltd, Unit 12A,Barrowside Business
Park,Sleaty Road,Graiguecullen,Carlow,Ireland  Company No.: 370845

Re: R: URIWhois plugin

Posted by Jeff Chan <je...@surbl.org>.

Quoting Giampaolo Tomassoni <g....@libero.it>:

> How do they "handle these domains in a centralized way"? Do they simply
> relay a whois request for not-yet-seen domains? Because in this case they
> have to tune their whois parsers a bit: dob.sibl.support-intelligence.net,
> in example, reports both libero.it and tomassoni.biz as being Day One Bread,
> while it is years they're around...

Day Old Bread has had some errors before.  All of .org was blacklisted for a
while for example.  Aside from the occasional errors, it's a useful list in
concept since it shows recently registered domains.

However I was referring to URI blacklists such as SURBL.org, not DOB.  SURBL
doesn't catch everything, but it catches much, and we seek to catch more.  It's
much better if we do the whois and other queries in a centralized way, do a lot
of (quick) testing, then distribute the resulting data as a blacklist which
everyone can use as relatively efficient DNS queries or rsync files.

Jeff C.

R: URIWhois plugin

Posted by Giampaolo Tomassoni <g....@libero.it>.

> -----Messaggio originale-----
> Da: Jeff Chan [mailto:jeffc@surbl.org]
> 
> In principle, this is a good concept; using domain whois data to spot
> bad
> domains can be useful.
> 
> In practice, it's a really, really, really bad idea since the public
> whois
> infrastructure is not designed for this kind of high volume use.  If
> many
> people did it, it would result in an effective DDOS against whois
> service, even
> with caching and delays.  Please don't do it.
> 
> It's much better to let URI blacklist operators such as SURBL handle
> these
> domains in a centralized way and publish the domain data via our four
> dozen DNS
> servers, etc.

How do they "handle these domains in a centralized way"? Do they simply
relay a whois request for not-yet-seen domains? Because in this case they
have to tune their whois parsers a bit: dob.sibl.support-intelligence.net,
in example, reports both libero.it and tomassoni.biz as being Day One Bread,
while it is years they're around...

Giampaolo

> 
> Jeff C.

Re: URIWhois plugin

Posted by Jeff Chan <je...@surbl.org>.

Quoting Giampaolo Tomassoni <g....@libero.it>:

> Dears,
>
> well, I just did version 0.01 of the URIWhois plugin.
>
> Its purpose is mainly to detect some spam containing URIs to sites in
> brand-new domains, or having some conflict in whois and dns records, or
> being driven by specific dns servers.
>
> So, it is meant to do something I believe someone else is already doing in
> their SA, but this plugin is completely asynchronous in order to minimize
> any performance impact.
>
> Also, it caches whois results. But the best thing is that, if you run more
> SA copies on the same computer (in example, you use amavis), when one is
> asked to issue a whois query for a domain which another copy is already
> quering, the first SA copy waits for the results obtained by the latter!
>
> Finally, it is easily configurable to adapt to your own mileage: you may
> even avoid whois queries by not using some of the rules. More details by
> perldoc.
>
> Please note this is not stable stuff. It is... well, what's before alpha?
>
> The URIWhois plugin needs SA v.3.002003 (or above?) and would surely
> appreciate a quite recent copy of BerkeleyDB (I'm using 0.31 with v.4.5 of
> the berkeleydb libraries).
>
> You can download it from here:
> http://www.tomassoni.biz/download/URIWhois-0.01.tar.bz2 (come on, it is 17
> KB...).
>
> Untar it on the /etc/spamassassin directory and you are (almost) done.
> Review settings from the /etc/spamassassin/URIWhois.cf file.
>
> I would like to have this code reviewed by you, since I'm not that much used
> to the async thingeries in SA.
>
> Enjoy!
>
> Giampaolo



In principle, this is a good concept; using domain whois data to spot bad
domains can be useful.

In practice, it's a really, really, really bad idea since the public whois
infrastructure is not designed for this kind of high volume use.  If many
people did it, it would result in an effective DDOS against whois service, even
with caching and delays.  Please don't do it.

It's much better to let URI blacklist operators such as SURBL handle these
domains in a centralized way and publish the domain data via our four dozen DNS
servers, etc.

Jeff C.