You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2008/04/26 18:49:58 UTC
Starting a URIBL - Howto? [OT]
I was just wondering from those of you who have done it - how to start a
URIBL. I'm guessing the process (simplified) is:
1) Mine messages for links
2) Subtract out anything matching a fairly large white list
So my first question here is - what do most of you used to mine the
links in a message with? Can someone point me in the right direction?
Also - I'm willing to work with and share data with others who are
already doing this.
Re: Starting a URIBL - Howto? [OT]
Posted by Rob McEwen <ro...@invaluement.com>.
Dallas Engelken wrote:
> No, you're right, thats not fair. If I compare only recent reactive
> listings, minus the subdomain hosters that we list, you hit about 60%
> whereas before it was more like 27%.
>
> imvURI stats from last 5000 URIBL black listings
> -> 2981 hits
> -> 2019 misses
Dallas, I've made some recent *substantial* improvements to ivmURI. (1)
I've added *several* new spam sources... it was always a weakness of
ivmURI that the raw data that fed ivmURI wasn't "wide" enough. That
incoming data is much wider now! ...and... (2) I improved ivmSIP's
response time (previously, it was getting bogged down in some auditing
tasks that was delaying writes to the rsync files... that has been fixed).
RESULTS...
stats from 5/23/2008 (a few minutes ago).
---------------------
322/500 (ivmURI hits from the latest 500 URIBL listings)
(whereas a couple of tests in April showed 186/500 and 225/500)
301/500 (URIBL hits from the latest 500 ivmURI listings)
NOTE: to compare apples-to-apples, subdomain listings in URIBL were removed
Let me know if you'd like a snapshot of ivmURI for your own analysis of
these latest improvements.
ALSO...
In spite of your off-list explanation, I'm STILL confused about what you
mean when you refer to URIBL's *pro-active listings*???
You must be either referring to:
(A) Listings *currently* in URIBL-GOLD, but *not* *yet* in URIBL-BLACK
--or--
(B) Listings *currently* in URIBL-BLACK which were *previously* listed
in URIBL-GOLD
Which is it? "A" or "B"? (or something else?)
OF COURSE: The silly part about all these stats is that the *superior*
comparison between DNSBLs is "hit rates" on spams sent to mail servers
combined with low FP rates. It is possible for a DNSBL to have far fewer
listings, but, in "real world" testing, hit on higher numbers of spams
with less FPs.
Rob McEwen
Re: Re: Starting a URIBL - Howto? [OT]
Posted by Dallas Engelken <da...@uribl.com>.
Rob McEwen wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dallas
> Engelken wrote:
>> Yes, of course, but you're results.txt is biased as it only shows
>> where imvURI hits.
>>
>> Based on the last 20k adds to URIBL, it appears to me that imvURI
>> has less coverage?
>> <snip>:
> Dallas,
>
> Yes, you are right!
>
> URIBL *does* cast a wider net than ivmURI.
>
> So, in general, I agree with your statement that ivmURI has less
> coverage than URIBL. But I'm confused about your stats... and they
> looks really weird. (but maybe I'm just not understanding them?)
>
> So here is what I did.
>
> I took the last 500 additions to URIBL, (not including geocity and
> blogspot items... so that this comparison would compare apples to
> apples!) I then ran those against ivmURI.
>
> 186 of the 500 latest additions to URIBL were also found in ivmURI.
>
> I then reversed this testing and ran URIBL against the last 500
> additions to ivmURI.
>
> 328 of the latest 500 additions to ivmURI were listed on URIBL.
>
> So yes, basically, you're right, URIBL does have greater coverage than
> ivmURI.
>
> Your point is well made. For the most part, URIBL casts a wider net
> than ivmURI. Also, if you were to include geocity and blogspot hits,
> of course, that would throw the comparison wildly in URIBL's favor...
> but I'm not so sure that would be a fair comparison.
No, you're right, thats not fair. If I compare only recent reactive
listings, minus the subdomain hosters that we list, you hit about 60%
whereas before it was more like 27%.
imvURI stats from last 5000 URIBL black listings
-> 2981 hits
-> 2019 misses
>
> (In both tests, I checked against the 2nd list just about 2-3 minutes
> after grabbing the lastest data from first list. This is important as
> I was seeing those stats quickly grow for BOTH after my initial
> collection of stats... because items not yet in both lists are
> continuously getting into the other list fast. So timing is mission
> critical in this kind of testing and the time between gathering and
> checking MUST be the same both ways.)
>
> However, I think you missed my point about
> http://invaluement.com/results.txt
>
> I wasn't saying that this proved that ivmURI is better than URIBL or
> SURBL. Only that this proves ivmURI as being *relevant* and *useful*
> ...even for those who are already using *both* URIBL and SURBL. (and
> this is just one such proof!)
you said,
"and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE: http://invaluement.com/results.txt )"
your EXAMPLE contradicts the statement that precedes it. I can only take it in the context of how I read it.
>
> For example, if ivmURI were only catching stuff already caught by
> URIBL and SURBL, ivmURI wouldn't be relevant or helpful to anyone.
> Moreover, I believe that URIBL or SURBL could easily create a
> similarly impressive page as my http://invaluement.com/results.txt page.
Probably.
>
> Bottom line is that you are correct... AND... I'm sorry you took this
> as me dissing URIBL!
>
I didnt take it that way. I was just pointing out that your statement
didnt match your accompanying example.
> Simply put, there are some series of spams that each of the three URI
> blacklists are better at catching than the other two. That is ALL that
> I meant by this.
>
Okay, if you would have said that, I would have agreed and never posted :)
--
Dallas Engelken
dallase@uribl.com
http://uribl.com
Re: Starting a URIBL - Howto? [OT]
Posted by Rob McEwen <ro...@invaluement.com>.
Dallas Engelken wrote:
> Yes, of course, but you're results.txt is biased as it only shows
> where imvURI hits.
>
> Based on the last 20k adds to URIBL, it appears to me that imvURI has
> less coverage?
> <snip>:
Dallas,
Yes, you are right!
URIBL *does* cast a wider net than ivmURI.
So, in general, I agree with your statement that ivmURI has less
coverage than URIBL. But I'm confused about your stats... and they looks
really weird. (but maybe I'm just not understanding them?)
So here is what I did.
I took the last 500 additions to URIBL, (not including geocity and
blogspot items... so that this comparison would compare apples to
apples!) I then ran those against ivmURI.
186 of the 500 latest additions to URIBL were also found in ivmURI.
I then reversed this testing and ran URIBL against the last 500
additions to ivmURI.
328 of the latest 500 additions to ivmURI were listed on URIBL.
So yes, basically, you're right, URIBL does have greater coverage than
ivmURI.
Your point is well made. For the most part, URIBL casts a wider net than
ivmURI. Also, if you were to include geocity and blogspot hits, of
course, that would throw the comparison wildly in URIBL's favor... but
I'm not so sure that would be a fair comparison.
(In both tests, I checked against the 2nd list just about 2-3 minutes
after grabbing the lastest data from first list. This is important as I
was seeing those stats quickly grow for BOTH after my initial collection
of stats... because items not yet in both lists are continuously getting
into the other list fast. So timing is mission critical in this kind of
testing and the time between gathering and checking MUST be the same
both ways.)
However, I think you missed my point about
http://invaluement.com/results.txt
I wasn't saying that this proved that ivmURI is better than URIBL or
SURBL. Only that this proves ivmURI as being *relevant* and *useful*
...even for those who are already using *both* URIBL and SURBL. (and
this is just one such proof!)
For example, if ivmURI were only catching stuff already caught by URIBL
and SURBL, ivmURI wouldn't be relevant or helpful to anyone. Moreover, I
believe that URIBL or SURBL could easily create a similarly impressive
page as my http://invaluement.com/results.txt page.
Bottom line is that you are correct... AND... I'm sorry you took this as
me dissing URIBL!
Simply put, there are some series of spams that each of the three URI
blacklists are better at catching than the other two. That is ALL that I
meant by this.
I'm trying to NOT turn this into a pissing contest. Can we end this
here? (Frankly, I'm keeping a LOT of powder dry right now as a gesture
of good-will.)
BTW - How do you have access? Direct queries are not allowed... even for
my paying subscribers. And I don't recall ever setting you up for RSYNC
access? (I recall offering... I just don't recall it ever happening.)
Where is your access coming from?
ALSO: Does this mean that I now am not allowed to make the official
invaluement.com site launch announcement on the URIBL list? ...I hope
not... then again, we might all be old and gray by the time that happens :)
Rob McEwen
Re: Starting a URIBL - Howto? [OT]
Posted by Clayton Keller <in...@ruraltel.net>.
Dallas Engelken wrote:
> Rob McEwen wrote:
>> (on-list follow-up)
>>
>> By "proactive listings", I discovered in my off-list conversation with
>> Dallas that this refers to URIBL-Gold listings... where items are
>> listed in "uribl-gold" in advance of seeing them in actual spams. But
>> this uribl-gold list isn't available to the public and is not even
>> prescribed as a list to use for fighting spam.
>
> We do ask anyone with access to it to use it. Since its basically
> uribl black for domains that we believe will show up in future spam
> campaigns, there is no reason not to. I'm sure there are some on this
> list that can comment further in regards to its effectiveness.
>
>> I'm really disappointed that Dallas would have presented that kind of
>> comparison to ivmURI. This is like comparing some kid's best
>> basketball game on an X-Box to Michael Jordan's best basketball game
>> on the court. I'm glad that URIBL-Gold is helping URIBL black get
>> better... but until the listing actually makes it into URIBL-Black...
>> and is then actually *usable* for blocking spam...
>
> From a RBL perspective, the purpose of the data in there is to catch
> the front end of spam runs. Assuming it takes ~5 minutes to list,
> rebuild, and redistribute new zone data in reactive mode, we could miss
> 50% of a 10 minute campaign. Obviously the longer the campaign draws
> out, the better the miss rate looks. But those using gold+black have
> 100% hitrates on alot of these campaigns, which is something that is
> difficult if not impossible to achieve on a reactive blacklist based
> soley on trap data or user feed back.
>
> As you can see at http://www.uribl.com/gold.shtml, over 20% (14k of 57k)
> of the domains that have been listed in gold for hours, days, even
> weeks, have since moved to black. So, assume each of those 14k
> domains returned NXDOMAIN on black.uribl.com for the first ~5 minutes of
> each of their campaigns, how much spam do you think we missed? Quite a
> lot I'd say. That short window is what we are targetting here. It
> doesnt result in a huge hitrate because it only hits in gold during the
> rebuild and redistribute window, but it does serve its purpose quite well.
>
> Aside from client side spam filtering, I could see
> registries/registrars, web hosts, ip space owners and the like
> benefiting from this data as well. Knowing there is potential for abuse
> prior to the abuse actually occurs could be quite a powerful tool.
> For example, I can tell you that ns1.tuhaerge.com is the next NS that
> will be spewing up VPXL crapmail
> (http://www.spamtrackers.hk/wiki/index.php?title=VPXL).. That NS and
> every domain registred against that NS should be instantly nuked, but
> getting those Chinese registrars to action anything like this, even with
> proper evidence, is nearly impossible... just think if you asked them to
> kill it before the abuse started. ;)
Hi, I just wanted to comment that only a few hours after Dallas sent his
last email we did see that NS spewing junk.
I know it's a little late in response, but I thought I'd pass this info
along to everyone involved in the thread just so you know your work does
appear to be paying off.
Re: Starting a URIBL - Howto? [OT]
Posted by Dallas Engelken <da...@uribl.com>.
Rob McEwen wrote:
> (on-list follow-up)
>
> By "proactive listings", I discovered in my off-list conversation with
> Dallas that this refers to URIBL-Gold listings... where items are
> listed in "uribl-gold" in advance of seeing them in actual spams. But
> this uribl-gold list isn't available to the public and is not even
> prescribed as a list to use for fighting spam.
We do ask anyone with access to it to use it. Since its basically
uribl black for domains that we believe will show up in future spam
campaigns, there is no reason not to. I'm sure there are some on this
list that can comment further in regards to its effectiveness.
> I'm really disappointed that Dallas would have presented that kind of
> comparison to ivmURI. This is like comparing some kid's best
> basketball game on an X-Box to Michael Jordan's best basketball game
> on the court. I'm glad that URIBL-Gold is helping URIBL black get
> better... but until the listing actually makes it into URIBL-Black...
> and is then actually *usable* for blocking spam...
From a RBL perspective, the purpose of the data in there is to catch
the front end of spam runs. Assuming it takes ~5 minutes to list,
rebuild, and redistribute new zone data in reactive mode, we could miss
50% of a 10 minute campaign. Obviously the longer the campaign draws
out, the better the miss rate looks. But those using gold+black have
100% hitrates on alot of these campaigns, which is something that is
difficult if not impossible to achieve on a reactive blacklist based
soley on trap data or user feed back.
As you can see at http://www.uribl.com/gold.shtml, over 20% (14k of 57k)
of the domains that have been listed in gold for hours, days, even
weeks, have since moved to black. So, assume each of those 14k
domains returned NXDOMAIN on black.uribl.com for the first ~5 minutes of
each of their campaigns, how much spam do you think we missed? Quite a
lot I'd say. That short window is what we are targetting here. It
doesnt result in a huge hitrate because it only hits in gold during the
rebuild and redistribute window, but it does serve its purpose quite well.
Aside from client side spam filtering, I could see
registries/registrars, web hosts, ip space owners and the like
benefiting from this data as well. Knowing there is potential for abuse
prior to the abuse actually occurs could be quite a powerful tool.
For example, I can tell you that ns1.tuhaerge.com is the next NS that
will be spewing up VPXL crapmail
(http://www.spamtrackers.hk/wiki/index.php?title=VPXL).. That NS and
every domain registred against that NS should be instantly nuked, but
getting those Chinese registrars to action anything like this, even with
proper evidence, is nearly impossible... just think if you asked them to
kill it before the abuse started. ;)
--
Dallas Engelken
dallase@uribl.com
http://uribl.com
Re: Starting a URIBL - Howto? [OT]
Posted by Rob McEwen <ro...@invaluement.com>.
(on-list follow-up)
First, earlier I presented these stats:
186/500 (ivmURI hits from the latest 500 URIBL listings)
328/500 (URIBL hits from the latest 500 ivmURI listings)
A follow-up *idential* test... only conducted later... gave these stats:
225/500 (ivmURI hits from the latest 500 URIBL listings)
282/500 (URIBL hits from the latest 500 ivmURI listings)
(geocities/blogspots/etc URIs excluded from both tests)
Why the difference? Why the improvement in ivmURI? How did ivmURI
*significantly* narrow that gap?
Two reasons:
(1) ivmURI's engine works faster during non-EST-business hours and
weekend hours (for various reasons) ...(I'm working on ivmURI's engine
right now. I've made these needed improvements with ivmSIP... now I just
need to do the same with ivmURI)
(2) While much of URIBL is automated, user-submissions to URIBL wane a
bit when both America and Europe are experiencing non-business hours..
even non-waking hours... and weekend hours
The the reason why ivmURI does BETTER in that testing than it did
several hours ago.
...but none of this matters that much... as I'll prove later... but I
present this anyways "for the record"
Dallas Engelken wrote:
> ivmURI stats from last 20000 URIBL reactive listings.
> -> 5519 hits
> -> 14481 misses
Dallas confirmed that these initial stats he posted DID include all
those geocities, blogpot, and other subdomains in URIBL that ivmURI
doesn't even try to catch... and there are TONS of those now in the
URIBL list. So Dallas's stats here are comparing "apples to oranges".
According to Dallas's off-list comments to me, when the "subdomains" are
removed, the ivmURI hits on recent URIBL listings are significantly
higher than these stats he original posted. Of course, I don't make it
my goal in life to list every last domain in URIBL. But this would
partially explain why my stats look so different from Dallas's stats...
and why these stats (unfairly and artificially) made ivmURI look so bad.
> ivmURI stats from last 20000 URIBL proactive listings.
> -> 351 hits
> -> 19649 misses
By "proactive listings", I discovered in my off-list conversation with
Dallas that this refers to URIBL-Gold listings... where items are listed
in "uribl-gold" in advance of seeing them in actual spams. But this
uribl-gold list isn't available to the public and is not even prescribed
as a list to use for fighting spam. I'm really disappointed that Dallas
would have presented that kind of comparison to ivmURI. This is like
comparing some kid's best basketball game on an X-Box to Michael
Jordan's best basketball game on the court. I'm glad that URIBL-Gold is
helping URIBL black get better... but until the listing actually makes
it into URIBL-Black... and is then actually *usable* for blocking
spam... it really doesn't count for anything. Therefore, such a
comparison is not only unfair, it is downright laughable. (To be extra
clear, in contrast to URIBL-gold, ALL the items reported on
http://invaluement.com/results.txt HAVE been seen "in the wild" and I do
have corresponding evidence spams "on file")
A LARGER QUESTION:
What matters more, how many items are in a list? Or (1) the amount of
"real world" spam sent to *real* users (NOT dictionary attack spam sent
to "unknown users") that a list "hits" on? Along with (2) low FP-rates.
At the moment:
SURBL has 1.34 MILLION listings
URIBL has 310K listings
ivmURI has 233K listings
But those numbers don't tell the whole story. ivmURI stands up quite
well when measuring real world "hits" on spam sent to real users. When
measured in the real world, ivmURI compares quite well in
head-to-head-to-head tests against SURBL and URIBL... even with it's
smaller footprint... and ivmURI is at least as good in the low-FPs
department.
But, like I said, ALL three lists are indispensable and block spam that
the other two miss.
Rob McEwen
Re: Re: Starting a URIBL - Howto? [OT]
Posted by Dallas Engelken <da...@uribl.com>.
Rob McEwen wrote:
>
> and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE:
> http://invaluement.com/results.txt )
>
Yes, of course, but you're results.txt is biased as it only shows where
imvURI hits.
Based on the last 20k adds to URIBL, it appears to me that imvURI has
less coverage?
imvURI stats from last 20000 URIBL reactive listings.
-> 5519 hits
-> 14481 misses
imvURI stats from last 20000 URIBL proactive listings.
-> 351 hits
-> 19649 misses
--
Dallas Engelken
dallase@uribl.com
http://uribl.com
Re: Starting a URIBL - Howto? [OT]
Posted by Rob McEwen <ro...@invaluement.com>.
Jeremy Fairbrass wrote:
> Hi Rob,
> Are your invaluement.com DNSBLs available for us to use? Your
> http://invaluement.com/results.txt page tells me why I should be using
> it TODAY ;) but I can't find any info about how...!!
>
> Cheers,
> Jeremy
[Note, others have asked the same on-line. This will be my only on-list
answer. The others will get the same answer off-list. Anyone else
interested should e-mail me off-list, rob@invaluement.com ]
Jeremy,
The beta testing period is over. I now only allow paying subscribers.
Hopefully, the web site to sign up will be available soon. (I keep
trying to finish it... but it gets delayed all the time as I continually
get carried away finding new ways to improve my lists!) In the meantime,
you can get access immediately... before the web site launches... by
filling out the following form below (anyone is welcome to do this...
just make sure you send this back to *me* and not to the list). I'll the
respond with further instructions as well as a button where you can
subscribe via PayPal. The subscription includes a trial period where you
pay $1 for the first 10 days. You can cancel at any time. (Other methods
of payment are available upon request and I often grant very large
potential subscribers longer periods of free testing time.)
Also, direct queries to my DNSBLs are never allowed and will always
fail... even for subscribers! Instead, subscribers get access via RSYNC
to either rbldnsd-formatted files, or BIND-formatted files. (I provide
detailed instructions about that!) Additionally, I have to have a clear
understanding of... for who/what this is used per subscriber. For
example, at this point, I don't know enough about Jeremy Fairbrass to
send that subscription button.. but I'm sure he will help me out with that!)
*********************************************************
Obtaining a subscription to the invaluement.com DNSBLs
(1) Name & contact information, including phone number & e-mail address,
company, etc.
(2) Tell me the approximate number of mailboxes/users that your use of this
product will protect if/when you decide to officially subscribe. (also
include
your spam filtering customer's users if applicable... basically, anyone who
benefits by your use of this product should be included in the total)
(3) Let me know of you provide either spam filtering software or
filtering appliances or DNSBLs or any other spam filtering technologies
to third parties where the actual filtering is then done outside of your
network. There is a different pricing plan for those situations.
(4) What type of access do you require:
(a) RSYNC to rbldnsd-formatted files (RECOMMENDED!)
...OR..
(b) RSYNC to (dns) bind-formatted files
(5) What IP address should I should grant permission for your RSYNC
client to access the lists? (and a backup IP is welcome)
******************************************
Send that information and I'll respond with further instructions as well
as my now-finalized price list (the same one that will be posted on my
web site soon).
Thanks for your interest!
Rob McEwen
Re: Starting a URIBL - Howto? [OT]
Posted by Jeremy Fairbrass <je...@fairbrass.co.nz>.
"Rob McEwen" <ro...@invaluement.com> wrote in message news:48137B9C.7090906@invaluement.com...
> Marc Perkel wrote:
>> I was just wondering from those of you who have done it - how to start a URIBL. I'm guessing the process (simplified) is:
>>
>> 1) Mine messages for links
>> 2) Subtract out anything matching a fairly large white list
>>
>> So my first question here is - what do most of you used to mine the links in a message with? Can someone point me in the right
>> direction? Also - I'm willing to work with and share data with others who are already doing this.
>>
> Marc,
>
> Just like a regular sender's IP dnsbl (aka "RBL"), the hardest part is not having FPs... in fact, this is probably *harder* for
> URIBLs compared to RBLs. The second hardest part is being able to list spammer's URIs *quickly* (particularly since trying to do
> so exacerbates the first problem.)
>
> The process you described is the best way to start... it is where everyone starts. But many have started with amazing whitelists,
> done what you described, and have failed. It take much more than a great whitelist to make a great blacklist.
>
> In fact, I know someone who frequents these anti-spam lists ...who I consider smarter than either you or me... and I happen to
> consider him the world's foremost authority on how to create and maintain a *great* RBL. (I'm not allowed to mention who he is...
> in this context... but just about everyone reading this would recognize his name... NO, this is NOT Steve Linford... please, no
> questions or guesses about this!) Anyway, over the past several months... he tried to create a great URIBL and, so far, his URIBL
> falls far short of SURBL and URIBL and ivmURI.
>
> Marc, if I had to make a short list of those who I thought might be able to pull this off... you'd definitely be on the short
> list.
>
> However, don't be discouraged if you come up short and/or if it takes many months... even years... to accomplish what you seek. If
> the guy I described can't do it (at least last I checked...), then believe me, this is NOT an easy task.
>
> I know MUCH about this. I've been one of the admins for SURBL for the past 4+ years. Additionally, I created own URIBL called
> "ivmURI", which is now *easily* in the same league as SURBL and URIBL... In fact, ivmSIP is probably even better... at least,
> according to the hit stats and FP stats that some of my users have provided me where all three URI blacklists are compared to each
> other. (Of course, all three lists are indispensable... I use ALL of them in my spam filtering... and ALL 3 catch stuff the other
> 2 miss... FOR EXAMPLE: http://invaluement.com/results.txt )
>
> At this time, there is no other publicly available URI blacklist that comes close to SURBL and URIBL and ivmURI. No "close" 4th
> place. Again, *not* *even* *close*.
>
> I hope this helps and doesn't discourage you. I had a wise college professor tell me "big problem, big solution... little problem,
> little solution". Spammer's URIs is a big problem that requires a big solution. Knowing what you're up against in creating a URI
> blacklist might seem discouraging in the short term, but might give you the proper long-term focus and patience you need to really
> pull this off.
>
> Best wishes for your success in this endeavor!
>
> Rob McEwen
> (creator of the "invaluement.com" DNSBLs, ivmURI & ivmSIP)
>
Hi Rob,
Are your invaluement.com DNSBLs available for us to use? Your http://invaluement.com/results.txt page tells me why I should be using
it TODAY ;) but I can't find any info about how...!!
Cheers,
Jeremy
Re: Starting a URIBL - Howto? [OT]
Posted by Rob McEwen <ro...@invaluement.com>.
Marc Perkel wrote:
> I was just wondering from those of you who have done it - how to start
> a URIBL. I'm guessing the process (simplified) is:
>
> 1) Mine messages for links
> 2) Subtract out anything matching a fairly large white list
>
> So my first question here is - what do most of you used to mine the
> links in a message with? Can someone point me in the right direction?
> Also - I'm willing to work with and share data with others who are
> already doing this.
>
Marc,
Just like a regular sender's IP dnsbl (aka "RBL"), the hardest part is
not having FPs... in fact, this is probably *harder* for URIBLs compared
to RBLs. The second hardest part is being able to list spammer's URIs
*quickly* (particularly since trying to do so exacerbates the first
problem.)
The process you described is the best way to start... it is where
everyone starts. But many have started with amazing whitelists, done
what you described, and have failed. It take much more than a great
whitelist to make a great blacklist.
In fact, I know someone who frequents these anti-spam lists ...who I
consider smarter than either you or me... and I happen to consider him
the world's foremost authority on how to create and maintain a *great*
RBL. (I'm not allowed to mention who he is... in this context... but
just about everyone reading this would recognize his name... NO, this is
NOT Steve Linford... please, no questions or guesses about this!)
Anyway, over the past several months... he tried to create a great URIBL
and, so far, his URIBL falls far short of SURBL and URIBL and ivmURI.
Marc, if I had to make a short list of those who I thought might be able
to pull this off... you'd definitely be on the short list.
However, don't be discouraged if you come up short and/or if it takes
many months... even years... to accomplish what you seek. If the guy I
described can't do it (at least last I checked...), then believe me,
this is NOT an easy task.
I know MUCH about this. I've been one of the admins for SURBL for the
past 4+ years. Additionally, I created own URIBL called "ivmURI", which
is now *easily* in the same league as SURBL and URIBL... In fact, ivmSIP
is probably even better... at least, according to the hit stats and FP
stats that some of my users have provided me where all three URI
blacklists are compared to each other. (Of course, all three lists are
indispensable... I use ALL of them in my spam filtering... and ALL 3
catch stuff the other 2 miss... FOR EXAMPLE:
http://invaluement.com/results.txt )
At this time, there is no other publicly available URI blacklist that
comes close to SURBL and URIBL and ivmURI. No "close" 4th place. Again,
*not* *even* *close*.
I hope this helps and doesn't discourage you. I had a wise college
professor tell me "big problem, big solution... little problem, little
solution". Spammer's URIs is a big problem that requires a big solution.
Knowing what you're up against in creating a URI blacklist might seem
discouraging in the short term, but might give you the proper long-term
focus and patience you need to really pull this off.
Best wishes for your success in this endeavor!
Rob McEwen
(creator of the "invaluement.com" DNSBLs, ivmURI & ivmSIP)