You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2008/01/02 17:22:40 UTC

Question about getting a blacklist included in SA

I was wondering about how to get a blacklist included in the SA 
distribution. I have a blacklist and whitelist that are both very good. 
I've been publishing it for about a year now. But I have a few questions.

What are the licensing requirements that I have to give to be included? 
I assume it has to be unrestricted?

What kind of bandwidth does it usually pull from servers when it is part 
of the default distribution?

I have 5 servers now at 3 locations and soon to add a 6th at a 4th 
location. Is this enough?

What other issues does SA look for when it come to inclusion?

Here's the info on my lists:
http://wiki.ctyme.com/index.php/Spam_DNS_Lists

Re: Question about getting a blacklist included in SA

Posted by Marc Perkel <ma...@perkel.com>.

Matthias Leisi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Matt Kettler wrote:
>
>   
>> Comparatively speaking, 6 might be inadequate. I don't know how much of
>> that scale is really "necessary" for minimal operation, and how much is
>> just needed for scalability against DDoS attacks.
>>     
>
> dnswl.org runs on 10 servers(*). Given that a whitelist has a lower DDoS
> risk than a blacklicst (spammers don't gain from DoSing a whitelist), a
> lower number seems sufficient for a pure whitelist.
>
> Traffic for the list.dnswl.org zone is well above 100 GByte/month, and
> rising. The dnswl.org zone adds circa 15 GByte/month; rsync is only
> about 5 GByte/month (all numbers per mirror).
>
> With the inclusion of dnswl.org rules into the the SA default ruleset,
> traffic roughly tripled in a short time. However I have no clue how much
> of the current traffic can *now* be attributed to these default rules.
>
> [Interestingly, we have a noticeable traffic peak around late afternoons
> central european time. I'm not sure why this happens, as I would have
> expected a more uniform worldwide / timezone / load distribution.]
>
>   

Maybe what I need to do is start with my white list which is easier to 
maintain and more accurate. See how that goes. Being DDOSed worries me 
some and I'm not sure I'm quite up to dealing with it yet. But I would 
ask that the SA developers test my lists to see if they are at least 
interesting. Also, I provide a lot of data for several other lists and 
maybe what I should do is just work behind the scenes and provide my 
data to someone else's list.

BTW, if anyone wans my data or a feed of my spam contact me privately 
and we'll work something out.

Re: Question about getting a blacklist included in SA

Posted by Matthias Leisi <ma...@leisi.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Matt Kettler wrote:

> Comparatively speaking, 6 might be inadequate. I don't know how much of
> that scale is really "necessary" for minimal operation, and how much is
> just needed for scalability against DDoS attacks.

dnswl.org runs on 10 servers(*). Given that a whitelist has a lower DDoS
risk than a blacklicst (spammers don't gain from DoSing a whitelist), a
lower number seems sufficient for a pure whitelist.

Traffic for the list.dnswl.org zone is well above 100 GByte/month, and
rising. The dnswl.org zone adds circa 15 GByte/month; rsync is only
about 5 GByte/month (all numbers per mirror).

With the inclusion of dnswl.org rules into the the SA default ruleset,
traffic roughly tripled in a short time. However I have no clue how much
of the current traffic can *now* be attributed to these default rules.

[Interestingly, we have a noticeable traffic peak around late afternoons
central european time. I'm not sure why this happens, as I would have
expected a more uniform worldwide / timezone / load distribution.]

- -- Matthias (for dnswl.org)

(*) Expansion is a priority for the next couple of weeks. So if you have
a VMWare, an IP address and some bandwidth to spare... ;-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFHfL6dxbHw2nyi/okRAtQAAKDJ6DRPJABZ0/Nj952JiSrIMcy/TgCfRVt3
whh7c4lAw66Ii9L7NazXqHs=
=SlbP
-----END PGP SIGNATURE-----

Re: Question about getting a blacklist included in SA

Posted by Matt Kettler <mk...@verizon.net>.
Per Jessen wrote:
> Matt Kettler wrote:
>
>   
>>> What kind of bandwidth does it usually pull from servers when it is
>>> part of the default distribution?
>>>
>>> I have 5 servers now at 3 locations and soon to add a 6th at a 4th
>>> location. Is this enough?
>>>       
>> For that, I have no clue.. probably not a lot of bandwidth, but
>> probably quite a lot of queries. Perhaps one of the URIBL or SURBL
>> folks that hang out in this list could give you a better idea.
>>     
>
> Over the last 24 hours, our public SURBL DNS had an average of 200
> queries per second, with peaks of 250 queries/sec.  Total volume was
> about 15million queries, roughly 6Gb traffic.
>
>   
Interesting stats, thanks Per.

I suppose another useful metric for Marc would be to look at the number
of servers and locations hosting URIBL or SURBL.

URIBL appears to have 24 server locations, 11 in the US, 11 in europe, 1
in south america and 1 in asia. I don't know how many of those locations
contains more than one server, but it does give you a sense of scale.

http://www.uribl.com/mirrors.shtml

Surbl doesn't have a neat google map to give you an idea of locations,
but they have 53 servers, at least, that's my count..

http://www.surbl.org/nameservers-output.html

Comparatively speaking, 6 might be inadequate. I don't know how much of
that scale is really "necessary" for minimal operation, and how much is
just needed for scalability against DDoS attacks.


Re: Question about getting a blacklist included in SA

Posted by Per Jessen <pe...@computer.org>.
Matt Kettler wrote:

>> What kind of bandwidth does it usually pull from servers when it is
>> part of the default distribution?
>>
>> I have 5 servers now at 3 locations and soon to add a 6th at a 4th
>> location. Is this enough?
>
> For that, I have no clue.. probably not a lot of bandwidth, but
> probably quite a lot of queries. Perhaps one of the URIBL or SURBL
> folks that hang out in this list could give you a better idea.

Over the last 24 hours, our public SURBL DNS had an average of 200
queries per second, with peaks of 250 queries/sec.  Total volume was
about 15million queries, roughly 6Gb traffic.


/Per Jessen, Zürich


Re: Question about getting a blacklist included in SA

Posted by Marc Perkel <ma...@perkel.com>.

Matt Kettler wrote:
> Marc Perkel wrote:
>   
>> I was wondering about how to get a blacklist included in the SA
>> distribution. I have a blacklist and whitelist that are both very
>> good. I've been publishing it for about a year now. But I have a few
>> questions.
>>
>> What are the licensing requirements that I have to give to be
>> included? I assume it has to be unrestricted?
>>     
> Yes, or at least unrestricted enough that almost anyone, including
> businesses, can use it freely.
>
> IMO, your list usage criteria currently is a bit too restrictive.
>
> Personally, I think for inclusion in SA it should be free for anyone,
> possibly with some exceptions to protect the list from being overloaded..
>
>  Spamhaus fits this kind of "self defense against overload" model IMO,
> limiting free usage to sites under 80k emails and 320k queries per day,
> and not allowing free use by appliances or reseller services. However,
> they are open to free use by businesses that aren't selling spam
> filtering as long as they're under the 80/320k limits. IMO, anything
> more restrictive than that probably shouldn't be in SA. Actually,
> personally I wish spamhaus's limits were a bit more liberal, but I'm not
> vastly uncomfortable with them.
>   
>> What kind of bandwidth does it usually pull from servers when it is
>> part of the default distribution?
>>
>> I have 5 servers now at 3 locations and soon to add a 6th at a 4th
>> location. Is this enough?
>>     
> For that, I have no clue.. probably not a lot of bandwidth, but probably
> quite a lot of queries. Perhaps one of the URIBL or SURBL folks that
> hang out in this list could give you a better idea. Although the usage
> is different, they're also almost exclusively used by SA. Other RBL
> operators get a lot of usage by tools other than SA.
>
>   
>> What other issues does SA look for when it come to inclusion?
>>     
> 1) S/O performance in some of the preliminary mass-checks.
>
> I see you've got some SA rules posted on your site.. perhaps we could
> sandbox them and see how they do in the nightly mass-checks.. Any devs
> curious? Anyone want to set me up with access to my sandbox so I can put
> them in?
>
> 2) Well documented listing/delisting policies, having those policies be
> compatible with SA ideals (ie: spews was not compatible), and a track
> record of sticking to them..
>
> One thing that strikes me as missing is that permanent
> black/whitelisting is mentioned, but there's no distinct policies about
> what gets permanently listed.  Track records only come with time, but
> you need policies in order to start building one :)
>
> 3)  I would say it needs to be distinctive from "all the others". If an
> RBL has 100% overlap with other lists, it's not adding any useful
> coverage. That could also be checked out in the mass-checks.
>
>   

Thanks Matt,

Depending on other issues I'd be willing to change my terms to be SA 
compatible. I can work on the docs. My list is definitely not another 
"me too" list. I've found a way to identify spambots on their first 
attempt but looking at things like hitting fake high numbers MX records 
and tracking what servers fail to issue a quit to close the connection. 
I'm tracking around 750k spambots that I blacklist.

Permanent black/white listing comes from some lists that I do download 
from other places where use of port 25 is prohibited. As to white lists 
that is somewhat automatic. I think I have the biggest white list on the 
planet.

I am interested in having some testing done to see if there are any 
false positive issues and see how it rates as to if it's effective.

Also, I have an interesting feature, yellow listing, which if you check 
my yellow list then if it's listed you can ignore all other lists. 
Yellow listing are mixed ham/spam sources like Yahoo, gmail, Hotmail 
that should never be either black or white listed.


Re: Question about getting a blacklist included in SA

Posted by Matt Kettler <mk...@verizon.net>.
Marc Perkel wrote:
> I was wondering about how to get a blacklist included in the SA
> distribution. I have a blacklist and whitelist that are both very
> good. I've been publishing it for about a year now. But I have a few
> questions.
>
> What are the licensing requirements that I have to give to be
> included? I assume it has to be unrestricted?
Yes, or at least unrestricted enough that almost anyone, including
businesses, can use it freely.

IMO, your list usage criteria currently is a bit too restrictive.

Personally, I think for inclusion in SA it should be free for anyone,
possibly with some exceptions to protect the list from being overloaded..

 Spamhaus fits this kind of "self defense against overload" model IMO,
limiting free usage to sites under 80k emails and 320k queries per day,
and not allowing free use by appliances or reseller services. However,
they are open to free use by businesses that aren't selling spam
filtering as long as they're under the 80/320k limits. IMO, anything
more restrictive than that probably shouldn't be in SA. Actually,
personally I wish spamhaus's limits were a bit more liberal, but I'm not
vastly uncomfortable with them.
>
> What kind of bandwidth does it usually pull from servers when it is
> part of the default distribution?
>
> I have 5 servers now at 3 locations and soon to add a 6th at a 4th
> location. Is this enough?
For that, I have no clue.. probably not a lot of bandwidth, but probably
quite a lot of queries. Perhaps one of the URIBL or SURBL folks that
hang out in this list could give you a better idea. Although the usage
is different, they're also almost exclusively used by SA. Other RBL
operators get a lot of usage by tools other than SA.

> What other issues does SA look for when it come to inclusion?
1) S/O performance in some of the preliminary mass-checks.

I see you've got some SA rules posted on your site.. perhaps we could
sandbox them and see how they do in the nightly mass-checks.. Any devs
curious? Anyone want to set me up with access to my sandbox so I can put
them in?

2) Well documented listing/delisting policies, having those policies be
compatible with SA ideals (ie: spews was not compatible), and a track
record of sticking to them..

One thing that strikes me as missing is that permanent
black/whitelisting is mentioned, but there's no distinct policies about
what gets permanently listed.  Track records only come with time, but
you need policies in order to start building one :)

3)  I would say it needs to be distinctive from "all the others". If an
RBL has 100% overlap with other lists, it's not adding any useful
coverage. That could also be checked out in the mass-checks.






Re: Question about getting a blacklist included in SA

Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 2 Jan 2008, Marc Perkel wrote:

> Here's the info on my lists:
> http://wiki.ctyme.com/index.php/Spam_DNS_Lists

Get somebody to proofread that page.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  I'll have that son of a bitch eating out of dumpsters in less than
  two years.       -- MS CEO Steve Ballmer, on RedHat CEO Matt Szulik
-----------------------------------------------------------------------
 15 days until Benjamin Franklin's 302nd Birthday