You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Warren Togami Jr." <wt...@gmail.com> on 2011/01/01 13:20:52 UTC

Count of uridnsbl skip list domains?

I noticed that spam frequently contains lists of boring legitimate URL's in
an apparent attempt to overload URIBL-based detection and hide their true
URL.  It does not appear that spamassassin is the target of this evasive
behavior.  In any case I am wondering if it is possible to count the number
of uridnsbl "skip list" domains in order to grant a score to this behavior.
I'm guessing this would require modifying the plugin in order to access the
count from a rule?

Example spamassassin -D output
=======================

Jan  1 05:43:33.577 [24371] dbg: uridnsbl: domain alexa.com in skip list
Jan  1 05:43:33.577 [24371] dbg: uridnsbl: domain gmail.com in skip list
Jan  1 05:43:33.577 [24371] dbg: uridnsbl: domain bing.com in skip list
Jan  1 05:43:33.577 [24371] dbg: uridnsbl: domain doubleclick.com in skip
list
Jan  1 05:43:33.577 [24371] dbg: uridnsbl: domain craigslist.org in skip
list
Jan  1 05:43:33.578 [24371] dbg: uridnsbl: domain nytimes.com in skip list
Jan  1 05:43:33.578 [24371] dbg: uridnsbl: domain baidu.com in skip list
Jan  1 05:43:33.578 [24371] dbg: uridnsbl: domain ebay.com in skip list
Jan  1 05:43:33.578 [24371] dbg: uridnsbl: domain google.co.in in skip list
Jan  1 05:43:33.579 [24371] dbg: uridnsbl: domain apple.com in skip list
Jan  1 05:43:33.579 [24371] dbg: uridnsbl: domain google.it in skip list
Jan  1 05:43:33.579 [24371] dbg: uridnsbl: domain bbc.co.uk in skip list
Jan  1 05:43:33.579 [24371] dbg: uridnsbl: domain rediff.com in skip list
Jan  1 05:43:33.579 [24371] dbg: uridnsbl: domain aol.com in skip list
Jan  1 05:43:33.580 [24371] dbg: uridnsbl: domain amazon.com in skip list
Jan  1 05:43:33.580 [24371] dbg: uridnsbl: domains to query: rapidshare.com
google.fr googleusercontent.com live.com conduit.com 212.162.53.170

Warren

Re: Count of uridnsbl skip list domains?

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, January 5, 2011, 2:18:33 PM, Kevin McGrail wrote:

>> This made me realize that the URI skip list we have in 25_uribl.cf 
>> <http://25_uribl.cf/> could be improved to avoid lots of useless 
>> lookups.  The old skip list had not been updated for so long, that 
>> twitter and facebook among others were not listed. I have already 
>> requested an updated skip list from SURBL.
>>
>> Would anyone object to expanding the current list of 200 skip domains 
>> to maybe 500?  I suspect this will only benefit us to avoid many 
>> useless queries.
> +1 from me.

> Regards,
> KAM

+1 here

Given that SpamAssassin administrators can locally whitelist,
blacklist and soon override the skip list with the new
clear_uridnsbl_skip_domain, SURBL would definitely prefer that
the common hammy domains continue to be excluded from queries,
and that the skip list at least be updated to take into account
new very common domains such as facebook.com.

Operationally it doesn't make much sense to check google.com,
facebook.com, pfizer.com, etc., trillions of times when they will
never be blacklisted.

Another advantage is that the skip list may help defeat decoy
URIs from overloading the URI checking limit, depending on how
that's implemented.

Cheers,

Jeff C.


Re: Count of uridnsbl skip list domains?

Posted by Benny Pedersen <me...@junc.org>.
On tor 06 jan 2011 17:46:37 CET, Yet Another Ninja wrote
> I do use a local URI WL for a very small subset of domains which  
> assigns negative scores but I wouldn't impose it on anyone :-)

same here, have a local ham, grey, spam, and meta ham to not include  
any domain listed in grey spam

so if a spammer fill in ham domains but there is one grey or spam,  
then ham dont give negative score, try again :)

but i know there is a bug in spamassassin, its not solved, but imho  
some see its as a feature to include email domains in url testing :(

-- 
xpoint


Re: Count of uridnsbl skip list domains?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2011-01-06 at 17:46 +0100, Yet Another Ninja wrote:
> On 2011-01-06 17:42, Karsten Bräckelmann wrote:

> > > > Huh? There is no such thing. The whole concept of a whitelist in this
> > > > context just does not work.
> > >
> > > what do you mean "there is no such thing" ?
> > >
> > > host hotmail.com.white.uribl.com
> > > hotmail.com.white.uribl.com has address 127.0.0.2
> > >
> > > there are some ppl who mirror it and query it
> >
> > Sure -- but they do not assign negative scores based on the mere
> > existence of such a link, do they?
> 
> I have no idea who uses it and what for.

OK, let me clarify my statements. :)  Yes, URI DNSWLs do exist --
usually not publicly accessible, though.

My main point, however, was the applicability of the concept. Unlike IP
based whitelists, which pretty much work exactly like IP based black-
lists with a different sign, this inversion does not work for URI
DNSWLs. A whitelisted domain in the body of a message is just too easy
to inject. And in the case of e.g. yahoo and hotmail freemailer accounts
the spammer often gets the link "for free", force-appended by the
webmail provider itself...


> I do use a local URI WL for a very small subset of domains which assigns 
> negative scores but I wouldn't impose it on anyone :-)

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 17:42, Karsten Bräckelmann wrote:
> On Thu, 2011-01-06 at 17:06 +0100, Yet Another Ninja wrote:
>> On 2011-01-06 16:28, Karsten Bräckelmann wrote:
>>>> Has anybody thought that by ignoring/skipping such a domain also breaks
>>>> queries to a uri WL?
>>>
>>> Huh? There is no such thing. The whole concept of a whitelist in this
>>> context just does not work.
>>
>> what do you mean "there is no such thing" ?
>>
>> host hotmail.com.white.uribl.com
>> hotmail.com.white.uribl.com has address 127.0.0.2
>>
>> there are some ppl who mirror it and query it
>
> Sure -- but they do not assign negative scores based on the mere
> existence of such a link, do they?

I have no idea who uses it and what for.

I do use a local URI WL for a very small subset of domains which assigns 
negative scores but I wouldn't impose it on anyone :-)



Re: Count of uridnsbl skip list domains?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2011-01-06 at 17:06 +0100, Yet Another Ninja wrote:
> On 2011-01-06 16:28, Karsten Bräckelmann wrote:
> > > Has anybody thought that by ignoring/skipping such a domain also breaks
> > > queries to a uri WL?
> >
> > Huh? There is no such thing. The whole concept of a whitelist in this
> > context just does not work.
> 
> what do you mean "there is no such thing" ?
> 
> host hotmail.com.white.uribl.com
> hotmail.com.white.uribl.com has address 127.0.0.2
> 
> there are some ppl who mirror it and query it

Sure -- but they do not assign negative scores based on the mere
existence of such a link, do they?

This sub-thread is actually pretty much the same as your FAQ-pointing
"URIBL's WL is not what you think it is" reply to Benny -- which
coincidentally I only read after posting.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 16:28, Karsten Bräckelmann wrote:
>> Has anybody thought that by ignoring/skipping such a domain also breaks
>> queries to a uri WL?
>
> Huh? There is no such thing. The whole concept of a whitelist in this
> context just does not work.

what do you mean "there is no such thing" ?

host hotmail.com.white.uribl.com
hotmail.com.white.uribl.com has address 127.0.0.2

there are some ppl who mirror it and query it


Re: Count of uridnsbl skip list domains?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> Has anybody thought that by ignoring/skipping such a domain also breaks 
> queries to a uri WL?

Huh? There is no such thing. The whole concept of a whitelist in this
context just does not work.

(It's a different story with BL internal "do not list" $color lists.)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Count of uridnsbl skip list domains?

Posted by Benny Pedersen <me...@junc.org>.
On tor 06 jan 2011 07:47:54 CET, Yet Another Ninja wrote

> enable_uridnsbl_skip_domain yes.

or clear_uridnsbl_skip_domain

same syntax as clear_internal_networks / clear_trusted_networks ...

-- 
xpoint


Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 6:24, Jeff Chan wrote:
> On Wednesday, January 5, 2011, 3:09:40 PM, Yet Ninja wrote:
>> On 2011-01-05 23:51, Warren Togami Jr. wrote:
>
>>> However they would never need this option.  The very nature of this list is
>>> it is domains that are so common that there is no reason they will ever be
>>> blacklisted.  I asked SURBL to verify that this old 200 list is still
>>> whitelisted, and they confirmed.
>
>> The nature of the list was to save SURBL queries, for the days when
>> SURBL was the only uri BL and had few mirrors, users were on a 128kb
>> pipe and SA was more or less the only app doing SURBL queries (via
>> SpamcopURI)
>
> Not really.  It was done to avoid unnecessary queries.


as I stated above, to save queries.

Has anybody thought that by ignoring/skipping such a domain also breaks 
queries to a uri WL?


could someone pls test if having a domain in uridnsbl_skip_domain list 
renders it useless as util_rb_2tld/util_rb_3tld (as in the case Warren 
proposes with wordpress.com)

I'd welcome having uridnsbl_skip_domain as a switchable feature:

enable_uridnsbl_skip_domain yes.




Re: Count of uridnsbl skip list domains?

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, January 5, 2011, 3:09:40 PM, Yet Ninja wrote:
> On 2011-01-05 23:51, Warren Togami Jr. wrote:

>> However they would never need this option.  The very nature of this list is
>> it is domains that are so common that there is no reason they will ever be
>> blacklisted.  I asked SURBL to verify that this old 200 list is still
>> whitelisted, and they confirmed.

> The nature of the list was to save SURBL queries, for the days when
> SURBL was the only uri BL and had few mirrors, users were on a 128kb
> pipe and SA was more or less the only app doing SURBL queries (via
> SpamcopURI)

Not really.  It was done to avoid unnecessary queries.

Cheers,

Jeff C.


Re: Count of uridnsbl skip list domains?

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, January 5, 2011, 3:14:32 PM, Kevin McGrail wrote:

>>> What are you referring to specifically?
>>
>> follow the changes in the plugin.

> My $0.02 is the list already exists so making it more accurate is the
> best course of action.  The list shouldn't be astronomical but I doubt
> that 200 vs 500 is a huge memory footprint.  Perhaps 10 kilobytes?

About 8k bytes or less for 500 short domains.

Cheers,

Jeff C.


Re: Count of uridnsbl skip list domains?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> That's a good suggestion to keep everyone happy, and is fairly easy to
> implement. Will do it - but please open a ticket to follow the procedure
> properly.
Bug open. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6531

Regards,
KAM

Re: Count of uridnsbl skip list domains?

Posted by Mark Martinec <Ma...@ijs.si>.
Kevin wrote:
> The list shouldn't be astronomical but I doubt 
> that 200 vs 500 is a huge memory footprint.  Perhaps 10 kilobytes?

Jeff C wrote:
> About 8k bytes or less for 500 short domains.

Current list of 195 domains takes 15488 bytes of storage (64-bit perl)
as reported by Devel::Size::total_size().  It is stored as a hash of strings,
perl rounds up the allocation size of each string to its allocation quantum.


Yet Another Ninja wrote:
> I'd welcome having uridnsbl_skip_domain as a switchable feature:
> enable_uridnsbl_skip_domain yes.
> or clear_uridnsbl_skip_domain
> same syntax as clear_internal_networks / clear_trusted_networks ...

Warren writes:
> How about we implement a clear_* parameter so this becomes a non-issue?
> In reality it is counter-productive to actually use such an option, but
> if you insist it must be possible...

That's a good suggestion to keep everyone happy, and is fairly easy to
implement. Will do it - but please open a ticket to follow the procedure
properly.

Here is what I have in mind:


 =item clear_uridnsbl_skip_domain [domain1 domain2 ...]

 If no argument is given, then clears the entire list of domains declared
 by I<uridnsbl_skip_domain> configuration directives so far. Any subsequent
 I<uridnsbl_skip_domain> directives will start creating a new list of skip
 domains.

 When given a list of domains as arguments, only the specified domains
 are removed from the list of skipped domains.


Mark

Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On Thu, Jan 6, 2011 at 6:54 AM, Yet Another Ninja <ax...@gmail.com>wrote:

> It is clear that making the default removing the majority of the skip list
>> will only make things worse.
>>
>
>
> Warren,
>
> what would be worse? nobody spent a second thought on this till you started
> making a lot of noise. Its not like it was bugging anybody or was a
> showstopper.
>
>
I noticed that this was a performance issue.  Sure it is minor, but might as
well fix it.  We'll get the updated list of domains from SURBL soon along
with % of queries so we can decide for ourselves where to cut-off our skip
list.

Warren

Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 17:35, Warren Togami Jr. wrote:
> On Wed, Jan 5, 2011 at 1:14 PM, Kevin A. McGrail<KM...@pccc.com>  wrote:
>
>>
>>   What are you referring to specifically?
>>>>
>>>
>>> follow the changes in the plugin.
>>>
>> My $0.02 is the list already exists so making it more accurate is the best
>> course of action.  The list shouldn't be astronomical but I doubt that 200
>> vs 500 is a huge memory footprint.  Perhaps 10 kilobytes?
>>
>> If you want the ability NOT to use the list, i.e. override the list and
>> that functionality doesn't exist, then a ticket for that needs to be open
>> (it might be) but that shouldn't be a blocker.
>>
>>
>>   These queries are *not* at all useless. Query counts are used for
>>> statistics and reputation.
>>>
>> Not if they hit caches...  As a public nameserver for a number of BL and
>> based on my experience listening to David F Skoll who authors MimeDefang,
>> DNS queries are a big bottle neck for SA and reducing them is good when it
>> can be done IMO.
>>
>> regards,
>> KAM
>>
>
> It is clear that making the default removing the majority of the skip list
> will only make things worse.

Warren,

what would be worse? nobody spent a second thought on this till you 
started making a lot of noise. Its not like it was bugging anybody or 
was a showstopper.

> How about we implement a clear_* parameter so this becomes a non-issue?  In
> reality it is counter-productive to actually use such an option, but if you
> insist it must be possible...

sure.. implement it...

> In any case, what we have now is clearly in need of update.
is it?

Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On Wed, Jan 5, 2011 at 1:14 PM, Kevin A. McGrail <KM...@pccc.com> wrote:

>
>  What are you referring to specifically?
>>>
>>
>> follow the changes in the plugin.
>>
> My $0.02 is the list already exists so making it more accurate is the best
> course of action.  The list shouldn't be astronomical but I doubt that 200
> vs 500 is a huge memory footprint.  Perhaps 10 kilobytes?
>
> If you want the ability NOT to use the list, i.e. override the list and
> that functionality doesn't exist, then a ticket for that needs to be open
> (it might be) but that shouldn't be a blocker.
>
>
>  These queries are *not* at all useless. Query counts are used for
>> statistics and reputation.
>>
> Not if they hit caches...  As a public nameserver for a number of BL and
> based on my experience listening to David F Skoll who authors MimeDefang,
> DNS queries are a big bottle neck for SA and reducing them is good when it
> can be done IMO.
>
> regards,
> KAM
>

It is clear that making the default removing the majority of the skip list
will only make things worse.

How about we implement a clear_* parameter so this becomes a non-issue?  In
reality it is counter-productive to actually use such an option, but if you
insist it must be possible...

In any case, what we have now is clearly in need of update.

Warren

Re: Count of uridnsbl skip list domains?

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, January 5, 2011, 3:26:13 PM, Yet Ninja wrote:
> On 2011-01-06 0:14, Kevin A. McGrail wrote:
>>
>>>> What are you referring to specifically?
>>>
>>> follow the changes in the plugin.
>> My $0.02 is the list already exists so making it more accurate is the
>> best course of action. The list shouldn't be astronomical but I doubt
>> that 200 vs 500 is a huge memory footprint. Perhaps 10 kilobytes?

> Define accurate?

> a list of "bypass queries to avoid traffic" which means "give domain a
> free pass" ???

Yes, I think google.com, yahoo.com, w3c.com, facebook.com should
get a free pass.  YMMV.  :)

Jeff C.


Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 0:14, Kevin A. McGrail wrote:
>
>>> What are you referring to specifically?
>>
>> follow the changes in the plugin.
> My $0.02 is the list already exists so making it more accurate is the
> best course of action. The list shouldn't be astronomical but I doubt
> that 200 vs 500 is a huge memory footprint. Perhaps 10 kilobytes?

Define accurate?

a list of "bypass queries to avoid traffic" which means "give domain a 
free pass" ???

The list is legacy for legacy issues.

>
> If you want the ability NOT to use the list, i.e. override the list and
> that functionality doesn't exist, then a ticket for that needs to be
> open (it might be) but that shouldn't be a blocker.

The subject has come up but it was of low priority.

>
>> These queries are *not* at all useless. Query counts are used for
>> statistics and reputation.
> Not if they hit caches... As a public nameserver for a number of BL and
> based on my experience listening to David F Skoll who authors
> MimeDefang, DNS queries are a big bottle neck for SA and reducing them
> is good when it can be done IMO.

reducing yes, but by skipping you do more than that.




Re: Count of uridnsbl skip list domains?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
>> What are you referring to specifically?
>
> follow the changes in the plugin.
My $0.02 is the list already exists so making it more accurate is the 
best course of action.  The list shouldn't be astronomical but I doubt 
that 200 vs 500 is a huge memory footprint.  Perhaps 10 kilobytes?

If you want the ability NOT to use the list, i.e. override the list and 
that functionality doesn't exist, then a ticket for that needs to be 
open (it might be) but that shouldn't be a blocker.

> These queries are *not* at all useless. Query counts are used for 
> statistics and reputation.
Not if they hit caches...  As a public nameserver for a number of BL and 
based on my experience listening to David F Skoll who authors 
MimeDefang, DNS queries are a big bottle neck for SA and reducing them 
is good when it can be done IMO.

regards,
KAM

Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-05 23:51, Warren Togami Jr. wrote:
> On Wed, Jan 5, 2011 at 12:31 PM, Yet Another Ninja<ax...@gmail.com>wrote:
>
>> On 2011-01-05 23:18, Kevin A. McGrail wrote:
>>
>>>
>>>
>>>> This made me realize that the URI skip list we have in 25_uribl.cf
>>>> <http://25_uribl.cf/>  could be improved to avoid lots of useless
>>>>
>>>> lookups. The old skip list had not been updated for so long, that
>>>> twitter and facebook among others were not listed. I have already
>>>> requested an updated skip list from SURBL.
>>>>
>>>> Would anyone object to expanding the current list of 200 skip domains
>>>> to maybe 500? I suspect this will only benefit us to avoid many
>>>> useless queries.
>>>>
>>> +1 from me.
>>>
>>
>> -1
>>
>> 1. This list imposes a restriction on ppl who may run local URI lists and
>> SA provides no unwhitelist_uridnsbl_skip_domain to bypass the default
>> uridnsbl_skip_domain entries.
>> This was discussed ages ago.
>>
>
> Do you advocate that we change nothing?  The current list is lacking obvious
> domains like facebook, twitter and many others.  By your logic we should
> entirely remove the existing list as it is restricting the local admins.

Yes, if it were up to me, except for a dozen domains, I'd remove it.


> However they would never need this option.  The very nature of this list is
> it is domains that are so common that there is no reason they will ever be
> blacklisted.  I asked SURBL to verify that this old 200 list is still
> whitelisted, and they confirmed.

The nature of the list was to save SURBL queries, for the days when 
SURBL was the only uri BL and had few mirrors, users were on a 128kb 
pipe and SA was more or less the only app doing SURBL queries (via 
SpamcopURI)

> The existing rules have an arbitrary line drawn based upon very old data.
> Certainly it needs improvement.  But how do we draw a new line?


>>
>> 2. Expanding it only increases memory use, no speed benefit.
>> The query cost is minimal as most of the proposed domains will already be
>> cached close to the SA instance.
>>
>> in SA 3.4 SVN code we have whitelisting of URIs for ppl who want to avoid
>> queries.
>>
>
> What are you referring to specifically?

follow the changes in the plugin.

>>
>> I assume Warren doesn't have insight on query traffic affecting either
>> SURBL or URIBL or any other URI BL so his opinion is only based on his gut
>> feeeling.
>>
>>
> I had a gut feeling, but I needed to verify with statistics.  My
> investigation into URI_SKIPPED_* revealed that we were doing a great many
> useless DNS queries because our skip list is so old.

These queries are *not* at all useless. Query counts are used for 
statistics and reputation.








Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On Wed, Jan 5, 2011 at 12:31 PM, Yet Another Ninja <ax...@gmail.com>wrote:

> On 2011-01-05 23:18, Kevin A. McGrail wrote:
>
>>
>>
>>> This made me realize that the URI skip list we have in 25_uribl.cf
>>> <http://25_uribl.cf/> could be improved to avoid lots of useless
>>>
>>> lookups. The old skip list had not been updated for so long, that
>>> twitter and facebook among others were not listed. I have already
>>> requested an updated skip list from SURBL.
>>>
>>> Would anyone object to expanding the current list of 200 skip domains
>>> to maybe 500? I suspect this will only benefit us to avoid many
>>> useless queries.
>>>
>> +1 from me.
>>
>
> -1
>
> 1. This list imposes a restriction on ppl who may run local URI lists and
> SA provides no unwhitelist_uridnsbl_skip_domain to bypass the default
> uridnsbl_skip_domain entries.
> This was discussed ages ago.
>

Do you advocate that we change nothing?  The current list is lacking obvious
domains like facebook, twitter and many others.  By your logic we should
entirely remove the existing list as it is restricting the local admins.
However they would never need this option.  The very nature of this list is
it is domains that are so common that there is no reason they will ever be
blacklisted.  I asked SURBL to verify that this old 200 list is still
whitelisted, and they confirmed.

The existing rules have an arbitrary line drawn based upon very old data.
Certainly it needs improvement.  But how do we draw a new line?


>
> 2. Expanding it only increases memory use, no speed benefit.
> The query cost is minimal as most of the proposed domains will already be
> cached close to the SA instance.
>
> in SA 3.4 SVN code we have whitelisting of URIs for ppl who want to avoid
> queries.
>

What are you referring to specifically?


>
> I assume Warren doesn't have insight on query traffic affecting either
> SURBL or URIBL or any other URI BL so his opinion is only based on his gut
> feeeling.
>
>
I had a gut feeling, but I needed to verify with statistics.  My
investigation into URI_SKIPPED_* revealed that we were doing a great many
useless DNS queries because our skip list is so old.  I then asked SURBL to
recommend an updated skip list based upon their actual query traffic.
Meanwhile, I asked for opinions here on this list.  I do not act
arbitrarily.

Warren

Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-05 23:18, Kevin A. McGrail wrote:
>
>>
>> This made me realize that the URI skip list we have in 25_uribl.cf
>> <http://25_uribl.cf/> could be improved to avoid lots of useless
>> lookups. The old skip list had not been updated for so long, that
>> twitter and facebook among others were not listed. I have already
>> requested an updated skip list from SURBL.
>>
>> Would anyone object to expanding the current list of 200 skip domains
>> to maybe 500? I suspect this will only benefit us to avoid many
>> useless queries.
> +1 from me.

-1

1. This list imposes a restriction on ppl who may run local URI lists 
and SA provides no unwhitelist_uridnsbl_skip_domain to bypass the 
default uridnsbl_skip_domain entries.
This was discussed ages ago.

2. Expanding it only increases memory use, no speed benefit.
The query cost is minimal as most of the proposed domains will already 
be cached close to the SA instance.

in SA 3.4 SVN code we have whitelisting of URIs for ppl who want to 
avoid queries.

I assume Warren doesn't have insight on query traffic affecting either 
SURBL or URIBL or any other URI BL so his opinion is only based on his 
gut feeeling.




Re: Count of uridnsbl skip list domains?

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-01-06 12:27, Benny Pedersen wrote:
> On ons 05 jan 2011 23:18:33 CET, "Kevin A. McGrail" wrote
>
>>> Would anyone object to expanding the current list of 200 skip domains
>>> to maybe 500? I suspect this will only benefit us to avoid many
>>> useless queries.
>> +1 from me.
>
> or turn it into a update channel ?
>
> long time i asked why uribl_white is not added to skip, uribl answered
> it was not there intention to go that broadway, but i can see why not :(

http://www.uribl.com/faq.shtml#q3

"Will you whitelist my domain?" says why URIBL's WL is not that what you 
think it is





Re: Count of uridnsbl skip list domains?

Posted by Benny Pedersen <me...@junc.org>.
On ons 05 jan 2011 23:18:33 CET, "Kevin A. McGrail" wrote

>> Would anyone object to expanding the current list of 200 skip  
>> domains to maybe 500?  I suspect this will only benefit us to avoid  
>> many useless queries.
> +1 from me.

or turn it into a update channel ?

long time i asked why uribl_white is not added to skip, uribl answered  
it was not there intention to go that broadway, but i can see why not :(

what is the creteria to be in skip ?

dont know if white is dropped on uribl.com but it was there once i was  
mirror nameserver

-- 
xpoint


Re: Count of uridnsbl skip list domains?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
>
> This made me realize that the URI skip list we have in 25_uribl.cf 
> <http://25_uribl.cf/> could be improved to avoid lots of useless 
> lookups.  The old skip list had not been updated for so long, that 
> twitter and facebook among others were not listed. I have already 
> requested an updated skip list from SURBL.
>
> Would anyone object to expanding the current list of 200 skip domains 
> to maybe 500?  I suspect this will only benefit us to avoid many 
> useless queries.
+1 from me.

Regards,
KAM

Re: Count of uridnsbl skip list domains?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> SURBL has now given a script to all their DNS servers to parse their 
> logs and find their next most common whitelisted hits.  We can then 
> decide what to do with it after they provide the new list.
>
I'm a public mirror for SURBL if you want me to run it against my 
stats.  I have a 1.7GB stat file ;-)

Regards,
KAM

Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
2011/1/5 Karsten Bräckelmann <gu...@rudersport.de>

> > Would anyone object to expanding the current list of 200 skip domains
> > to maybe 500?  I suspect this will only benefit us to avoid many
> > useless queries.
>
> I would prefer to base the number of domains on DNS query stats, and
> where a cut-off actually might make sense, rather than an arbitrary,
> fixed number.
>
>
I would agree, except given that we've had a hardcoded 200 list for so long,
their DNS query stats are skewed significantly away from our existing 200
common list.

SURBL has now given a script to all their DNS servers to parse their logs
and find their next most common whitelisted hits.  We can then decide what
to do with it after they provide the new list.

Warren

Re: Count of uridnsbl skip list domains?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Wed, 2011-01-05 at 12:13 -1000, Warren Togami Jr. wrote:
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm?view=diff&r1=1055287&r2=1055288&pathrev=1055288
> I backed out this change for reasons written in the commit.  I did
> never figure out why it caused seemingly unrelated lint failures in
> trunk.

The eval() rule should accept parameters for ranges, to dynamically
generate the various rules. Not hard code them. (See SVN link below for
the rules.)

Also, while the idea might be worthwhile, I see two issues with it.
First, the spam you have in mind should score rather high anyway, based
on samples of a recent-ish campaign I have seen that matches this.
Second, this might be aiming short -- the real info would be "number of
domains in skip list, plus number of domains not listed". Possibly
compared to the total number of domains. Such an extended measure
probably would be even more prone to FPs, though.


> This made me realize that the URI skip list we have in 25_uribl.cf
> could be improved to avoid lots of useless lookups.  The old skip list
> had not been updated for so long, that twitter and facebook among
> others were not listed. I have already requested an updated skip list
> from SURBL.
> 
> Would anyone object to expanding the current list of 200 skip domains
> to maybe 500?  I suspect this will only benefit us to avoid many
> useless queries.

I would prefer to base the number of domains on DNS query stats, and
where a cut-off actually might make sense, rather than an arbitrary,
fixed number.


> On Sun, Jan 2, 2011 at 1:34 AM, Warren Togami Jr. wrote:
> > http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_uri_skipped.cf?view=markup&pathrev=1054357


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm?view=diff&r1=1055287&r2=1055288&pathrev=1055288
I backed out this change for reasons written in the commit.  I did never
figure out why it caused seemingly unrelated lint failures in trunk.

This made me realize that the URI skip list we have in 25_uribl.cf could be
improved to avoid lots of useless lookups.  The old skip list had not been
updated for so long, that twitter and facebook among others were not listed.
I have already requested an updated skip list from SURBL.

Would anyone object to expanding the current list of 200 skip domains to
maybe 500?  I suspect this will only benefit us to avoid many useless
queries.

Warren

On Sun, Jan 2, 2011 at 1:34 AM, Warren Togami Jr. <wt...@gmail.com> wrote:

>
> http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_uri_skipped.cf?view=markup&pathrev=1054357
> This rule worked locally without errors visible in spamassassin -D.  I
> don't understand how it causes seemingly unrelated complaints in the lint?
>
> I backed out the rules for now so Hudson will stop complaining.
>
> Warren
>
>
> On Sat, Jan 1, 2011 at 2:20 AM, Warren Togami Jr. <wt...@gmail.com>wrote:
>
>> I noticed that spam frequently contains lists of boring legitimate URL's
>> in an apparent attempt to overload URIBL-based detection and hide their true
>> URL.  It does not appear that spamassassin is the target of this evasive
>> behavior.  In any case I am wondering if it is possible to count the number
>> of uridnsbl "skip list" domains in order to grant a score to this behavior.
>> I'm guessing this would require modifying the plugin in order to access the
>> count from a rule?
>>
>>

Re: Count of uridnsbl skip list domains?

Posted by "Warren Togami Jr." <wt...@gmail.com>.
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_uri_skipped.cf?view=markup&pathrev=1054357
This rule worked locally without errors visible in spamassassin -D.  I don't
understand how it causes seemingly unrelated complaints in the lint?

I backed out the rules for now so Hudson will stop complaining.

Warren

On Sat, Jan 1, 2011 at 2:20 AM, Warren Togami Jr. <wt...@gmail.com> wrote:

> I noticed that spam frequently contains lists of boring legitimate URL's in
> an apparent attempt to overload URIBL-based detection and hide their true
> URL.  It does not appear that spamassassin is the target of this evasive
> behavior.  In any case I am wondering if it is possible to count the number
> of uridnsbl "skip list" domains in order to grant a score to this behavior.
> I'm guessing this would require modifying the plugin in order to access the
> count from a rule?
>
>