You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Nick Edwards <ni...@gmail.com> on 2013/11/29 04:30:18 UTC

uribl problem

Hi, have a problem with our internal uribl

urirhsbl        INT_URI uri.int.lan. A
body            INT_URI eval:check_uridnsbl('INT_URI')
describe        INT_URI Contains a URI listed in internal URIBL
tflags          INT_URI net
score           INT_URI 3


this rule performs lookups if in normal text of body, however, i we
have inside html if does not lookup. eg

"hi see example.org"  looks up example.org
but
"hi see <a href="http://example.org">example.net</a>"
it will lookup example.net, not example.org
 is this correct or do I need some other lookup method in local.cf ?

thanks

Re: uribl problem

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2013-12-02 at 07:58 +1000, Nick Edwards wrote:
> On 12/1/13, Karsten Bräckelmann <gu...@rudersport.de> wrote:

> > The general SA method of verifying which domains are queried for, is to
> > have a look at the debug output. In your case, you can also check your
> > local DNSBL's logs.
> >
> >   spamassassin -D uridnsbl  < msg
> 
> Ahh ok, this produces output I missed in the 2000 lines of normal
> debug output, it turns out it is seeing that host/domain for a lookup,
> however in my case that prompted me to ask this question, it was not
> looking up the domain in question because as your suggested debug
> output easily shows, that domain is in a skip list, which explains why
> it was not looking up.

That's what grep or searching in less is for. :)  You would have quickly
noticed example.net being skipped by searching the output for your
test-case...


> Is there an easy way to say ignore this host/domain in a skip list? or
> disable skip list altogether? closest I can find is skip rbl checks.

See the URIDNSBL plugin documentation, (clear_)uridnsbl_skip_domain
options.
  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_URIDNSBL.html

Before dropping any domain from the default skip list in 25_uribl.cf,
keep in mind that affects all URI DNSBLs. Unless a domain in the skip
list turns severely rogue, they will never be listed by DNSBLs anyway.

  $ grep example.net 25_uribl.cf
  uridnsbl_skip_domain example.com example.net example.org
  $ grep uridnsbl_skip_domain 25_uribl.cf | wc -l -w
       51     254

The default skip list is about 200 domains, generated from URI DNSBL
data of domains frequently appearing in mail and thus (previously) being
looked up.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: uribl problem

Posted by Nick Edwards <ni...@gmail.com>.
Hi Karsten,

On 12/1/13, Karsten Bräckelmann <gu...@rudersport.de> wrote:
> On Fri, 2013-11-29 at 13:30 +1000, Nick Edwards wrote:
>> Hi, have a problem with our internal uribl
>>
>> urirhsbl        INT_URI uri.int.lan. A
>> body            INT_URI eval:check_uridnsbl('INT_URI')
>> describe        INT_URI Contains a URI listed in internal URIBL
>> tflags          INT_URI net
>> score           INT_URI 3
>
> That's correct.
>

Thanks

>> this rule performs lookups if in normal text of body, however, i we
>> have inside html if does not lookup. eg
>>
>> "hi see example.org"  looks up example.org
>> but
>> "hi see <a href="http://example.org">example.net</a>"
>> it will lookup example.net, not example.org
>
> How do you tell SA does not lookup the domain in the HTML anchor href?
>

I ran debug and viewed the scrollback (see below)

> The general SA method of verifying which domains are queried for, is to
> have a look at the debug output. In your case, you can also check your
> local DNSBL's logs.
>
>   spamassassin -D uridnsbl  < msg
>

Ahh ok, this produces output I missed in the 2000 lines of normal
debug output, it turns out it is seeing that host/domain for a lookup,
however in my case that prompted me to ask this question, it was not
looking up the domain in question because as your suggested debug
output easily shows, that domain is in a skip list, which explains why
it was not looking up.

Is there an easy way to say ignore this host/domain in a skip list? or
disable skip list altogether? closest I can find is skip rbl checks.


> To see more of the URIDNSBL plugin activity, including which DNSBLs are
> queried and what domains are looked up, you can use e.g.
>
>   spamassassin -D  < msg  2>&1 | grep URI-DNSBL
>
> To limit that to your local DNSBL, grep for DNSBL:uri.int.lan.
>

right, added that to my cheats list :)

>
> Note: The absence of a rule match for the second domain in the Report
> header is NOT an indicator of a missing query. If more than one domain
> is listed in the DNSBL, the urirhsbl rule will still be triggered once
> only, showing one domain, not all listed domains:
>
>   X-Spam-Report:
>     *  3.0 INT_URI Contains a URI listed in internal URIBL
>     *      [URIs: example.net]
>
> Despite the plural in the automatically added detail, it does list one
> domain only. Probably a bug in the URIDNSBL plugin, though might also be
> intended.
>
> Since the DNSBL lookups are asynchronous, it is likely undefined which
> listed domain will trigger the rule to hit and be reported, influenced
> by lookup time and the order they are parsed from the message.
>
>

Awesome, thank you.

Re: uribl problem

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2013-11-29 at 13:30 +1000, Nick Edwards wrote:
> Hi, have a problem with our internal uribl
> 
> urirhsbl        INT_URI uri.int.lan. A
> body            INT_URI eval:check_uridnsbl('INT_URI')
> describe        INT_URI Contains a URI listed in internal URIBL
> tflags          INT_URI net
> score           INT_URI 3

That's correct.

> this rule performs lookups if in normal text of body, however, i we
> have inside html if does not lookup. eg
> 
> "hi see example.org"  looks up example.org
> but
> "hi see <a href="http://example.org">example.net</a>"
> it will lookup example.net, not example.org

How do you tell SA does not lookup the domain in the HTML anchor href?

The general SA method of verifying which domains are queried for, is to
have a look at the debug output. In your case, you can also check your
local DNSBL's logs.

  spamassassin -D uridnsbl  < msg

will limit the debug output to the URIDNSBL plugin, which would look
like this:

  dbg: uridnsbl: domain example.net in skip list
  dbg: uridnsbl: domain example.com in skip list
  dbg: uridnsbl: domains to query: anchor-text.net anchor-href.net
  dbg: uridnsbl: domain "anchor-text.net" listed (INT_URI): 127.0.0.2
  dbg: uridnsbl: domain "anchor-href.net" listed (INT_URI): 127.0.0.2

Note the placeholder domains as found in the HTML anchor href and parsed
from the text. The example.(net|com) domains you used are perfect for
the HTML sample snippet, but won't work for actual debugging, since they
are in the default skip list.

To see more of the URIDNSBL plugin activity, including which DNSBLs are
queried and what domains are looked up, you can use e.g.

  spamassassin -D  < msg  2>&1 | grep URI-DNSBL

To limit that to your local DNSBL, grep for DNSBL:uri.int.lan.


Note: The absence of a rule match for the second domain in the Report
header is NOT an indicator of a missing query. If more than one domain
is listed in the DNSBL, the urirhsbl rule will still be triggered once
only, showing one domain, not all listed domains:

  X-Spam-Report:
    *  3.0 INT_URI Contains a URI listed in internal URIBL
    *      [URIs: example.net]

Despite the plural in the automatically added detail, it does list one
domain only. Probably a bug in the URIDNSBL plugin, though might also be
intended.

Since the DNSBL lookups are asynchronous, it is likely undefined which
listed domain will trigger the rule to hit and be reported, influenced
by lookup time and the order they are parsed from the message.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: uribl problem

Posted by Benny Pedersen <me...@junc.eu>.
Nick Edwards skrev den 2013-11-29 04:30:

> urirhsbl        INT_URI uri.int.lan. A
> body            INT_URI eval:check_uridnsbl('INT_URI')
> describe        INT_URI Contains a URI listed in internal URIBL
> tflags          INT_URI net
> score           INT_URI 3

rule is okay as designed

> this rule performs lookups if in normal text of body, however, i we
> have inside html if does not lookup. eg
> 
> "hi see example.org"  looks up example.org

grep example.org msg | wc -l

> but
> "hi see <a href="http://example.org">example.net</a>"
> it will lookup example.net, not example.org

grep example.org msg | wc -l
grep example.net msg | wc -l

is both is with one line ?

rule of thumps here is that how email ia designed, with means header is 
part of body testing, so if example.net exits in From: then it will be 
tested in uridnsbl aswell, basicly its a feature that is imho not meant 
to be so, but its a nice bug :=)

problem is here that it also test uridnsbl domains as sender domains, eg 
it unfiltered with type it is, testing that this domains have mx records 
would also make more fails, testing if domains have a or aaaa is limted 
usable for testing spam

>  is this correct or do I need some other lookup method in local.cf ?

you will need another plugin to make this specific test imho