You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/04/06 13:52:54 UTC

updated SURBL results

Justin added a "urirhsbl" test to the URIBL module, so I retested on my
last 4 days of spam (the ham here ranges from 0 to 10 months old) using
SURBL and it exceeded my highest expectations.

  OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
     6491     1497     4994    0.231   0.00    0.00  (all messages)
  100.000  23.0627  76.9373    0.231   0.00    0.00  (all messages as %)
   13.804  59.8530   0.0000    1.000   1.00    0.01  T_URIBL_SC_SURBL
   14.374  61.5230   0.2403    0.996   0.99    1.00  URIBL_SBL
    0.277   0.8016   0.1201    0.870   0.55    1.00  URIBL_DSBL

I went ahead and promoted T_URIBL_SC_SURBL to URIBL_SC_SURBL

I also demoted URIBL_DSBL while I was at it.  URIBL_DSBL doesn't seem to
work so well on recent spam (and given that DSBL isn't a spamhaus-style
list, I'm not surprised).  I'm suspect it's real-time results are pretty
poor.  The weird thing is that the weekly results for URIBL_DSBL are
pretty good.  This suggests to me that spammers eventually get around to
using compromised machines as web servers, but it could be something
else weird going on.

 (weekly corpora results)
 46.667  56.8786   0.5695    0.990   0.96    1.00  URIBL_SBL
  6.313   7.6724   0.1770    0.977   0.86    1.00  URIBL_DSBL

I tested DSBL on some 6 month old spam I have and it does fare about the
same (0.5% spam hit rate), so I'm not really sure what's up with DSBL.
For now, I just demoted it rather than removing it entirely.  Even at 8%
hits, it's probably not worth it, so if someone wants to go ahead and
remove it, I won't complain.

It also seems like I get a large increase in both URIBL_SBL and
URIBL_DSBL hits if I increase the timeout and/or let stuff cache up
locally by running mass-check twice.  URIBL_SBL goes from 61.5% hits to
79.2% and URIBL_DSBL goes from 0.8% to 1.4%.  However, SURBL barely
increases run as one expects.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: updated SURBL results

Posted by Jeff Chan <je...@surbl.org>.
On Tuesday, April 6, 2004, 5:01:45 PM, Daniel Quinlan wrote:
> Jeff Chan <je...@surbl.org> writes:

>> Thanks much for the test data Daniel!  Can I ask for a clarification
>> of whether urirhsbl in URIBL is doing name resolution before comparing
>> to SURBL, or whether it's comparing "names to names"?  I tried to look
>> for urirhsbl in the sources at:
>> 
>>   http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

> Hmmm... that's not the current tree.  That must be the nightly snapshot.
> This is current:

> http://cvs.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm?rev=9881&root=Apache-SVN&view=markup

> which says:

>        urirhsbl NAME_OF_RULE rhsbl_zone lookuptype
>            Specify a RHSBL-style domain lookup.  "NAME_OF_RULE"
>            is the name of the rule to be used, "rhsbl_zone" is
>            the zone to look up domain names in, and "lookuptype"
>            is the type of lookup (TXT or A).   Note that you must
>            also define a header-eval rule calling
>            "check_uridnsbl" to use this.

>            An RHSBL zone is one where the domain name is looked
>            up, as a string; e.g. a URI using the domain "foo.com"
>            will cause a lookup of "foo.com.uriblzone.net".  Note
>            that hostnames are stripped from the domain used in
>            the URIBL lookup, so the domain "foo.bar.com" will
>            look up "bar.com.uriblzone.net", and "foo.bar.co.uk"
>            will look up "bar.co.uk.uriblzone.net".

> The code just looks up the domain.  Here are some non-dot-com-net-org
> domains looked up as an example: 

>   domain "dkldhfg33.us" listed
>   domain "hfgti6.info" listed
>   domain "hfgr33.us" listed
>   domain "net-click.net.ph" listed
>   domain "dkldhfg33.us" listed
>   domain "net-click.net.ph" listed
>   domain "dkldhfg33.us" listed
>   domain "dkldhfg33.us" listed
>   domain "hfgti6.info" listed
>   domain "dfhfks333.info" listed

Perfect!   Thanks for adding urirhsbl Justin.  And thanks for testing
it with SURBL Daniel.

> Jeff Chan <je...@surbl.org> writes:

>> Given the 60% hit rate, I assume it's doing name to name.

> Yep.  We'll see how FPs look on other corpora this weekend, but it looks
> good right now.

Thanks!

> If you want to do experiments with threshold tweaking or any other
> changes at some point and you're able to set up a test DNSBL
> (test.surbl.org), we could try comparing the two.  Since it's working
> well so far, I'd suggest leaving the current algorithm as-is until a
> replacement algorithm is shown to be better through testing (and I think
> improvements on either end should be quite possible considering we just
> got the query working).

Yes, the next version of the SURBL engine is in the works.
It will use some of the concepts we have been discussing
such as a longer default expiration of probably 10 days,
and variable thresholding and expiration based on SBL and country
inclusion probably.  That alone will be a big improvement,
though we may tweak more after that.

We are setting the new engine up separately, and a separate test
RBL is an excellent idea for side by side comparison before
any changing over of the production list.

Until then the current engine will continue to run untouched.
It seems reasonably stable, fast, useful, etc., even though
I see some of the same issues you do with domains expiring
then coming back on etc.  I'm focused on the new version
instead of tweaking the old one, which is adequate for now.

Thanks all for the continuing support!

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Re: updated SURBL results

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> except that T_URIBL_SC_SURBL is now URIBL_SC_SURBL, right?

Yes.  A prefix of "T_" means it is a test rule.  I promoted the rule
into the main rule set, removing the "T_" prefix.

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: updated SURBL results

Posted by Jeff Chan <je...@surbl.org>.
On Wednesday, April 7, 2004, 12:32:23 AM, Jeff Chan wrote:
> Could I ask you for a sample config or rule to cause urirhsbl
> to use SURBL?  I'd like to add it to the quick start at the top
> of our web page to help get people using it easily.

I think I mostly answered my own question by spotting the
sample Justin added to the cf file:

  http://www.spamassassin.org/full/3.0.x/dist/rules/25_uribl.cf

urirhsbl        URIBL_SC_SURBL  sc.surbl.org.   A
header          URIBL_SC_SURBL  eval:check_uridnsbl('T_URIBL_SC_SURBL')
describe        URIBL_SC_SURBL  Contains a URL listed in the SC SURBL blocklist
tflags          URIBL_SC_SURBL  net

except that T_URIBL_SC_SURBL is now URIBL_SC_SURBL, right?

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Re: updated SURBL results

Posted by Jeff Chan <je...@surbl.org>.
On Tuesday, April 6, 2004, 5:01:45 PM, Daniel Quinlan wrote:

> http://cvs.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm?rev=9881&root=Apache-SVN&view=markup

> which says:

>        urirhsbl NAME_OF_RULE rhsbl_zone lookuptype
>            Specify a RHSBL-style domain lookup.  "NAME_OF_RULE"
>            is the name of the rule to be used, "rhsbl_zone" is
>            the zone to look up domain names in, and "lookuptype"
>            is the type of lookup (TXT or A).   Note that you must
>            also define a header-eval rule calling
>            "check_uridnsbl" to use this.

>            An RHSBL zone is one where the domain name is looked
>            up, as a string; e.g. a URI using the domain "foo.com"
>            will cause a lookup of "foo.com.uriblzone.net".  Note
>            that hostnames are stripped from the domain used in
>            the URIBL lookup, so the domain "foo.bar.com" will
>            look up "bar.com.uriblzone.net", and "foo.bar.co.uk"
>            will look up "bar.co.uk.uriblzone.net".

Could I ask you for a sample config or rule to cause urirhsbl
to use SURBL?  I'd like to add it to the quick start at the top
of our web page to help get people using it easily.

Also where does the check_uridnsbl rule go?

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/


Re: updated SURBL results

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> Thanks much for the test data Daniel!  Can I ask for a clarification
> of whether urirhsbl in URIBL is doing name resolution before comparing
> to SURBL, or whether it's comparing "names to names"?  I tried to look
> for urirhsbl in the sources at:
> 
>   http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

Hmmm... that's not the current tree.  That must be the nightly snapshot.
This is current:

http://cvs.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm?rev=9881&root=Apache-SVN&view=markup

which says:

       urirhsbl NAME_OF_RULE rhsbl_zone lookuptype
           Specify a RHSBL-style domain lookup.  "NAME_OF_RULE"
           is the name of the rule to be used, "rhsbl_zone" is
           the zone to look up domain names in, and "lookuptype"
           is the type of lookup (TXT or A).   Note that you must
           also define a header-eval rule calling
           "check_uridnsbl" to use this.

           An RHSBL zone is one where the domain name is looked
           up, as a string; e.g. a URI using the domain "foo.com"
           will cause a lookup of "foo.com.uriblzone.net".  Note
           that hostnames are stripped from the domain used in
           the URIBL lookup, so the domain "foo.bar.com" will
           look up "bar.com.uriblzone.net", and "foo.bar.co.uk"
           will look up "bar.co.uk.uriblzone.net".

The code just looks up the domain.  Here are some non-dot-com-net-org
domains looked up as an example: 

  domain "dkldhfg33.us" listed
  domain "hfgti6.info" listed
  domain "hfgr33.us" listed
  domain "net-click.net.ph" listed
  domain "dkldhfg33.us" listed
  domain "net-click.net.ph" listed
  domain "dkldhfg33.us" listed
  domain "dkldhfg33.us" listed
  domain "hfgti6.info" listed
  domain "dfhfks333.info" listed

Jeff Chan <je...@surbl.org> writes:

> Given the 60% hit rate, I assume it's doing name to name.

Yep.  We'll see how FPs look on other corpora this weekend, but it looks
good right now.

If you want to do experiments with threshold tweaking or any other
changes at some point and you're able to set up a test DNSBL
(test.surbl.org), we could try comparing the two.  Since it's working
well so far, I'd suggest leaving the current algorithm as-is until a
replacement algorithm is shown to be better through testing (and I think
improvements on either end should be quite possible considering we just
got the query working).

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: updated SURBL results

Posted by Jeff Chan <je...@surbl.org>.
On Tuesday, April 6, 2004, 4:52:54 AM, Daniel Quinlan wrote:
> Justin added a "urirhsbl" test to the URIBL module, so I retested on my
> last 4 days of spam (the ham here ranges from 0 to 10 months old) using
> SURBL and it exceeded my highest expectations.

>   OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>      6491     1497     4994    0.231   0.00    0.00  (all messages)
>   100.000  23.0627  76.9373    0.231   0.00    0.00  (all messages as %)
>    13.804  59.8530   0.0000    1.000   1.00    0.01  T_URIBL_SC_SURBL
>    14.374  61.5230   0.2403    0.996   0.99    1.00  URIBL_SBL
>     0.277   0.8016   0.1201    0.870   0.55    1.00  URIBL_DSBL

> I went ahead and promoted T_URIBL_SC_SURBL to URIBL_SC_SURBL

Thanks much for the test data Daniel!  Can I ask for a
clarification of whether urirhsbl in URIBL is doing name
resolution before comparing to SURBL, or whether it's comparing
"names to names"?  I tried to look for urirhsbl in the sources at:

  http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

but I'm not looking in the right place since I don't see it there.

Given the 60% hit rate, I assume it's doing name to name.  If so,
awesome!

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/