You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Don Newcomer <ne...@dickinson.edu> on 2004/07/02 15:06:16 UTC

Re: [SURBL-Discuss] Re: New SURBL additions

Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by
spam and ham:

URI_RBL spam counts:

 3577 - AB_URI_RBL (5.0) - surbl.cf
 2499 - DS_URI_RBL (0.33) - surbl.cf
 7282 - OB_URI_RBL (4.0) - surbl.cf
 4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf
 5458 - WS_URI_RBL (3.0) - surbl.cf

URI_RBL ham counts:

  231 - DS_URI_RBL (0.33) - surbl.cf
   18 - OB_URI_RBL (4.0) - surbl.cf
    1 - SPAMCOP_URI_RBL (3.0) - surbl.cf
   29 - WS_URI_RBL (3.0) - surbl.cf

Interesting that AB_URI_RBL has no false positives yet...  Still, we
haven't released spam filtering to our users yet so my Bayes training is
based pretty much on all of the SA rulesets' interpretation of spam (which
isn't necessarily a bad thing).

Don Newcomer
Senior Manager, Systems
Infrastructure Systems Department
Library and Information Services
Dickinson College
P.O. Box 1773
Carlisle, PA  17013
717-245-1256 (Voice)
717-245-1690 (FAX)
newcomer@dickinson.edu

On Thu, 1 Jul 2004, Jeff Chan wrote:

> Still looking for anyone's spam detection rates and false
> positive rates with all the lists:
>
>   sc.surbl.org - SpamCop spamvertised sites
>   ws.surbl.org - sa-blacklist, BigEvil and other data
>   ob.surbl.org - OutBlaze spamvertised sites
>   ab.surbl.org - AbuseButler spamvertised sites
>
>   ds.surbl.org (beta, 6dos data)
>
> Jeff C.
> --
> Jeff Chan
> mailto:jeffc@surbl.org
> http://www.surbl.org/
>
> _______________________________________________
> Discuss mailing list
> Discuss@lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
>

Re: [SURBL-Discuss] Re: New SURBL additions

Posted by Jeff Chan <je...@surbl.org>.
On Friday, July 2, 2004, 6:06:16 AM, Don Newcomer wrote:
> Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by
> spam and ham:

> URI_RBL spam counts:

>  3577 - AB_URI_RBL (5.0) - surbl.cf
>  2499 - DS_URI_RBL (0.33) - surbl.cf
>  7282 - OB_URI_RBL (4.0) - surbl.cf
>  4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf
>  5458 - WS_URI_RBL (3.0) - surbl.cf

> URI_RBL ham counts:

>   231 - DS_URI_RBL (0.33) - surbl.cf
>    18 - OB_URI_RBL (4.0) - surbl.cf
>     1 - SPAMCOP_URI_RBL (3.0) - surbl.cf
>    29 - WS_URI_RBL (3.0) - surbl.cf

> Interesting that AB_URI_RBL has no false positives yet...  Still, we
> haven't released spam filtering to our users yet so my Bayes training is
> based pretty much on all of the SA rulesets' interpretation of spam (which
> isn't necessarily a bad thing).

Thanks much for the data Don, particularly the false positive
hits.  Does anyone else have any to share?   If so please post
them here.

ab.surbl.org is based on SpamCop data plus some manual reports,
as is sc.surbl.org, but ab has a different inclusion criteria
of taking the top 500 most often reported (less www. duplicates
and whitelists hits) over 7 days, whereas sc has an arbitrary
inclusion threshold of 10 reports over 4 days.  1 FP for sc
is pretty good, though zero is better.  :-)

ob is pretty impressive in terms of hit rate and relatively
low FP rate, at least as a percentage of hits.

Note that ds.surbl.org (based on 6dos data) is now up on 5 name
servers so it may be ok to use on production servers for beta
testing.

Please note that I probably won't be able to check email for
about a week so hopefully others will help answer SURBL
questions, etc.

Cheers,

Jeff C.


Re: [SURBL-Discuss] Re: New SURBL additions

Posted by Alex Pleiner <pl...@zeitform.de>.
* Martin <ma...@idkommunikation.com> [2004-07-02 16:02]:
> Don Newcomer wrote:

> >It's just a shell script using awk, grep, sed, and sort.  Not a big deal
> >really.

> Ok, do you mind to share the script?

You might find the attached script helpful. Edit @rulesdir if you need
and feed your spam mails via STDIN. The script will grep for
X-Spam-Status (change if necessary) and count the rules found.

Usage: sa-stats.pl < spam.box

Alex

Output (for the last few days):
   1 BAYES_99                            -  102 - 23_bayes.cf              
   2 WS_URI_RBL                          -   61 - spamcop_uri-local.cf     
   3 NO_REAL_NAME                        -   59 - 20_head_tests.cf         
   4 RCVD_IN_SORBS                       -   42 - 20_dnsbl_tests.cf        
   5 SPAMCOP_URI_RBL                     -   34 - spamcop_uri.cf           
   6 HTML_MESSAGE                        -   33 - 20_html_tests.cf         
   7 RCVD_IN_BL_SPAMCOP_NET              -   31 - 20_dnsbl_tests.cf        
   8 RAZOR2_CHECK                        -   30 - 20_body_tests.cf         
   9 RAZOR2_CF_RANGE_51_100              -   28 - 20_body_tests.cf         
  10 RCVD_IN_DYNABLOCK                   -   23 - 20_dnsbl_tests.cf        
[...snip...]


> Thanks!

> / Martin


-- 
Alex Pleiner
zeitform Internet Dienste OHG     Fraunhoferstr. 5
                                  64283 Darmstadt, Germany
http://www.zeitform.de            Tel.: +49 (0)6151 155-635
mailto:pleiner@zeitform.de        Fax:  +49 (0)6151 155-634
GnuPG/PGP Key-ID: 0x613C21EA

Re: [SURBL-Discuss] Re: New SURBL additions

Posted by Martin <ma...@idkommunikation.com>.
Don Newcomer wrote:

> It's just a shell script using awk, grep, sed, and sort.  Not a big deal
> really.

Ok, do you mind to share the script?

Thanks!

/ Martin


Re: [SURBL-Discuss] Re: New SURBL additions

Posted by Don Newcomer <ne...@dickinson.edu>.
It's just a shell script using awk, grep, sed, and sort.  Not a big deal
really.

Don Newcomer
Senior Manager, Systems
Infrastructure Systems Department
Library and Information Services
Dickinson College
P.O. Box 1773
Carlisle, PA  17013
717-245-1256 (Voice)
717-245-1690 (FAX)
newcomer@dickinson.edu

On Fri, 2 Jul 2004, Martin wrote:

> Don Newcomer wrote:
>
> > Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by
> > spam and ham:
> >
> > URI_RBL spam counts:
> >
> >  3577 - AB_URI_RBL (5.0) - surbl.cf
> >  2499 - DS_URI_RBL (0.33) - surbl.cf
> >  7282 - OB_URI_RBL (4.0) - surbl.cf
> >  4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf
> >  5458 - WS_URI_RBL (3.0) - surbl.cf
>
> How do you collect the stats like this? I want to do this too :)
>
> / Martin
>

Re: [SURBL-Discuss] Re: New SURBL additions

Posted by Martin <ma...@idkommunikation.com>.
Don Newcomer wrote:

> Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by
> spam and ham:
> 
> URI_RBL spam counts:
> 
>  3577 - AB_URI_RBL (5.0) - surbl.cf
>  2499 - DS_URI_RBL (0.33) - surbl.cf
>  7282 - OB_URI_RBL (4.0) - surbl.cf
>  4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf
>  5458 - WS_URI_RBL (3.0) - surbl.cf

How do you collect the stats like this? I want to do this too :)

/ Martin