You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Warren Togami Jr." <wt...@gmail.com> on 2011/02/10 00:21:13 UTC

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Most of these are redundant to rules in my sandbox.  Please remove the 
redundant parts and I guess put the missing rules into my existing sem file.

http://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/wtogami/20_bug_6220_sem.cf

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6527
Note that the URI rules aren't working in the masscheck because of this bug.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6297#c14
Which was introduced by an incomplete fix here.

Warren

On 2/9/2011 11:51 AM, smf@apache.org wrote:
> Author: smf
> Date: Wed Feb  9 21:51:24 2011
> New Revision: 1069129
>
> URL: http://svn.apache.org/viewvc?rev=1069129&view=rev
> Log:
> Updated sandbox to mass-check SEM DNS lists
>
> Added:
>      spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
>
> Added: spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf?rev=1069129&view=auto
> ==============================================================================
> --- spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf (added)
> +++ spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf Wed Feb  9 21:51:24 2011
> @@ -0,0 +1,53 @@
> +#testrules
> +
> +# DNSEval rules
> +ifplugin Mail::SpamAssassin::Plugin::DNSEval
> +
> +header    SMF_SEM_BLACK    eval:check_rbl('semblack-lastexternal', 'bl.spameatingmonkey.net')
> +tflags    SMF_SEM_BLACK    net
> +describe  SMF_SEM_BLACK    Received from an IP listed by SEM-BLACK
> +score     SMF_SEM_BLACK    0.1
> +reuse     SMF_SEM_BLACK
> +
> +endif
> +
> +
> +# URIDNSBL rules
> +ifplugin Mail::SpamAssassin::Plugin::URIDNSBL
> +
> +urirhssub SMF_SEM_URI      uribl.spameatingmonkey.net. A 2
> +body      SMF_SEM_URI      eval:check_uridnsbl('SEM_URI')
> +describe  SMF_SEM_URI      Contains a URI listed by SEM-URI
> +tflags    SMF_SEM_URI      net
> +score     SMF_SEM_URI      0.1
> +reuse     SMF_SEM_URI
> +
> +urirhssub SMF_SEM_URIRED   urired.spameatingmonkey.net. A 2
> +body      SMF_SEM_URIRED   eval:check_uridnsbl('SEM_URIRED')
> +describe  SMF_SEM_URIRED   Contains a URI listed by SEM-URIRED
> +tflags    SMF_SEM_URIRED   net
> +score     SMF_SEM_URIRED   0.1
> +reuse     SMF_SEM_URIRED
> +
> +urirhssub SMF_SEM_FRESH    fresh.spameatingmonkey.net. A 2
> +body      SMF_SEM_FRESH    eval:check_uridnsbl('SEM_FRESH')
> +describe  SMF_SEM_FRESH    Contains a domain registered less than 5 days ago
> +tflags    SMF_SEM_FRESH    net
> +score     SMF_SEM_FRESH    0.1
> +reuse	  SMF_SEM_FRESH
> +
> +urirhssub SMF_SEM_FRESH_10 fresh10.spameatingmonkey.net. A 2
> +body      SMF_SEM_FRESH_10 eval:check_uridnsbl('SEM_FRESH')
> +describe  SMF_SEM_FRESH_10 Contains a domain registered less than 5 days ago
> +tflags    SMF_SEM_FRESH_10 net
> +score     SMF_SEM_FRESH_10 0.1
> +reuse     SMF_SEM_FRESH_10
> +
> +urirhssub SMF_SEM_FRESH_15 fresh15.spameatingmonkey.net. A 2
> +body      SMF_SEM_FRESH_15 eval:check_uridnsbl('SEM_FRESH')
> +describe  SMF_SEM_FRESH_15 Contains a domain registered less than 5 days ago
> +tflags    SMF_SEM_FRESH_15 net
> +score     SMF_SEM_FRESH_15 0.1
> +reuse     SMF_SEM_FRESH_15
> +
> +endif
>
>


Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/9/2011 2:41 PM, Steve Freegard wrote:
> On 09/02/11 23:21, Warren Togami Jr. wrote:
>> Most of these are redundant to rules in my sandbox. Please remove the
>> redundant parts and I guess put the missing rules into my existing sem
>> file.
>
> Are you sure this is a bug and not because you forgot to assign a score
> to them? IIRC network tests like these are skipped if they do not have a
> positive score.
>
> I think I'll therefore leave these in my sandbox for now and see what
> happens; the names are different and yours aren't being run anyway so I
> can't see any harm.
>
> Regards,
> Steve.

Hmm, it is possible that assigning a score is a workaround for the bug. 
  Prior to the introduction of that bug it wasn't necessary to assign a 
score in the sandbox.  I actually avoided doing so because it is 
misleading, the score is not actually used for anything.

Anyway, please consolidate the SEM rules to a single file (I don't care 
where).

Warren

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 11:08 PM, Steve Freegard wrote:
> It was my understanding that multiple SEM lists were already being
> included in the weekly masscheck. I have no objections to it and have
> plenty of capacity to handle the queries.
>
> --Blaine Fleming
> SEM Admin

Alrighty then.  Could you ask him to make a combined zone so it is 
possible to query all with a single DNS lookup?

>
>> I'm going ahead with the several changes I suggested but keeping the
>> URI rules disabled for now.
>
> I will update my rules with the correct naming; I've already run a local
> network mass-check last night and I'm seeing the results I expected
> SEM_FRESH, SEM_FRESH_10, SEM_FRESH_15 all score better than URIBL_RHS_DOB:
>
> OVERALL SPAM% HAM% S/O RANK SCORE NAME
> 0 25000 424 0.983 0.00 0.00 (all messages)
> 0.00000 98.3323 1.6677 0.983 0.00 0.00 (all messages as %)
> 15.639 15.9040 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_15
> 15.623 15.8880 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_10
> 10.954 11.1400 0.0000 1.000 0.83 0.00 T_SMF_SEM_FRESH
> 9.271 9.4280 0.0000 1.000 0.81 0.00 URIBL_RHS_DOB

BTW, your FRESH rules had typos that indicate they were not behaving as 
you expect.  See the changes I checked into svn.

Warren

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by Steve Freegard <st...@stevefreegard.com>.
  On 11/02/11 06:59, Warren Togami Jr. wrote:
> Steve, any comment on the previous post? 

Sorry - but I was busy yesterday and waiting for a reply from Blaine.

> I don't think it is a good idea to enable the URI rules in masscheck 
> unless Blaine has approved of this.  He will suddenly be whacked by 
> millions of queries on Saturday and that could be nasty if he isn't 
> expecting it.
>

It was my understanding that multiple SEM lists were already being
included in the weekly masscheck.  I have no objections to it and have
plenty of capacity to handle the queries.

--Blaine Fleming
SEM Admin

> I'm going ahead with the several changes I suggested but keeping the 
> URI rules disabled for now.

I will update my rules with the correct naming; I've already run a local 
network mass-check last night and I'm seeing the results I expected 
SEM_FRESH, SEM_FRESH_10, SEM_FRESH_15 all score better than URIBL_RHS_DOB:

OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
       0    25000      424    0.983   0.00    0.00  (all messages)
0.00000  98.3323   1.6677    0.983   0.00    0.00  (all messages as %)
  15.639  15.9040   0.0000    1.000   0.85    0.00  T_SMF_SEM_FRESH_15
  15.623  15.8880   0.0000    1.000   0.85    0.00  T_SMF_SEM_FRESH_10
  10.954  11.1400   0.0000    1.000   0.83    0.00  T_SMF_SEM_FRESH
   9.271   9.4280   0.0000    1.000   0.81    0.00  URIBL_RHS_DOB

Regards,
Steve.

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 11:16 PM, Steve Freegard wrote:
>> 2) Given the aforementioned issue where every URI will trigger
>> multiple DNS queries to SEM, we may want to strongly suggest to Blaine
>> for him to make a combined zone so all of his URIBL's can be queried
>> with a single DNS lookup.
>
> First things first. These rules need to be tested for their
> effectiveness and overlap before anything else happens. No point in
> asking for a big change like this if we end up not putting the rules
> into the main ruleset.
>

Good point.

As noted earlier, SEMBLACK is already a non-starter for inclusion.

I am interested to see the performance and overlap of the URIBL's.  I'm 
enabling newly renamed URIBL_SEM_FRESH* on my mail servers with 
informational scores so we'll have some "reuse" data.  Other masscheck 
participants should do the same if they do tagging.

Warren

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by Steve Freegard <st...@stevefreegard.com>.
  On 11/02/11 07:11, Warren Togami Jr. wrote:
> 1) His rule names like SEM_URI, SEM_URIRED, SEM_FRESH* don't follow 
> spamassassin's own conventions where URIBL rules begin with URIBL_*. 
> I'm thinking we should probably add that prefix.
>

I'll correct this.

> 2) Given the aforementioned issue where every URI will trigger 
> multiple DNS queries to SEM, we may want to strongly suggest to Blaine 
> for him to make a combined zone so all of his URIBL's can be queried 
> with a single DNS lookup.

First things first.  These rules need to be tested for their 
effectiveness and overlap before anything else happens.  No point in 
asking for a big change like this if we end up not putting the rules 
into the main ruleset.

Regards,
Steve.

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 8:59 PM, Warren Togami Jr. wrote:
> Steve, any comment on the previous post? I don't think it is a good idea
> to enable the URI rules in masscheck unless Blaine has approved of this.
> He will suddenly be whacked by millions of queries on Saturday and that
> could be nasty if he isn't expecting it.
>
> I'm going ahead with the several changes I suggested but keeping the URI
> rules disabled for now.
>

Two more thoughts:

1) His rule names like SEM_URI, SEM_URIRED, SEM_FRESH* don't follow 
spamassassin's own conventions where URIBL rules begin with URIBL_*. 
I'm thinking we should probably add that prefix.

2) Given the aforementioned issue where every URI will trigger multiple 
DNS queries to SEM, we may want to strongly suggest to Blaine for him to 
make a combined zone so all of his URIBL's can be queried with a single 
DNS lookup.

Warren

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
Steve, any comment on the previous post?  I don't think it is a good 
idea to enable the URI rules in masscheck unless Blaine has approved of 
this.  He will suddenly be whacked by millions of queries on Saturday 
and that could be nasty if he isn't expecting it.

I'm going ahead with the several changes I suggested but keeping the URI 
rules disabled for now.

Warren

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 02/09/2011 02:41 PM, Steve Freegard wrote:
> On 09/02/11 23:21, Warren Togami Jr. wrote:
>> Most of these are redundant to rules in my sandbox. Please remove the
>> redundant parts and I guess put the missing rules into my existing sem
>> file.
>
> Are you sure this is a bug and not because you forgot to assign a score
> to them? IIRC network tests like these are skipped if they do not have a
> positive score.

I figured it out.  The bug workaround has nothing to do with having a 
score or not.  You are working around the issue with your use of 
#testrules instead of tflags nopublish.

>
> I think I'll therefore leave these in my sandbox for now and see what
> happens; the names are different and yours aren't being run anyway so I
> can't see any harm.

Consolidate into single sandbox file
====================================
My SEM rules were in nightly masscheck since late 2009, and the URI 
rules were previously working before Bug #6527 happened. 
RCVD_IN_SEMBLACK is also the official rule name (from SEM's own 
website).  In cases of these network rules, precedent seems to be using 
the official names.

Let us consolidate the rules into a single file?  Let's use your file. 
It doesn't matter where they are in the sandbox.  I'll delete my file 
and lets rename your rules.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6220
Please note anything related to SEM here.

Reuse Changes
=============
"reuse" is only ideal for the FRESH lists.  We will have almost no 
readings at all if we use "reuse" for SEMBLACK, URI and URIRED.

Is this wise?
=============
Did you ask Blaine if he approves of this?

It is required for the list operators to ask for or approve spamassassin 
testing of their lists.  We haven't been testing his URIBL's for a long 
time, and if we suddenly begin testing MULTIPLE of his lists we'll 
suddenly whack his DNS servers with millions of lookups on Saturday. 
The nightly masscheck currently has ~820k mail.  Many of those mail have 
multiple domain names.  Multiply that by his different URIBL's, and that 
is a significant flood of DNS lookups coming out of nowhere.

SURBL's multiple URIBL's avoids this multiplication issue because they 
are all on a single DNS lookup with different return codes.

SEMBLACK should be avoided
==========================
http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html
http://ruleqa.spamassassin.org/20110205-r1067413-n/T_RCVD_IN_SEMBLACK/detail
I strongly recommend folks to not use RCVD_IN_SEMBLACK because of a 
questionable record on safety during the past years and overlap of ~90% 
with the high scoring Spamhaus RCVD_IN_PBL.  (Also old news: Late 2009 I 
caught him outright copying PSBL, which he claimed was an innocent 
mistake.  I don't know what methodology he uses now.)

While its safety improved in recent weeks, this high level of overlap 
with PBL makes it dangerous and redundant.  Also look at "set 0, 
score-map", almost none of the spam hits 5 points and below are 
SEMBLACK.  This means using SEMBLACK almost never helps you.

Warren Togami
warren@togami.com

Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf

Posted by Steve Freegard <st...@stevefreegard.com>.
On 09/02/11 23:21, Warren Togami Jr. wrote:
> Most of these are redundant to rules in my sandbox.  Please remove the
> redundant parts and I guess put the missing rules into my existing sem
> file.

Are you sure this is a bug and not because you forgot to assign a score 
to them?  IIRC network tests like these are skipped if they do not have 
a positive score.

I think I'll therefore leave these in my sandbox for now and see what 
happens; the names are different and yours aren't being run anyway so I 
can't see any harm.

Regards,
Steve.