You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Warren Togami Jr." <wt...@gmail.com> on 2011/02/10 00:21:13 UTC
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Most of these are redundant to rules in my sandbox. Please remove the
redundant parts and I guess put the missing rules into my existing sem file.
http://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/wtogami/20_bug_6220_sem.cf
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6527
Note that the URI rules aren't working in the masscheck because of this bug.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6297#c14
Which was introduced by an incomplete fix here.
Warren
On 2/9/2011 11:51 AM, smf@apache.org wrote:
> Author: smf
> Date: Wed Feb 9 21:51:24 2011
> New Revision: 1069129
>
> URL: http://svn.apache.org/viewvc?rev=1069129&view=rev
> Log:
> Updated sandbox to mass-check SEM DNS lists
>
> Added:
> spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
>
> Added: spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf?rev=1069129&view=auto
> ==============================================================================
> --- spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf (added)
> +++ spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf Wed Feb 9 21:51:24 2011
> @@ -0,0 +1,53 @@
> +#testrules
> +
> +# DNSEval rules
> +ifplugin Mail::SpamAssassin::Plugin::DNSEval
> +
> +header SMF_SEM_BLACK eval:check_rbl('semblack-lastexternal', 'bl.spameatingmonkey.net')
> +tflags SMF_SEM_BLACK net
> +describe SMF_SEM_BLACK Received from an IP listed by SEM-BLACK
> +score SMF_SEM_BLACK 0.1
> +reuse SMF_SEM_BLACK
> +
> +endif
> +
> +
> +# URIDNSBL rules
> +ifplugin Mail::SpamAssassin::Plugin::URIDNSBL
> +
> +urirhssub SMF_SEM_URI uribl.spameatingmonkey.net. A 2
> +body SMF_SEM_URI eval:check_uridnsbl('SEM_URI')
> +describe SMF_SEM_URI Contains a URI listed by SEM-URI
> +tflags SMF_SEM_URI net
> +score SMF_SEM_URI 0.1
> +reuse SMF_SEM_URI
> +
> +urirhssub SMF_SEM_URIRED urired.spameatingmonkey.net. A 2
> +body SMF_SEM_URIRED eval:check_uridnsbl('SEM_URIRED')
> +describe SMF_SEM_URIRED Contains a URI listed by SEM-URIRED
> +tflags SMF_SEM_URIRED net
> +score SMF_SEM_URIRED 0.1
> +reuse SMF_SEM_URIRED
> +
> +urirhssub SMF_SEM_FRESH fresh.spameatingmonkey.net. A 2
> +body SMF_SEM_FRESH eval:check_uridnsbl('SEM_FRESH')
> +describe SMF_SEM_FRESH Contains a domain registered less than 5 days ago
> +tflags SMF_SEM_FRESH net
> +score SMF_SEM_FRESH 0.1
> +reuse SMF_SEM_FRESH
> +
> +urirhssub SMF_SEM_FRESH_10 fresh10.spameatingmonkey.net. A 2
> +body SMF_SEM_FRESH_10 eval:check_uridnsbl('SEM_FRESH')
> +describe SMF_SEM_FRESH_10 Contains a domain registered less than 5 days ago
> +tflags SMF_SEM_FRESH_10 net
> +score SMF_SEM_FRESH_10 0.1
> +reuse SMF_SEM_FRESH_10
> +
> +urirhssub SMF_SEM_FRESH_15 fresh15.spameatingmonkey.net. A 2
> +body SMF_SEM_FRESH_15 eval:check_uridnsbl('SEM_FRESH')
> +describe SMF_SEM_FRESH_15 Contains a domain registered less than 5 days ago
> +tflags SMF_SEM_FRESH_15 net
> +score SMF_SEM_FRESH_15 0.1
> +reuse SMF_SEM_FRESH_15
> +
> +endif
>
>
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/9/2011 2:41 PM, Steve Freegard wrote:
> On 09/02/11 23:21, Warren Togami Jr. wrote:
>> Most of these are redundant to rules in my sandbox. Please remove the
>> redundant parts and I guess put the missing rules into my existing sem
>> file.
>
> Are you sure this is a bug and not because you forgot to assign a score
> to them? IIRC network tests like these are skipped if they do not have a
> positive score.
>
> I think I'll therefore leave these in my sandbox for now and see what
> happens; the names are different and yours aren't being run anyway so I
> can't see any harm.
>
> Regards,
> Steve.
Hmm, it is possible that assigning a score is a workaround for the bug.
Prior to the introduction of that bug it wasn't necessary to assign a
score in the sandbox. I actually avoided doing so because it is
misleading, the score is not actually used for anything.
Anyway, please consolidate the SEM rules to a single file (I don't care
where).
Warren
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 11:08 PM, Steve Freegard wrote:
> It was my understanding that multiple SEM lists were already being
> included in the weekly masscheck. I have no objections to it and have
> plenty of capacity to handle the queries.
>
> --Blaine Fleming
> SEM Admin
Alrighty then. Could you ask him to make a combined zone so it is
possible to query all with a single DNS lookup?
>
>> I'm going ahead with the several changes I suggested but keeping the
>> URI rules disabled for now.
>
> I will update my rules with the correct naming; I've already run a local
> network mass-check last night and I'm seeing the results I expected
> SEM_FRESH, SEM_FRESH_10, SEM_FRESH_15 all score better than URIBL_RHS_DOB:
>
> OVERALL SPAM% HAM% S/O RANK SCORE NAME
> 0 25000 424 0.983 0.00 0.00 (all messages)
> 0.00000 98.3323 1.6677 0.983 0.00 0.00 (all messages as %)
> 15.639 15.9040 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_15
> 15.623 15.8880 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_10
> 10.954 11.1400 0.0000 1.000 0.83 0.00 T_SMF_SEM_FRESH
> 9.271 9.4280 0.0000 1.000 0.81 0.00 URIBL_RHS_DOB
BTW, your FRESH rules had typos that indicate they were not behaving as
you expect. See the changes I checked into svn.
Warren
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by Steve Freegard <st...@stevefreegard.com>.
On 11/02/11 06:59, Warren Togami Jr. wrote:
> Steve, any comment on the previous post?
Sorry - but I was busy yesterday and waiting for a reply from Blaine.
> I don't think it is a good idea to enable the URI rules in masscheck
> unless Blaine has approved of this. He will suddenly be whacked by
> millions of queries on Saturday and that could be nasty if he isn't
> expecting it.
>
It was my understanding that multiple SEM lists were already being
included in the weekly masscheck. I have no objections to it and have
plenty of capacity to handle the queries.
--Blaine Fleming
SEM Admin
> I'm going ahead with the several changes I suggested but keeping the
> URI rules disabled for now.
I will update my rules with the correct naming; I've already run a local
network mass-check last night and I'm seeing the results I expected
SEM_FRESH, SEM_FRESH_10, SEM_FRESH_15 all score better than URIBL_RHS_DOB:
OVERALL SPAM% HAM% S/O RANK SCORE NAME
0 25000 424 0.983 0.00 0.00 (all messages)
0.00000 98.3323 1.6677 0.983 0.00 0.00 (all messages as %)
15.639 15.9040 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_15
15.623 15.8880 0.0000 1.000 0.85 0.00 T_SMF_SEM_FRESH_10
10.954 11.1400 0.0000 1.000 0.83 0.00 T_SMF_SEM_FRESH
9.271 9.4280 0.0000 1.000 0.81 0.00 URIBL_RHS_DOB
Regards,
Steve.
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 11:16 PM, Steve Freegard wrote:
>> 2) Given the aforementioned issue where every URI will trigger
>> multiple DNS queries to SEM, we may want to strongly suggest to Blaine
>> for him to make a combined zone so all of his URIBL's can be queried
>> with a single DNS lookup.
>
> First things first. These rules need to be tested for their
> effectiveness and overlap before anything else happens. No point in
> asking for a big change like this if we end up not putting the rules
> into the main ruleset.
>
Good point.
As noted earlier, SEMBLACK is already a non-starter for inclusion.
I am interested to see the performance and overlap of the URIBL's. I'm
enabling newly renamed URIBL_SEM_FRESH* on my mail servers with
informational scores so we'll have some "reuse" data. Other masscheck
participants should do the same if they do tagging.
Warren
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by Steve Freegard <st...@stevefreegard.com>.
On 11/02/11 07:11, Warren Togami Jr. wrote:
> 1) His rule names like SEM_URI, SEM_URIRED, SEM_FRESH* don't follow
> spamassassin's own conventions where URIBL rules begin with URIBL_*.
> I'm thinking we should probably add that prefix.
>
I'll correct this.
> 2) Given the aforementioned issue where every URI will trigger
> multiple DNS queries to SEM, we may want to strongly suggest to Blaine
> for him to make a combined zone so all of his URIBL's can be queried
> with a single DNS lookup.
First things first. These rules need to be tested for their
effectiveness and overlap before anything else happens. No point in
asking for a big change like this if we end up not putting the rules
into the main ruleset.
Regards,
Steve.
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 2/10/2011 8:59 PM, Warren Togami Jr. wrote:
> Steve, any comment on the previous post? I don't think it is a good idea
> to enable the URI rules in masscheck unless Blaine has approved of this.
> He will suddenly be whacked by millions of queries on Saturday and that
> could be nasty if he isn't expecting it.
>
> I'm going ahead with the several changes I suggested but keeping the URI
> rules disabled for now.
>
Two more thoughts:
1) His rule names like SEM_URI, SEM_URIRED, SEM_FRESH* don't follow
spamassassin's own conventions where URIBL rules begin with URIBL_*.
I'm thinking we should probably add that prefix.
2) Given the aforementioned issue where every URI will trigger multiple
DNS queries to SEM, we may want to strongly suggest to Blaine for him to
make a combined zone so all of his URIBL's can be queried with a single
DNS lookup.
Warren
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
Steve, any comment on the previous post? I don't think it is a good
idea to enable the URI rules in masscheck unless Blaine has approved of
this. He will suddenly be whacked by millions of queries on Saturday
and that could be nasty if he isn't expecting it.
I'm going ahead with the several changes I suggested but keeping the URI
rules disabled for now.
Warren
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by "Warren Togami Jr." <wt...@gmail.com>.
On 02/09/2011 02:41 PM, Steve Freegard wrote:
> On 09/02/11 23:21, Warren Togami Jr. wrote:
>> Most of these are redundant to rules in my sandbox. Please remove the
>> redundant parts and I guess put the missing rules into my existing sem
>> file.
>
> Are you sure this is a bug and not because you forgot to assign a score
> to them? IIRC network tests like these are skipped if they do not have a
> positive score.
I figured it out. The bug workaround has nothing to do with having a
score or not. You are working around the issue with your use of
#testrules instead of tflags nopublish.
>
> I think I'll therefore leave these in my sandbox for now and see what
> happens; the names are different and yours aren't being run anyway so I
> can't see any harm.
Consolidate into single sandbox file
====================================
My SEM rules were in nightly masscheck since late 2009, and the URI
rules were previously working before Bug #6527 happened.
RCVD_IN_SEMBLACK is also the official rule name (from SEM's own
website). In cases of these network rules, precedent seems to be using
the official names.
Let us consolidate the rules into a single file? Let's use your file.
It doesn't matter where they are in the sandbox. I'll delete my file
and lets rename your rules.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6220
Please note anything related to SEM here.
Reuse Changes
=============
"reuse" is only ideal for the FRESH lists. We will have almost no
readings at all if we use "reuse" for SEMBLACK, URI and URIRED.
Is this wise?
=============
Did you ask Blaine if he approves of this?
It is required for the list operators to ask for or approve spamassassin
testing of their lists. We haven't been testing his URIBL's for a long
time, and if we suddenly begin testing MULTIPLE of his lists we'll
suddenly whack his DNS servers with millions of lookups on Saturday.
The nightly masscheck currently has ~820k mail. Many of those mail have
multiple domain names. Multiply that by his different URIBL's, and that
is a significant flood of DNS lookups coming out of nowhere.
SURBL's multiple URIBL's avoids this multiplication issue because they
are all on a single DNS lookup with different return codes.
SEMBLACK should be avoided
==========================
http://www.spamtips.org/2011/01/dnsbl-safety-report-1232011.html
http://ruleqa.spamassassin.org/20110205-r1067413-n/T_RCVD_IN_SEMBLACK/detail
I strongly recommend folks to not use RCVD_IN_SEMBLACK because of a
questionable record on safety during the past years and overlap of ~90%
with the high scoring Spamhaus RCVD_IN_PBL. (Also old news: Late 2009 I
caught him outright copying PSBL, which he claimed was an innocent
mistake. I don't know what methodology he uses now.)
While its safety improved in recent weeks, this high level of overlap
with PBL makes it dangerous and redundant. Also look at "set 0,
score-map", almost none of the spam hits 5 points and below are
SEMBLACK. This means using SEMBLACK almost never helps you.
Warren Togami
warren@togami.com
Re: svn commit: r1069129 - /spamassassin/trunk/rulesrc/sandbox/smf/30_sem.cf
Posted by Steve Freegard <st...@stevefreegard.com>.
On 09/02/11 23:21, Warren Togami Jr. wrote:
> Most of these are redundant to rules in my sandbox. Please remove the
> redundant parts and I guess put the missing rules into my existing sem
> file.
Are you sure this is a bug and not because you forgot to assign a score
to them? IIRC network tests like these are skipped if they do not have
a positive score.
I think I'll therefore leave these in my sandbox for now and see what
happens; the names are different and yours aren't being run anyway so I
can't see any harm.
Regards,
Steve.