You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2010/04/02 21:39:35 UTC

[Bug 6400] New: GA feedback for Mailspike DNSBL

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

           Summary: GA feedback for Mailspike DNSBL
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Score Generation
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: joao.gouveia@anubisnetworks.com


Would it be possible to include Mailspike DNSBL in the next GA run?

Currently we're assigning (high) scores (see
http://mailspike.org/anubis/implementation_sa.html) based on our experience,
but most likely these scores are far from the ideal if we want to have
something that works for the SA community.

Rules are already in the Warren's sandbox:
http://svn.apache.org/viewvc/spamassassin/branches/3.3/rulesrc/sandbox/wtogami/20_anubis.cf
(later we need to change the DNS zones to mailspike.net but the data is the
same, so that should not make any difference)

Any thing we can do to help get this running?

Thanks!

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #7 from Kevin A. McGrail <km...@pccc.com> 2011-11-28 19:10:15 UTC ---
(In reply to comment #6)
> So consensus is that RCVD_IN_MSPIKE_BL should be added to the default rule set
> with a fixed very conservative score, until we have enough data to determine a
> more useful score?
> 
> So Warren should remove the "nopublish" flag, and set a fixed score, of what? 
> 1?  0.1?  
> 
> Latest ruleqa net results: 
> http://ruleqa.spamassassin.org/20111126/T_RCVD_IN_MSPIKE_BL/detail
> 
>   MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
>       0  74.6530   0.0067   1.000    0.99    0.00  T_RCVD_IN_MSPIKE_BL  
> 
> Which makes it the second best ranked blacklist, below RCVD_IN_XBL.  But that's
> skewed by the fact that it doesn't have reuse set.

I'd recommend the entire MSPIKE kit and kaboodle.  I'm running with these
scores and recommend them:

# Scores
score RCVD_IN_MSPIKE_ZBI     3.5
score RCVD_IN_MSPIKE_L5      3.1
score RCVD_IN_MSPIKE_L4      2.5
score RCVD_IN_MSPIKE_L3      1.9
score RCVD_IN_MSPIKE_L2      0.7
score RCVD_IN_MSPIKE_H2      -1.0
score RCVD_IN_MSPIKE_H3      -1.9
score RCVD_IN_MSPIKE_H4      -2.0
score RCVD_IN_MSPIKE_H5      -2.5
score RCVD_IN_MSPIKE_BL      1.0
score RCVD_IN_MSPIKE_WL      -1.0

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |critical

--- Comment #43 from Kevin A. McGrail <km...@pccc.com> 2012-01-23 15:01:45 UTC ---
I cannot figure out what is causing the MSPIKE rules to have a manually set
rule in 50_scores.cf and a set of scores in 72_scores.cf from the rules update.

I've tried moving the rules from sandbox to rules.  Could it be the gen:mutable
from 50_scores?

# MAILSPIKE RBL ENABLED FOR SA3.4 and above - BUG 6400
if (version >= 3.004000)
  # FLOATING SCORES FOR GA - adjust after GA to make L3 - L5 linear
  # Probably adjust up slightly to make up for the "reuse" imperfection
# <gen:mutable>
  score RCVD_IN_MSPIKE_ZBI     2.7
  score RCVD_IN_MSPIKE_L5      2.5
  score RCVD_IN_MSPIKE_L4      1.7
  score RCVD_IN_MSPIKE_L3      0.9
# </gen:mutable>
  # FIXED SCORES
  # TEMPORARILY LOWERED - adjust these higher after GA is done
  # (pending discussion: Whitelists need scores, but they shouldn't effect the
scoring of spam detection rules.)
  score RCVD_IN_MSPIKE_H3      -0.01
  score RCVD_IN_MSPIKE_H4      -0.01
  score RCVD_IN_MSPIKE_H5      -1.0
  # FIXED SCORES - informational rules, useful only for statistical comparisons
  score RCVD_IN_MSPIKE_BL      0.01
  score RCVD_IN_MSPIKE_WL      -0.01
endif


Here's what I see in sa-update.
50_scores.cf:  score RCVD_IN_MSPIKE_ZBI     2.7
50_scores.cf:  score RCVD_IN_MSPIKE_L5      2.5
50_scores.cf:  score RCVD_IN_MSPIKE_L4      1.7
50_scores.cf:  score RCVD_IN_MSPIKE_L3      0.9
50_scores.cf:  score RCVD_IN_MSPIKE_H3      -0.01
50_scores.cf:  score RCVD_IN_MSPIKE_H4      -0.01
50_scores.cf:  score RCVD_IN_MSPIKE_H5      -1.0
50_scores.cf:  score RCVD_IN_MSPIKE_BL      0.01
50_scores.cf:  score RCVD_IN_MSPIKE_WL      -0.01
72_scores.cf:score RCVD_IN_MSPIKE_BL                     0.001 0.010 0.001
0.010
72_scores.cf:score RCVD_IN_MSPIKE_H2                     0.001 -0.001 0.001
-0.001
72_scores.cf:score RCVD_IN_MSPIKE_H3                     0.001 -0.010 0.001
-0.010
72_scores.cf:score RCVD_IN_MSPIKE_H4                     0.001 -0.010 0.001
-0.010
72_scores.cf:score RCVD_IN_MSPIKE_H5                     0.001 0.001 0.001
0.001
72_scores.cf:score RCVD_IN_MSPIKE_L2                     0.001 0.001 0.001
0.001
72_scores.cf:score RCVD_IN_MSPIKE_L3                     0.001 0.001 0.001
0.001
72_scores.cf:score RCVD_IN_MSPIKE_L4                     0.001 3.996 0.001
3.996
72_scores.cf:score RCVD_IN_MSPIKE_L5                     0.001 3.676 0.001
3.676
72_scores.cf:score RCVD_IN_MSPIKE_WL                     0.001 -0.010 0.001
-0.010
72_scores.cf:score RCVD_IN_MSPIKE_ZBI                    0.001 0.001 0.001
0.001

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

D. Stussy <so...@kd6lvw.ampr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |software+spamassassin@kd6lv
                   |                            |w.ampr.org

--- Comment #28 from D. Stussy <so...@kd6lvw.ampr.org> 2011-12-06 23:24:47 UTC ---
Policy Issue:  The SA community should come up with a policy for addressing the
addition of new features or expansion of such (like new DNSBLs and DNSWLs).  I
suggest that for the first period of time (1 month?  3 months?  Whatever
"masscheck" needs to assign a score?), the default score for any new rule be
set to 0.001 (or -0.001 for whitelisting type functions).  Such a policy should
do these things:

1)  Elimnate any bickering over what the initial score should be.
2)  Give end users time to recognize that a new test has been installed.
    (And give them a chance for a local override score)
3)  Allow any spam feedback system time to evaluate the usefulness of the rule.

Some of this may already be done in the mass check system.  However, that's by
convention, not policy.  A separate update channel with rules under
consideration would be part of idea I'm setting forth.

Along with this is a suggestion:  A dynamic web page that picks up rules under
testing (from the various sandboxes) and lists them, so all can see what is
under consideration.  If data regarding effectiveness is available, that too
would be nice to see.

Some have posted their personal stats with regard to their experience with the
rule/DNSBL under consideration in this bug/feature request, but that's not
quite the same as an automated system producing a combined result from multiple
feedback points.

In other words, perhaps SA needs a written, software-enforced policy and
procedure to determine initial scores.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #31 from D. Stussy <so...@kd6lvw.ampr.org> 2011-12-07 18:46:25 UTC ---
I don't think that you all recklessly add rules.  However, I do note that for
every proposed rule, this type of discussion is repeated ad nauseum.  If you
had a policy regarding addition, there would be no need for this discussion: 
After automated testing, either the rule makes the cut or not.

What I am saying is that if you have a [clear, written] policy (and not a mere
custom), it need be better defined.  If it were, there would be no discussion. 
The rule would simply be added or discarded.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #21 from Darxus <Da...@ChaosReigns.com> 2011-12-01 15:44:09 UTC ---
(In reply to comment #19)
> Nope, that incident was March 2011.
> 
> I actually was thinking along the lines of the dev@ threads "Other DNSBL's",
> Oct 2009, and "3.3.2 and MSPIKE", Dec 2010, and similar discussions about when
> and how to include new DNSBLs.
> 
> DNSBLs do cause additional network traffic and load, which in particular for
> larger sites is a real concern. Any addition like this definitely needs to be
> communicated load and clear in the release announcement.
> 
> Regardless, whether excessive queries might result in FP return values.

Thanks for the references.  They happened before I subscribed to the dev list. 
I just read through them here:  
http://old.nabble.com/Other-DNSBL%27s-td25925640.html
http://old.nabble.com/3.3.2-and-MSPIKE-td30513395.html

There was *no* objection to adding tested DNSBLs to existing releases.  Except
for your claim, again, in the second thread, that there had been a previous
objection.  Which there wasn't, in those threads.  The only objection of any
kind was:

Henrik K, Dec 22, 2010
> Not that it isn't a worthy cause, but you can't just start adding arbitrary
> unknown lists to mass checks. Some of them might crumble from the sudden
> mass check flood.

Which is not relevant to this bug.  Also, I'm not convinced it's true.

And this is where you first claimed there had previously been a relevant
objection:

Karsten Bräckelmann-2, Oct 19, 2009
> A micro 3.3.x release probably is not the best opportunity, though. I
> recall there has been quite some discussion and resentment last time.
> Even when including new BLs for 3.4, we really need to communicate that
> added network load better to the user-base.

Was there another thread that I haven't read where this resentment was
discussed?

"MSPIKE (previously named ANBREP) has proven consistently in weekly masschecks
since before the release of 3.3.0" - Warren.  So MSPIKE has been good since
before 2010-01-27.  Two years.  And from what I can tell, for at least a year,
it hasn't been added to the default rule set because of your unsubstantiated
claim that somebody else objected.

Can you start over and tell me why you don't think MSPIKE should be added to
existing 3.3 releases?  I don't think anybody will mind the increased network
load.  I think everyone will appreciate the resulting increased accuracy.

I wouldn't care so much if it was easy for us to maintain two sets of rule
updates.  But we can't.  So not adding MSPIKE to 3.3 means never adding MSPIKE
to spamassassin, or at least sub-optimal scoring in one of the major releases,
which you don't seem okay with either.  Which results in spamassassin being
incapable of ever adding another useful DNSBL, which is not okay.


Somewhat related, spamassassin needs an announcement list, for announcing
changes like this, and releases. (bug 6714)


Not relevant to this bug, bug I can't help repeating why we use reuse, from one
of those threads:

Bjoern Sikora, Oct 19, 2009
> Please pay attention that some blacklists do only list IP addresses for hours.
> When running the mass check you need realtime data to get reliable results.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #9 from Darxus <Da...@ChaosReigns.com> 2011-11-28 19:40:56 UTC ---
(In reply to comment #8)
> concerned that including all the components of the _BL rule will cause the
> rescorer to behave suboptimally with our relatively limited corpora.  Huh,

I meant once we have enough data to run them through the rescorer with reuse.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #37 from Kevin A. McGrail <km...@pccc.com> 2012-01-18 23:20:41 UTC ---
(In reply to comment #36)
> Looks like nobody is getting any hits: 
> http://ruleqa.spamassassin.org/?daterev=20120114-r1231463-n&rule=%2Fmspike
> And I'm running trunk (3.4.0), so I should be.
> Should all the instances of "(version >= 3.400000)" be replaced with "(version
> >= 3.004000)"?  "3.004000" is the directory sa-update downloads my rules to. 
> And one other example I found seems to match that pattern in 72_active.cf:
> 
> if version >= 3.003000
> ifplugin Mail::SpamAssassin::Plugin::WLBLEval

Not sure.  Can you test with a local .cf and confirm it hits?  I'm trying to
focus on other bugs and assumed it was hitting just getting hits on T_ test
rules from previous notes.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #41 from Darxus <Da...@ChaosReigns.com> 2012-01-20 15:41:10 UTC ---
I'm getting the scores from 72_scores.cf, all 0.001.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Benny Pedersen <me...@junc.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |me@junc.eu

--- Comment #44 from Benny Pedersen <me...@junc.eu> ---
it fails with DNSEval plugin disabled, missing ifplugin lines in cf files

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com
   Target Milestone|Undefined                   |3.4.0

--- Comment #5 from Kevin A. McGrail <km...@pccc.com> 2011-11-06 23:22:20 UTC ---
While a mass-check and algorithms are always ideal, Mailspike shows promise and
I'd like to target this for 3.4.0 even if we have to guess.  I've had mailspike
in place for some time on live servers so I should be able to look at the
statistics and make a good approximation for scores to move this forward.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #15 from Darxus <Da...@ChaosReigns.com> 2011-11-30 17:26:30 UTC ---
(In reply to comment #14)
> Very easily by pointing you to the uproar and lengthy discussion the last (and
> first) time a new DNSBL has been pushed with a rule update because it was added
> to the sandboxes.

That uproar was because that blacklist (spam eating monkey) was automatically
detecting "abuse" and causing many false positives.  (bug 6220)

> (In reply to comment #13)
> > I believe we can just encapsulate it in a version check.

An interesting suggestion.

> Would that not bias the re-scoring, and thus negatively impact 3.3?

And an excellent question.  

What do you folks think about polling the users list about adding a new DNSBL
to existing 3.3 releases?  Should I just post, asking?

What's the worst that will happen if these rules are enabled for existing 3.3.*
releases?  João, can you confirm that mailspike will not cause false positives
as a result of detecting high amounts of traffic?  So worst case, people get no
DNS response, and the score is not affected?  Also, worst case, we revert the
addition a day later.  

(All, of course, depending on getting rule updates happening again, bug 6702.)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #35 from Kevin A. McGrail <km...@pccc.com> 2012-01-18 23:00:06 UTC ---
Looks resolved.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #39 from Kevin A. McGrail <km...@pccc.com> 2012-01-18 23:54:20 UTC ---
(In reply to comment #38)
> Yeah.  Running trunk, ANYTHING2 hits, and ANYTHING1 does not:
> 
> if (version >= 3.400000)
> body ANYTHING1 /./ 
> endif
> 
> if (version >= 3.004000)
> body ANYTHING2 /./ 
> endif

Sorry, I missed the nuance earlier.  Working on too many bugs at once.  3.00400
is absolutely correct.

I caught a few of these in sandboxes and rules so I fixed them all. Will keep
this open to check in a day or two after the next auto rules creation.

Good catch.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #14 from Karsten Bräckelmann <gu...@rudersport.de> 2011-11-30 16:57:34 UTC ---
(In reply to comment #12)
> > We really do not want to introduce new DNS lookups in an existing branch, but
> > strictly with the next major/minor release only and a clear announcement in the
> > release notes.
> 
> Wait, you want to *never* include mailspike in the existing 3.3.* releases, and
> create a separate rule set for 3.4.*?  Because of increasing the network load
> by 1 DNS lookup, for a very useful looking RBL?  That sounds... not good.

*shrug*  Sounds good to me.

> How can you justify that?

Very easily by pointing you to the uproar and lengthy discussion the last (and
first) time a new DNSBL has been pushed with a rule update because it was added
to the sandboxes.

The conclusion of the discussion was clear -- DNSBLs MUST NOT be introduced via
the update channel.

Moreover, since the planned next version will be 3.4, discussing potential
inclusion in a 3.3.x micro version release is a moot point.


(In reply to comment #13)
> I believe we can just encapsulate it in a version check.

Would that not bias the re-scoring, and thus negatively impact 3.3?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #18 from João Gouveia <jo...@anubisnetworks.com> 2011-11-30 17:36:54 UTC ---
(In reply to comment #15)
> (In reply to comment #14)
> > Very easily by pointing you to the uproar and lengthy discussion the last (and
> > first) time a new DNSBL has been pushed with a rule update because it was added
> > to the sandboxes.
> 
> That uproar was because that blacklist (spam eating monkey) was automatically
> detecting "abuse" and causing many false positives.  (bug 6220)
> 
> > (In reply to comment #13)
> > > I believe we can just encapsulate it in a version check.
> 
> An interesting suggestion.
> 
> > Would that not bias the re-scoring, and thus negatively impact 3.3?
> 
> And an excellent question.  
> 
> What do you folks think about polling the users list about adding a new DNSBL
> to existing 3.3 releases?  Should I just post, asking?
> 
> What's the worst that will happen if these rules are enabled for existing 3.3.*
> releases?  João, can you confirm that mailspike will not cause false positives
> as a result of detecting high amounts of traffic?  So worst case, people get no
> DNS response, and the score is not affected?  Also, worst case, we revert the
> addition a day later.  

Yes, I can confirm that.
We have also deployed a DNSBL mirror network, which is on standby and ready to
kick in, in case we start getting too much traffic.

> 
> (All, of course, depending on getting rule updates happening again, bug 6702.)

(In reply to comment #15)
> (In reply to comment #14)
> > Very easily by pointing you to the uproar and lengthy discussion the last (and
> > first) time a new DNSBL has been pushed with a rule update because it was added
> > to the sandboxes.
> 
> That uproar was because that blacklist (spam eating monkey) was automatically
> detecting "abuse" and causing many false positives.  (bug 6220)
> 
> > (In reply to comment #13)
> > > I believe we can just encapsulate it in a version check.
> 
> An interesting suggestion.
> 
> > Would that not bias the re-scoring, and thus negatively impact 3.3?
> 
> And an excellent question.  
> 
> What do you folks think about polling the users list about adding a new DNSBL
> to existing 3.3 releases?  Should I just post, asking?
> 
> What's the worst that will happen if these rules are enabled for existing 3.3.*
> releases?  João, can you confirm that mailspike will not cause false positives
> as a result of detecting high amounts of traffic?  So worst case, people get no
> DNS response, and the score is not affected?  Also, worst case, we revert the
> addition a day later.  

Yes, I can confirm that.
We have also deployed a DNSBL mirror network, which is on standby and ready to
kick in, in case we start getting too much traffic.

> (All, of course, depending on getting rule updates happening again, bug 6702.)

(In reply to comment #15)
> (In reply to comment #14)
> > Very easily by pointing you to the uproar and lengthy discussion the last (and
> > first) time a new DNSBL has been pushed with a rule update because it was added
> > to the sandboxes.
> 
> That uproar was because that blacklist (spam eating monkey) was automatically
> detecting "abuse" and causing many false positives.  (bug 6220)
> 
> > (In reply to comment #13)
> > > I believe we can just encapsulate it in a version check.
> 
> An interesting suggestion.
> 
> > Would that not bias the re-scoring, and thus negatively impact 3.3?
> 
> And an excellent question.  
> 
> What do you folks think about polling the users list about adding a new DNSBL
> to existing 3.3 releases?  Should I just post, asking?
> 
> What's the worst that will happen if these rules are enabled for existing 3.3.*
> releases?  João, can you confirm that mailspike will not cause false positives
> as a result of detecting high amounts of traffic?  So worst case, people get no
> DNS response, and the score is not affected?  Also, worst case, we revert the
> addition a day later.

Yes, I can confirm that.
We have also deployed a DNSBL mirror network, which is on standby and ready to
kick in, in case we start getting too much traffic.

> 
> (All, of course, depending on getting rule updates happening again, bug 6702.)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #6 from Darxus <Da...@ChaosReigns.com> 2011-11-28 19:04:15 UTC ---
So consensus is that RCVD_IN_MSPIKE_BL should be added to the default rule set
with a fixed very conservative score, until we have enough data to determine a
more useful score?

So Warren should remove the "nopublish" flag, and set a fixed score, of what? 
1?  0.1?  

Latest ruleqa net results: 
http://ruleqa.spamassassin.org/20111126/T_RCVD_IN_MSPIKE_BL/detail

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0  74.6530   0.0067   1.000    0.99    0.00  T_RCVD_IN_MSPIKE_BL  

Which makes it the second best ranked blacklist, below RCVD_IN_XBL.  But that's
skewed by the fact that it doesn't have reuse set.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #11 from Karsten Bräckelmann <gu...@rudersport.de> 2011-11-30 12:06:13 UTC ---
(In reply to comment #6)
> So consensus is that RCVD_IN_MSPIKE_BL should be added to the default rule set
> with a fixed very conservative score, until we have enough data to determine a
> more useful score?
> 
> So Warren should remove the "nopublish" flag, and [...]

Unfortunately, no. Since this is targeted at 3.4 things are much more
complicate.

We really do not want to introduce new DNS lookups in an existing branch, but
strictly with the next major/minor release only and a clear announcement in the
release notes.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #3 from Darxus <Da...@ChaosReigns.com> 2011-10-12 00:49:22 UTC ---
If these rules get added to the default spamassassin set, optimal rule
generation would be part of that.  Theoretically, optimal scores for most rules
are generated every day, that's just currently broken (bug 6671).

The problem with your WL stats isn't just that it hits some spam, but that the
percentage of ham it hits is proportionally low.  For example,
RCVD_IN_DNSWL_MED hits more ham and less spam than the _WL rule and all of the
_H* rules except for _H4 which only hits 0.1% of ham.  Might get a better
result by, for example, combining _H5, _H4, and _H3, but it doesn't look like
it.

It also looks like everything that hits RCVD_IN_MSPIKE_WL also hits one of the
RCVD_IN_DNSWL_* rules, so it doesn't add much.  The overlap with the DNSWL HI,
LOW, and NONE totals 102%, because rounding errors are fun.  Although the lack
of DNSWL_MED in the overlap list is pretty weird.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #4 from Warren Togami <wt...@gmail.com> 2011-10-13 19:55:55 UTC ---
Adding Mailspike is not a simple matter of adding Mailspike.  In order to
properly include Mailspike we need a full rescore run, as its addition will
effect the score of other rules.

I suspect we have two problems with a full rescoring:

* I fear that the quality and quantity of our corpus has diminished in the past
year.  I personally haven't been manually sorting mail in a LONG time.

https://fedorahosted.org/auto-mass-check/
João, it would be very helpful if you folks could join the nightly masscheck
with a variety of ham and a random sample of spam.

* masscheck of old and new network rules like DNSBL's is not an Apples to
Apples comparison.  For old rules like RCVD_IN_PSBL we rely on --reuse of
spamassassin tags in the corpora.  For new rules, tags are typically missing
from corpora, so we are testing old mail against a real-time blacklist.  This
is not good, as the old DNSBL reflects actual performance while the new
blacklist does not.  I don't have a solution for this other than all masscheck
participants having added MSPIKE rules from a year ago.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

John Hardin <jh...@impsec.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jhardin@impsec.org

--- Comment #20 from John Hardin <jh...@impsec.org> 2011-11-30 19:03:51 UTC ---
Put a 73_sandbox_manual_scores.cf with those rules' scores set to zero in the
3.3 SVN branch? Then 3.3 site admins can enable them in local config if they
choose, and the GA rescoring process will only automatically affect 3.4+

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #29 from Warren Togami <wt...@gmail.com> 2011-12-07 10:46:51 UTC ---
(In reply to comment #28)
> 
> Some have posted their personal stats with regard to their experience with the
> rule/DNSBL under consideration in this bug/feature request, but that's not
> quite the same as an automated system producing a combined result from multiple
> feedback points.
> 

Your tone seems to suggest that that we recklessly add network rules.   The
policy has been clear for a long time now, we add network rules only at major
version releases.  The last new network rule was RCVD_IN_PSBL added in 3.3.0
after a year of excellent automated masscheck data.

Your post also demonstrates utter ignorance of our existing "automated system"
that has demonstrated over a 2 years of consistent excellent performance and
safety for MSPIKE.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #25 from Darxus <Da...@ChaosReigns.com> 2011-12-05 23:45:49 UTC ---
(In reply to comment #24)
> +1 for adding Mailspike to 3.4
> 
> -1 on adding any new DNSBL via sa-update to existing releases, but strictly
>    limit this on actual releases with README and release notes

"-1 votes are vetos and kill the proposal dead until all vetoers withdraw their
-1 votes." - http://www.apache.org/foundation/voting.html

How would you like to go about adding it to 3.4?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #23 from Darxus <Da...@ChaosReigns.com> 2011-12-05 21:26:53 UTC ---
(In reply to comment #22)
> Again, I believe there was consensus about generally adding new DNSBLs with
> major/minor version updates only, *and* clearly communicating this in the

> new DNSBLs. Maybe you just trust my memory on that?

I don't.  I don't think you're lying, I just don't see a reason to assume you
remember it as I would interpret it.

> Hey, I welcome anyone to step up and tell me consensus has been otherwise.

>From what I saw, consensus just said that people shouldn't add a bunch of new
DNSBLs to masscheck without discussing it.  And there was no consensus that new
DNSBLs could only be added on major/minor releases.

> What's unsubstantiated is your claim that I and my quote would have been the
> reason for not yet including Mailspike. There are NO votes by PMC members or
> committers. That's what matters and changes the game, not a single post on a
> mailing list.

Kevin's comment 7 should count as a +1.  And I worry that others aren't voting
because they believe there was previous consensus to only add DNSBLs at
releases, which I question.

> While you "don't think anybody will mind the increased network load", I do know
> it for a fact. Yes, there *are* systems out there, heavily tweaked for
> throughput with assorted DNSBLs, plugins and rules disabled.

Good.  Are there any of those people who don't read dev@, users@, announce@, or
proofread sa-updates before applying them?  
Do any of these systems you know of stand any actual risk of inadvertently
adding the MSPIKE rules via sa-update?
Would any of them be any more than slightly inconvenienced for a day?

> Thus the need for clearly communicating any such changes.

I agree with that.  I think being unwilling to make the change to existing
releases takes it too far.

> > Somewhat related, spamassassin needs an announcement list, for announcing
> > changes like this, and releases. (bug 6714)
> 
> Oh, yeah, *that* was a fun bug.

Yeah, sorry.  So posting to announce@ should be ample communication, right?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #13 from Kevin A. McGrail <km...@pccc.com> 2011-11-30 15:58:54 UTC ---
(In reply to comment #12)
> (In reply to comment #11)
> > > So Warren should remove the "nopublish" flag, and [...]
> > 
> > Unfortunately, no. Since this is targeted at 3.4 things are much more
> > complicate.
> 
> Well, with rules, any change is going to affect all 3.3.* releases as soon as
> there's another rule update, right?
> > We really do not want to introduce new DNS lookups in an existing branch, but
> > strictly with the next major/minor release only and a clear announcement in the
> > release notes.
> 
> Wait, you want to *never* include mailspike in the existing 3.3.* releases, and
> create a separate rule set for 3.4.*?  Because of increasing the network load
> by 1 DNS lookup, for a very useful looking RBL?  That sounds... not good.  How
> can you justify that?

Let's just make the rules 3.4 only and add it to the README.  I want to get
3.4.0 out the damn door and I'm working like a slave on IPv6 stuff still to see
if we can get it finalized.

I believe we can just encapsulate it in a version check.

if (version >= 3.400000)
...
endif

Copacetic if we do that?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #27 from Warren Togami <wt...@gmail.com> 2011-12-06 09:09:33 UTC ---
> +1 for adding Mailspike to 3.4
> 
> -1 on adding any new DNSBL via sa-update to existing releases, but strictly
> limit this on actual releases with README and release notes

+1 to adding Mailspike to 3.4.  Just we need to be VERY CAREFUL about exactly
which rules to add, and how scores are set.

1) I strongly warn against letting the individual L# rules float with the
rescorer.  We will NOT see a logical linear progression of higher scores for
_L3 _L4 and _L5.  I would recommend letting only the aggregate _BL float with
GA rescoring.  Include _L3, _L4, _L5 and ZBI only as informational rules to aid
in future statistical analysis.

2) BE VERY CAREFUL DURING GA RESCORING!
"reuse" is the best way to ensure we are doing a proper apples-to-apples
comparison of MSPIKE and the other DNSBL's as their combination mutually
rebalances their scores.  However, if we enable "reuse" for MSPIKE, we must be
*CERTAIN* that all masscheck participants have their spam tagged using MSPIKE
rules.  If not, they will artificially count as non-hits and throw off the
statistics, potentially fatally.

3) Please add _H whitelist rules only as informational.

http://www.mail-archive.com/users@spamassassin.apache.org/msg69546.html
As noted in this earlier analysis, our existing whitelists DO NOT IMPROVE the
results of Spamassassin.  Weekly masscheck results have consistently indicated
moderately poor performance of the existing whitelists, so whitelists may even
be making things slightly worse.

This belongs in an separate discussion but mentioning this as it is related.

 * We would be better off again reducing the existing whitelist scores.  In
particular, DNSWL_LOW and IADB whitelist rules are consistently demonstrating
problems, probably due to poor enforcement.

 * We should artificially set all Whitelist rules to -0.01 during any future GA
rescoring.  Why?  We are testing the efficacy of the spam detection rules, and
the two are mutually independent.  Zeroing out the effect of whitelists during
score generation ensures that whitelists are not improperly affecting the score
setting of spam detection rules.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #33 from Darxus <Da...@ChaosReigns.com> 2011-12-27 21:07:24 UTC ---
I was just trying to figure out why I'm not getting any hits on these, and
noticed the rules all have the T_ prefix, as distributed by sa-update.  Why is
that?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #12 from Darxus <Da...@ChaosReigns.com> 2011-11-30 15:31:55 UTC ---
(In reply to comment #11)
> > So Warren should remove the "nopublish" flag, and [...]
> 
> Unfortunately, no. Since this is targeted at 3.4 things are much more
> complicate.

Well, with rules, any change is going to affect all 3.3.* releases as soon as
there's another rule update, right?

> We really do not want to introduce new DNS lookups in an existing branch, but
> strictly with the next major/minor release only and a clear announcement in the
> release notes.

Wait, you want to *never* include mailspike in the existing 3.3.* releases, and
create a separate rule set for 3.4.*?  Because of increasing the network load
by 1 DNS lookup, for a very useful looking RBL?  That sounds... not good.  How
can you justify that?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #38 from Darxus <Da...@ChaosReigns.com> 2012-01-18 23:39:04 UTC ---
Yeah.  Running trunk, ANYTHING2 hits, and ANYTHING1 does not:

if (version >= 3.400000)
body ANYTHING1 /./ 
endif

if (version >= 3.004000)
body ANYTHING2 /./ 
endif

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #30 from Kevin A. McGrail <km...@pccc.com> 2011-12-07 16:06:26 UTC ---
"The policy has been clear for a long time now, we add network rules only at
major
version releases"

I am unfortunately unaware of any such policy.  In fact, since I've sort of
spearheaded trying to write the policy for RBLs and their inclusion, I can tell
you definitively that there is no such policy.

However, network rules have caused angst and need to be treated with care on a
case-by-case basis.  That we can all agree on.

In this particular case, I believe it's a good list and worthy of inclusion by
default. 

If it makes people have less anxiety to release it wrapped in a 3.4.0 version
block check, fine.  If people like my idea to have a disable_rules channel and
publish this as-is with a score 0.0 rules channel to disable the rule, I'm cool
with that too.

But I would like to see more focus on tickets that are holding up 3.4.0 and
compromise on those tickets even if we have some 3.4.1 targets, etc.

And overall, our release policy is that we release things for the MAJORITY of
users NOT for the hyper-minority that is processing 30 million emails a day and
can't afford an extra DNS query, etc.

My expectation is that people running the uber-systems have people on the dev
list, etc. and would know how to add the score 0's NOW prior to the updates
channel pushing out the rule.

Regards,
KAM

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #34 from Kevin A. McGrail <km...@pccc.com> 2011-12-28 22:58:44 UTC ---
(In reply to comment #33)
> I was just trying to figure out why I'm not getting any hits on these, and
> noticed the rules all have the T_ prefix, as distributed by sa-update.  Why is
> that?

Good question.

svn commit -m 'Adding to force_active to see if it resolves MSPIKE being T_
rules - bug 6400'
Sending        rulesrc/10_force_active.cf
Transmitting file data .
Committed revision 1225371.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #40 from Kevin A. McGrail <km...@pccc.com> 2012-01-20 15:13:05 UTC ---
The rules update includes 3.004000 check now which is good.  However, we have
scores in BOTH 72_scores.cf and 50_scores.cf.  I don't think that's a good
thing but perhaps it is expected.  

Darxus, with trunk, which scores do you see applied?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

João Gouveia <jo...@anubisnetworks.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joao.gouveia@anubisnetworks
                   |                            |.com

--- Comment #2 from João Gouveia <jo...@anubisnetworks.com> 2011-10-12 00:22:46 UTC ---

If SpamAssassin project decides to include our IP Reputation (or just the
blacklist) as a default, I would think that computing optimal scores is
important.
Although we would very much like to give something back to the community
(that's why we're "here"), if this is not included in SpamAssassin, then there
isn't much point on going through that hurdle, even thou it would still be
useful to many (up to you guys).

Regarding the WL_* lists, this is a fully automated reputation list, and the IP
addresses listed there are typically legit mailers, which of course doesn't
mean they don't send out junk.
A typical example would be yahoo and gmail mailers, which most likely would be
on the WL_* lists, but still do send lots of spam (and that's just getting
worst).
There's an explanation on the website (http://mailspike.org/usage.html), which
reads:

"
wl.mailspike.net: this zone lists all IP addresses with good reputation levels
between H2 and H5. The listed IP addresses may occasional send spam but since
they originate mostly legit traffic, they should not be blocked. This list can
be used as a feature when determining if a message should be considered spam or
not, but it should never be used exclusively for whitelisting purposes. 
"

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.4.0                       |3.4.1
           Severity|critical                    |major

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #32 from Kevin A. McGrail <km...@pccc.com> 2011-12-12 19:56:37 UTC ---
We have consensus for 3.4.0 so let's moving forward on scores.

svn commit -m 'Enabling of Mailspike RBL for 3.4+ - bug 6400'
Sending        rules/50_scores.cf
Transmitting file data .
Committed revision 1213397.

Warren, can you please remove the no publish flag on the rules in your sandbox?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #24 from Karsten Bräckelmann <gu...@rudersport.de> 2011-12-05 23:37:44 UTC ---
(In reply to comment #23)
> > What's unsubstantiated is your claim that I and my quote would have been the
> > reason for not yet including Mailspike. There are NO votes by PMC members or
> > committers. That's what matters and changes the game, not a single post on a
> > mailing list.
> 
> Kevin's comment 7 should count as a +1.  And I worry that others aren't voting
> because they believe there was previous consensus to only add DNSBLs at
> releases, which I question.

I'd argue "Let's just make the rules 3.4 only" in comment 13 would imply a -1
by KAM. Imply, neither one is a vote.

If others actually "believe there was previous consensus", they might recall
the same discussions I do.

> > Thus the need for clearly communicating any such changes.
> 
> I agree with that.  I think being unwilling to make the change to existing
> releases takes it too far.

As you just said, these are EXISTING releases.

sa-update can even introduce new plugins, code. Disabled by default in the
client. Because that would take it too far, in any but tightly managed site
internal processes. Similarly, in my not so humble opinion, introducing new
DNSBLs via sa-update to existing releases would take it too far.

Anyway, for the formal part:

+1 for adding Mailspike to 3.4

-1 on adding any new DNSBL via sa-update to existing releases, but strictly
   limit this on actual releases with README and release notes

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #19 from Karsten Bräckelmann <gu...@rudersport.de> 2011-11-30 18:38:11 UTC ---
(In reply to comment #15)
> That uproar was because that blacklist (spam eating monkey) was automatically
> detecting "abuse" and causing many false positives.  (bug 6220)

Nope, that incident was March 2011.

I actually was thinking along the lines of the dev@ threads "Other DNSBL's",
Oct 2009, and "3.3.2 and MSPIKE", Dec 2010, and similar discussions about when
and how to include new DNSBLs.

DNSBLs do cause additional network traffic and load, which in particular for
larger sites is a real concern. Any addition like this definitely needs to be
communicated load and clear in the release announcement.

Regardless, whether excessive queries might result in FP return values.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #10 from Darxus <Da...@ChaosReigns.com> 2011-11-29 23:52:00 UTC ---
(In reply to comment #7)
> score RCVD_IN_MSPIKE_ZBI     3.5
> score RCVD_IN_MSPIKE_L5      3.1
> score RCVD_IN_MSPIKE_L4      2.5
> score RCVD_IN_MSPIKE_L3      1.9
> score RCVD_IN_MSPIKE_L2      0.7
> score RCVD_IN_MSPIKE_H2      -1.0
> score RCVD_IN_MSPIKE_H3      -1.9
> score RCVD_IN_MSPIKE_H4      -2.0
> score RCVD_IN_MSPIKE_H5      -2.5
> score RCVD_IN_MSPIKE_BL      1.0
> score RCVD_IN_MSPIKE_WL      -1.0

Any chance we could get two more votes in favor of this score set?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #26 from Karsten Bräckelmann <gu...@rudersport.de> 2011-12-06 00:07:48 UTC ---
> "-1 votes are vetos and kill the proposal dead until all vetoers withdraw their
> -1 votes." - http://www.apache.org/foundation/voting.html

That applies to the Votes on Code Modification section and process only. The
section before describes voting in general, including fractions (hey, half a
veto?) and explicitly mentions '-1' to mean 'no'.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #16 from Darxus <Da...@ChaosReigns.com> 2011-11-30 17:28:27 UTC ---
(In reply to comment #14)
> > Wait, you want to *never* include mailspike in the existing 3.3.* releases, and
> > create a separate rule set for 3.4.*?  Because of increasing the network load
> > by 1 DNS lookup, for a very useful looking RBL?  That sounds... not good.
> 
> *shrug*  Sounds good to me.

Forgot to ask:  Do we even have a remote idea how to do this?  Maintain updates
of two separate rule sets?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #22 from Karsten Bräckelmann <gu...@rudersport.de> 2011-12-02 00:55:10 UTC ---
(In reply to comment #21)
> Thanks for the references.  They happened before I subscribed to the dev list. 

> There was *no* objection to adding tested DNSBLs to existing releases.

Again, I believe there was consensus about generally adding new DNSBLs with
major/minor version updates only, *and* clearly communicating this in the
release announcement -- something that doesn't even exist with daily rule
updates.

There likely have been more than these two threads and discussions about it.
That's two years ago, and I was happy to find these references at all in my
archive.

(See below, one of the referenced nabble archives are horribly incomplete and
don't reflect the thread in my archive with the same Subject at all.)


> And this is where you first claimed there had previously been a relevant
> objection:
> 
> Karsten Bräckelmann-2, Oct 19, 2009
> > A micro 3.3.x release probably is not the best opportunity, though. I
> > recall there has been quite some discussion and resentment last time.
> > Even when including new BLs for 3.4, we really need to communicate that
> > added network load better to the user-base.

Oh, a two year old reference by me this has been discussed before? Should I
have dug even deeper through my archive? Sorry, ain't gonna happen.

(And FWIW, that post by me was the other thread, Dec 23, 2010, not the early
one 2009. Which in the nabble archive even is broken and lacks an important
part of the full thread.)

So, you've not even been subscribed to dev@ at that point. OK, no problem. I
have, and I have been involved in discussions about WHEN and HOW to introduce
new DNSBLs. Maybe you just trust my memory on that?

Hey, I welcome anyone to step up and tell me consensus has been otherwise.


> "MSPIKE (previously named ANBREP) has proven consistently in weekly masschecks
> since before the release of 3.3.0" - Warren.  So MSPIKE has been good since
> before 2010-01-27.  Two years.  And from what I can tell, for at least a year,
> it hasn't been added to the default rule set because of your unsubstantiated
> claim that somebody else objected.

I do not question the Mailspike DNSBL in any way. Quite the contrary. However,
good performance is irrelevant to the point of when and how to add it.

What's unsubstantiated is your claim that I and my quote would have been the
reason for not yet including Mailspike. There are NO votes by PMC members or
committers. That's what matters and changes the game, not a single post on a
mailing list.


> Can you start over and tell me why you don't think MSPIKE should be added to
> existing 3.3 releases?  I don't think anybody will mind the increased network
> load.  I think everyone will appreciate the resulting increased accuracy.

While you "don't think anybody will mind the increased network load", I do know
it for a fact. Yes, there *are* systems out there, heavily tweaked for
throughput with assorted DNSBLs, plugins and rules disabled.

Thus the need for clearly communicating any such changes.


> Somewhat related, spamassassin needs an announcement list, for announcing
> changes like this, and releases. (bug 6714)

Oh, yeah, *that* was a fun bug.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #36 from Darxus <Da...@ChaosReigns.com> 2012-01-18 23:18:30 UTC ---
Looks like nobody is getting any hits: 
http://ruleqa.spamassassin.org/?daterev=20120114-r1231463-n&rule=%2Fmspike
And I'm running trunk (3.4.0), so I should be.
Should all the instances of "(version >= 3.400000)" be replaced with "(version
>= 3.004000)"?  "3.004000" is the directory sa-update downloads my rules to. 
And one other example I found seems to match that pattern in 72_active.cf:

if version >= 3.003000
ifplugin Mail::SpamAssassin::Plugin::WLBLEval

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #8 from Darxus <Da...@ChaosReigns.com> 2011-11-28 19:39:46 UTC ---
(In reply to comment #7)
> I'd recommend the entire MSPIKE kit and kaboodle.  I'm running with these
> scores and recommend them:

Really?  The ranks of the _WL rule and its components are kind of bad.  And I'm
concerned that including all the components of the _BL rule will cause the
rescorer to behave suboptimally with our relatively limited corpora.  Huh,
although effectively it looks like it's just two components, _L4 and L5, since
the other two are empty, so that's probably fine.  But if I did have a vote I
certainly wouldn't vote against using the score set you recommended.  I'm just
curious what you're reasoning is.

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0  74.6530   0.0067   1.000    0.99    0.00  T_RCVD_IN_MSPIKE_BL  
      0   0.0251   7.0172   0.004    0.83    0.00  T_RCVD_IN_MSPIKE_WL  
      0   0.8738        0   1.000    0.79    0.00  T_RCVD_IN_MSPIKE_ZBI  

Components of _BL:
      0        0        0   0.500    0.48    0.00  T_RCVD_IN_MSPIKE_L2  
      0        0        0   0.500    0.48    0.00  T_RCVD_IN_MSPIKE_L3  
      0  48.1830   0.0059   1.000    0.98    0.00  T_RCVD_IN_MSPIKE_L4  
      0  25.5962   0.0007   1.000    0.97    0.00  T_RCVD_IN_MSPIKE_L5  

Components of _WL:
      0   0.1684  13.3764   0.012    0.72    0.00  T_RCVD_IN_MSPIKE_H2  
      0   0.0241   6.9795   0.003    0.84    0.00  T_RCVD_IN_MSPIKE_H3  
      0   0.0010   0.0355   0.029    0.50    0.00  T_RCVD_IN_MSPIKE_H4  
      0        0   0.0022   0.000    0.48    0.00  T_RCVD_IN_MSPIKE_H5  

Somebody should create a graph, with number of randomly sampled emails from the
corpora on one axis, and accuracy rate on the other axis.  Get some actual
numbers related to how much email we need for what accuracy.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #42 from Kevin A. McGrail <km...@pccc.com> 2012-01-20 16:23:12 UTC ---
(In reply to comment #41)
> I'm getting the scores from 72_scores.cf, all 0.001.

Interesting.  So I think I need to move this all from sandbox to rules.  That's
my best guess.

[root@devel trunk]# svn commit -m 'Remove mailspike from sandbox and moved to
rules.  remove mspike rules from force active.  Trying to fix the duplicate
scores in 72_scores.cf and 50_scores.cf.'
Sending        lib/Mail/SpamAssassin/Plugin/Bayes.pm
Adding         rules/20_mailspike.cf
Sending        rulesrc/10_force_active.cf
Deleting       rulesrc/sandbox/wtogami/20_mailspike.cf
Transmitting file data ..
Committed revision 1233974.

Ignore the Bayes plug-in change.  I'm reverting that. 

Let's see if this works.  If anyone has a better idea, I'm all ears.

Regards,
KAM

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #17 from Kevin A. McGrail <km...@pccc.com> 2011-11-30 17:36:06 UTC ---
(In reply to comment #16)
> (In reply to comment #14)
> > > Wait, you want to *never* include mailspike in the existing 3.3.* releases, and
> > > create a separate rule set for 3.4.*?  Because of increasing the network load
> > > by 1 DNS lookup, for a very useful looking RBL?  That sounds... not good.
> > 
> > *shrug*  Sounds good to me.
> 
> Forgot to ask:  Do we even have a remote idea how to do this?  Maintain updates
> of two separate rule sets?

I don't and would view the version encapsulation as the best solution.  I'm
still working on getting a single ruleset published as not funny as that is.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6400] GA feedback for Mailspike DNSBL

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Darxus@ChaosReigns.com

--- Comment #1 from Darxus <Da...@ChaosReigns.com> 2011-10-11 23:45:33 UTC ---
You're just looking for optimal scores to recommend to spamassassin users?  I
should be able to do that for you.  I could run the re-scorer, but I think
maybe it would be better to provide you with a score that would maximize
accuracy given the existing scores of the rest of the tests, instead of doing a
full re-scorer run?  Feel free to remind me.  I don't think we have enough
masscheck data to pick accurate scores for your L* and H* rules, but your _WL
and _BL rules shouldn't be a problem.

Looks like these rules moved to
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_mailspike.cf

RuleQA results for them: 
http://ruleqa.spamassassin.org/?daterev=20111008-r1180336-n&rule=%2Fmspike

Warren, you're recommending using RCVD_IN_MSPIKE_BL on spamtips.org, why isn't
it in the default rule set?  In the last masscheck run, it scored better than
RCVD_IN_PSBL.  MSPIKE_WL stuff isn't looking very good though, and for some
reason it's an additional DNS query.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.