You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/02/29 20:37:09 UTC

[Bug 5842] New: Should SPF rules be "tflags net" or not?

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842

           Summary: Should SPF rules be "tflags net" or not?
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Score Generation
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: spamassassin@dostech.ca


Now that bug 5239 enables the SPF plugin to reuse results from Received-SPF
headers, should SPF rules be "tflags net" or not?  The rules can work without
network tests enabled.

For 3.2 we generated scores without "tflags net".  My r588457.

This was reverted in jm's r596095.

Now I'm not sure what to do.  We need to generate scores for the rules for set0
(so they shouldn't have tflags net) but those scores probably aren't going to be
very accurate since I don't think many of the mass-check contributors have
Received-SPF headers in their mail.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From spamassassin@dostech.ca  2008-03-03 20:37 -------
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)

> > > what do we need to do to record Received-SPF: btw?
> I meant, what do we need to do *in our MTA configurations* ;)

Pick a milter, any milter:

http://svn.perl.org/viewvc/qpsmtpd/trunk/plugins/sender_permitted_from?view=markup

http://www.openspf.org/Implementations

> Now that I think about it though, I don't do any SPF lookups in any of my MTAs;
> I leave that to SpamAssassin.  So maybe we should add support for recording it
> (if there isn't a header already there).  Then we *can* use this header as a
> way to #reuse SPF lookups, *and* we are more standards-compliant (since I think
> that is dictated in the std).  That would help with generating scores for set0
> at least.

"It is RECOMMENDED that SMTP receivers record the result of SPF processing in
the message header."

We could, I guess.  It'd give us a positive indication that the SPF checks were
actually done for domains that don't publish SPF records.

> In the meantime I think it'd be acceptable to make these a special case.
> Generate their scores for set2/3, then simply copy those to set0/1.

1/3 and 0/2 :)

> if we
> don't have the data, we can't trust the GA, but we should be able to trust that
> the S/O ratios will be the same (since it's the same domains and the same
> lookup logic!).

Yeah, we're probably going to have to do some copying.  I'm not sure who's
corpus was responsible for the scores that were generated for 3.2.

> However that doesn't solve the core issue -- "tflags net".  we need to keep the
> network lookup code running with "tflags net". This is necessary for the
> --reuse support, so that it knows to set the rule score to 0 when attempting to
> reuse hits.

Really?  Isn't that what #reuse is supposed to be taking care of?  Why a dual
dependency on both #reuse and "tflags net"?

> However as you note, this means the code doesn't run in set0 at all.
> 
> What I did in the past with the ROUND_THE_WORLD test was to split it into two
> rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the
> former set2. What about doing that with the SPF rules -- adding a duplicate
> ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.?  (better names welcome of
> course.)

I *really* want to avoid different rules like above.  It'll confuse people and
cause those who aren't confused to write metas to combine the two versions.

> We could even move the current set to __SPF_PASS_LOCAL, __SPF_PASS_NET
> and combine them into a new SPF_PASS meta rule.  This would be
> --reuse-compatible, and clearly delineate set0 and set2 rules.

This might complicate score generation further. :(



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From jm@jmason.org  2008-03-03 01:53 -------
(In reply to comment #2)
> (In reply to comment #1)
> > I say yes, they should be tflags net.  it's exactly analogous to the DNSBL
> > rules; they are network lookups, which can record their results in a header (the
> > reuse data), which can then be reused in set 0 if --reuse is specified.
> > IMO that makes sense for the SPF rules (although the recorded results are
> > put in Received-SPF:).
> 
> I'm not sure I'd consider them exactly analogous.  With a mail system that adds
> Received-SPF headers before SA sees the message SPF results are used (in all,
> even non-net, scoresets) without SA *ever* doing the actual network checks.

ok, not exactly analogous.  Just partially ;)

> Reuse of the data later in mass-checks, etc, isn't of concern.  I'm not aware of
> --reuse applying to spamassassin or spamd.

true.

> > what do we need to do to record Received-SPF: btw?
> 
> Record Received-SPF?  We don't.  This would be added by a mail system that
> processes the mail before SA sees it.  For example, see the headers of any mail
> I've sent to an ASF mailling list.

I meant, what do we need to do *in our MTA configurations* ;)

Now that I think about it though, I don't do any SPF lookups in any of my MTAs;
I leave that to SpamAssassin.  So maybe we should add support for recording it
(if there isn't a header already there).  Then we *can* use this header as a
way to #reuse SPF lookups, *and* we are more standards-compliant (since I think
that is dictated in the std).  That would help with generating scores for set0
at least.

In the meantime I think it'd be acceptable to make these a special case.
Generate their scores for set2/3, then simply copy those to set0/1.    if we
don't have the data, we can't trust the GA, but we should be able to trust that
the S/O ratios will be the same (since it's the same domains and the same
lookup logic!).

However that doesn't solve the core issue -- "tflags net".  we need to keep the
network lookup code running with "tflags net". This is necessary for the
--reuse support, so that it knows to set the rule score to 0 when attempting to
reuse hits.   However as you note, this means the code doesn't run in set0 at
all.

What I did in the past with the ROUND_THE_WORLD test was to split it into two
rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the
former set2. What about doing that with the SPF rules -- adding a duplicate
ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.?  (better names welcome of
course.) We could even move the current set to __SPF_PASS_LOCAL, __SPF_PASS_NET
and combine them into a new SPF_PASS meta rule.  This would be
--reuse-compatible, and clearly delineate set0 and set2 rules.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From jm@jmason.org  2008-03-01 13:33 -------
I say yes, they should be tflags net.  it's exactly analogous to the DNSBL
rules; they are network lookups, which can record their results in a header (the
reuse data), which can then be reused in set 0 if --reuse is specified.
IMO that makes sense for the SPF rules (although the recorded results are
put in Received-SPF:).

what do we need to do to record Received-SPF: btw?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From spamassassin@dostech.ca  2008-03-01 21:16 -------
(In reply to comment #1)
> I say yes, they should be tflags net.  it's exactly analogous to the DNSBL
> rules; they are network lookups, which can record their results in a header (the
> reuse data), which can then be reused in set 0 if --reuse is specified.
> IMO that makes sense for the SPF rules (although the recorded results are
> put in Received-SPF:).

I'm not sure I'd consider them exactly analogous.  With a mail system that adds
Received-SPF headers before SA sees the message SPF results are used (in all,
even non-net, scoresets) without SA *ever* doing the actual network checks.

Reuse of the data later in mass-checks, etc, isn't of concern.  I'm not aware of
--reuse applying to spamassassin or spamd.

> what do we need to do to record Received-SPF: btw?

Record Received-SPF?  We don't.  This would be added by a mail system that
processes the mail before SA sees it.  For example, see the headers of any mail
I've sent to an ASF mailling list.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From spamassassin@dostech.ca  2008-03-04 11:09 -------
(In reply to comment #5)
> well, we had no form of reuse for SPF, so the scores were based on whatever
> SPF records were in place at the time of mass-check.  in my opinion, basing
> scores on this is better than on no data at all.

Actually we did have #reuse for the SPF tests.  I was actually questioning where
the set0/2 scores for the SPF rules came from, but of course, they were
generated based on the results of single set3 mass-check (just like any other
set0/2 rules).  So that's fine (assuming all/most of the corpus submitters have
SPF checks enabled).

> We could modify the code so that it knows that #reuse implies "tflags net",
> sure.  basically I didn't want the effects of #reuse to be too widespread
> in the main Mail::SpamAssassin classes, since it's only supposed to affect
> mass-checks.

I can't think of any reason why #reuse must or should only apply to net rules. 
If you wanted to #reuse bayes rules (which I've been considering for a while) I
think you should be able to.  Same goes for any other rule if it makes sense for
that rule.

> I see your point -- they are pretty messy.  other suggestions?

I don't see what the harm would be of *not* having "tflags net" for the SPF
rules and just having mass-check #reuse whatever we tell it to.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842





------- Additional Comments From jm@jmason.org  2008-03-04 01:21 -------
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (In reply to comment #1)
> > if we don't have the data, we can't trust the GA, but we should be able to
> > trust that the S/O ratios will be the same (since it's the same domains and
> > the same lookup logic!).
> 
> Yeah, we're probably going to have to do some copying.  I'm not sure who's
> corpus was responsible for the scores that were generated for 3.2.

well, we had no form of reuse for SPF, so the scores were based on whatever
SPF records were in place at the time of mass-check.  in my opinion, basing
scores on this is better than on no data at all.

> > However that doesn't solve the core issue -- "tflags net".  we need to keep
> > the network lookup code running with "tflags net". This is necessary for
> > the --reuse support, so that it knows to set the rule score to 0 when
> > attempting to reuse hits.
> 
> Really?  Isn't that what #reuse is supposed to be taking care of?  Why a dual
> dependency on both #reuse and "tflags net"?

We could modify the code so that it knows that #reuse implies "tflags net",
sure.  basically I didn't want the effects of #reuse to be too widespread
in the main Mail::SpamAssassin classes, since it's only supposed to affect
mass-checks.

> > What I did in the past with the ROUND_THE_WORLD test was to split it into two
> > rules, ROUND_THE_WORLD and ROUND_THE_WORLD_LOCAL; the latter was set0, the
> > former set2. What about doing that with the SPF rules -- adding a duplicate
> > ruleset for SPF_PASS_LOCAL, SPF_NEUTRAL_LOCAL etc.?  (better names welcome of
> > course.)
> 
> I *really* want to avoid different rules like above.  It'll confuse people and
> cause those who aren't confused to write metas to combine the two versions.

I see your point -- they are pretty messy.  other suggestions?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5842] Should SPF rules be "tflags net" or not?

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5842


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.3.0






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.