You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2010/03/31 00:03:09 UTC
[Bug 6397] New: Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397
Summary: Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: All
Status: NEW
Severity: minor
Priority: P5
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: antispam@khopis.com
3.3.1 features both of these near-identical rules.
>From updates.spamassassin.org at 928644,
header RCVD_HELO_IP_MISMATCH eval:helo_ip_mismatch()
describe RCVD_HELO_IP_MISMATCH Received: HELO and IP do not match, but should
header __TWO_IPS_RCVD Received =~
/[\[\(\s]((?!(?:10|127|169\.254|172\.(?:1[6-9]|2[0-9]|3[01])|192\.168)\.)(?:[12]?\d\d?\.){3}[12]?\d\d?)[\[\(\s][^\[\n;,]{0,99}\[(?!\1)\d/
meta TWO_IPS_RCVD __TWO_IPS_RCVD && !ALL_TRUSTED
describe TWO_IPS_RCVD Received: Relay identifies itself as wrong IP
score RCVD_HELO_IP_MISMATCH 1.680 1.186 2.362 2.368
score TWO_IPS_RCVD 0.001 2.764 0.001 2.764
>From yesterday's mass-check:
MSECS SPAM% HAM% S/O RANK SCORE NAME
0 0.3273 0.0008 0.998 0.78 1.68 RCVD_HELO_IP_MISMATCH
0 0.3277 0.0077 0.977 0.77 (n/a) __TWO_IPS_RCVD
0 0.3277 0.0077 0.977 0.77 1.00 TWO_IPS_RCVD
overlap spam: 98% of RCVD_HELO_IP_MISMATCH hits also hit TWO_IPS_RCVD; 98% of
TWO_IPS_RCVD hits also hit RCVD_HELO_IP_MISMATCH
overlap ham: 100% of RCVD_HELO_IP_MISMATCH hits also hit TWO_IPS_RCVD; 10% of
TWO_IPS_RCVD hits also hit RCVD_HELO_IP_MISMATCH
Problem: these rules almost always score together, which sums to a really high
score with network checks enabled (3.950 or 5.132 with Bayes).
We can remove TWO_IPS_RCVD since it performs slightly worse, or we can remove
RCVD_HELO_IP_MISMATCH since that would enable shrinking the eval code. The 2%
non-overlap equates to 0.0065% of the spam corpus, which hardly seems
worthwhile. The lack of matching between their ham is unexpected but
negligible anyway since it's a difference of 18 hams.
(I designed TWO_IPS_RCVD not knowing about RCVD_HELO_IP_MISMATCH.)
See also bug 4188 which sites SARE_RECV_SUSP_3, a third-party rule similar to
TWO_IPS_RCVD.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397
Kevin A. McGrail <km...@pccc.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |kmcgrail@pccc.com
Resolution| |FIXED
--- Comment #3 from Kevin A. McGrail <km...@pccc.com> 2012-01-18 23:03:45 UTC ---
TWO_IPS_RCVD already appears gone so this was resolved in the past.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397
Adam Katz <an...@khopis.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |antispam@khopis.com
--- Comment #2 from Adam Katz <an...@khopis.com> 2010-04-12 18:54:21 EDT ---
(In reply to comment #1)
> (In reply to comment #0)
> > Problem: these rules almost always score together, which sums to a really
> > high score with network checks enabled (3.950 or 5.132 with Bayes).
> >
> > We can remove TWO_IPS_RCVD since it performs slightly worse, [...]
>
> If this is your opinion about your own rule, why not simply remove or
> tflags nopublish it?
The part you truncated contained my logic and asked if it was worthwhile:
> > We can remove TWO_IPS_RCVD since it performs slightly worse, or we can
> > remove RCVD_HELO_IP_MISMATCH since that would enable shrinking the eval
> > code.
If this is not considered worthwhile, I'll remove my rule.
Also note this is a bug in the GA, which apparently fails to notice (and/or
account for) such overwhelming overlap, bringing up the question of whether it
should be specifically sought out rather than depending on inter-generational
differences. (This would change this bug's Component category to RuleQA or
Score Generation.)
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397
--- Comment #1 from Karsten Bräckelmann <gu...@rudersport.de> 2010-04-01 14:24:38 UTC ---
(In reply to comment #0)
> Problem: these rules almost always score together, which sums to a really high
> score with network checks enabled (3.950 or 5.132 with Bayes).
>
> We can remove TWO_IPS_RCVD since it performs slightly worse, [...]
If this is your opinion about your own rule, why not simply remove or tflags
nopublish it?
> (I designed TWO_IPS_RCVD not knowing about RCVD_HELO_IP_MISMATCH.)
$ find rules* -name '*.cf' | xargs grep -l TWO_IPS_RCVD
rulesrc/sandbox/khopesh/20_khop_general.cf
rulesrc/scores/72_scores.cf
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.