You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2010/03/31 00:03:09 UTC

[Bug 6397] New: Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397

           Summary: Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Rules
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: antispam@khopis.com


3.3.1 features both of these near-identical rules.
>From updates.spamassassin.org at 928644,

header   RCVD_HELO_IP_MISMATCH  eval:helo_ip_mismatch()
describe RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but should

header   __TWO_IPS_RCVD Received =~
/[\[\(\s]((?!(?:10|127|169\.254|172\.(?:1[6-9]|2[0-9]|3[01])|192\.168)\.)(?:[12]?\d\d?\.){3}[12]?\d\d?)[\[\(\s][^\[\n;,]{0,99}\[(?!\1)\d/

meta     TWO_IPS_RCVD   __TWO_IPS_RCVD && !ALL_TRUSTED
describe TWO_IPS_RCVD   Received: Relay identifies itself as wrong IP

score RCVD_HELO_IP_MISMATCH  1.680 1.186 2.362 2.368
score TWO_IPS_RCVD           0.001 2.764 0.001 2.764

>From yesterday's mass-check:

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
      0   0.3273   0.0008   0.998    0.78    1.68  RCVD_HELO_IP_MISMATCH  
      0   0.3277   0.0077   0.977    0.77   (n/a)  __TWO_IPS_RCVD  
      0   0.3277   0.0077   0.977    0.77    1.00  TWO_IPS_RCVD  

overlap spam:  98% of RCVD_HELO_IP_MISMATCH hits also hit TWO_IPS_RCVD; 98% of
TWO_IPS_RCVD hits also hit RCVD_HELO_IP_MISMATCH 
overlap  ham: 100% of RCVD_HELO_IP_MISMATCH hits also hit TWO_IPS_RCVD; 10% of 
TWO_IPS_RCVD hits also hit RCVD_HELO_IP_MISMATCH 


Problem:  these rules almost always score together, which sums to a really high
score with network checks enabled (3.950 or 5.132 with Bayes).

We can remove TWO_IPS_RCVD since it performs slightly worse, or we can remove
RCVD_HELO_IP_MISMATCH since that would enable shrinking the eval code.  The 2%
non-overlap equates to 0.0065% of the spam corpus, which hardly seems
worthwhile.  The lack of matching between their ham is unexpected but
negligible anyway since it's a difference of 18 hams.

(I designed TWO_IPS_RCVD not knowing about RCVD_HELO_IP_MISMATCH.)

See also bug 4188 which sites SARE_RECV_SUSP_3, a third-party rule similar to
TWO_IPS_RCVD.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |kmcgrail@pccc.com
         Resolution|                            |FIXED

--- Comment #3 from Kevin A. McGrail <km...@pccc.com> 2012-01-18 23:03:45 UTC ---
TWO_IPS_RCVD already appears gone so this was resolved in the past.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #2 from Adam Katz <an...@khopis.com> 2010-04-12 18:54:21 EDT ---
(In reply to comment #1)
> (In reply to comment #0)
> > Problem:  these rules almost always score together, which sums to a really
> > high score with network checks enabled (3.950 or 5.132 with Bayes).
> > 
> > We can remove TWO_IPS_RCVD since it performs slightly worse, [...]
> 
> If this is your opinion about your own rule, why not simply remove or
> tflags nopublish it?

The part you truncated contained my logic and asked if it was worthwhile:
> > We can remove TWO_IPS_RCVD since it performs slightly worse, or we can
> > remove RCVD_HELO_IP_MISMATCH since that would enable shrinking the eval
> > code.

If this is not considered worthwhile, I'll remove my rule.


Also note this is a bug in the GA, which apparently fails to notice (and/or
account for) such overwhelming overlap, bringing up the question of whether it
should be specifically sought out rather than depending on inter-generational
differences.  (This would change this bug's Component category to RuleQA or
Score Generation.)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6397] Compare RCVD_HELO_IP_MISMATCH with TWO_IPS_RCVD

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6397

--- Comment #1 from Karsten Bräckelmann <gu...@rudersport.de> 2010-04-01 14:24:38 UTC ---
(In reply to comment #0)
> Problem:  these rules almost always score together, which sums to a really high
> score with network checks enabled (3.950 or 5.132 with Bayes).
> 
> We can remove TWO_IPS_RCVD since it performs slightly worse, [...]

If this is your opinion about your own rule, why not simply remove or tflags
nopublish it?

> (I designed TWO_IPS_RCVD not knowing about RCVD_HELO_IP_MISMATCH.)

$ find rules* -name '*.cf' | xargs grep -l TWO_IPS_RCVD
rulesrc/sandbox/khopesh/20_khop_general.cf
rulesrc/scores/72_scores.cf

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.