You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/05/02 01:38:05 UTC

[Bug 6580] New: several FP-prone rules are scored too high

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

             Bug #: 6580
           Summary: several FP-prone rules are scored too high
           Product: Spamassassin
           Version: 3.3.1
          Platform: PC
        OS/Version: Windows 7
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: lawrencewilliams@nl.rogers.com
    Classification: Unclassified


Several default SA rules that are likely to FP are scored too high.
Unfortunately, I don't have enough time to find specific examples of e-mails
for each one. I have copied the contents of the rules-override file I use on
our servers. The comments should hopefully give some idea as to where FPs
occur.

# Many mailings discuss how to be removed
# 3.0 is too much for a system with 5.0 spam score
# Bump down to 0.001 for now
score EXCUSE_REMOVE 0.001

# Hits upon e-mails from Windows Live notification
# score to 0.001 for now
score RCVD_ILLEGAL_IP 0.001

# Many valid e-mails come from an offers@ address
# score it 0.001 for now
score FROM_OFFERS 0.001

# Alot of people forward messages from their cell phones, with no Subject
# score it 0.001 for now
score MISSING_SUBJECT 0.001

# Hits upon legit mail with caps subjects
# Reduce to 0.001
score SUBJ_ALL_CAPS 0.001

# Hits on valid senders too hard
score TO_NO_BRKTS_DIRECT 0.001

# Hits on e-mail TO 7097370906@message.aliant.net FROM
7097223730@message.aliant.net THAT is HTML-only
# This causes lots of FPs
score TO_EQ_FM_DOM_HTML_ONLY 0.001

# Only legit mail seems to set this off
# Spammers are not foolish enough to use a freemail
# Reply To: that differs from the From:
score FREEMAIL_FORGED_REPLYTO 0.001

Regards,
Lawrence

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #6 from Lawrence <la...@nl.rogers.com> 2011-05-02 06:40:09 UTC ---
Sorry I couldn't post the body, but the rest of the e-mail was sensitive.

I would think that the majority of SA users have basic DNS checks for SPF and
DKIM enabled??

Thanks for working on this. If I had more time, I'd try to help in other ways
and contribute back to the SA project.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #9 from John Hardin <jh...@impsec.org> ---
(In reply to Mark London from comment #8)
> The FROM_OFFERS rule should be removed.

Removed FROM_OFFERS from static scores file, so that the GA rescorer can score
it based on actual performance against the current corpora.

Revision 1540110

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

John Hardin <jh...@impsec.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jhardin@impsec.org

--- Comment #1 from John Hardin <jh...@impsec.org> 2011-05-02 04:13:42 UTC ---
(In reply to comment #0)
> Several default SA rules that are likely to FP are scored too high.
> Unfortunately, I don't have enough time to find specific examples of e-mails
> for each one. I have copied the contents of the rules-override file I use on
> our servers.

I'd suggest setting their scores to 0.001 is an overreaction.

> # Hits on valid senders too hard
> score TO_NO_BRKTS_DIRECT 0.001
> 
> # Hits on e-mail TO 7097370906@message.aliant.net FROM
> 7097223730@message.aliant.net THAT is HTML-only
> # This causes lots of FPs
> score TO_EQ_FM_DOM_HTML_ONLY 0.001

I've been putting a lot of work into reducing the FPs on my rules, but rule
updates have been frustratingly sporadic.

Also, the results will only as good as the masscheck corpus. If anyone can
provide ham to the corpus that such rules FP on it will greatly help.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #4 from Lawrence <la...@nl.rogers.com> 2011-05-02 04:51:47 UTC ---
Hi John,

For that particular rule, I found a ham sample that it triggers on

Return-path: <70...@message.aliant.net>
Envelope-to: info@eastcom.ca
Delivery-date: Mon, 21 Feb 2011 20:45:52 -0330
Received: from mawiw-nb02s0.aliant.net ([198.164.4.81])
    by athena.lcwsoft.com with esmtp (Exim 4.69)
    (envelope-from <70...@message.aliant.net>)
    id 1Prfuk-0002Ei-JI
    for info@eastcom.ca; Mon, 21 Feb 2011 20:45:43 -0330
To: 
 7097370906@message.aliant.net
From: 
   "7097536520"  <70...@message.aliant.net>
Received: from unknown (HELO ah02.acds1-sjnb.aliant.icn) ([192.168.139.131])
  by mawiw-nb02m1.aliant.icn with ESMTP; 21 Feb 2011 20:24:49 -0400
x-cds-complex-name: acds1-sjnb
Message-ID: <4d...@ah02.acds1-sjnb.aliant.icn>
Subject: =?iso-8859-1?Q?fax_from_7097536520?=
Date: Mon, 21 Feb 2011 20:15:34 -0400
MIME-Version: 1.0
Content-Type: multipart/mixed;
  boundary="=====>>MDAS 1298333734<<====="
X-cPanel-MailScanner-Information: Please contact the ISP for more information
X-cPanel-MailScanner-ID: 1Prfuk-0002Ei-JI
X-cPanel-MailScanner: Found to be clean
X-cPanel-MailScanner-SpamCheck: not spam, SpamAssassin (not cached,
    score=1.816, required 6, HTML_IMAGE_ONLY_20 0.70, HTML_MESSAGE 0.00,
    MIME_HTML_ONLY 1.10, SPF_PASS -0.00, TO_EQ_FM_DOM_HTML_ONLY 0.00,
    T_REMOTE_IMAGE 0.01)
X-cPanel-MailScanner-SpamScore: s
X-cPanel-MailScanner-From: 7097536520@message.aliant.net
X-Spam-Status: No

It hits SPF_PASS, which may be a better solution that making an exception.

Thoughts?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #7 from John Hardin <jh...@impsec.org> 2011-05-02 13:36:03 UTC ---
(In reply to comment #6)
> Sorry I couldn't post the body, but the rest of the e-mail was sensitive.

That's okay, an HTML body is easy to fake up.

> I would think that the majority of SA users have basic DNS checks for SPF and
> DKIM enabled??

After some non-midnight-oil thought, I've decided the adding !SPF_HIT and
!DKIM_VALID don't really drag in net dependencies. They won't act to suppress
FPs in a non-net install, and will help if network is enabled. I will make a
pass through my rules adding them where appropriate.

In this specific case the phone-to-phone avoidance does work, and isn't
net-based.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #2 from Lawrence <la...@nl.rogers.com> 2011-05-02 04:34:31 UTC ---
Perhaps, but my limited knowledge of the rules in question made 0.001 the only
safe number I could use without causing too many possible FPs for our
production servers.

Unfortunately, most of these rescores are from 9 months ago, and we have
policies in place to only retain 6 months of e-mail, so I cannot provide ham
samples right now. Some of them should be pretty obvious though. Example:
TO_EQ_FM_DOM_HTML_ONLY, which is hit mostly by messaging services provided by
cell providers AND the message is HTML-only. Our national provider, Bell
Canada, sends messages in this format sometimes.

Regards,
Lawrence

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Re: [Bug 6580] several FP-prone rules are scored too high

Posted by Axb <ax...@gmail.com>.
On 11/08/2013 05:13 PM, John Hardin wrote:
> On Fri, 8 Nov 2013, bugzilla-daemon@issues.apache.org wrote:
>
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580
>>
>> Mark London <mr...@psfc.mit.edu> changed:
>>
>>           What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>
>>                 CC|                            |mrl@psfc.mit.edu
>>
>> --- Comment #8 from Mark London <mr...@psfc.mit.edu> ---
>> The FROM_OFFERS rule should be removed.  It's not useful in flagging
>> any spam
>> that I've come across.  I've searched several months of my mail
>> server's log
>> files.  In fact, it's only flagging valid email.  Credit card sites,
>> and aarp
>> are being tagged by this rule.
>
> The S/O for this rule is only .75 - .85 in recent masschecks, which is
> probably too high for the 2.5+ points fixed score; the S/O was 1.00 when
> the fixed score was assigned.
>
> Should we remove this from 50_scores.cf and let the GA score it?

+1 for that.


Re: [Bug 6580] several FP-prone rules are scored too high

Posted by John Hardin <jh...@impsec.org>.
On Fri, 8 Nov 2013, bugzilla-daemon@issues.apache.org wrote:

> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580
>
> Mark London <mr...@psfc.mit.edu> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |mrl@psfc.mit.edu
>
> --- Comment #8 from Mark London <mr...@psfc.mit.edu> ---
> The FROM_OFFERS rule should be removed.  It's not useful in flagging any spam
> that I've come across.  I've searched several months of my mail server's log
> files.  In fact, it's only flagging valid email.  Credit card sites, and aarp
> are being tagged by this rule.

The S/O for this rule is only .75 - .85 in recent masschecks, which is 
probably too high for the 2.5+ points fixed score; the S/O was 1.00 when 
the fixed score was assigned.

Should we remove this from 50_scores.cf and let the GA score it?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   There is no better measure of the unthinking contempt of the
   environmentalist movement for civilization than their call to
   turn off the lights and sit in the dark.            -- Sultan Knish
-----------------------------------------------------------------------
  3 days until Veterans Day

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

Mark London <mr...@psfc.mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mrl@psfc.mit.edu

--- Comment #8 from Mark London <mr...@psfc.mit.edu> ---
The FROM_OFFERS rule should be removed.  It's not useful in flagging any spam
that I've come across.  I've searched several months of my mail server's log
files.  In fact, it's only flagging valid email.  Credit card sites, and aarp
are being tagged by this rule.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

Lawrence <la...@nl.rogers.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lawrencewilliams@nl.rogers.
                   |                            |com
         OS/Version|Windows 7                   |Linux

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #3 from John Hardin <jh...@impsec.org> 2011-05-02 04:40:32 UTC ---
(In reply to comment #2)
> Some of them should be pretty obvious though. Example:
> TO_EQ_FM_DOM_HTML_ONLY, which is hit mostly by messaging services provided by
> cell providers AND the message is HTML-only. Our national provider, Bell
> Canada, sends messages in this format sometimes.

Agreed, and thanks for providing that example. I've added FP avoidance to those
rules for "from and to address are both purely numeric", for the phone-to-phone
case. That'll be committed shortly (tomorrow?) after my local testing finishes.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6580] several FP-prone rules are scored too high

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6580

--- Comment #5 from John Hardin <jh...@impsec.org> 2011-05-02 05:50:23 UTC ---
(In reply to comment #4)
> Hi John,
> 
> For that particular rule, I found a ham sample that it triggers on

Thanks.

> It hits SPF_PASS, which may be a better solution that making an exception.
> 
> Thoughts?

...possibly, but those rules are not now network tests; adding !SPF_PASS and/or
!DKIM_VALID would make them net tests, which I'd like to avoid if possible.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.