You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2021/10/05 20:07:27 UTC

[Bug 7933] New: Catch really old mails

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

            Bug ID: 7933
           Summary: Catch really old mails
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Rules
          Assignee: dev@spamassassin.apache.org
          Reporter: jidanni@jidanni.org
  Target Milestone: Undefined

Maybe old dates like: 
Date: Mon, 06 Jul 2020 11:09:58 -0700 (PDT)
should trigger something.

"Hopdelta" says:
Sender                                             Recipient                   
                      Time                   Delta
Start                                              gmail.com                   
                      02:09:58 2020/07/07
[127.0.1.1]                                        smtp.gmail.com              
                      02:09:58 2020/07/07     0s
PDT                                                mail-wm1-x331.google.com    
                      02:09:58 2020/07/07     0s
mail-wm1-x331.google.com                           shenron.openstreetmap.org   
                      02:10:00 2020/07/07     2s
shenron.openstreetmap.org                          100.96.133.195              
                      22:53:01 2021/10/05     1s  43m  20h 455d
postfix-inbound-0.inbound.mailchannels.net        
pdx1-sub0-mail-mx22.g.dreamhost.com                22:53:02 2021/10/05     1s

Maybe even a 0.1 score would be good.
No I don't know what is old enough: one week, one month, one year?
Maybe separate rules for each.

Also some folks would in fact like to give it a negative score.
Well if there was a rule for it then they could.
Else they would need to make a fancy parser...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7933] Catch really old mails

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

Loren Wilton <lw...@earthlink.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lwilton@earthlink.net

--- Comment #5 from Loren Wilton <lw...@earthlink.net> ---
I would agree that having one or more checks against he latest received date
would be handy. 

I've also seen a few cases were even the latest received date is bogus (I'm not
an ISP), so an ability to check against the SA system date would be nice. I
trust my own system date.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7933] Catch really old mails

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

Bill Cole <bi...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |billcole@apache.org

--- Comment #1 from Bill Cole <bi...@apache.org> ---
(In reply to jidanni from comment #0)
> Maybe old dates like: 
> Date: Mon, 06 Jul 2020 11:09:58 -0700 (PDT)
> should trigger something.

Like the DATE_IN_PAST_* rules? 

Can you provide an example of a message that doesn't hit any of those which you
think should be hit by a new rule?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7933] Catch really old mails

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

--- Comment #4 from jidanni@jidanni.org ---
Well fine, perhaps change
>   describe DATE_IN_PAST_96_XX  Date: is 96 hours or more before Received: date
<   describe DATE_IN_PAST_96_XX  Date: is 96 hours or more before EARLIEST
Received: date

Anyway maybe there should be a
<   describe DATE_IN_PAST_96_XX2 Date: is 96 hours or more before LATEST
Received: date
to really catch them all, even if they aren't spam. Perhaps score 0.1
for now.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7933] Catch really old mails

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

--- Comment #3 from Bill Cole <bi...@apache.org> ---
(In reply to jidanni from comment #2)
> Created attachment 5755 [details]
> Old mail not detected
> 
> Why doesn't this trigger
> 
> header DATE_IN_PAST_96_XX	eval:check_for_shifted_date('undef', '-96')
> describe DATE_IN_PAST_96_XX	Date: is 96 hours or more before Received: date

Good question... 

If I'm reading the code correctly, the reason for this is that there are
plausible and parseable Received headers which have times close to the Date
header. If I strip out the Received headers from 2020, it triggers that rule.

The comments in the code imply that not using the smallest Date/Received
difference resulted in false positives. 

Since DATE_IN_PAST_96_XX and its siblings are fairly strong rules with scores
set by the RuleQA process (current scores for DATE_IN_PAST_96_XX: 2.600 2.070
1.233 3.405)  I do not believe it would be polite to users to modify the
behavior of the underlying eval function at this point. It currently is a
measurement of the apparent delay between message composition and initial
submission, not of total transit time. RuleQA shows that metric correlating
rather well with spamminess.

It may be useful to add a different test that looks at a more strictly
specified date comparison, such as using the last Received header or the last
"trusted" Received header instead of the current practice of using the smallest
time delta  in a parseable Received header relative to the Date header. That
would require a new eval in Plugin/HeaderEval.pm. Whether a measurement of
putative total transit time actually correlates either way to ham or spam is
anyone's guess. In the sample case, it seems likely to me that the message is
not spam, but rather some sort of re-injected mail originally sent to a
discussion list.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7933] Catch really old mails

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7933

jidanni@jidanni.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jidanni@jidanni.org

--- Comment #2 from jidanni@jidanni.org ---
Created attachment 5755
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5755&action=edit
Old mail not detected

Why doesn't this trigger

header DATE_IN_PAST_96_XX       eval:check_for_shifted_date('undef', '-96')
describe DATE_IN_PAST_96_XX     Date: is 96 hours or more before Received: date

-- 
You are receiving this mail because:
You are the assignee for the bug.