You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/05/29 18:38:50 UTC
[Bug 4927] New: Suggesting a rule to test for double Subject or double From
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
Summary: Suggesting a rule to test for double Subject or double
From
Product: Spamassassin
Version: 3.1.2
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P5
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: Mark.Martinec@ijs.si
After noticing a spam message with two Subject header fields that
got through, I tested all our site's mail traffic for couple of days,
watching for message with multiple occurrences of header fields,
which (according to RFC 2822) may occur at most once.
Here is a suggested new rule:
header __DOUBLE_SUBJ ALL =~ /^Subject:.*^Subject:/smi
header __DOUBLE_FROM ALL =~ /^From:.*^From:/smi
meta DOUBLE_SUBJ_OR_FROM __DOUBLE_SUBJ || __DOUBLE_FROM
describe DOUBLE_SUBJ_OR_FROM Contains more than one Subject or From header
score DOUBLE_SUBJ_OR_FROM 2.0
Here is the analysis.
First, looking at messages counts with multiple header fields:
count multiple header fields present
----- ------------------------------
160 Subject
173 From
122 From AND Subject
333 From OR Subject
37 Subject AND NOT From
52 From AND NOT Subject
47 Message-ID
6 Reply-To
5 Sender
6 To
0 Cc
Seems line multiple Cc, To, Sender and Reply-To are infrequent
and probably not worth the trouble.
Multiple Message-ID occur more frequently, but according to attached
diagram seem to occur in non-spam mail as well(?), so it seems it can
trigger false positives (but it may be useful to re-evaluate this).
Presence of multiple From or multiple Subject header fields seem to be
a very good indication of spam, with not a single FP in my three-day
sample. The two messages that did score below 5 were manually re-checked
and turned out to be spam or a crippled spam message.
A remaining question is how to combine __DOUBLE_SUBJ and __DOUBLE_FROM
tests. To score each one individually, or to score on a metarule on some
combination of the two (OR, AND, AND NOT).
Manually checking messages that match 'Subject AND NOT From'
as well as 'From AND NOT Subject' doesn't make me believe these
two would be more useful that each rule individually.
Although 'From AND Subject' hits quite frequently, it doesn't have
less false positives or improved hit rate. Seems like 'From OR Subject'
covers most cases with good quality, which makes me suggest a single
DOUBLE_SUBJ_OR_FROM metarule, in favour of scoring each individual
DOUBLE_SUBJ / DOUBLE_FROM rules.
It would be interesting how automatic score assignment evaluates the rule.
As an illustration, there are two diagrams attached, the second one
is just a magnified left-hand side detail of the first one.
X-axis shows distribution (centiles) of all mail which hits each rule,
and y-axis is a score that SA assigned to a message (SA 3.1.2, all usual
network tests enables, bayes, razor, dcc, common SARE rules).
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
------- Additional Comments From Mark.Martinec@ijs.si 2006-05-29 16:40 -------
Created an attachment (id=3528)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3528&action=view)
Diagram: spam score vs. distribution for each proposed rule
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
felicity@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From felicity@apache.org 2006-11-22 19:30 -------
This one already existed, for Content-Type:
0.018 0.0205 0.0000 1.000 0.71 1.34 HEADER_COUNT_CTYPE
Here's my test rules:
0.115 0.1333 0.0000 1.000 1.00 1.00 HEADER_COUNT_SUBJECT
0.015 0.0000 0.1064 0.000 0.00 1.00 HEADER_COUNT_FROM
So I'm adding in the SUBJECT rule, but leaving out the FROM rule.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
vseerror@lehigh.edu changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vseerror@lehigh.edu
------- Additional Comments From vseerror@lehigh.edu 2006-11-02 12:54 -------
in addition, see bug 1239. spammers are using multiple subject lines to fool
some mail clients that don't consistently use the first subject line.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
------- Additional Comments From peter@unlikejam.dreamhost.com 2006-05-29 22:46 -------
Related to this rule, currently emails that have illegal multiple "from" headers can cause dud
entries in the AWL table. These entries may have spaces or special characters like <. I tracked
down the related code at one stage but don't have it handy (think it related to a join, with spaces,
where it assumed there was only one array entry for the requested header). I guess this should be
a seperate AWL related bug.
mysql> select email from awl where email like "% %";
meung@ultraz.dk <meung meung@ultraz.dk
_pickofthemonth2972@awplay.com <pickofthemonth2972_@inpuj.com
pickofthemonth2972@awplay.com
_jason _@example.com
_cedric roman stagy_@umpire.com
_ommonwealth bankcustomersupport_@no.domain.name.given
5 rows in set (0.10 sec)
Sample headers from an offending email:
To: superjoker <Su...@ultraz.dk>
Subject: Meung@ultraz.dk
From: Meung@ultraz.dk <Meung
Content-Type: multipart/mixed; boundary=13642f531fca8abc354f6fadb0b9b531
MIME-Version: 1.0
From: Meung@ultraz.dk
Subject: The hottest end of the year issue
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
------- Additional Comments From Mark.Martinec@ijs.si 2006-05-29 16:41 -------
Created an attachment (id=3529)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3529&action=view)
detail of the previous attachment, magnified low-end scores
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Re: [Bug 4927] New: Suggesting a rule to test for double Subject
or double From
Posted by Szalay Attila <sa...@balabit.hu>.
Hi All!
On Mon, 2006-05-29 at 16:38 +0000,
bugzilla-daemon@bugzilla.spamassassin.org wrote:
> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
>
> count multiple header fields present
> ----- ------------------------------
> 160 Subject a
> 173 From b
> 122 From AND Subject c
> 333 From OR Subject d
> 37 Subject AND NOT From e
> 52 From AND NOT Subject f
I know that it's not too important (for the original discussion) but
this numbers must be wrong.
Because there are some link between this numbers. I put a letter after
each line and I write this connections.
c + e = a (122 + 37 = 160 => 159 = 160)
c + f = b (122 + 52 = 173 => 174 = 173)
e + f + c = d (37 + 52 + 122 = 333 => 211 = 333)
The third one is why I write this letter. i could write some other
connection too (d - a = f, etc.) but those are just some reincarnation
of this three.
--
Szalay Attila BalaBit IT BiztonsƔgtechnikai Kft.
tel:(36-1)-371-05-40 1116 Bp. Csurgoi ut 20/b
fax:(36-1)-208-08-75 http://www.balabit.hu/
[Bug 4927] Suggesting a rule to test for double Subject or double From
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4927
felicity@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.2.0
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.