You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/09/27 15:46:30 UTC
[Bug 5110] New: EXTRA_MPART_TYPE fires on valid multipart/related
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Summary: EXTRA_MPART_TYPE fires on valid multipart/related
Product: Spamassassin
Version: 3.1.5
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: nj@leverton.org
EXTRA_MPART_TYPE matches on Content-Type headers which also have a type=
parameter. However RFC 2387 specifies that "multipart/related" MUST have a
type parameter giving the MIME type of the root body part.
Around 10% of my hits on this rule are for Content-Type=multipart/related with
a type parameter, such as:
Content-Type: multipart/related; type="text/html"; boundary="..."
or
Content-Type: multipart/related; type="multipart/alternative";
In particular, M$ Exchange sometimes seems to make use of this format for
including background images. Much though I detest them, I'd prefer they
didn't contribute to a spam score !
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From nj@leverton.org 2006-09-27 14:09 -------
I don't understand why there's a check for zero or one matches on
'multipart' either in the RE for this rule. It seems to me that
(?:\s*multipart\/)?
will match any input (and testing the RE confirms that). Is this perhaps a
typo for excluding multiparts, or something else ?
May I suggest this rule for trial which excludes only multipart/related:
m/^\s*\b(?!multipart\/related\b).* type=/i
or perhaps this looser one which excludes all multiparts:
m/^\s*\b(?!multipart\/).* type=/i
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From felicity@apache.org 2007-01-09 23:32 -------
ok, I've removed the test rules from my sandbox.
Sending felicity/70_other.cf
Transmitting file data .
Committed revision 494754.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard| |check after perceptron
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From felicity@apache.org 2006-12-05 07:59 -------
Hrm. Interesting results:
22.807 25.2886 0.1464 0.994 1.00 1.00 EXTRA_MPART_TYPE2
26.188 29.0237 0.2928 0.990 0.67 1.00 EXTRA_MPART_TYPE3
19.882 22.0423 0.1464 0.993 0.00 0.85 EXTRA_MPART_TYPE
In order:
- content-type includes /\btype=/i
- content-type starts with multipart/related
- original
So the diff between the original and #2 is probably that the original looks for
" type=" and #2 will accept ";type=", so that's an easy win.
I just threw in #3 because I was curious. Essentially what this means is that
for my corpus, at least, multipart/related is very likely to be spam (and all of
the #2 hits also hit #3). Interestingly, the difference between 2 and 3 are
people ignoring RFC 2387. :(
So part of me wants to just use #3. While the 2x ham rate looks daunting, it
really means that instead of 2 ham hits, it was 4 ham hits.
Thoughts?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From jm@jmason.org 2006-12-05 08:09 -------
can you throw them into svn so that we can see how they do on all corpora?
this url will show it:
http://ruleqa.spamassassin.org/?daterev=last-night&rule=%2FEXTRA_MPART_TYPE
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
felicity@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.2.0
------- Additional Comments From felicity@apache.org 2006-09-27 15:09 -------
9.916 11.8320 0.5927 0.952 0.69 0.85 EXTRA_MPART_TYPE
So FWIW, both ham and spam hit on this, and it's basically always
multipart/related, so trying to avoid hitting on that type essentially means to
kill the rule. Perhaps there's a way to clean up the ham hits by looking at
both the type and the type="??" value. In general, I think this is just one of
those things that can definitely happen in ham but seems to be a lot more
prevalent in spam. The scores are relatively low though IMO, and they'd be
lower if representative mails were included in the next score generation run:
score EXTRA_MPART_TYPE 0.847 0.815 0.733 1.091
BTW: I'm also not sure why the multipart bit is in the RE:
Content-Type =~ /(?:\s*multipart\/)?.* type=/i
Seems strange.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From jm@jmason.org 2006-09-27 17:13 -------
hey --
looks to me like the /(?:\s*multipart\/)?/ thing may have started off being just
"multipart/", then the (?:...)? was added later. it's certainly superfluous.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Bug 5110 depends on bug 5270, which changed state.
Bug 5270 Summary: 3.2.0 rescoring
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
What |Old Value |New Value
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|check after perceptron |post-perceptron
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Bug 5110 depends on bug 5270, which changed state.
Bug 5270 Summary: 3.2.0 rescoring
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
What |Old Value |New Value
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Bug 5110 depends on bug 5270, which changed state.
Bug 5270 Summary: 3.2.0 rescoring
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
What |Old Value |New Value
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From jm@jmason.org 2006-12-28 09:53 -------
here's what they get in
http://ruleqa.spamassassin.org/?daterev=20061228-r490679-n&rule=%2FEXTRA_MPART_TYPE&srcpath=&s_detail=on&g=Change
:
0.00000 17.9181 0.1088 0.994 0.85 0.85 EXTRA_MPART_TYPE
0.00000 2.1400 0.0000 1.000 0.66 0.85 EXTRA_MPART_TYPE bb-doc
0.00000 10.1073 1.5162 0.870 0.79 0.85 EXTRA_MPART_TYPE bb-fredt
0.00000 26.8785 0.0120 1.000 0.99 0.85 EXTRA_MPART_TYPE bb-jm
0.00000 12.0507 0.0000 1.000 0.87 0.85 EXTRA_MPART_TYPE bb-zmi
0.00000 11.0359 0.2279 0.980 0.85 0.85 EXTRA_MPART_TYPE cthielen
0.00000 9.1298 0.0352 0.996 0.91 0.85 EXTRA_MPART_TYPE daf
0.00000 20.4588 0.0133 0.999 0.98 0.85 EXTRA_MPART_TYPE jm
0.00000 19.7498 0.1335 0.993 0.83 0.85 EXTRA_MPART_TYPE theo
0.00000 20.7805 0.2403 0.989 0.77 0.00 T_EXTRA_MPART_TYPE2
0.00000 2.2867 0.0000 1.000 0.67 0.00 T_EXTRA_MPART_TYPE2 bb-doc
0.00000 10.5674 2.2054 0.827 0.73 0.00 T_EXTRA_MPART_TYPE2 bb-fredt
0.00000 34.4643 0.1801 0.995 0.90 0.00 T_EXTRA_MPART_TYPE2 bb-jm
0.00000 13.9682 0.0000 1.000 0.88 0.00 T_EXTRA_MPART_TYPE2 bb-zmi
0.00000 14.6210 0.7443 0.952 0.77 0.00 T_EXTRA_MPART_TYPE2 cthielen
0.00000 12.2418 0.2114 0.983 0.86 0.00 T_EXTRA_MPART_TYPE2 daf
0.00000 27.3740 0.3070 0.989 0.86 0.00 T_EXTRA_MPART_TYPE2 jm
0.00000 22.3068 0.1590 0.993 0.82 0.00 T_EXTRA_MPART_TYPE2 theo
0.00000 26.3436 0.3369 0.987 0.73 0.00 T_EXTRA_MPART_TYPE3
0.00000 6.4333 0.0000 1.000 0.84 0.00 T_EXTRA_MPART_TYPE3 bb-doc
0.00000 20.3347 2.3432 0.897 0.80 0.00 T_EXTRA_MPART_TYPE3 bb-fredt
0.00000 39.6935 0.0760 0.998 0.96 0.00 T_EXTRA_MPART_TYPE3 bb-jm
0.00000 19.3598 0.0000 1.000 0.90 0.00 T_EXTRA_MPART_TYPE3 bb-zmi
0.00000 20.6144 1.3975 0.937 0.72 0.00 T_EXTRA_MPART_TYPE3 cthielen
0.00000 14.8961 0.0470 0.997 0.94 0.00 T_EXTRA_MPART_TYPE3 daf
0.00000 28.7010 0.1601 0.994 0.93 0.00 T_EXTRA_MPART_TYPE3 jm
0.00000 28.3078 0.3630 0.987 0.74 0.00 T_EXTRA_MPART_TYPE3 theo
to be honest, I'm quite worried about the high ham rates in some corpora;
I've been finding (and seeing reports on the users list) that a few new rules
are firing on legit Outlook Express mails (EXTRA_MPART_TYPE allegedly being one).
I think I'd stick with EXTRA_MPART_TYPE, and lower its score.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Bug 5110 depends on bug 5270, which changed state.
Bug 5270 Summary: 3.2.0 rescoring
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
What |Old Value |New Value
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
Bug 5110 depends on bug 5270, which changed state.
Bug 5270 Summary: 3.2.0 rescoring
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270
What |Old Value |New Value
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |5270
------- Additional Comments From jm@jmason.org 2007-01-05 05:30 -------
I think we can just comment/remove T_EXTRA_MPART_TYPE2, T_EXTRA_MPART_TYPE3
during the perceptron run, and see what score it gives to EXTRA_MPART_TYPE; if
it's judged too high, we may need to manually score it (low) and re-run the
perceptron to take that into account.
marking this as a dependency of 5270 accordingly.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
------- Additional Comments From jm@jmason.org 2007-02-24 13:08 -------
It's been given a much higher score by the GA:
score EXTRA_MPART_TYPE 2.501 2.636 1.359 1.404
as per comment 7, let's score that down to a manually-scored 1.0
and see how that affects accuracy. see bug 5270 for more details of that...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5110] EXTRA_MPART_TYPE fires on valid multipart/related
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2007-02-25 08:30 -------
ok -- set to 1.0, see bug 5270 for more details.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.