You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/14 20:25:17 UTC
[Bug 3271] New: new MIME parser FPs much more often on Mailman admin messages
http://bugzilla.spamassassin.org/show_bug.cgi?id=3271
Summary: new MIME parser FPs much more often on Mailman admin
messages
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: major
Priority: P5
Component: Libraries
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: jm@jmason.org
Mailman 2.1.x has a (nifty) new feature. When a list is set to require admin
approval for non-members to post, it'll send the moderation-required message in
this format:
From: list-owner@example.com
Subject: blah post from hywworqlgwq@summitoh.net requires approval
Content-type: multipart/mixed ...
The multipart/mixed parts are:
[text/plain]: a brief "please authorize this posting" msg
[message/rfc822]: the original message
[message/rfc822]: an approval message suitable for use as response
This is great for list moderation to fend off spam.
Now, the problem is -- in 2.63 this was fine, and got through no problem,
presumably because of limitations in the 2.6x MIME parser. However, I've *just*
installed 3.0.0svn on my server for dogfooding, and it doesn't handle them at
all well; every single 'requires approval' message that related to a spam has
been caught as spam.
It looks like the new MIME parser is descending into the message/rfc822 part.
Here's the rules hit from one msg:
X-spam-report:
* 0.2 NO_REAL_NAME From: does not include a real name
* 1.0 HTML_OBFUSCATE_20_30 BODY: Message is 20% to 30% HTML obfuscation
* 0.0 HTML_10_20 BODY: Message is 10% to 20% HTML
* 1.2 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
* 1.0 HTML_BADTAG_40_50 BODY: HTML message is 40% to 50% bad tags
* -0.0 BAYES_44 BODY: Bayesian spam probability is 44 to 50%
* [score: 0.5000]
* 3.0 MPART_ALT_DIFF BODY: HTML and text parts are different
* 1.0 HTML_NONELEMENT_60_70 BODY: 60% to 70% of HTML elements are non-standard
* 0.1 HTML_MESSAGE BODY: HTML included in message
* 0.6 MIME_HTML_NO_CHARSET RAW: Message text in HTML without charset
* 1.0 URIBL_SBL Contains a URL listed in the SBL blocklist
* [URIs: monnsid.com]
* 1.0 LONGWORDS Long string of long words
* -1.8 AWL AWL: From: address is in the auto white-list
X-spam-status: Yes, score=8.2 required=5.0 tests=AWL,BAYES_44,HTML_10_20,
HTML_BADTAG_40_50,HTML_MESSAGE,HTML_NONELEMENT_60_70,
HTML_OBFUSCATE_20_30,LONGWORDS,MIME_HTML_MOSTLY,MIME_HTML_NO_CHARSET,
MPART_ALT_DIFF,NO_REAL_NAME,URIBL_SBL autolearn=no version=3.0.0-r9952
(msg attached)
I've manually whitelisted my list admin addresses to work around this, but I do
get a stack of spam directly to those addrs as well, so that's nonoptimal,
kludgy, requires user configuration, therefore not good.
IMO it'd be better to just not descend into message/rfc822 parts. After all,
*WE* use message/rfc822 as a "safe" encapsulation format, ourselves!
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.