You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/14 20:25:17 UTC
[Bug 3271] New: new MIME parser FPs much more often on Mailman admin messages

http://bugzilla.spamassassin.org/show_bug.cgi?id=3271

           Summary: new MIME parser FPs much more often on Mailman admin
                    messages
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Libraries
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: jm@jmason.org


Mailman 2.1.x has a (nifty) new feature.  When a list is set to require admin
approval for non-members to post, it'll send the moderation-required message in
this format:

From: list-owner@example.com
Subject: blah post from hywworqlgwq@summitoh.net requires approval
Content-type: multipart/mixed ...

The multipart/mixed parts are:

   [text/plain]: a brief "please authorize this posting" msg
   [message/rfc822]: the original message
   [message/rfc822]: an approval message suitable for use as response

This is great for list moderation to fend off spam.

Now, the problem is -- in 2.63 this was fine, and got through no problem,
presumably because of limitations in the 2.6x MIME parser.  However, I've *just*
installed 3.0.0svn on my server for dogfooding, and it doesn't handle them at
all well; every single 'requires approval' message that related to a spam has
been caught as spam.

It looks like the new MIME parser is descending into the message/rfc822 part. 
Here's the rules hit from one msg:

X-spam-report: 
	*  0.2 NO_REAL_NAME From: does not include a real name
	*  1.0 HTML_OBFUSCATE_20_30 BODY: Message is 20% to 30% HTML obfuscation
	*  0.0 HTML_10_20 BODY: Message is 10% to 20% HTML
	*  1.2 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
	*  1.0 HTML_BADTAG_40_50 BODY: HTML message is 40% to 50% bad tags
	* -0.0 BAYES_44 BODY: Bayesian spam probability is 44 to 50%
	*      [score: 0.5000]
	*  3.0 MPART_ALT_DIFF BODY: HTML and text parts are different
	*  1.0 HTML_NONELEMENT_60_70 BODY: 60% to 70% of HTML elements are non-standard
	*  0.1 HTML_MESSAGE BODY: HTML included in message
	*  0.6 MIME_HTML_NO_CHARSET RAW: Message text in HTML without charset
	*  1.0 URIBL_SBL Contains a URL listed in the SBL blocklist
	*      [URIs: monnsid.com]
	*  1.0 LONGWORDS Long string of long words
	* -1.8 AWL AWL: From: address is in the auto white-list
X-spam-status: Yes, score=8.2 required=5.0 tests=AWL,BAYES_44,HTML_10_20,
	HTML_BADTAG_40_50,HTML_MESSAGE,HTML_NONELEMENT_60_70,
	HTML_OBFUSCATE_20_30,LONGWORDS,MIME_HTML_MOSTLY,MIME_HTML_NO_CHARSET,
	MPART_ALT_DIFF,NO_REAL_NAME,URIBL_SBL autolearn=no version=3.0.0-r9952

(msg attached)

I've manually whitelisted my list admin addresses to work around this, but I do
get a stack of spam directly to those addrs as well, so that's nonoptimal,
kludgy, requires user configuration, therefore not good.

IMO it'd be better to just not descend into message/rfc822 parts.  After all,
*WE* use message/rfc822 as a "safe" encapsulation format, ourselves!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.