You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Sidney Markowitz <si...@sidney.com> on 2007/05/01 21:22:24 UTC

Question about EXTRA_MPART_TYPE FP

I just noticed some ham email that was composed using Outlook Express
that triggers EXTRA_MPART_TYPE and I was curious.

Can someone easily check a corpus or the mass check data to see what the
stats are for

 EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE

compared to plain EXTRA_MPART_TYPE?

In other words, how much spam that has EXTRA_MPART_TYPE really is from
Outlook Express? How much of the FPs of EXTRA_MPART_TYPE do or do not
come from Outlook Express?

Sorry that I have yet to become failiar with manipulating our saved data
to get these kinds of stats easily.

Thanks,

 -- sidney


Re: Question about EXTRA_MPART_TYPE FP

Posted by Sidney Markowitz <si...@sidney.com>.
Sidney Markowitz wrote, On 2/5/07 7:22 AM:
>  EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE

I've come up with some information and some questions about this after
looking at the results of a set of rules T_SIDNEY_* that I put into my
sandbox.

Here is the situation: EXTRA_MPART_TYPE looks for a Content-Type header
that contains both a content-type multipart/ specification and another
"type=" content-type specification. At first glance that seems wrong and
redundant and a good spam sign given it's good S/O ratio and rank.

However, it turns out that RFC 2387 specifies Content-Type
multipart/related as having a type= field that describes the
content-type of its root MIME section. The EXTRA_MPART_TYPE rule will
fire on any RFC-compliant multipart/related message. It is the correct
MIME type to use for a message that includes components referenced by
other components. The common example would be an HTML message that
includes images that are not external links.

Please look at past discussion on this list and in bug 5224 about
OE_MULTIPART_RELATED. That rule was proposed in that bug and turned out
to have a good S/O ratio. However, it was pointed out that there are
legitimate emails that trigger it and there are no signs that can be
used to distinguish the multipart/related header of Outlook Express mail
that is spam and that is ham. The end result of the discussion was that
Justin agreed that the rule should not be promoted out of testing.

Which brings me to EXTRA_MPART_TYPE. That rule also matches something
which is legitimate RFC-compliant recommended usage when you want to
send HTML mail with embedded images. If it doesn't get quite as good S/O
as OE_MULTIPART_RELATED it's perhaps because there is a bit more ham
that does that without using OE or forged OE. That does mean that you
would see a more accurate slightly lower S/O for OE_MULTIPART_RELATED by
removing from the hits anything that also hit FORGED_OE.

So should we really be using the EXTRA_MPART_TYPE rule?

To get a more fine-grained idea about what is going on with it, see the
T_SIDNEY* rules from my sandbox. The names show what they are testing,
with "OE" meaning Outlook Express excluding forged OE, HTML matching
messages with HTML, EMPT meaning messages that match EXTRA_MPART_TYPE,
and an "N" prefix to any of those three being a "Not".

I also just added T_SIDNEY_EMPT_NMPREL, T_SIDNEY_OE_EMPT_NMPREL,
T_SIDNEY_NOE_EMPT_NMPREL to see if there are any EXTRA_MPART_TYPE emails
that are not actually RFC2387 multipart/related messages. That hasn't
been run through mass test yet as I type this.

 -- sidney

Re: Question about EXTRA_MPART_TYPE FP

Posted by Sidney Markowitz <si...@sidney.com>.
Daryl C. W. O'Shea wrote, On 2/5/07 7:48 AM:
> T_SIDNEY_20070501B

You can remove the two rules you set up for me. I've got my own sandbox
to play in now.

Thanks,

 -- sidney

Re: Question about EXTRA_MPART_TYPE FP

Posted by Sidney Markowitz <si...@sidney.com>.
Daryl C. W. O'Shea wrote, On 2/5/07 7:48 AM:
> It's easy to create your own sandbox, too, BTW.  Just checkout:
> 
> https://svn.apache.org/repos/asf/spamassassin/rules/trunk/sandbox
> 
> create a 'sidney' directory in the checkout and place .cf files in it.

I just read up about it on the wiki. I've always concentrated on the
code, but I see we've set things up to make rule hacking very easy. Now
I know how to do it for myself next time.

Thanks!

 -- sidney

Re: Question about EXTRA_MPART_TYPE FP

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Sidney Markowitz wrote:
> Daryl C. W. O'Shea wrote, On 2/5/07 7:34 AM:
>> I checked meta T_SIDNEY_20070501 into my sandbox for you.  Probably 
>> easiest to just check out the results for that tomorrow.
> 
> Ok, if it's that easy, could you also check in one for
> 
>  EXTRA_MPART_TYPE && ( ! __OE_MUA || __FORGED_OE )
> 
> That way we'll have the direct comparison right there.
> 
> Thanks,
> 
>  -- sidney
> 

T_SIDNEY_20070501B


It's easy to create your own sandbox, too, BTW.  Just checkout:

https://svn.apache.org/repos/asf/spamassassin/rules/trunk/sandbox

create a 'sidney' directory in the checkout and place .cf files in it.


Daryl

Re: Question about EXTRA_MPART_TYPE FP

Posted by Sidney Markowitz <si...@sidney.com>.
Daryl C. W. O'Shea wrote, On 2/5/07 7:34 AM:
> I checked meta T_SIDNEY_20070501 into my sandbox for you.  Probably 
> easiest to just check out the results for that tomorrow.

Ok, if it's that easy, could you also check in one for

 EXTRA_MPART_TYPE && ( ! __OE_MUA || __FORGED_OE )

That way we'll have the direct comparison right there.

Thanks,

 -- sidney

Re: Question about EXTRA_MPART_TYPE FP

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Sidney Markowitz wrote:
> I just noticed some ham email that was composed using Outlook Express
> that triggers EXTRA_MPART_TYPE and I was curious.
> 
> Can someone easily check a corpus or the mass check data to see what the
> stats are for
> 
>  EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE
> 
> compared to plain EXTRA_MPART_TYPE?
> 
> In other words, how much spam that has EXTRA_MPART_TYPE really is from
> Outlook Express? How much of the FPs of EXTRA_MPART_TYPE do or do not
> come from Outlook Express?
> 
> Sorry that I have yet to become failiar with manipulating our saved data
> to get these kinds of stats easily.
> 
> Thanks,
> 
>  -- sidney
> 

I checked meta T_SIDNEY_20070501 into my sandbox for you.  Probably 
easiest to just check out the results for that tomorrow.

Daryl