You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ha...@t-online.de on 2006/12/04 07:20:48 UTC
Re: New Rule: OE_MULTIPART_RELATED
>>
>> Hello list,
>>
>> For your consideration:
>>
>> header __MULTIPART_RELATED Content-Type =~ /multipart\/related/
>>
>> meta OE_MULTIPART_RELATED (__OE_MUA && __MULTIPART_RELATED)
>> describe OE_MULTIPART_RELATED Possible image spam forged as from MS Outlook
>>
>> The false Positive rate on my corpus is 0.1%. I can't tell you about the false
>> negative rate since I don't keep my spam (only my ham).
>>
>> This rule works very well on the pump-and-dump image spam that has been
>> escaping my spamassassin installation for the last few months. Although
>> Outlook Express is capable of generating messages with multipart/related MIME
>> type, it only does that if the user creates an HTML message with inline
>> images. This happens occasionally but rarely (hence the 0.1%). I expect the
>> perceptron might give this rule a score of perhaps +0.5, which is not enough
>> to catch the pump-and-dump image spam by itself, but works well in
>> conjunction with Mail::SpamAssassin::Plugin::ImageInfo.
>>
>> Thoughts on this rule?
>>
>> --Ian Turner
>>
Hi Ian,
this would trap mail using outlook "stationery".
I dont really like it, but I get it in wanted mail.
Generally I believe that rules scoring valid use of mail (cid addressing, mime types) should
be avoided - unless you want to block, e.g., mails with images or mails sent from outlook
generally
Rather try to find a subtle difference in the way real outlook builds the message and the
spammers do it, that would really reveal it is not from outlook
Wolfgang Hamann
Re: New Rule: OE_MULTIPART_RELATED
Posted by Ian Turner <ve...@vectro.org>.
Followup on my earlier message...
On Monday 04 December 2006 11:11, Ian Turner wrote:
> Yup. All of the FPs in my corpus are outlook messages with inline images.
> But it turns out that some of those are also spam; the actual FP rate is
The actual FP rate, eliminating false false positives (e.g., after corpus
cleaning) is 4 messages in 4773, or 0.08%.
> That's what I'm trying to do, but this particular spammer seems to have
> been very careful (or really used outlook to generate the message) -- it
> seems to match exactly, at least at the MIME and RFC822 layers. I'm looking
> into HTML now.
A careful review of HTML messages from this class of spam and HTML messages
from my corpus reveals nothing distinctive about the spam; the message
template was almost certainly generated using Outlook Express itself. The
rule I've already suggested (OE_MULTIPART_RELATED) is the most distinctive
aspect I can find, barring any analysis of the image itself (which I leave to
the ImageInfo or OCR plugins).
Cheers,
--Ian
Re: New Rule: OE_MULTIPART_RELATED
Posted by Ian Turner <ve...@vectro.org>.
On Monday 04 December 2006 16:19, John D. Hardin wrote:
> On Mon, 4 Dec 2006, Ian Turner wrote:
> > When used in combination with, say, DC_GIF_UNO_LARGO,
> > RCVD_IN_NJABL_DUL, and RCVD_IN_BL_SPAMCOP_NET, this rule can help
> > make a more solid prediction.
>
> The perceptron doesn't create meta rules, does it?
Nope, although you can always create them and see what score it gives them.
But what I actually meant when I said "in combination" was not meta rules,
but simply the sum-of-scores rule aggregation that spamassassin already does.
Each of the rules may provide the suggestion of spam, but most rules are not
scored high enough to mark an e-mail as spam on their own -- several rules
must match in order to make a "spam" decision.
Cheers,
--Ian Turner
Re: New Rule: OE_MULTIPART_RELATED
Posted by "John D. Hardin" <jh...@impsec.org>.
On Mon, 4 Dec 2006, Ian Turner wrote:
> When used in combination with, say, DC_GIF_UNO_LARGO,
> RCVD_IN_NJABL_DUL, and RCVD_IN_BL_SPAMCOP_NET, this rule can help
> make a more solid prediction.
The perceptron doesn't create meta rules, does it?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...to announce there must be no criticism of the President or to
stand by the President right or wrong is not only unpatriotic and
servile, but is morally treasonous to the American public.
-- Theodore Roosevelt, 1918
-----------------------------------------------------------------------
11 days until Bill of Rights day
Re: New Rule: OE_MULTIPART_RELATED
Posted by Ian Turner <ve...@vectro.org>.
On Monday 04 December 2006 01:20, hamann.w@t-online.de wrote:
> this would trap mail using outlook "stationery".
> I dont really like it, but I get it in wanted mail.
Yup. All of the FPs in my corpus are outlook messages with inline images. But
it turns out that some of those are also spam; the actual FP rate is
> Generally I believe that rules scoring valid use of mail (cid addressing,
> mime types) should be avoided
Actually, I disagree -- we already have lots of rules that match valid use of
mail, such as CHARSET_FARAWAY, DOMAIN_RATIO, NO_REAL_NAME, TO_EMPTY, and
nearly all of the SUBJ_ rules.
A spamassassin rule need not stand alone; it still has predictive power when
used in combination with other rules, as long as it shows a statistically
significant difference in spam/ham hit-rates. We use the perceptron to figure
out exactly /how much/ predictive power it has.
When used in combination with, say, DC_GIF_UNO_LARGO, RCVD_IN_NJABL_DUL, and
RCVD_IN_BL_SPAMCOP_NET, this rule can help make a more solid prediction.
> Rather try to find a subtle difference in the way real outlook builds the
> message and the spammers do it, that would really reveal it is not from
> outlook
That's what I'm trying to do, but this particular spammer seems to have been
very careful (or really used outlook to generate the message) -- it seems to
match exactly, at least at the MIME and RFC822 layers. I'm looking into HTML
now.
Cheers,
--Ian