You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jo Rhett <jr...@lizardarts.com> on 2007/08/14 21:13:13 UTC

PDF rule not matching -- split line content type?

So I've been getting a metric ton of PDF spam.  Investigating the  
rule that is supposed to match this, I see

rawbody __TVD_BODY              /\S{4}/
header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP ||  
__TVD_MIME_ATT_AOPDF
meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP  
&& __TVD_MIME_ATT && !__TVD_BODY
describe TVD_PDF_FINGER01       Mail matches standard pdf spam  
fingerprint

mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet- 
stream.*\.pdf/i

The following message appears to match perfectly with this, except  
for perhaps that the content type is spread across two lines?  I  
haven't checked the code, but would this matter?

Return-Path: <Yo...@nic.za.net>
Received: from mail.netconsonance.com ([unix socket])
	 by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
	 Tue, 14 Aug 2007 06:27:16 -0700
Received: from [84.21.29.58] ([84.21.29.58])
	by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id l7EDR4UU095951
	for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
	(envelope-from Yohann@nic.za.net)
X-Virus-Scanned: amavisd-new at netconsonance.com
X-Spam-Score: 2.033
X-Spam-Level: **
X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
	tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
	MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] with  
Microsoft SMTPSVC(6.0.3790.1830);
	Tue, 14 Aug 2007 15:27:01 +0200
Message-ID: <00...@x6of7ca27m39al>
From: "Yohann michels" <Yo...@nic.za.net>
To: jrhett@lizardarts.com
Subject: bill-jrhett
Date: Tue, 14 Aug 2007 15:26:28 +0200
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138


------=_NextPart_000_000E_01C7DE87.7C1E24D0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"


------=_NextPart_001_000F_01C7DE87.7C1E24D0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1250


------=_NextPart_001_000F_01C7DE87.7C1E24D0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1250

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dwindows-1250">
<META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>

------=_NextPart_001_000F_01C7DE87.7C1E24D0--

------=_NextPart_000_000E_01C7DE87.7C1E24D0
Content-Transfer-Encoding: base64
Content-Type: application/octet-stream;
	name=marketing-jrhett.pdf
Content-Disposition: attachment;
	filename=marketing-jrhett.pdf

JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAx 
NjU5
L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ 
+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg

*snip*


-- 
Jo Rhett




Re: PDF rule not matching -- split line content type?

Posted by Theo Van Dinter <fe...@apache.org>.
The rawbody rule finds the text/html part as non-empty, so __TVD_BODY is
false, making the TVD_PDF_FINGER01 rule false.

On Tue, Aug 14, 2007 at 10:16:42PM -0700, Jo Rhett wrote:
> Can someone clue me in on why this rule isn't matching?
> 
> Jo Rhett wrote:
> >So I've been getting a metric ton of PDF spam.  Investigating the rule 
> >that is supposed to match this, I see
> >
> >rawbody __TVD_BODY              /\S{4}/
> >header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
> >meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF
> >meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
> >__TVD_MIME_ATT && !__TVD_BODY
> >describe TVD_PDF_FINGER01       Mail matches standard pdf spam fingerprint
> >
> >mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> >mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> >/^application\/octet-stream.*\.pdf/i
> >
> >The following message appears to match perfectly with this, except for 
> >perhaps that the content type is spread across two lines?  I haven't 
> >checked the code, but would this matter?
> >
> >Return-Path: <Yo...@nic.za.net>
> >Received: from mail.netconsonance.com ([unix socket])
> >     by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
> >     Tue, 14 Aug 2007 06:27:16 -0700
> >Received: from [84.21.29.58] ([84.21.29.58])
> >    by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id l7EDR4UU095951
> >    for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
> >    (envelope-from Yohann@nic.za.net)
> >X-Virus-Scanned: amavisd-new at netconsonance.com
> >X-Spam-Score: 2.033
> >X-Spam-Level: **
> >X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
> >    tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
> >    MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
> >Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] with 
> >Microsoft SMTPSVC(6.0.3790.1830);
> >    Tue, 14 Aug 2007 15:27:01 +0200
> >Message-ID: <00...@x6of7ca27m39al>
> >From: "Yohann michels" <Yo...@nic.za.net>
> >To: jrhett@lizardarts.com
> >Subject: bill-jrhett
> >Date: Tue, 14 Aug 2007 15:26:28 +0200
> >MIME-Version: 1.0
> >Content-Type: multipart/mixed;
> >    boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
> >X-Priority: 3
> >X-MSMail-Priority: Normal
> >X-Mailer: Microsoft Outlook Express 6.00.2900.3138
> >X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
> >
> >
> >------=_NextPart_000_000E_01C7DE87.7C1E24D0
> >Content-Type: multipart/alternative;
> >    boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
> >
> >
> >------=_NextPart_001_000F_01C7DE87.7C1E24D0
> >Content-Transfer-Encoding: quoted-printable
> >Content-Type: text/plain;
> >    charset=windows-1250
> >
> >
> >------=_NextPart_001_000F_01C7DE87.7C1E24D0
> >Content-Transfer-Encoding: quoted-printable
> >Content-Type: text/html;
> >    charset=windows-1250
> >
> ><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> ><HTML><HEAD>
> ><META http-equiv=3DContent-Type content=3D"text/html; =
> >charset=3Dwindows-1250">
> ><META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
> ><STYLE></STYLE>
> ></HEAD>
> ><BODY bgColor=3D#ffffff>
> ><DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
> >
> >------=_NextPart_001_000F_01C7DE87.7C1E24D0--
> >
> >------=_NextPart_000_000E_01C7DE87.7C1E24D0
> >Content-Transfer-Encoding: base64
> >Content-Type: application/octet-stream;
> >    name=marketing-jrhett.pdf
> >Content-Disposition: attachment;
> >    filename=marketing-jrhett.pdf
> >
> >JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
> >
> >L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
> >
> >
> >*snip*
> >
> >
> 
> 
> -- 
> Jo Rhett
> Net Consonance ... net philanthropy, open source and other randomness

-- 
Randomly Selected Tagline:
"Low probability events do happen, which is why people still play the lottery."
                                         - Elizabeth Zwicky at LISA '99

Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Sorry, Theo -- I keep working on the rule without changing the subject 
line.  The rule does have a different problem as I detailed below.

I'm trying to find a useful way to provide a different rule that matches 
what you identified in that one, but I haven't had the time this week to 
work on it.  (and lack a good testing ground, as mentioned in the 
previous message)

Theo Van Dinter wrote:
> FWIW, I responded a few days ago with an explanation of why the rule isn't
> hitting.  It has nothing to do with content-type headers and everything to do
> with the fact that the message body isn't empty, there's HTML content.
> 
> 
> On Thu, Aug 16, 2007 at 10:15:03AM +0100, Justin Mason wrote:
>> Jo --
>>
>> I've checked that in as 'TVD_PDF_FINGER01_JO'.  You can track its progress
>> at http://ruleqa.SpamAssassin.org .
>>
>> by the way -- it's pretty easy for you to test your own rules in your own
>> environment, actually, and I recommend you try it out.  These are the
>> tools we use:
>>
>>   http://wiki.apache.org/spamassassin/MassCheck
>>   http://wiki.apache.org/spamassassin/HitFrequencies
>>
>> They are bundled with SpamAssassin in the "masses" folder.  All the
>> documentation is there on the wiki.
>>
>> --j.
>>
>> Jo Rhett writes:
>>> Since nobody is paying attention, let me clarify.  The current rule is 
>>> wrong:
>>>
>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>> /^application\/octet-stream.*\.pdf/i
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>>> __TVD_MIME_ATT && !__TVD_BODY
>>>
>>> This evaluates to exactly the same as this:
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY
>>>
>>> I believe that the original rule's intent was this:
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY
>>>
>>> Can someone with commit rights please test and commit this change?
>>> Thank you.
>>>
>>> Jo Rhett wrote:
>>>> Well actually I think the rule has a bug.  Why OR the two mime types as 
>>>> a new meta, and then require one of the two in the final meta?   The net 
>>>> effect is that if ATT_TP is true it matches, but if ATT_AOPDF is true it 
>>>> will never match.
>>>>
>>>> I believe that the following will work better - work in every situation 
>>>> that it worked before, and not fail when the mime type is octet-stream:
>>>>    meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT && 
>>>> !__TVD_BODY
>>>>
>>>> Would someone kindly evaluate this change and possibly fix the rule?  
>>>> Thanks.
>>>>
>>>> On Aug 14, 2007, at 10:41 PM, Loren Wilton wrote:
>>>>>>> rawbody __TVD_BODY              /\S{4}/
>>>>> true
>>>>>
>>>>>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>>>>> true
>>>>>
>>>>>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>>>> false
>>>>>
>>>>>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>>>>>> /^application\/octet-stream.*\.pdf/i
>>>>> maybe true, maybe not.  I would hope newlines were translated to 
>>>>> spaces by the mimehdr plugin, but maybe they weren't.  Try /is instead 
>>>>> of /i and see if it helps.
>>>>>
>>>>>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
>>>>>>> __TVD_MIME_ATT_AOPDF
>>>>> maybe true
>>>>>
>>>>>>> meta TVD_PDF_FINGER01
>>>>>    __TVD_MIME_CT_MM
>>>>> true
>>>>>    && __TVD_MIME_ATT_TP
>>>>> undefined here, can't say
>>>>>    && __TVD_MIME_ATT
>>>>> maybe true
>>>>>    && !__TVD_BODY
>>>>> true
>>>>>
>>>>> So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if it 
>>>>> will fire, since that is part of an 'and'.  If I assume it to be true 
>>>>> then I'm still not sure because of the multiline possibility in 
>>>>> __TVD_MIME_ATT.
>>>>>
>>>>>        Loren
>>>>>
>>>>>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>>>>>>> fingerprint
>>>>>
>>>>> ----- Original Message ----- From: "Jo Rhett" <jr...@netconsonance.com>
>>>>> To: "SpamAssassin Users" <us...@spamassassin.apache.org>
>>>>> Sent: Tuesday, August 14, 2007 10:16 PM
>>>>> Subject: Re: PDF rule not matching -- split line content type?
>>>>>
>>>>>
>>>>>> Can someone clue me in on why this rule isn't matching?
>>>>>>
>>>>>> Jo Rhett wrote:
>>>>>>> So I've been getting a metric ton of PDF spam.  Investigating the 
>>>>>>> rule that is supposed to match this, I see
>>>>>>>
>>>>>>> rawbody __TVD_BODY              /\S{4}/
>>>>>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>>>>>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
>>>>>>> __TVD_MIME_ATT_AOPDF
>>>>>>> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && 
>>>>>>> __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY
>>>>>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>>>>>>> fingerprint
>>>>>>>
>>>>>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>>>>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>>>>>> /^application\/octet-stream.*\.pdf/i
>>>>>>>
>>>>>>> The following message appears to match perfectly with this, except 
>>>>>>> for perhaps that the content type is spread across two lines?  I 
>>>>>>> haven't checked the code, but would this matter?
>>>>>>>
>>>>>>> Return-Path: <Yo...@nic.za.net>
>>>>>>> Received: from mail.netconsonance.com ([unix socket])
>>>>>>>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
>>>>>>>      Tue, 14 Aug 2007 06:27:16 -0700
>>>>>>> Received: from [84.21.29.58] ([84.21.29.58])
>>>>>>>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id 
>>>>>>> l7EDR4UU095951
>>>>>>>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
>>>>>>>     (envelope-from Yohann@nic.za.net)
>>>>>>> X-Virus-Scanned: amavisd-new at netconsonance.com
>>>>>>> X-Spam-Score: 2.033
>>>>>>> X-Spam-Level: **
>>>>>>> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
>>>>>>>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
>>>>>>>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
>>>>>>> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] 
>>>>>>> with Microsoft SMTPSVC(6.0.3790.1830);
>>>>>>>     Tue, 14 Aug 2007 15:27:01 +0200
>>>>>>> Message-ID: <00...@x6of7ca27m39al>
>>>>>>> From: "Yohann michels" <Yo...@nic.za.net>
>>>>>>> To: jrhett@lizardarts.com
>>>>>>> Subject: bill-jrhett
>>>>>>> Date: Tue, 14 Aug 2007 15:26:28 +0200
>>>>>>> MIME-Version: 1.0
>>>>>>> Content-Type: multipart/mixed;
>>>>>>>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
>>>>>>> X-Priority: 3
>>>>>>> X-MSMail-Priority: Normal
>>>>>>> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
>>>>>>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
>>>>>>>
>>>>>>>
>>>>>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>>>>>> Content-Type: multipart/alternative;
>>>>>>>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
>>>>>>>
>>>>>>>
>>>>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>> Content-Type: text/plain;
>>>>>>>     charset=windows-1250
>>>>>>>
>>>>>>>
>>>>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>>>>>> Content-Transfer-Encoding: quoted-printable
>>>>>>> Content-Type: text/html;
>>>>>>>     charset=windows-1250
>>>>>>>
>>>>>>> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
>>>>>>> <HTML><HEAD>
>>>>>>> <META http-equiv=3DContent-Type content=3D"text/html; =
>>>>>>> charset=3Dwindows-1250">
>>>>>>> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
>>>>>>> <STYLE></STYLE>
>>>>>>> </HEAD>
>>>>>>> <BODY bgColor=3D#ffffff>
>>>>>>> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
>>>>>>>
>>>>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
>>>>>>>
>>>>>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>>>>>> Content-Transfer-Encoding: base64
>>>>>>> Content-Type: application/octet-stream;
>>>>>>>     name=marketing-jrhett.pdf
>>>>>>> Content-Disposition: attachment;
>>>>>>>     filename=marketing-jrhett.pdf
>>>>>>>
>>>>>>> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
>>>>>>> L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
>>>>>>> *snip*
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Jo Rhett
>>>>>> Net Consonance ... net philanthropy, open source and other randomness
>>>>>
>>>
>>> -- 
>>> Jo Rhett
>>> Net Consonance ... net philanthropy, open source and other randomness
> 


-- 
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness

Re: PDF rule not matching -- split line content type?

Posted by Theo Van Dinter <fe...@apache.org>.
FWIW, I responded a few days ago with an explanation of why the rule isn't
hitting.  It has nothing to do with content-type headers and everything to do
with the fact that the message body isn't empty, there's HTML content.


On Thu, Aug 16, 2007 at 10:15:03AM +0100, Justin Mason wrote:
> 
> Jo --
> 
> I've checked that in as 'TVD_PDF_FINGER01_JO'.  You can track its progress
> at http://ruleqa.SpamAssassin.org .
> 
> by the way -- it's pretty easy for you to test your own rules in your own
> environment, actually, and I recommend you try it out.  These are the
> tools we use:
> 
>   http://wiki.apache.org/spamassassin/MassCheck
>   http://wiki.apache.org/spamassassin/HitFrequencies
> 
> They are bundled with SpamAssassin in the "masses" folder.  All the
> documentation is there on the wiki.
> 
> --j.
> 
> Jo Rhett writes:
> > Since nobody is paying attention, let me clarify.  The current rule is 
> > wrong:
> > 
> > mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> > mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> > /^application\/octet-stream.*\.pdf/i
> > 
> > meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
> > __TVD_MIME_ATT && !__TVD_BODY
> > 
> > This evaluates to exactly the same as this:
> > 
> > meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY
> > 
> > I believe that the original rule's intent was this:
> > 
> > meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY
> > 
> > Can someone with commit rights please test and commit this change?
> > Thank you.
> > 
> > Jo Rhett wrote:
> > > Well actually I think the rule has a bug.  Why OR the two mime types as 
> > > a new meta, and then require one of the two in the final meta?   The net 
> > > effect is that if ATT_TP is true it matches, but if ATT_AOPDF is true it 
> > > will never match.
> > > 
> > > I believe that the following will work better - work in every situation 
> > > that it worked before, and not fail when the mime type is octet-stream:
> > >    meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT && 
> > > !__TVD_BODY
> > > 
> > > Would someone kindly evaluate this change and possibly fix the rule?  
> > > Thanks.
> > > 
> > > On Aug 14, 2007, at 10:41 PM, Loren Wilton wrote:
> > >>>> rawbody __TVD_BODY              /\S{4}/
> > >>
> > >> true
> > >>
> > >>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
> > >>
> > >> true
> > >>
> > >>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> > >>
> > >> false
> > >>
> > >>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> > >>>> /^application\/octet-stream.*\.pdf/i
> > >>
> > >> maybe true, maybe not.  I would hope newlines were translated to 
> > >> spaces by the mimehdr plugin, but maybe they weren't.  Try /is instead 
> > >> of /i and see if it helps.
> > >>
> > >>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
> > >>>> __TVD_MIME_ATT_AOPDF
> > >>
> > >> maybe true
> > >>
> > >>>> meta TVD_PDF_FINGER01
> > >>    __TVD_MIME_CT_MM
> > >> true
> > >>    && __TVD_MIME_ATT_TP
> > >> undefined here, can't say
> > >>    && __TVD_MIME_ATT
> > >> maybe true
> > >>    && !__TVD_BODY
> > >> true
> > >>
> > >> So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if it 
> > >> will fire, since that is part of an 'and'.  If I assume it to be true 
> > >> then I'm still not sure because of the multiline possibility in 
> > >> __TVD_MIME_ATT.
> > >>
> > >>        Loren
> > >>
> > >>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
> > >>>> fingerprint
> > >>
> > >>
> > >> ----- Original Message ----- From: "Jo Rhett" <jr...@netconsonance.com>
> > >> To: "SpamAssassin Users" <us...@spamassassin.apache.org>
> > >> Sent: Tuesday, August 14, 2007 10:16 PM
> > >> Subject: Re: PDF rule not matching -- split line content type?
> > >>
> > >>
> > >>> Can someone clue me in on why this rule isn't matching?
> > >>>
> > >>> Jo Rhett wrote:
> > >>>> So I've been getting a metric ton of PDF spam.  Investigating the 
> > >>>> rule that is supposed to match this, I see
> > >>>>
> > >>>> rawbody __TVD_BODY              /\S{4}/
> > >>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
> > >>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
> > >>>> __TVD_MIME_ATT_AOPDF
> > >>>> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && 
> > >>>> __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY
> > >>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
> > >>>> fingerprint
> > >>>>
> > >>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> > >>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> > >>>> /^application\/octet-stream.*\.pdf/i
> > >>>>
> > >>>> The following message appears to match perfectly with this, except 
> > >>>> for perhaps that the content type is spread across two lines?  I 
> > >>>> haven't checked the code, but would this matter?
> > >>>>
> > >>>> Return-Path: <Yo...@nic.za.net>
> > >>>> Received: from mail.netconsonance.com ([unix socket])
> > >>>>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
> > >>>>      Tue, 14 Aug 2007 06:27:16 -0700
> > >>>> Received: from [84.21.29.58] ([84.21.29.58])
> > >>>>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id 
> > >>>> l7EDR4UU095951
> > >>>>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
> > >>>>     (envelope-from Yohann@nic.za.net)
> > >>>> X-Virus-Scanned: amavisd-new at netconsonance.com
> > >>>> X-Spam-Score: 2.033
> > >>>> X-Spam-Level: **
> > >>>> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
> > >>>>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
> > >>>>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
> > >>>> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] 
> > >>>> with Microsoft SMTPSVC(6.0.3790.1830);
> > >>>>     Tue, 14 Aug 2007 15:27:01 +0200
> > >>>> Message-ID: <00...@x6of7ca27m39al>
> > >>>> From: "Yohann michels" <Yo...@nic.za.net>
> > >>>> To: jrhett@lizardarts.com
> > >>>> Subject: bill-jrhett
> > >>>> Date: Tue, 14 Aug 2007 15:26:28 +0200
> > >>>> MIME-Version: 1.0
> > >>>> Content-Type: multipart/mixed;
> > >>>>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
> > >>>> X-Priority: 3
> > >>>> X-MSMail-Priority: Normal
> > >>>> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
> > >>>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
> > >>>>
> > >>>>
> > >>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
> > >>>> Content-Type: multipart/alternative;
> > >>>>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
> > >>>>
> > >>>>
> > >>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
> > >>>> Content-Transfer-Encoding: quoted-printable
> > >>>> Content-Type: text/plain;
> > >>>>     charset=windows-1250
> > >>>>
> > >>>>
> > >>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
> > >>>> Content-Transfer-Encoding: quoted-printable
> > >>>> Content-Type: text/html;
> > >>>>     charset=windows-1250
> > >>>>
> > >>>> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> > >>>> <HTML><HEAD>
> > >>>> <META http-equiv=3DContent-Type content=3D"text/html; =
> > >>>> charset=3Dwindows-1250">
> > >>>> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
> > >>>> <STYLE></STYLE>
> > >>>> </HEAD>
> > >>>> <BODY bgColor=3D#ffffff>
> > >>>> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
> > >>>>
> > >>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
> > >>>>
> > >>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
> > >>>> Content-Transfer-Encoding: base64
> > >>>> Content-Type: application/octet-stream;
> > >>>>     name=marketing-jrhett.pdf
> > >>>> Content-Disposition: attachment;
> > >>>>     filename=marketing-jrhett.pdf
> > >>>>
> > >>>> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
> > >>>> L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
> > >>>> *snip*
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> -- 
> > >>> Jo Rhett
> > >>> Net Consonance ... net philanthropy, open source and other randomness
> > >>
> > >>
> > > 
> > 
> > 
> > -- 
> > Jo Rhett
> > Net Consonance ... net philanthropy, open source and other randomness

-- 
Randomly Selected Tagline:
"Premature optimisation is the root of all evil." - Knuth

Re: PDF rule not matching -- split line content type?

Posted by Chris Lear <ch...@laculine.com>.
Jo Rhett wrote:
> Chris Lear wrote:
>> * Jo Rhett wrote (16/08/07 07:41):
>>> Since nobody is paying attention
>>
>> Or they're asleep. Your messages were at 23:44 and 07:41 here.
>>
>>> , let me clarify.  The current rule is wrong:
>>>
>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>> /^application\/octet-stream.*\.pdf/i
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>>> __TVD_MIME_ATT && !__TVD_BODY
>>>
>>> This evaluates to exactly the same as this:
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>>> !__TVD_BODY
>>>
>>> I believe that the original rule's intent was this:
>>>
>>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY
>>
>> I don't think you're right.
>>
>> The rule looks like this to me:
>>
>> meta TVD_PDF_FINGER01
>> __TVD_MIME_CT_MM             # content-type is multi-part mixed
>> && __TVD_MIME_ATT_TP         # and has a text-plain part
>> && __TVD_MIME_ATT            # and has an attachment that is either
>>     __TVD_MIME_ATT_AP    # application/pdf
>>     __TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
>> && !__TVD_BODY               # and has no non-whitespace text content
>>
>> Your rule would seem to match anything with no non-whitespace text 
>> content regardless of whether or not a pdf was attached.
> 
> I did a full analysis of why the rule is broken, line by line in the 
> message you replied to.  But I'll do it again.
> 
> (dropping "__TVT_MIME_" for ease of typing)
> 
> ATT is a meta of ATT_AP *or* ATT_AOPDF.
> 
> But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 
> really it will only work if ATT_TP matches.  If ATT_A0PDF matches then 
> it won't match.
> 
> No go back up and read the text I quoted at the top.  Because if this is 
> the authors intent then you can shorten the rule, but I somehow don't 
> think so.

I read it. I think you got it wrong. The author's intent seems to accord 
with my analysis.

> 
>> I was looking into this very rule about 3 days ago, because of false 
>> positives (client mailing out auto-generated pdfs which are being 
>> rejected by messagelabs), and I found that spamassassin -D told me all 
>> I needed to know about why some e-mail hit this rule and some didn't.
> 
> Perhaps.  But maybe you have difficulty reading the line by line 
> analysis I posted below, hm?  I have ~200 messages here that are 100% 
> spam that would match the fixed rule, which seems to be the authors intent.
> 

As I say, I read it. It was clear from the start that you didn't 
understand why the rule wasn't firing (and TVD, the rule author, 
explained that). It also appeared to me that your rewrite of the rule 
was the result of a misreading of the logic (or a misunderstanding of 
multipart mime). I thought I could elucidate. I stand by my comments, 
except that I misread your rewrite and thought it was looking only for 
text/plain, whereas it's looking only for pdf mime parts. Theo has 
explained it all now anyway, so there's no more to add.

But forgive me. I should have known better than to step in to a Jo Rhett 
thread. I'll try not to do it again.

Chris

Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Theo Van Dinter wrote:
> On Thu, Aug 16, 2007 at 09:47:06AM -0700, Jo Rhett wrote:
>> (dropping "__TVT_MIME_" for ease of typing)
> 
> You just don't like typing my initials...  ;)

Honestly not.  I just skip common prefixes when typing ;-)

>> ATT is a meta of ATT_AP *or* ATT_AOPDF.
>> But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 

*smacks head*

Sorry, somewhere I got confused and mind-melded _AP and _TP into the 
same thing.  My bad.   Sorry about that.

> And since I haven't had time to pay attention to the newer spams, there could
> definitely be room for a ...02 which targets them.  :)

Time.  I want to write one, but I keep getting hauled off to do other 
things every time I sit down to try.


Re: PDF rule not matching -- split line content type?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Aug 16, 2007 at 09:47:06AM -0700, Jo Rhett wrote:
> (dropping "__TVT_MIME_" for ease of typing)

You just don't like typing my initials...  ;)

> ATT is a meta of ATT_AP *or* ATT_AOPDF.

Right.

> But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 
> really it will only work if ATT_TP matches.  If ATT_A0PDF matches then 
> it won't match.

Right, of course: that's the fingerprint.  It needs a text/plain
part as well as a PDF part (which could be either application/pdf or
application/octet-stream w/ a "pdf" filename).  If it only has one or
the other, we don't want to target it.

Scoring for just a PDF attachment is going to seriously FP.

Now arguably, the messages could now include no text/plain but a text/html
and a PDF, and the rule won't match that.  So perhaps just looking for
/^text\b/ would be more beneficial?  Also, as previously mentioned in the
thread, your mail has a text/plain, but a non-empty text/html which makes the
empty body check non-function -- I didn't want to write a plugin just to look
for an empty text/plain, so went the easy way w/ rawbody.

But anyway, the rule is doing what I intended it to do when it was written.
The rule is still working well, according to the nightly test results:

  1.537   2.2811   0.0000    1.000   0.85    0.00  TVD_PDF_FINGER01

And since I haven't had time to pay attention to the newer spams, there could
definitely be room for a ...02 which targets them.  :)

-- 
Randomly Selected Tagline:
"When you say 'I wrote a program that crashed Windows,' people just stare
 at you blankly and say 'Hey, I got those with the system, *for free*.'"
                      - Linus Torvalds

Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Chris Lear wrote:
> * Jo Rhett wrote (16/08/07 07:41):
>> Since nobody is paying attention
> 
> Or they're asleep. Your messages were at 23:44 and 07:41 here.
> 
>> , let me clarify.  The current rule is wrong:
>>
>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>> /^application\/octet-stream.*\.pdf/i
>>
>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>> __TVD_MIME_ATT && !__TVD_BODY
>>
>> This evaluates to exactly the same as this:
>>
>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>> !__TVD_BODY
>>
>> I believe that the original rule's intent was this:
>>
>> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY
> 
> I don't think you're right.
> 
> The rule looks like this to me:
> 
> meta TVD_PDF_FINGER01
> __TVD_MIME_CT_MM             # content-type is multi-part mixed
> && __TVD_MIME_ATT_TP         # and has a text-plain part
> && __TVD_MIME_ATT            # and has an attachment that is either
>     __TVD_MIME_ATT_AP    # application/pdf
>     __TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
> && !__TVD_BODY               # and has no non-whitespace text content
> 
> Your rule would seem to match anything with no non-whitespace text 
> content regardless of whether or not a pdf was attached.

I did a full analysis of why the rule is broken, line by line in the 
message you replied to.  But I'll do it again.

(dropping "__TVT_MIME_" for ease of typing)

ATT is a meta of ATT_AP *or* ATT_AOPDF.

But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 
really it will only work if ATT_TP matches.  If ATT_A0PDF matches then 
it won't match.

No go back up and read the text I quoted at the top.  Because if this is 
the authors intent then you can shorten the rule, but I somehow don't 
think so.

> I was looking into this very rule about 3 days ago, because of false 
> positives (client mailing out auto-generated pdfs which are being 
> rejected by messagelabs), and I found that spamassassin -D told me all I 
> needed to know about why some e-mail hit this rule and some didn't.

Perhaps.  But maybe you have difficulty reading the line by line 
analysis I posted below, hm?  I have ~200 messages here that are 100% 
spam that would match the fixed rule, which seems to be the authors intent.

-- 
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness

Re: PDF rule not matching -- split line content type?

Posted by Chris Lear <ch...@laculine.com>.
* Jo Rhett wrote (16/08/07 07:41):
> Since nobody is paying attention

Or they're asleep. Your messages were at 23:44 and 07:41 here.

>, let me clarify.  The current rule is 
> wrong:
> 
> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> /^application\/octet-stream.*\.pdf/i
> 
> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
> __TVD_MIME_ATT && !__TVD_BODY
> 
> This evaluates to exactly the same as this:
> 
> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY
> 
> I believe that the original rule's intent was this:
> 
> meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY

I don't think you're right.

The rule looks like this to me:

meta TVD_PDF_FINGER01
__TVD_MIME_CT_MM             # content-type is multi-part mixed
&& __TVD_MIME_ATT_TP         # and has a text-plain part
&& __TVD_MIME_ATT            # and has an attachment that is either
	__TVD_MIME_ATT_AP    # application/pdf
	__TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
&& !__TVD_BODY               # and has no non-whitespace text content

Your rule would seem to match anything with no non-whitespace text 
content regardless of whether or not a pdf was attached.

I was looking into this very rule about 3 days ago, because of false 
positives (client mailing out auto-generated pdfs which are being 
rejected by messagelabs), and I found that spamassassin -D told me all I 
needed to know about why some e-mail hit this rule and some didn't.

Chris

Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Since nobody is paying attention, let me clarify.  The current rule is 
wrong:

mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
/^application\/octet-stream.*\.pdf/i

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
__TVD_MIME_ATT && !__TVD_BODY

This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY

I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY

Can someone with commit rights please test and commit this change?
Thank you.

Jo Rhett wrote:
> Well actually I think the rule has a bug.  Why OR the two mime types as 
> a new meta, and then require one of the two in the final meta?   The net 
> effect is that if ATT_TP is true it matches, but if ATT_AOPDF is true it 
> will never match.
> 
> I believe that the following will work better - work in every situation 
> that it worked before, and not fail when the mime type is octet-stream:
>    meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT && 
> !__TVD_BODY
> 
> Would someone kindly evaluate this change and possibly fix the rule?  
> Thanks.
> 
> On Aug 14, 2007, at 10:41 PM, Loren Wilton wrote:
>>>> rawbody __TVD_BODY              /\S{4}/
>>
>> true
>>
>>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>>
>> true
>>
>>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>
>> false
>>
>>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>>> /^application\/octet-stream.*\.pdf/i
>>
>> maybe true, maybe not.  I would hope newlines were translated to 
>> spaces by the mimehdr plugin, but maybe they weren't.  Try /is instead 
>> of /i and see if it helps.
>>
>>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
>>>> __TVD_MIME_ATT_AOPDF
>>
>> maybe true
>>
>>>> meta TVD_PDF_FINGER01
>>    __TVD_MIME_CT_MM
>> true
>>    && __TVD_MIME_ATT_TP
>> undefined here, can't say
>>    && __TVD_MIME_ATT
>> maybe true
>>    && !__TVD_BODY
>> true
>>
>> So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if it 
>> will fire, since that is part of an 'and'.  If I assume it to be true 
>> then I'm still not sure because of the multiline possibility in 
>> __TVD_MIME_ATT.
>>
>>        Loren
>>
>>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>>>> fingerprint
>>
>>
>> ----- Original Message ----- From: "Jo Rhett" <jr...@netconsonance.com>
>> To: "SpamAssassin Users" <us...@spamassassin.apache.org>
>> Sent: Tuesday, August 14, 2007 10:16 PM
>> Subject: Re: PDF rule not matching -- split line content type?
>>
>>
>>> Can someone clue me in on why this rule isn't matching?
>>>
>>> Jo Rhett wrote:
>>>> So I've been getting a metric ton of PDF spam.  Investigating the 
>>>> rule that is supposed to match this, I see
>>>>
>>>> rawbody __TVD_BODY              /\S{4}/
>>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || 
>>>> __TVD_MIME_ATT_AOPDF
>>>> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && 
>>>> __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY
>>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>>>> fingerprint
>>>>
>>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>>>> /^application\/octet-stream.*\.pdf/i
>>>>
>>>> The following message appears to match perfectly with this, except 
>>>> for perhaps that the content type is spread across two lines?  I 
>>>> haven't checked the code, but would this matter?
>>>>
>>>> Return-Path: <Yo...@nic.za.net>
>>>> Received: from mail.netconsonance.com ([unix socket])
>>>>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
>>>>      Tue, 14 Aug 2007 06:27:16 -0700
>>>> Received: from [84.21.29.58] ([84.21.29.58])
>>>>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id 
>>>> l7EDR4UU095951
>>>>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
>>>>     (envelope-from Yohann@nic.za.net)
>>>> X-Virus-Scanned: amavisd-new at netconsonance.com
>>>> X-Spam-Score: 2.033
>>>> X-Spam-Level: **
>>>> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
>>>>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
>>>>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
>>>> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] 
>>>> with Microsoft SMTPSVC(6.0.3790.1830);
>>>>     Tue, 14 Aug 2007 15:27:01 +0200
>>>> Message-ID: <00...@x6of7ca27m39al>
>>>> From: "Yohann michels" <Yo...@nic.za.net>
>>>> To: jrhett@lizardarts.com
>>>> Subject: bill-jrhett
>>>> Date: Tue, 14 Aug 2007 15:26:28 +0200
>>>> MIME-Version: 1.0
>>>> Content-Type: multipart/mixed;
>>>>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
>>>> X-Priority: 3
>>>> X-MSMail-Priority: Normal
>>>> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
>>>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
>>>>
>>>>
>>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>>> Content-Type: multipart/alternative;
>>>>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
>>>>
>>>>
>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>>> Content-Transfer-Encoding: quoted-printable
>>>> Content-Type: text/plain;
>>>>     charset=windows-1250
>>>>
>>>>
>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>>> Content-Transfer-Encoding: quoted-printable
>>>> Content-Type: text/html;
>>>>     charset=windows-1250
>>>>
>>>> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
>>>> <HTML><HEAD>
>>>> <META http-equiv=3DContent-Type content=3D"text/html; =
>>>> charset=3Dwindows-1250">
>>>> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
>>>> <STYLE></STYLE>
>>>> </HEAD>
>>>> <BODY bgColor=3D#ffffff>
>>>> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
>>>>
>>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
>>>>
>>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>>> Content-Transfer-Encoding: base64
>>>> Content-Type: application/octet-stream;
>>>>     name=marketing-jrhett.pdf
>>>> Content-Disposition: attachment;
>>>>     filename=marketing-jrhett.pdf
>>>>
>>>> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
>>>> L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
>>>> *snip*
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Jo Rhett
>>> Net Consonance ... net philanthropy, open source and other randomness
>>
>>
> 


-- 
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness

Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Well actually I think the rule has a bug.  Why OR the two mime types  
as a new meta, and then require one of the two in the final meta?    
The net effect is that if ATT_TP is true it matches, but if ATT_AOPDF  
is true it will never match.

I believe that the following will work better - work in every  
situation that it worked before, and not fail when the mime type is  
octet-stream:
    meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM &&  
__TVD_MIME_ATT && !__TVD_BODY

Would someone kindly evaluate this change and possibly fix the rule?   
Thanks.

On Aug 14, 2007, at 10:41 PM, Loren Wilton wrote:
>>> rawbody __TVD_BODY              /\S{4}/
>
> true
>
>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>
> true
>
>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>
> false
>
>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/ 
>>> octet-stream.*\.pdf/i
>
> maybe true, maybe not.  I would hope newlines were translated to  
> spaces by the mimehdr plugin, but maybe they weren't.  Try /is  
> instead of /i and see if it helps.
>
>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP ||  
>>> __TVD_MIME_ATT_AOPDF
>
> maybe true
>
>>> meta TVD_PDF_FINGER01
>    __TVD_MIME_CT_MM
> true
>    && __TVD_MIME_ATT_TP
> undefined here, can't say
>    && __TVD_MIME_ATT
> maybe true
>    && !__TVD_BODY
> true
>
> So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if  
> it will fire, since that is part of an 'and'.  If I assume it to be  
> true then I'm still not sure because of the multiline possibility  
> in __TVD_MIME_ATT.
>
>        Loren
>
>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam  
>>> fingerprint
>
>
> ----- Original Message ----- From: "Jo Rhett"  
> <jr...@netconsonance.com>
> To: "SpamAssassin Users" <us...@spamassassin.apache.org>
> Sent: Tuesday, August 14, 2007 10:16 PM
> Subject: Re: PDF rule not matching -- split line content type?
>
>
>> Can someone clue me in on why this rule isn't matching?
>>
>> Jo Rhett wrote:
>>> So I've been getting a metric ton of PDF spam.  Investigating the  
>>> rule that is supposed to match this, I see
>>>
>>> rawbody __TVD_BODY              /\S{4}/
>>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP ||  
>>> __TVD_MIME_ATT_AOPDF
>>> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM &&  
>>> __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY
>>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam  
>>> fingerprint
>>>
>>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/ 
>>> octet-stream.*\.pdf/i
>>>
>>> The following message appears to match perfectly with this,  
>>> except for perhaps that the content type is spread across two  
>>> lines?  I haven't checked the code, but would this matter?
>>>
>>> Return-Path: <Yo...@nic.za.net>
>>> Received: from mail.netconsonance.com ([unix socket])
>>>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
>>>      Tue, 14 Aug 2007 06:27:16 -0700
>>> Received: from [84.21.29.58] ([84.21.29.58])
>>>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id  
>>> l7EDR4UU095951
>>>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700  
>>> (PDT)
>>>     (envelope-from Yohann@nic.za.net)
>>> X-Virus-Scanned: amavisd-new at netconsonance.com
>>> X-Spam-Score: 2.033
>>> X-Spam-Level: **
>>> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
>>>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
>>>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
>>> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58]  
>>> with Microsoft SMTPSVC(6.0.3790.1830);
>>>     Tue, 14 Aug 2007 15:27:01 +0200
>>> Message-ID: <00...@x6of7ca27m39al>
>>> From: "Yohann michels" <Yo...@nic.za.net>
>>> To: jrhett@lizardarts.com
>>> Subject: bill-jrhett
>>> Date: Tue, 14 Aug 2007 15:26:28 +0200
>>> MIME-Version: 1.0
>>> Content-Type: multipart/mixed;
>>>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
>>> X-Priority: 3
>>> X-MSMail-Priority: Normal
>>> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
>>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
>>>
>>>
>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>> Content-Type: multipart/alternative;
>>>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
>>>
>>>
>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>> Content-Transfer-Encoding: quoted-printable
>>> Content-Type: text/plain;
>>>     charset=windows-1250
>>>
>>>
>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>>> Content-Transfer-Encoding: quoted-printable
>>> Content-Type: text/html;
>>>     charset=windows-1250
>>>
>>> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
>>> <HTML><HEAD>
>>> <META http-equiv=3DContent-Type content=3D"text/html; =
>>> charset=3Dwindows-1250">
>>> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
>>> <STYLE></STYLE>
>>> </HEAD>
>>> <BODY bgColor=3D#ffffff>
>>> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
>>>
>>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
>>>
>>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>>> Content-Transfer-Encoding: base64
>>> Content-Type: application/octet-stream;
>>>     name=marketing-jrhett.pdf
>>> Content-Disposition: attachment;
>>>     filename=marketing-jrhett.pdf
>>>
>>> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEv 
>>> RSAxNjU5 L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ 
>>> +Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg *snip*
>>>
>>>
>>
>>
>> -- 
>> Jo Rhett
>> Net Consonance ... net philanthropy, open source and other randomness
>
>

-- 
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



Re: PDF rule not matching -- split line content type?

Posted by Loren Wilton <lw...@earthlink.net>.
>> rawbody __TVD_BODY              /\S{4}/

true

>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i

true

>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i

false

>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>> /^application\/octet-stream.*\.pdf/i

maybe true, maybe not.  I would hope newlines were translated to spaces by 
the mimehdr plugin, but maybe they weren't.  Try /is instead of /i and see 
if it helps.

>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF

maybe true

>> meta TVD_PDF_FINGER01
    __TVD_MIME_CT_MM
true
    && __TVD_MIME_ATT_TP
undefined here, can't say
    && __TVD_MIME_ATT
maybe true
    && !__TVD_BODY
true

So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if it will 
fire, since that is part of an 'and'.  If I assume it to be true then I'm 
still not sure because of the multiline possibility in __TVD_MIME_ATT.

        Loren

>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>> fingerprint


----- Original Message ----- 
From: "Jo Rhett" <jr...@netconsonance.com>
To: "SpamAssassin Users" <us...@spamassassin.apache.org>
Sent: Tuesday, August 14, 2007 10:16 PM
Subject: Re: PDF rule not matching -- split line content type?


> Can someone clue me in on why this rule isn't matching?
>
> Jo Rhett wrote:
>> So I've been getting a metric ton of PDF spam.  Investigating the rule 
>> that is supposed to match this, I see
>>
>> rawbody __TVD_BODY              /\S{4}/
>> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
>> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF
>> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
>> __TVD_MIME_ATT && !__TVD_BODY
>> describe TVD_PDF_FINGER01       Mail matches standard pdf spam 
>> fingerprint
>>
>> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
>> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
>> /^application\/octet-stream.*\.pdf/i
>>
>> The following message appears to match perfectly with this, except for 
>> perhaps that the content type is spread across two lines?  I haven't 
>> checked the code, but would this matter?
>>
>> Return-Path: <Yo...@nic.za.net>
>> Received: from mail.netconsonance.com ([unix socket])
>>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
>>      Tue, 14 Aug 2007 06:27:16 -0700
>> Received: from [84.21.29.58] ([84.21.29.58])
>>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id 
>> l7EDR4UU095951
>>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
>>     (envelope-from Yohann@nic.za.net)
>> X-Virus-Scanned: amavisd-new at netconsonance.com
>> X-Spam-Score: 2.033
>> X-Spam-Level: **
>> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
>>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
>>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
>> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] with 
>> Microsoft SMTPSVC(6.0.3790.1830);
>>     Tue, 14 Aug 2007 15:27:01 +0200
>> Message-ID: <00...@x6of7ca27m39al>
>> From: "Yohann michels" <Yo...@nic.za.net>
>> To: jrhett@lizardarts.com
>> Subject: bill-jrhett
>> Date: Tue, 14 Aug 2007 15:26:28 +0200
>> MIME-Version: 1.0
>> Content-Type: multipart/mixed;
>>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
>> X-Priority: 3
>> X-MSMail-Priority: Normal
>> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
>>
>>
>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>> Content-Type: multipart/alternative;
>>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
>>
>>
>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>> Content-Transfer-Encoding: quoted-printable
>> Content-Type: text/plain;
>>     charset=windows-1250
>>
>>
>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
>> Content-Transfer-Encoding: quoted-printable
>> Content-Type: text/html;
>>     charset=windows-1250
>>
>> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
>> <HTML><HEAD>
>> <META http-equiv=3DContent-Type content=3D"text/html; =
>> charset=3Dwindows-1250">
>> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
>> <STYLE></STYLE>
>> </HEAD>
>> <BODY bgColor=3D#ffffff>
>> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
>>
>> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
>>
>> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
>> Content-Transfer-Encoding: base64
>> Content-Type: application/octet-stream;
>>     name=marketing-jrhett.pdf
>> Content-Disposition: attachment;
>>     filename=marketing-jrhett.pdf
>>
>> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
>> L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
>> *snip*
>>
>>
>
>
> -- 
> Jo Rhett
> Net Consonance ... net philanthropy, open source and other randomness 



Re: PDF rule not matching -- split line content type?

Posted by Jo Rhett <jr...@netconsonance.com>.
Can someone clue me in on why this rule isn't matching?

Jo Rhett wrote:
> So I've been getting a metric ton of PDF spam.  Investigating the rule 
> that is supposed to match this, I see
> 
> rawbody __TVD_BODY              /\S{4}/
> header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
> meta __TVD_MIME_ATT             __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF
> meta TVD_PDF_FINGER01           __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && 
> __TVD_MIME_ATT && !__TVD_BODY
> describe TVD_PDF_FINGER01       Mail matches standard pdf spam fingerprint
> 
> mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
> mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
> /^application\/octet-stream.*\.pdf/i
> 
> The following message appears to match perfectly with this, except for 
> perhaps that the content type is spread across two lines?  I haven't 
> checked the code, but would this matter?
> 
> Return-Path: <Yo...@nic.za.net>
> Received: from mail.netconsonance.com ([unix socket])
>      by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
>      Tue, 14 Aug 2007 06:27:16 -0700
> Received: from [84.21.29.58] ([84.21.29.58])
>     by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id l7EDR4UU095951
>     for <jr...@lizardarts.com>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
>     (envelope-from Yohann@nic.za.net)
> X-Virus-Scanned: amavisd-new at netconsonance.com
> X-Spam-Score: 2.033
> X-Spam-Level: **
> X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
>     tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
>     MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
> Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] with 
> Microsoft SMTPSVC(6.0.3790.1830);
>     Tue, 14 Aug 2007 15:27:01 +0200
> Message-ID: <00...@x6of7ca27m39al>
> From: "Yohann michels" <Yo...@nic.za.net>
> To: jrhett@lizardarts.com
> Subject: bill-jrhett
> Date: Tue, 14 Aug 2007 15:26:28 +0200
> MIME-Version: 1.0
> Content-Type: multipart/mixed;
>     boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
> X-Priority: 3
> X-MSMail-Priority: Normal
> X-Mailer: Microsoft Outlook Express 6.00.2900.3138
> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
> 
> 
> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
> Content-Type: multipart/alternative;
>     boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"
> 
> 
> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/plain;
>     charset=windows-1250
> 
> 
> ------=_NextPart_001_000F_01C7DE87.7C1E24D0
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/html;
>     charset=windows-1250
> 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META http-equiv=3DContent-Type content=3D"text/html; =
> charset=3Dwindows-1250">
> <META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY bgColor=3D#ffffff>
> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
> 
> ------=_NextPart_001_000F_01C7DE87.7C1E24D0--
> 
> ------=_NextPart_000_000E_01C7DE87.7C1E24D0
> Content-Transfer-Encoding: base64
> Content-Type: application/octet-stream;
>     name=marketing-jrhett.pdf
> Content-Disposition: attachment;
>     filename=marketing-jrhett.pdf
> 
> JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 
> 
> L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg 
> 
> 
> *snip*
> 
> 


-- 
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness