You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Philip Prindeville <ph...@redfish-solutions.com> on 2022/05/10 22:10:23 UTC

Rule to detect non-standard headers that aren't X- prefixed

Anyone have a rule to detect the following nonsense headers seen in this message I got?

Return-Path: <co...@uakron.edu>
Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
	by mail (envelope-sender <co...@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
	for <xy...@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
To: "xyzzy@redfish-solutions.com" <xy...@redfish-solutions.com>
From: "Nabil, Home Depot" <co...@uakron.edu>
Message-ID: <35...@uakron.edu>
Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
Minicomputers-Exhume: sides
Subject: Nabil, 1 searches this week
Malthus-Films: 88976dea
List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
Parasitic-Homogeneity: db5da28ba3e69a
MIME-Version: 1.0
Capitalizations-Grievously: oilers
Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"

Obviously, the following bogus header names are present:

Minicomputers-Exhume
Malthus-Films
Parasitic-Homogeneity
Capitalizations-Grievously

The list of legitimate headers is quite small, per RFC-2822 Section 3.6 and 3.6.7 (odd that 3.6.8 doesn't call out the X-* requirement).

I'd like to fingerprint messages based on non-standard header names.

Has anyone undertaken this already?  I tried playing with:

header __L_NON_STD_HEADERS      ALL !~ /^(Return-Path|Received|Resent-Date|Resent-From|Resent-Sender|Resent-To|Resent-Cc|Resent-Bcc|Resent-Message-ID|Date|From|Sender|Reply-To|To|Cc|Bcc|Message-ID|In-Reply-To|References|Subject|Comments|Keywords|Content-Type|Content-Transfer-Encoding|MIME-Version|DKIM-Signature|X-([A-Z][a-z]+(-[A-Z][a-z]*)*))\:/m

But that will only match if *none* of the headers are standard ones, so that won't work... I really need to examine the headers one-by-one.

Thanks,

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 11, 2022, at 9:24 AM, John Hardin <jh...@impsec.org> wrote:
> 
> On Tue, 10 May 2022, Philip Prindeville wrote:
> 
>> Anyone have a rule to detect the following nonsense headers seen in this message I got?
>> 
>> Return-Path: <co...@uakron.edu>
>> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
>> 	by mail (envelope-sender <co...@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
>> 	for <xy...@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
>> To: "xyzzy@redfish-solutions.com" <xy...@redfish-solutions.com>
>> From: "Nabil, Home Depot" <co...@uakron.edu>
>> Message-ID: <35...@uakron.edu>
>> Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
>> Minicomputers-Exhume: sides
>> Subject: Nabil, 1 searches this week
>> Malthus-Films: 88976dea
>> List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
>> Parasitic-Homogeneity: db5da28ba3e69a
>> MIME-Version: 1.0
>> Capitalizations-Grievously: oilers
>> Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"
>> 
>> Obviously, the following bogus header names are present:
>> 
>> Minicomputers-Exhume
>> Malthus-Films
>> Parasitic-Homogeneity
>> Capitalizations-Grievously
> 
> Take a look at __RAND_HEADER and RAND_HEADER_MANY
> 
> 

For my test messages, __RAND_HEADER_MANY isn't firing.

Also, Return-Path: is listed in RFC-2822, and many delivering (terminal) MTA's add it, including Sendmail.

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by John Hardin <jh...@impsec.org>.
On Tue, 10 May 2022, Philip Prindeville wrote:

> Anyone have a rule to detect the following nonsense headers seen in this message I got?
>
> Return-Path: <co...@uakron.edu>
> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
> 	by mail (envelope-sender <co...@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
> 	for <xy...@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
> To: "xyzzy@redfish-solutions.com" <xy...@redfish-solutions.com>
> From: "Nabil, Home Depot" <co...@uakron.edu>
> Message-ID: <35...@uakron.edu>
> Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
> Minicomputers-Exhume: sides
> Subject: Nabil, 1 searches this week
> Malthus-Films: 88976dea
> List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
> Parasitic-Homogeneity: db5da28ba3e69a
> MIME-Version: 1.0
> Capitalizations-Grievously: oilers
> Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"
>
> Obviously, the following bogus header names are present:
>
> Minicomputers-Exhume
> Malthus-Films
> Parasitic-Homogeneity
> Capitalizations-Grievously

Take a look at __RAND_HEADER and RAND_HEADER_MANY


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Of the twenty-two civilizations that have appeared in history,
   nineteen of them collapsed when they reached the moral state the
   United States is in now.                          -- Arnold Toynbee
-----------------------------------------------------------------------
  3 days until the 74th anniversary of Israel's independence

Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2022-05-10 at 18:10:23 UTC-0400 (Tue, 10 May 2022 16:10:23 -0600)
Philip Prindeville <ph...@redfish-solutions.com>
is rumored to have said:

> Anyone have a rule to detect the following nonsense headers seen in 
> this message I got?

No, and complicating your circumstance: RFC6648

Here's the title & abstract:


            Deprecating the "X-" Prefix and Similar Constructs
                         in Application Protocols

Abstract

    Historically, designers and implementers of application protocols
    have often distinguished between standardized and unstandardized
    parameters by prefixing the names of unstandardized parameters with
    the string "X-" or similar constructs.  In practice, that convention
    causes more problems than it solves.  Therefore, this document
    deprecates the convention for newly defined parameters with textual
    (as opposed to numerical) names in application protocols.



-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2022-05-10 at 20:20:14 UTC-0400 (Tue, 10 May 2022 18:20:14 -0600)
Philip Prindeville <ph...@redfish-solutions.com>
is rumored to have said:

>> On May 10, 2022, at 5:57 PM, Martin Gregorie <ma...@gregorie.org> 
>> wrote:
>>
>> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>>>
>>> You're correct that they're different in every message received.
>>>
>> So write a rule that fires on any header name that *doesn't* match
>> anything in the list of legit headers as defined in the relevant 
>> RFCs.
>
>
> See my original message.
>
> I can't think of a single way to match each header, and then test for 
> any of them not matching the pattern...

As documented in the POD in Mail::SpamAssassin::Conf, a header rule 
checking "ALL:raw" actually matches against the pristine header section, 
in which you could check for lines that do not begin with the 'standard' 
headers.

Unfortunately, as noted elsewhere in the thread, this pattern uses 
one-time header names AND there is nothing wrong about using random 
words as header names without a leading 'X-' so it's likely a low-yield 
approach.



-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 10, 2022, at 5:57 PM, Martin Gregorie <ma...@gregorie.org> wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 11, 2022, at 1:44 AM, Henrik K <he...@hege.li> wrote:
> 
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>> See my original message.
>> 
>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> 
> Simply use regex negative lookahead.
> 
> ALL =~ /^(?!Foo|Bar):/m
> 
> It will hit any line _not_ starting with Foo: or Bar:
> 


Ah, that did it.

Of course, if I get false positives, I'll have to search for the header names I forgot to include manually...



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Henrik K <he...@hege.li>.
On Mon, May 23, 2022 at 10:48:51PM -0600, Philip Prindeville wrote:
> 
> 
> > On May 11, 2022, at 1:53 AM, Henrik K <he...@hege.li> wrote:
> > 
> > On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> >> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> >>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> >>>> See my original message.
> >>>> 
> >>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> >>> 
> >>> Simply use regex negative lookahead.
> >>> 
> >>> ALL =~ /^(?!Foo|Bar):/m
> >>> 
> >>> It will hit any line _not_ starting with Foo: or Bar:
> >> 
> >> Oops I think it was buggy.. more like:
> >> 
> >> ALL =~ /^(?!(?:Foo|Bar):)/m
> > 
> > And for debug logging to log the missing header (to easily inspect what was
> > matched) you need some additional string matching, lookahead itself doesn't
> > save any string
> > 
> > ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> > 
> 
> 
> Ended up using .*$ instead of [^:]* but that worked too.
> 
> Is it possible to count how many times we didn't see matching headers and then count those, setting some threshold, like 3 or more unknown headers?

tflags multiple should work

header UNKNOWN_HDR ALL ...
tflags UNKNOWN_HDR multiple maxhits=3
meta UNKNOWN_HDR_TOOMANY UNKNOWN_HDR >= 3


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 11, 2022, at 1:53 AM, Henrik K <he...@hege.li> wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>> 
>>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


Ended up using .*$ instead of [^:]* but that worked too.

Is it possible to count how many times we didn't see matching headers and then count those, setting some threshold, like 3 or more unknown headers?

Thanks,

-Philip


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Henrik K <he...@hege.li>.
On Fri, May 13, 2022 at 12:22:48PM -0600, Philip Prindeville wrote:
>
> How do you look at what a rule is matching?  I've never figured that out...

Debug output:
spamassassin -t -D rules < message.eml 2>&1 | grep 'got hit'


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 11, 2022, at 1:53 AM, Henrik K <he...@hege.li> wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>> 
>>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


How do you look at what a rule is matching?  I've never figured that out...

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Henrik K <he...@hege.li>.
On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> > On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > > See my original message.
> > > 
> > > I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> > 
> > Simply use regex negative lookahead.
> > 
> > ALL =~ /^(?!Foo|Bar):/m
> > 
> > It will hit any line _not_ starting with Foo: or Bar:
> 
> Oops I think it was buggy.. more like:
> 
> ALL =~ /^(?!(?:Foo|Bar):)/m

And for debug logging to log the missing header (to easily inspect what was
matched) you need some additional string matching, lookahead itself doesn't
save any string

ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Henrik K <he...@hege.li>.
On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > See my original message.
> > 
> > I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> 
> Simply use regex negative lookahead.
> 
> ALL =~ /^(?!Foo|Bar):/m
> 
> It will hit any line _not_ starting with Foo: or Bar:

Oops I think it was buggy.. more like:

ALL =~ /^(?!(?:Foo|Bar):)/m

Unless you want to write colon to all alternations

ALL =~ /^(?!Foo:|Bar:)/m


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Henrik K <he...@hege.li>.
On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> See my original message.
> 
> I can't think of a single way to match each header, and then test for any of them not matching the pattern...

Simply use regex negative lookahead.

ALL =~ /^(?!Foo|Bar):/m

It will hit any line _not_ starting with Foo: or Bar:


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2022-05-10 at 18:19 -0600, Philip Prindeville wrote:
> I can't think of a single way to match each header, and then test for
> any of them not matching the pattern...
> 
> 
I had in mind a subrule that triggers on valid header names, combined
with a meta rule that inverts the subrule result. At least, that's what
I'd try as a starting point.

Martin



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 10, 2022, at 5:57 PM, Martin Gregorie <ma...@gregorie.org> wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
> 
> You're correct that they're different in every message received.
> 
So write a rule that fires on any header name that *doesn't* match
anything in the list of legit headers as defined in the relevant RFCs.

Of course you may need to extend that list to include some extras, such
as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.

Martin



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Philip Prindeville <ph...@redfish-solutions.com>.

> On May 10, 2022, at 4:58 PM, Kevin A. McGrail <km...@apache.org> wrote:
> 
> On 5/10/2022 6:10 PM, Philip Prindeville wrote:
>> Anyone have a rule to detect the following nonsense headers seen in this message I got?
> 
> Interesting. Those look more like something that Bayesian learning would be best to handle.
> 
> But, have you built a corpora of spam and ham?  Do a list of headers that appear in ham and spam corpora and xor out the spam ones.  Then write a rule if any of those exist.  They look like they might change a lot and they are randomized to avoid these type of issues so I see your dilemma and a plugin might be needed.
> 
> Regards,
> KAM


You're correct that they're different in every message received.



Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 5/10/2022 6:10 PM, Philip Prindeville wrote:
> Anyone have a rule to detect the following nonsense headers seen in this message I got?

Interesting. Those look more like something that Bayesian learning would 
be best to handle.

But, have you built a corpora of spam and ham?  Do a list of headers 
that appear in ham and spam corpora and xor out the spam ones.  Then 
write a rule if any of those exist.  They look like they might change a 
lot and they are randomized to avoid these type of issues so I see your 
dilemma and a plugin might be needed.

Regards,
KAM

-- 
Kevin A. McGrail
KMcGrail@Apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Re: Rule to detect non-standard headers that aren't X- prefixed

Posted by Loren Wilton <lw...@earthlink.net>.
> Minicomputers-Exhume: sides
> Malthus-Films: 88976dea
> Parasitic-Homogeneity: db5da28ba3e69a
> Capitalizations-Grievously: oilers

It looks like the pattern is
        /[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}/
or something close to that.
Obviously it can mutate, but generally these are made by a tool, and until a 
new version of the tool comes along, they will be stable.

Try someting like
    header  LW_BOGUS_HEADERS ALL =~ 
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}\n/is