You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Per Jessen <pe...@computer.org> on 2007/10/09 12:58:22 UTC

header lines being folded into one?

I've got a rule for spotting a dodgy X-Originating-IP:

header   PJ_BOGUS_XORIGIN  X-Originating-IP =~ /\.([3-9][^. ][^. ]+|2[6-9][^. ]+|25[6-9][^. ]*)\./

I was investigating an unusual hit, when I noticed the following:

It will produce a positive hit when an email contains two lines like these:

X-Originating-IP: [17.148.16.66]
X-Originating-IP: 134.32.140.207

whereas there is no match if the email only contains one of those. 
(either one)

It looks to me like the two X-Originating-IP lines are merged into 
one, and my regex is then applied to:

X-Originating-IP: [17.148.16.66]134.32.140.207

Is this normal/correct behaviour?



/Per Jessen, Zürich


Re: header lines being folded into one?

Posted by Per Jessen <pe...@computer.org>.
Mark Martinec wrote:

> Per,
> 
>> X-Originating-IP: [17.148.16.66]
>> X-Originating-IP: 134.32.140.207
> ...
>> It looks to me like the two X-Originating-IP lines are merged into
>> one, and my regex is then applied to:
>>
>> X-Originating-IP: [17.148.16.66]134.32.140.207
> 
> True (with newline inbetween).
> 
>> Is this normal/correct behaviour?
> 
> Don't know, it seems it has been designed that way purposely,
> but it causes surprises and unexpected matching when same
> header field name occurs more than once in a message.
> This was already mentioned on the list. A workaround is
> to use a /m flag on almost any regexp in rules, especially
> those which use anchors (^ and $), e.g.:
> 
>   header L_LANIECA_S1 Subject =~ /^(girls|love|screensaver)$/m
> 
> To me it looks like a misfeature.

Yeah, that was my thought too.  At the very least it's
counter-intuitive.  Thanks for the /m hint. 


/Per Jessen, Zürich


Re: header lines being folded into one?

Posted by Mark Martinec <Ma...@ijs.si>.
On Tuesday October 9 2007 20:19:36 Loren Wilton wrote:
> > To me it looks like a misfeature.
>
> I think I would agree that it may be a misfeature in the case of this
> specific header.  In general though it may not be.  Consider the case of
> two separate Subject: headers, often with completely different subjects. 
> There was a time that was a quite decent spam sign  (I haven't checked
> recently to see if it still is).

I wasn't suggesting discarding repeated header fields.

But a rule like (a test for an older virus storm):

  header L_LANIECA_S1 Subject =~ /^(girls|love|screensaver)$/

could be tried individually with each Subject header field body,
instead of with both field bodies concatenated - without noticable
performance degradation.

As it goes now, the above rule can be easily circumvented
by adding a second Subject line, and the ^ or $ would not match
any more. It is easy to forget putting the /m flag in rules.

An incident that made me aware of the problem is when I tried
to detect mail coming with a particular X-*... line - which
misteriously failed every now and then, until I realized it
fails when there are two such lines in a header. In my case
it was a pure security incident (I won't go into details).

Justin Mason wrote:
> Yeah -- it's long-standing, but surprising, behaviour.
> it should probably be documented in the Conf manpage somewhere...

I think it should be changed for 3.3.0 (and a change well documented).
I would think a change would fix more rules that are unknowingly
broken now, than rules that would possibly be affected in undesirable way.

  Mark

Re: header lines being folded into one?

Posted by Loren Wilton <lw...@earthlink.net>.
> To me it looks like a misfeature.

I think I would agree that it may be a misfeature in the case of this 
specific header.  In general though it may not be.  Consider the case of two 
separate Subject: headers, often with completely different subjects.  There 
was a time that was a quite decent spam sign  (I haven't checked recently to 
see if it still is).

        Loren



Re: header lines being folded into one?

Posted by Mark Martinec <Ma...@ijs.si>.
Per,

> X-Originating-IP: [17.148.16.66]
> X-Originating-IP: 134.32.140.207
...
> It looks to me like the two X-Originating-IP lines are merged into
> one, and my regex is then applied to:
>
> X-Originating-IP: [17.148.16.66]134.32.140.207

True (with newline inbetween).

> Is this normal/correct behaviour?

Don't know, it seems it has been designed that way purposely,
but it causes surprises and unexpected matching when same
header field name occurs more than once in a message.
This was already mentioned on the list. A workaround is
to use a /m flag on almost any regexp in rules, especially
those which use anchors (^ and $), e.g.:

  header L_LANIECA_S1 Subject =~ /^(girls|love|screensaver)$/m

To me it looks like a misfeature.

  Mark