You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Pedro David Marco <pe...@yahoo.com> on 2018/12/06 17:52:13 UTC

Understanding header ALL

Hi,
i need some wisdom from SA monks please...
Can anyone explain briefly how header ALL work?
if i try a rule like this:
header        TESTRULE1         ALL   =~    /.+/ism
Using -D debug mode i only "see"  the first header of the email... shouldn't i see all headers?

it works nice if i check for  something slightly more complex, such as.... 
header        TESTRULE2         ALL  =~   /From=.*pedro.*  To=.*pedro.*/ism
but i am trying to understand  how it works... and why i only see one line in Debug mode...
Thx,
--------PedroD




Re: Understanding header ALL

Posted by John Hardin <jh...@impsec.org>.
On Thu, 6 Dec 2018, Pedro David Marco wrote:

> Hi,
> i need some wisdom from SA monks please...
> Can anyone explain briefly how header ALL work?
> if i try a rule like this:
> header        TESTRULE1         ALL   =~    /.+/ism
> Using -D debug mode i only "see"  the first header of the email... shouldn't i see all headers?
>
> it works nice if i check for  something slightly more complex, such as.... 
> header        TESTRULE2         ALL  =~   /From=.*pedro.*  To=.*pedro.*/ism
> but i am trying to understand  how it works... and why i only see one line in Debug mode...
> Thx,

"." apparently doesn't match line breaks (I'm sure that's documented 
somewhere in the RE language spec but I can't be bothered to dig it up 
right now :) ).

There's two ways to do this:

# All headers, one per hit
header   __ALL_HEADERS        ALL =~ /.+/sm
tflags   __ALL_HEADERS        multiple

# All headers together in one hit
header   __ALL_HEADERS_ALL    ALL =~ /(?:.+$)+/sm


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   USMC Rules of Gunfighting #6: If you can choose what to bring
   to a gunfight, bring a long gun and a friend with a long gun.
-----------------------------------------------------------------------
  Tomorrow: The 77th anniversary of Pearl Harbor

Re: Understanding header ALL

Posted by Benny Pedersen <me...@junc.eu>.
Pedro David Marco skrev den 2018-12-06 22:29:

> if your rule worked, it would only match FROM or TO... the great
> advantage of the ALL is that i "sees" all headers in one string so we
> can match FROM  'and'   TO at the same time

i know from my own rules it sometime possitive to limit data to what is 
wanted :=)

all header could be upto 64kb if i remember smtp specs well, so i would 
check this to see if i miss something with it, to tired for today to 
test it, my own intrest begin to make smtp test rules, inspired from 
rspamd into spamassassin, i will not disclose it, since spammers might 
listen here what i do, i stand behind open source, but will not help 
spammers make a game

i had using rspamd, but lost intrest in it, to complicated for me to 
manage, and ucl was and is not well supported in linux, why did thay not 
use xml where there is plenty of tools to build edit and manage it, 
thanks to spamassassin it not that complicated to make things working

Re: Understanding header ALL

Posted by Henrik K <he...@hege.li>.
Why do you need to match them at the same time?  Using meta would be more
effective.  Also this doesn't care what order From and To appear in headers.

header __FOO1 From =~ /pedro/i
header __FOO2 To =~ /pedro/i
meta FOO __FOO1 && __FOO2

But just to put it out there, if you want to capture and match things from
another header, proper way would be with positive lookaheads, header order
won't matter then.  This finds pedro from To:.

header FOO ALL =~ /^(?=.*?\nFrom:[^\n]*(pedro))(?=.*?\nTo:[^\n]*\1)/si


On Thu, Dec 06, 2018 at 09:29:40PM +0000, Pedro David Marco wrote:
> Thanks Benny,
> 
> if your rule worked, it would only match FROM or TO... the great advantage of
> the ALL is that i "sees" all headers in one string so we can match FROM  'and' 
>  TO at the same time
> 
> ------
> PedroD
> 
> 
> 
> On Thursday, December 6, 2018, 10:23:17 PM GMT+1, Benny Pedersen <me...@junc.eu>
> wrote:
> 
> 
> Pedro David Marco skrev den 2018-12-06 21:25:
> 
> 
> > header        TESTRULE2        ALL  =~  /From=.*pedro.*
> > To=.*pedro.*/ism
> > This is a mistery...  :-?
> 
> 
> header TESTRULE (From|To) =~ /\.*pedro\.*/ism
> 
> dont know if it works, just my silly thinking right now
> 

Re: Understanding header ALL

Posted by Pedro David Marco <pe...@yahoo.com>.
 Thanks Benny,
if your rule worked, it would only match FROM or TO... the great advantage of the ALL is that i "sees" all headers in one string so we can match FROM  'and'   TO at the same time
------PedroD


    On Thursday, December 6, 2018, 10:23:17 PM GMT+1, Benny Pedersen <me...@junc.eu> wrote:  
 
 Pedro David Marco skrev den 2018-12-06 21:25:

> header        TESTRULE2        ALL  =~  /From=.*pedro.* 
> To=.*pedro.*/ism
> This is a mistery...  :-?

header TESTRULE (From|To) =~ /\.*pedro\.*/ism

dont know if it works, just my silly thinking right now
  

Re: Understanding header ALL

Posted by Benny Pedersen <me...@junc.eu>.
Pedro David Marco skrev den 2018-12-06 21:25:

> header        TESTRULE2         ALL  =~   /From=.*pedro.* 
> To=.*pedro.*/ism
> This is a mistery...  :-?

header TESTRULE (From|To) =~ /\.*pedro\.*/ism

dont know if it works, just my silly thinking right now

Re: Understanding header ALL

Posted by John Hardin <jh...@impsec.org>.
On Fri, 7 Dec 2018, Bill Cole wrote:

> This is entirely a debug message artifact. In fact, '/.+/' will match the 
> entire header block, however the 'dbg()' function won't print all of that, 
> apparently due to an expansion artifact in Mail::SpamAssassin::Logger

Aha! Thanks for explaining that!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  7 days until Bill of Rights day

Re: Understanding header ALL

Posted by Pedro David Marco <pe...@yahoo.com>.
 $BillCole++ ;   # :-)
Thanks Bill.. that was my concern and what i was suspecting...
----------Pedro.D
    On Saturday, December 8, 2018, 3:59:12 AM GMT+1, Bill Cole <sa...@billmail.scconsult.com> wrote:  
 
 On 6 Dec 2018, at 15:25, Pedro David Marco wrote:

>  Thanks Bill and John...
> Your words make sense to me. It seems that ALL means that SA puts all 
> headers into a Perl string (including \n chars) and tries the regex...
> As John Hardin correctly states,  a dot does not match  the \n  but 
> this is changed with the "s" regex flag.  
> In fact it works like a charm if i try a rule like this:
>    header        TESTRULE2         ALL  =~  
>  /From=.*pedro.*  To=.*pedro.*/ism 
> This is a mistery...  :-?

No mystery: misunderstanding. I thought you were expecting multiple 
hits, but now I realize that you are just asking about the debug 
message.

This is entirely a debug message artifact. In fact, '/.+/' will match 
the entire header block, however the 'dbg()' function won't print all of 
that, apparently due to an expansion artifact in 
Mail::SpamAssassin::Logger


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole
  

Re: Understanding header ALL

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 6 Dec 2018, at 15:25, Pedro David Marco wrote:

>  Thanks Bill and John...
> Your words make sense to me. It seems that ALL means that SA puts all 
> headers into a Perl string (including \n chars) and tries the regex...
> As John Hardin correctly states,  a dot does not match  the \n  but 
> this is changed with the "s" regex flag.  
> In fact it works like a charm if i try a rule like this:
>    header        TESTRULE2         ALL  =~  
>  /From=.*pedro.*  To=.*pedro.*/ism 
> This is a mistery...  :-?

No mystery: misunderstanding. I thought you were expecting multiple 
hits, but now I realize that you are just asking about the debug 
message.

This is entirely a debug message artifact. In fact, '/.+/' will match 
the entire header block, however the 'dbg()' function won't print all of 
that, apparently due to an expansion artifact in 
Mail::SpamAssassin::Logger


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Re: Understanding header ALL

Posted by Pedro David Marco <pe...@yahoo.com>.
 Thanks Bill and John...
Your words make sense to me. It seems that ALL means that SA puts all headers into a Perl string (including \n chars) and tries the regex...
As John Hardin correctly states,  a dot does not match  the \n  but this is changed with the "s" regex flag.  
In fact it works like a charm if i try a rule like this:
   header        TESTRULE2         ALL  =~   /From=.*pedro.*  To=.*pedro.*/ism 
This is a mistery...  :-?
Thanks to all...
---PedroD



    On Thursday, December 6, 2018, 8:32:46 PM GMT+1, Bill Cole <sa...@billmail.scconsult.com> wrote:  
 
 On 6 Dec 2018, at 13:36, Pedro David Marco wrote:

>  Thanks a lot Bill..
> i already considered the "multiple" flag and it did not work 
> either...   i mean... the rule works but i only see the first line 
> in Debug mode...
> ----Pedrod

Having pondered this for a bit and looked at unhelpful docs, I *think* I 
understand what's going on.

You cannot get multiple hits from an ALL rule because the regex is 
matched against the whole block of headers. Once it matches, the test is 
done.

It might make sense to add an "ANY" pseudo-header that tests against 
each header, rather than "ALL" which tests against the whole text of all 
the headers.


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole
  

Re: Understanding header ALL

Posted by RW <rw...@googlemail.com>.
On Fri, 07 Dec 2018 09:14:11 -0500
Bill Cole wrote:

> On 7 Dec 2018, at 8:33, RW wrote:
> 
> > On Thu, 06 Dec 2018 14:32:37 -0500
> > Bill Cole wrote:
> >  
> >> You cannot get multiple hits from an ALL rule because the regex is
> >> matched against the whole block of headers. Once it matches, the
> >> test is done.  
> >
> > Just for the record, that isn't a limitation of "multiple"  
> 
> Right. It's inherent in the logic of the "ALL" pseudo-header: an 
> aggregate of all headers, not an array of discrete headers.

I wouldn't expect it to make any difference. In the body each paragraph
is a separate string (unfortunately), and hit counts are aggregated
across all of them.

Re: Understanding header ALL

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 7 Dec 2018, at 8:33, RW wrote:

> On Thu, 06 Dec 2018 14:32:37 -0500
> Bill Cole wrote:
>
>> You cannot get multiple hits from an ALL rule because the regex is
>> matched against the whole block of headers. Once it matches, the test
>> is done.
>
> Just for the record, that isn't a limitation of "multiple"

Right. It's inherent in the logic of the "ALL" pseudo-header: an 
aggregate of all headers, not an array of discrete headers.


Re: Understanding header ALL

Posted by RW <rw...@googlemail.com>.
On Thu, 06 Dec 2018 14:32:37 -0500
Bill Cole wrote:

> You cannot get multiple hits from an ALL rule because the regex is 
> matched against the whole block of headers. Once it matches, the test
> is done.

Just for the record, that isn't a limitation of "multiple"

header   T_TEST1   Subject =~ /\w+/
tflags   T_TEST1   multiple


$ echo "Subject: Mary had a little lamb" | spamassassin -D 2>&1 | grep -o  'T_TEST1.*'  
T_TEST1 ======> got hit: "Mary"
T_TEST1 ======> got hit: "had"
T_TEST1 ======> got hit: "a"
T_TEST1 ======> got hit: "little"
T_TEST1 ======> got hit: "lamb"

Re: Understanding header ALL

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 6 Dec 2018, at 13:36, Pedro David Marco wrote:

>  Thanks a lot Bill..
> i already considered the "multiple" flag and it did not work 
> either...   i mean... the rule works but i only see the first line 
> in Debug mode...
> ----Pedrod

Having pondered this for a bit and looked at unhelpful docs, I *think* I 
understand what's going on.

You cannot get multiple hits from an ALL rule because the regex is 
matched against the whole block of headers. Once it matches, the test is 
done.

It might make sense to add an "ANY" pseudo-header that tests against 
each header, rather than "ALL" which tests against the whole text of all 
the headers.


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Re: Understanding header ALL

Posted by Pedro David Marco <pe...@yahoo.com>.
 Thanks a lot Bill..
i already considered the "multiple" flag and it did not work either...   i mean... the rule works but i only see the first line in Debug mode...
----Pedrod




    On Thursday, December 6, 2018, 7:21:46 PM GMT+1, Bill Cole <sa...@billmail.scconsult.com> wrote:  
 
 On 6 Dec 2018, at 12:52, Pedro David Marco wrote:

> Hi,
> i need some wisdom from SA monks please...
> Can anyone explain briefly how header ALL work?
> if i try a rule like this:
> header        TESTRULE1         ALL   =~    /.+/ism
> Using -D debug mode i only "see"  the first header of the email... 
> shouldn't i see all headers?
>
> it works nice if i check for  something slightly more complex, such 
> as.... 
> header        TESTRULE2         ALL  =~  
>  /From=.*pedro.*  To=.*pedro.*/ism
> but i am trying to understand  how it works... and why i only see one 
> line in Debug mode...
> Thx,
> --------PedroD


For a rule to match more than once per message, it needs to have the 
'multiple' tflag set, e.g.:

tflags  TESTRULE1  multiple maxhits=50

(It's generally wise to set *some* 'maxhits' value on a 'multiple' rule, 
since it can save you from runaway scanning of pathological messages.)

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole
  

Re: Understanding header ALL

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 6 Dec 2018, at 12:52, Pedro David Marco wrote:

> Hi,
> i need some wisdom from SA monks please...
> Can anyone explain briefly how header ALL work?
> if i try a rule like this:
> header        TESTRULE1         ALL   =~    /.+/ism
> Using -D debug mode i only "see"  the first header of the email... 
> shouldn't i see all headers?
>
> it works nice if i check for  something slightly more complex, such 
> as.... 
> header        TESTRULE2         ALL  =~  
>  /From=.*pedro.*  To=.*pedro.*/ism
> but i am trying to understand  how it works... and why i only see one 
> line in Debug mode...
> Thx,
> --------PedroD


For a rule to match more than once per message, it needs to have the 
'multiple' tflag set, e.g.:

tflags  TESTRULE1  multiple maxhits=50

(It's generally wise to set *some* 'maxhits' value on a 'multiple' rule, 
since it can save you from runaway scanning of pathological messages.)

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole