You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Pedro David Marco <pe...@yahoo.com> on 2020/07/07 10:18:30 UTC

Multiple regex on same URL

I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to be able to check up to three regex expressions on the "same" URL. It seems to work wellbut... any crazy (with all respects) volunteer for checks.. tests... etc?
Disclaimer: I am not a super Perl developer, so the code may be ugly for perl monks :-(  sorry..
Regards,
-----------Pedro.





Re: Multiple regex on same URL

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 07.07.20 10:18, Pedro David Marco wrote:
> I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to
> be able to check up to three regex expressions on the "same" URL.  It
> seems to work wellbut...  any crazy (with all respects) volunteer for
> checks..  tests...  etc?

>Disclaimer: I am not a super Perl developer, so the code may be ugly for perl monks :-(  sorry..
>Regards,
>-----------Pedro.

try posting the patch or a link to it. 

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]

Re: Multiple regex on same URL

Posted by John Hardin <jh...@impsec.org>.
On Tue, 7 Jul 2020, Martin Gregorie wrote:

> On Tue, 2020-07-07 at 20:39 +0000, Pedro David Marco wrote:
>>
>>
>>   >On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <
>> hege@hege.li> wrote:
>>
>>> Also newer SpamAssassin already has URIDetail plugin which can also
>>> do what you want:
>>>   uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/
>>> ...
>> if it uses the same key more than once, then uri_detail joins them
>> with "OR", but we need an "AND"
>> -----Pedro
>>
> That should be easy enough to do with a metarule:
>
> uri   __SUBRULE1 /(URL alternateslist1)/
> uri   __SUBRULE2 /(URL alternateslist2)/
> meta  MYMETARULE (__SUBRULE1 && __SUBRULE2)
> score MYMETARULE 6.0

Unfortunately there's no way to enforce them being checked together on the 
*same* URI: uri1 could hit SR1 and uri2 could hit SR2 and the meta would fire, but it 
would be inappropriate.

The (?=...)(?!...) construct is better.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   We have to realize that people who run the government can and do
   change. Our society and laws must assume that bad people -
   criminals even - will run the government, at least part of the
   time.                                               -- John Gilmore
-----------------------------------------------------------------------
  Today: Robert Heinlein's 113th birthday

Re: Multiple regex on same URL

Posted by Pedro David Marco <pe...@yahoo.com>.
 

   >On Wednesday, July 8, 2020, 12:28:37 AM GMT+2, Martin Gregorie <ma...@gregorie.org> wrote:  
 >>I didn't spot the requirement that the URIs must match: I read your
>requirement as being that two matches from a group of URLs within a
>defined set or with the same second level domain would do. My mistake.

Probably my fault, Martin.. my "English" leaves much to be desired...

>Might it be easier to define and implement with a decent RDBMS and a
>clever SQL query? 
The simplest way has been to patch uri_detail plugin so it can combine multiple equal keys with OR or AND on demand... :-)
----Pedro

  

Re: Multiple regex on same URL

Posted by John Hardin <jh...@impsec.org>.
On Tue, 7 Jul 2020, Martin Gregorie wrote:

> On Tue, 2020-07-07 at 22:07 +0000, Pedro David Marco wrote:
>> Thanks Martin, but  the meta may be possitive if one URL triggers
>> SUBRULE1 and another different URL triggers SUBRULE2...
>>  how can you be sure both SUBRULES are possitive in the "same" URL?
>>
> I didn't spot the requirement that the URIs must match: I read your
> requirement as being that two matches from a group of URLs within a
> defined set or with the same second level domain would do. My mistake.
>
> Might it be easier to define and implement with a decent RDBMS and a
> clever SQL query?

Ugh, no.

The (?=...)(?!...) is a good way, but if you use * or + you need to be 
careful to avoid the possibility of a backtrack DOS - use the "non-greedy" 
version. However, that weakness is smaller as we're looking at URIs rather 
than the entire message body - there's less to potentially backtrack over.

I suggest the positive match first, then the negative match, as the 
positive match will probably occur in only a small percentage of URIs 
scanned and will thus generally fail and shortcircuit the evaluation of 
the (much more likely to hit) negative lookforward match.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   We have to realize that people who run the government can and do
   change. Our society and laws must assume that bad people -
   criminals even - will run the government, at least part of the
   time.                                               -- John Gilmore
-----------------------------------------------------------------------
  Today: Robert Heinlein's 113th birthday

Re: Multiple regex on same URL

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2020-07-07 at 22:07 +0000, Pedro David Marco wrote:
> Thanks Martin, but  the meta may be possitive if one URL triggers
> SUBRULE1 and another different URL triggers SUBRULE2...
>  how can you be sure both SUBRULES are possitive in the "same" URL? 
>
I didn't spot the requirement that the URIs must match: I read your
requirement as being that two matches from a group of URLs within a
defined set or with the same second level domain would do. My mistake.

Might it be easier to define and implement with a decent RDBMS and a
clever SQL query? 

Martin



Re: Multiple regex on same URL

Posted by Pedro David Marco <pe...@yahoo.com>.
 

   >On Tuesday, July 7, 2020, 11:56:22 PM GMT+2, Martin Gregorie <ma...@gregorie.org> wrote:  
 
> That should be easy enough to do with a metarule:

>uri  __SUBRULE1 /(URL alternateslist1)/
>uri  __SUBRULE1 /(URL alternateslist2)/
>meta  MYMETARULE (__SUBRULE1 && __SUBRULE2)
>score MYMETARULE 6.0

.>..or something like that

>Martin
Thanks Martin, but  the meta may be possitive if one URL triggers SUBRULE1 and another different URL triggers SUBRULE2...
 how can you be sure both SUBRULES are possitive in the "same" URL? 
-----Pedro






  

Re: Multiple regex on same URL

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2020-07-07 at 20:39 +0000, Pedro David Marco wrote:
>  
> 
>    >On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <
> hege@hege.li> wrote:  
>  
> > Also newer SpamAssassin already has URIDetail plugin which can also
> > do what you want:
> >   uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/
> > ...
> if it uses the same key more than once, then uri_detail joins them
> with "OR", but we need an "AND" 
> -----Pedro
> 
That should be easy enough to do with a metarule:

uri   __SUBRULE1 /(URL alternateslist1)/
uri   __SUBRULE1 /(URL alternateslist2)/
meta  MYMETARULE (__SUBRULE1 &&
__SUBRULE2)
score MYMETARULE 6.0

...or something like that

Martin



Re: Multiple regex on same URL

Posted by Pedro David Marco <pe...@yahoo.com>.
 

   >On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <he...@hege.li> wrote:  
 
>Also newer SpamAssassin already has URIDetail plugin which can also do what you want:

>  uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/ ...
if it uses the same key more than once, then uri_detail joins them with "OR", but we need an "AND" 
-----Pedro


  

Re: Multiple regex on same URL

Posted by "@lbutlr" <kr...@kreme.com>.
On 07 Jul 2020, at 07:16, Henrik K <he...@hege.li> wrote:
> On Tue, Jul 07, 2020 at 11:41:01AM +0000, Pedro David Marco wrote:
>> 
>>> On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <he...@hege.li> wrote:
>> 
>> 
>>> What examply do you mean by checking multiple regex on the "same" URL?  Give
>> an example.  Most likely it's already possible without any changes.
>> 
>> 
>> for example..  checking if an URL matches Regex1  BUT does NOT matches Regex2 
>> can be done  with looksahead/behind but is cpu-expensive and may be too complex
>> to maintain... 
> 
> Why would lookahead be expensive?  It's normal regex.  It's probably more
> expensive to run two separate regexes.

Is the ReDos Attack relevant here?

<https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS>
"The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time."



-- 
Once upon a time, a woman was picking up firewood. She came upon a
	poisonous snake frozen in the snow. She took the snake home and
	nurse it back to health. One day the snake bit her on the cheek.
	As she lay dying, she asked the snake, "Why have you done this to
	me?" And the snake answered, "Look, bitch, you knew I was a
	snake."


Re: Multiple regex on same URL

Posted by Henrik K <he...@hege.li>.
On Tue, Jul 07, 2020 at 11:41:01AM +0000, Pedro David Marco wrote:
> 
> >On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <he...@hege.li> wrote:
> 
> 
> >What examply do you mean by checking multiple regex on the "same" URL?  Give
> an example.  Most likely it's already possible without any changes.
> 
> 
> for example..  checking if an URL matches Regex1  BUT does NOT matches Regex2 
> can be done  with looksahead/behind but is cpu-expensive and may be too complex
> to maintain... 

Why would lookahead be expensive?  It's normal regex.  It's probably more
expensive to run two separate regexes.

uri FOO /^(?!.*?donotfind)(?=.*?findthis)/

Also newer SpamAssassin already has URIDetail plugin which can also do what
you want:

  uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/ ...


Re: Multiple regex on same URL

Posted by Pedro David Marco <pe...@yahoo.com>.
 

   >On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <he...@hege.li> wrote:  
 
>What examply do you mean by checking multiple regex on the "same" URL?  Give an example.  Most likely it's already possible without any changes.

for example..  checking if an URL matches Regex1  BUT does NOT matches Regex2  can be done  with looksahead/behind but is cpu-expensive and may be too complex to maintain... 

----Pedro 


  

Re: Multiple regex on same URL

Posted by Henrik K <he...@hege.li>.
On Tue, Jul 07, 2020 at 10:18:30AM +0000, Pedro David Marco wrote:
> I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to be
> able to check up to three regex expressions on the "same" URL. It seems to work
> well
> but... any crazy (with all respects) volunteer for checks.. tests... etc?
> 
> Disclaimer: I am not a super Perl developer, so the code may be ugly for perl
> monks :-(  sorry..

What examply do you mean by checking multiple regex on the "same" URL?  Give
an example.  Most likely it's already possible without any changes.