You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <su...@junkemailfilter.com> on 2014/02/18 18:22:19 UTC

regex help

Trying to do something complex and not sure how it's done. What I'm 
looking for is to combine 2 conditions in a single regular expression so 
that both have to be true for a match. Yes - I know I can make 2 SA 
rules and combine them but I bet there's a way to do it in one 
expression. For simplicity here's the challenge.

A chuck of text has to include the word "cat" 5 time and the word "dog" 
4 times to be a match. How do you do that?

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Re: regex help

Posted by Joe Quinn <jq...@pccc.com>.
On 2/18/2014 12:22 PM, Marc Perkel wrote:
> Trying to do something complex and not sure how it's done. What I'm 
> looking for is to combine 2 conditions in a single regular expression 
> so that both have to be true for a match. Yes - I know I can make 2 SA 
> rules and combine them but I bet there's a way to do it in one 
> expression. For simplicity here's the challenge.
>
> A chuck of text has to include the word "cat" 5 time and the word 
> "dog" 4 times to be a match. How do you do that?
>
Take the FSM approach. You have thirty states to deal with, specifically 
{found X dogs, Y cats | X <- {0..5}, Y <- {0..4}}. The transitions 
between states should be obvious. Naively translating to a regex, you 
get a very very long pattern.

http://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions

As John alludes to, though it is technically possible you really don't 
want to do this. There's much more maintainable and composable ways to 
write what you need.

Re: regex help

Posted by Bowie Bailey <Bo...@BUC.com>.
On 2/18/2014 1:26 PM, Marc Perkel wrote:
> On 2/18/2014 9:32 AM, John Hardin wrote:
>> On Tue, 18 Feb 2014, Marc Perkel wrote:
>>
>>> Trying to do something complex and not sure how it's done. What I'm
>>> looking for is to combine 2 conditions in a single regular expression
>>> so that both have to be true for a match. Yes - I know I can make 2
>>> SA rules and combine them but I bet there's a way to do it in one
>>> expression. For simplicity here's the challenge.
>>>
>>> A chuck of text has to include the word "cat" 5 time and the word
>>> "dog" 4 times to be a match. How do you do that?
>> I assume there must be no restrictions on the order the occurrences
>> appear? That makes it rather difficult, and thus expensive.
>>
>> Two individual simple "tflags multiple maxhits=N" REs and a meta to
>> combine them would be much more efficient than a single RE. Is this
>> just an intellectual exercise, and/or something not limited to the SA
>> environment?
>>
> Yes - no order - it is expensive - don't care. Need to be a single regex.
>

Try this:

(?=(?:.*?\bcat\b){5}).*?(?:.*?\bdog\b){4}

It will match on a string with at least 5 instances of "cat" and at 
least 4 of "dog".  The "\b" anchors ensure that it will not match words 
like "catapult" or "dogma" -- it will also ignore strings like "catcat" 
or "dogcat".  Remove them if you don't care about partial word matches.

If you want to match a string with the exact number of matches, that 
will be more complicated.

-- 
Bowie

Re: regex help

Posted by Marc Perkel <su...@junkemailfilter.com>.
On 2/18/2014 9:32 AM, John Hardin wrote:
> On Tue, 18 Feb 2014, Marc Perkel wrote:
>
>> Trying to do something complex and not sure how it's done. What I'm 
>> looking for is to combine 2 conditions in a single regular expression 
>> so that both have to be true for a match. Yes - I know I can make 2 
>> SA rules and combine them but I bet there's a way to do it in one 
>> expression. For simplicity here's the challenge.
>>
>> A chuck of text has to include the word "cat" 5 time and the word 
>> "dog" 4 times to be a match. How do you do that?
>
> I assume there must be no restrictions on the order the occurrences 
> appear? That makes it rather difficult, and thus expensive.
>
> Two individual simple "tflags multiple maxhits=N" REs and a meta to 
> combine them would be much more efficient than a single RE. Is this 
> just an intellectual exercise, and/or something not limited to the SA 
> environment?
>

Yes - no order - it is expensive - don't care. Need to be a single regex.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Re: regex help

Posted by John Hardin <jh...@impsec.org>.
On Tue, 18 Feb 2014, Marc Perkel wrote:

> Trying to do something complex and not sure how it's done. What I'm looking 
> for is to combine 2 conditions in a single regular expression so that both 
> have to be true for a match. Yes - I know I can make 2 SA rules and combine 
> them but I bet there's a way to do it in one expression. For simplicity 
> here's the challenge.
>
> A chuck of text has to include the word "cat" 5 time and the word "dog" 4 
> times to be a match. How do you do that?

I assume there must be no restrictions on the order the occurrences 
appear? That makes it rather difficult, and thus expensive.

Two individual simple "tflags multiple maxhits=N" REs and a meta to 
combine them would be much more efficient than a single RE. Is this just 
an intellectual exercise, and/or something not limited to the SA 
environment?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  4 days until George Washington's 282nd Birthday