You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <su...@junkemailfilter.com> on 2014/02/18 18:22:19 UTC
regex help
Trying to do something complex and not sure how it's done. What I'm
looking for is to combine 2 conditions in a single regular expression so
that both have to be true for a match. Yes - I know I can make 2 SA
rules and combine them but I bet there's a way to do it in one
expression. For simplicity here's the challenge.
A chuck of text has to include the word "cat" 5 time and the word "dog"
4 times to be a match. How do you do that?
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400
Re: regex help
Posted by Joe Quinn <jq...@pccc.com>.
On 2/18/2014 12:22 PM, Marc Perkel wrote:
> Trying to do something complex and not sure how it's done. What I'm
> looking for is to combine 2 conditions in a single regular expression
> so that both have to be true for a match. Yes - I know I can make 2 SA
> rules and combine them but I bet there's a way to do it in one
> expression. For simplicity here's the challenge.
>
> A chuck of text has to include the word "cat" 5 time and the word
> "dog" 4 times to be a match. How do you do that?
>
Take the FSM approach. You have thirty states to deal with, specifically
{found X dogs, Y cats | X <- {0..5}, Y <- {0..4}}. The transitions
between states should be obvious. Naively translating to a regex, you
get a very very long pattern.
http://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions
As John alludes to, though it is technically possible you really don't
want to do this. There's much more maintainable and composable ways to
write what you need.
Re: regex help
Posted by Bowie Bailey <Bo...@BUC.com>.
On 2/18/2014 1:26 PM, Marc Perkel wrote:
> On 2/18/2014 9:32 AM, John Hardin wrote:
>> On Tue, 18 Feb 2014, Marc Perkel wrote:
>>
>>> Trying to do something complex and not sure how it's done. What I'm
>>> looking for is to combine 2 conditions in a single regular expression
>>> so that both have to be true for a match. Yes - I know I can make 2
>>> SA rules and combine them but I bet there's a way to do it in one
>>> expression. For simplicity here's the challenge.
>>>
>>> A chuck of text has to include the word "cat" 5 time and the word
>>> "dog" 4 times to be a match. How do you do that?
>> I assume there must be no restrictions on the order the occurrences
>> appear? That makes it rather difficult, and thus expensive.
>>
>> Two individual simple "tflags multiple maxhits=N" REs and a meta to
>> combine them would be much more efficient than a single RE. Is this
>> just an intellectual exercise, and/or something not limited to the SA
>> environment?
>>
> Yes - no order - it is expensive - don't care. Need to be a single regex.
>
Try this:
(?=(?:.*?\bcat\b){5}).*?(?:.*?\bdog\b){4}
It will match on a string with at least 5 instances of "cat" and at
least 4 of "dog". The "\b" anchors ensure that it will not match words
like "catapult" or "dogma" -- it will also ignore strings like "catcat"
or "dogcat". Remove them if you don't care about partial word matches.
If you want to match a string with the exact number of matches, that
will be more complicated.
--
Bowie
Re: regex help
Posted by Marc Perkel <su...@junkemailfilter.com>.
On 2/18/2014 9:32 AM, John Hardin wrote:
> On Tue, 18 Feb 2014, Marc Perkel wrote:
>
>> Trying to do something complex and not sure how it's done. What I'm
>> looking for is to combine 2 conditions in a single regular expression
>> so that both have to be true for a match. Yes - I know I can make 2
>> SA rules and combine them but I bet there's a way to do it in one
>> expression. For simplicity here's the challenge.
>>
>> A chuck of text has to include the word "cat" 5 time and the word
>> "dog" 4 times to be a match. How do you do that?
>
> I assume there must be no restrictions on the order the occurrences
> appear? That makes it rather difficult, and thus expensive.
>
> Two individual simple "tflags multiple maxhits=N" REs and a meta to
> combine them would be much more efficient than a single RE. Is this
> just an intellectual exercise, and/or something not limited to the SA
> environment?
>
Yes - no order - it is expensive - don't care. Need to be a single regex.
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400
Re: regex help
Posted by John Hardin <jh...@impsec.org>.
On Tue, 18 Feb 2014, Marc Perkel wrote:
> Trying to do something complex and not sure how it's done. What I'm looking
> for is to combine 2 conditions in a single regular expression so that both
> have to be true for a match. Yes - I know I can make 2 SA rules and combine
> them but I bet there's a way to do it in one expression. For simplicity
> here's the challenge.
>
> A chuck of text has to include the word "cat" 5 time and the word "dog" 4
> times to be a match. How do you do that?
I assume there must be no restrictions on the order the occurrences
appear? That makes it rather difficult, and thus expensive.
Two individual simple "tflags multiple maxhits=N" REs and a meta to
combine them would be much more efficient than a single RE. Is this just
an intellectual exercise, and/or something not limited to the SA
environment?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You do not examine legislation in the light of the benefits it
will convey if properly administered, but in the light of the
wrongs it would do and the harms it would cause if improperly
administered. -- Lyndon B. Johnson
-----------------------------------------------------------------------
4 days until George Washington's 282nd Birthday