You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jason Frisvold <xe...@gmail.com> on 2005/12/27 14:30:55 UTC

Re: What's does m{} do ?

On 12/27/05, Mark R. London <mr...@psfc.mit.edu> wrote:
> What does m{} do, like in the following test?
>
> body DRUG_DOSAGE            m{[\d\.]+ *\$? *(?:[\\/]|per) *d.?o.?s.?e}i

Looks like a case insensitive match ..  Let's see..

[\d\.]+ matches a digit or a period one or more times
 * (that's space asterisk) matches 0 or more spaces
\$? matches a dollar sign 0 or 1 time
 * (that's space asterisk) matches 0 or more spaces
(?:[\\/]|per) I'm not 100% sure on..  It looks like it matches either
:V or per ...
 * (that's space asterisk) matches 0 or more spaces
d.?o.?s.?e matches d followed by 0 or 1 period, o followed by 0 or 1
period, s followed by 0 or 1 period, and e

Standard perl regex ..  Check out these sites :

http://www.intuitive.com/spam-assassin-rule-help.html
http://www.english.uga.edu/humcomp/perl/regex2a.html
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: What's does m{} do ?

Posted by Matt Kettler <mk...@comcast.net>.
At 09:34 AM 12/27/2005, Mark London wrote:
>rather than simply //, or are they identical?  (There are only a couple of
>tests which use m{} in Spamassassin).

They are identical, but they do have one advantage.. you can use / inside 
the rule text without having it escape it.

it makes things like http:// much more readable, as in a normal / delimited 
rule you'd have to write http:\/\/

The rules that use m{ likely contain many /'es in the text, so this was 
done for readability. 


Re: What's does m{} do ?

Posted by Mark London <mr...@psfc.mit.edu>.
Sorry, I wasn't clear about my question, which is why is m{} used in that test 
rather than simply //, or are they identical?  (There are only a couple of 
tests which use m{} in Spamassassin).


Re: What's does m{} do ?

Posted by Jason Frisvold <xe...@gmail.com>.
On 12/27/05, Loren Wilton <lw...@earthlink.net> wrote:
> Close, but not quite.
>
> (?:[\\/]|per)
>
> The (?:) is bracketing.  A normal pair of parends would be 'capturing' and
> keep track of what was found within the grouping.  The ?: modifier tells
> Perl to not bother capturing the contents, since it won't be used later.
> This is an efficiency concern.

Ahh, I was not aware of that..  That does come in handy..  Thanks for
that info :)

> The [\\/] is a character set match.  It is looking for either / or \.  The
> other side of the alternation is 'per'.  Thus it is looking for 'per', or a
> slash or backslash as in $1.25/dose.

Heh..  font issue..  I could have *sworn* that was \V and not \\/   I
had no idea what \V meant and couldnt find a reference to it..  *grin*

> d.?o.?s.?e matches d followed by 0 or 1 *any character*, followed by o, etc.
> A bare dot in a regex is a 'match any character except newline' character.
> So this is looking for 'dose', 'd ose', 'd*o*s*e', or any other random form
> of one-character obfuscation.

Typo on my part..  I meant any character...  Sorry bout that..  :)

>         Loren

Thanks for clearing everything else up..  My regex foo is still a little weak..

--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com

Re: What's does m{} do ?

Posted by Loren Wilton <lw...@earthlink.net>.
[\d\.]+ matches a digit or a period one or more times
 * (that's space asterisk) matches 0 or more spaces
\$? matches a dollar sign 0 or 1 time
 * (that's space asterisk) matches 0 or more spaces
(?:[\\/]|per) I'm not 100% sure on..  It looks like it matches either
:V or per ...
 * (that's space asterisk) matches 0 or more spaces
d.?o.?s.?e matches d followed by 0 or 1 period, o followed by 0 or 1
period, s followed by 0 or 1 period, and e

Close, but not quite.

(?:[\\/]|per)

The (?:) is bracketing.  A normal pair of parends would be 'capturing' and
keep track of what was found within the grouping.  The ?: modifier tells
Perl to not bother capturing the contents, since it won't be used later.
This is an efficiency concern.

The [\\/] is a character set match.  It is looking for either / or \.  The
other side of the alternation is 'per'.  Thus it is looking for 'per', or a
slash or backslash as in $1.25/dose.

d.?o.?s.?e matches d followed by 0 or 1 *any character*, followed by o, etc.
A bare dot in a regex is a 'match any character except newline' character.
So this is looking for 'dose', 'd ose', 'd*o*s*e', or any other random form
of one-character obfuscation.

        Loren