You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by era eriksson <er...@suespammers.org> on 2000/11/01 08:28:15 UTC

documentation/6777: FAQ question B.9 contains a minor regex error in the example

>Number:         6777
>Category:       documentation
>Synopsis:       FAQ question B.9 contains a minor regex error in the example
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          doc-bug
>Submitter-Id:   apache
>Arrival-Date:   Tue Oct 31 23:30:01 PST 2000
>Closed-Date:
>Last-Modified:
>Originator:     era@suespammers.org
>Release:        FAQ $Revision: 1.146 $ ($Date: 2000/09/12 02:29:10 $) -- http://www.apache.org/docs/misc/FAQ.html
>Organization:
apache
>Environment:
http://www.apache.org/docs/misc/FAQ.html
>Description:
<http://httpd.apache.org/docs/misc/FAQ.html#regex> presently reads:

 < What are "regular expressions"?
 < 
 < Regular expressions are a way of describing a pattern - for
 < example, "all the words that begin with the letter A" or "every
 < 10-digit phone number" or even "Every sentence with two commas in
 < it, and no capital letter Q". Regular expressions (aka "regexp"s)
 < are useful in Apache because they let you apply certain attributes
 < against collections of files or resources in very flexible ways -
 < for example, all .gif and .jpg files under any "images" directory
 < could be written as /.*\/images\/.*[jpg|gif]/.

The error here could not be described as "minor" but since it's just
an innocent example of what a regex looks like, it's not very critical.

As any grade school kid can tell you, the [] construct in regex-ese
defines a character class. Thus, the above example matches file names
under /images/ which contain any one of the characters f, g, i, j, p,
or | (having sorted the character list and elided duplicates).

The correct regex which corresponds to the prose description in the
answer is (jpg|gif) assuming you really do use Perl-compatible regular
expressions (too lazy to check -- I'm only just starting out with my
first Apache installation). Possibly you would actually also want to
remove the superfluous leading .* and anchor the search at the end,
and allow for "jpeg" as well as "jpg", yielding the regular expression
/\/images\/.*(jpe?g|gif)$/.

See also Jeffrey Friedl's _Mastering_Regular_Expressions_ (O'Reilly 1997)
<http://www.oreilly.com/catalog/regex/>

BTW, I'm with Friedl on the following terminology issue: "regexp" is
hard to handle in English, so perhaps you might prefer to use "regex".
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:
 [In order for any reply to be added to the PR database, you need]
 [to include <ap...@Apache.Org> in the Cc line and make sure the]
 [subject line starts with the report component and number, with ]
 [or without any 'Re:' prefixes (such as "general/1098:" or      ]
 ["Re: general/1098:").  If the subject doesn't match this       ]
 [pattern, your message will be misfiled and ignored.  The       ]
 ["apbugs" address is not added to the Cc line of messages from  ]
 [the database automatically because of the potential for mail   ]
 [loops.  If you do not include this Cc, your reply may be ig-   ]
 [nored unless you are responding to an explicit request from a  ]
 [developer.  Reply only with text; DO NOT SEND ATTACHMENTS!     ]