You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by pg...@apache.org on 2007/11/26 18:04:37 UTC
svn commit: r598343 [10/22] - in /httpd/httpd/vendor/pcre/current: ./ doc/ doc/html/ testdata/

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html?rev=598343&r1=598342&r2=598343&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html Mon Nov 26 09:04:19 2007
@@ -21,7 +21,6 @@
 <li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
 <li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
 <li><a name="TOC8" href="#SEC8">AUTHOR</a>
-<li><a name="TOC9" href="#SEC9">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
 <P>
@@ -47,8 +46,8 @@
 This set of functions provides a POSIX-style API to the PCRE regular expression
 package. See the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-documentation for a description of PCRE's native API, which contains much
-additional functionality.
+documentation for a description of PCRE's native API, which contains additional
+functionality.
 </P>
 <P>
 The functions described here are just wrapper functions that ultimately call
@@ -60,10 +59,10 @@
 </P>
 <P>
 I have implemented only those option bits that can be reasonably mapped to PCRE
-native options. In addition, the option REG_EXTENDED is defined with the value
-zero. This has no effect, but since programs that are written to the POSIX
-interface often use it, this makes it easier to slot in PCRE as a replacement
-library. Other POSIX options are not even defined.
+native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
+with the value zero. They have no effect, but since programs that are written
+to the POSIX interface often use them, this makes it easier to slot in PCRE as
+a replacement library. Other POSIX options are not even defined.
 </P>
 <P>
 When PCRE is called via these functions, it is only the API that is POSIX-like
@@ -90,43 +89,22 @@
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
 to a <b>regex_t</b> structure that is used as a base for storing information
-about the compiled regular expression.
+about the compiled expression.
 </P>
 <P>
 The argument <i>cflags</i> is either zero, or contains one or more of the bits
 defined by the following macros:
 <pre>
-  REG_DOTALL
-</pre>
-The PCRE_DOTALL option is set when the regular expression is passed for
-compilation to the native function. Note that REG_DOTALL is not part of the
-POSIX standard.
-<pre>
   REG_ICASE
 </pre>
-The PCRE_CASELESS option is set when the regular expression is passed for
-compilation to the native function.
+The PCRE_CASELESS option is set when the expression is passed for compilation
+to the native function.
 <pre>
   REG_NEWLINE
 </pre>
-The PCRE_MULTILINE option is set when the regular expression is passed for
-compilation to the native function. Note that this does <i>not</i> mimic the
-defined POSIX behaviour for REG_NEWLINE (see the following section).
-<pre>
-  REG_NOSUB
-</pre>
-The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
-for compilation to the native function. In addition, when a pattern that is
-compiled with this flag is passed to <b>regexec()</b> for matching, the
-<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
-are returned.
-<pre>
-  REG_UTF8
-</pre>
-The PCRE_UTF8 option is set when the regular expression is passed for
-compilation to the native function. This causes the pattern itself and all data
-strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
-is not part of the POSIX standard.
+The PCRE_MULTILINE option is set when the expression is passed for compilation
+to the native function. Note that this does <i>not</i> mimic the defined POSIX
+behaviour for REG_NEWLINE (see the following section).
 </P>
 <P>
 In the absence of these flags, no options are passed to the native function.
@@ -194,20 +172,15 @@
 function.
 </P>
 <P>
-If the pattern was compiled with the REG_NOSUB flag, no data about any matched
-strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
-<b>regexec()</b> are ignored.
-</P>
-<P>
-Otherwise,the portion of the string that was matched, and also any captured
-substrings, are returned via the <i>pmatch</i> argument, which points to an
-array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
-members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
-character of each substring and the offset to the first character after the end
-of each substring, respectively. The 0th element of the vector relates to the
-entire portion of <i>string</i> that was matched; subsequent elements relate to
-the capturing subpatterns of the regular expression. Unused entries in the
-array have both structure members set to -1.
+The portion of the string that was matched, and also any captured substrings,
+are returned via the <i>pmatch</i> argument, which points to an array of
+<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members
+<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of
+each substring and the offset to the first character after the end of each
+substring, respectively. The 0th element of the vector relates to the entire
+portion of <i>string</i> that was matched; subsequent elements relate to the
+capturing subpatterns of the regular expression. Unused entries in the array
+have both structure members set to -1.
 </P>
 <P>
 A successful match yields a zero return; various error codes are defined in the
@@ -230,19 +203,16 @@
 </P>
 <br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel
-<br>
-University Computing Service
+Philip Hazel &#60;ph10@cam.ac.uk&#62;
 <br>
-Cambridge CB2 3QH, England.
+University Computing Service,
 <br>
+Cambridge CB2 3QG, England.
 </P>
-<br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 06 March 2007
-<br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Last updated: 07 September 2004
 <br>
+Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html?rev=598343&r1=598342&r2=598343&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html Mon Nov 26 09:04:19 2007
@@ -17,8 +17,6 @@
 <li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
 <li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
 <li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
-<li><a name="TOC5" href="#SEC5">AUTHOR</a>
-<li><a name="TOC6" href="#SEC6">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
 <P>
@@ -34,9 +32,7 @@
 If you save compiled patterns to a file, you can copy them to a different host
 and run them there. This works even if the new host has the opposite endianness
 to the one on which the patterns were compiled. There may be a small
-performance penalty, but it should be insignificant. However, compiling regular
-expressions with one version of PCRE for use with a different version is not
-guaranteed to work and may cause crashes.
+performance penalty, but it should be insignificant.
 </P>
 <br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
 <P>
@@ -92,17 +88,16 @@
 <br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
 <P>
 Re-using a precompiled pattern is straightforward. Having reloaded it into main
-memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
-the usual way. This should work even on another host, and even if that host has
-the opposite endianness to the one where the pattern was compiled.
+memory, you pass its pointer to <b>pcre_exec()</b> in the usual way. This should
+work even on another host, and even if that host has the opposite endianness to
+the one where the pattern was compiled.
 </P>
 <P>
 However, if you passed a pointer to custom character tables when the pattern
 was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
-now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
-because the value saved with the compiled pattern will obviously be nonsense. A
-field in a <b>pcre_extra()</b> block is used to pass this data, as described in
-the
+now pass a similar pointer to <b>pcre_exec()</b>, because the value saved with
+the compiled pattern will obviously be nonsense. A field in a
+<b>pcre_extra()</b> block is used to pass this data, as described in the
 <a href="pcreapi.html#extradata">section on matching a pattern</a>
 in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
@@ -119,30 +114,20 @@
 <b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
 reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
 <i>flags</i> field to indicate that study data is present. Then pass the
-<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
-usual way.
+<b>pcre_extra</b> block to <b>pcre_exec()</b> in the usual way.
 </P>
 <br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
 <P>
-In general, it is safest to recompile all saved patterns when you update to a
-new PCRE release, though not all updates actually require this. Recompiling is
-definitely needed for release 7.2.
+The layout of the control block that is at the start of the data that makes up
+a compiled pattern was changed for release 5.0. If you have any saved patterns
+that were compiled with previous releases (not a facility that was previously
+advertised), you will have to recompile them for release 5.0. However, from now
+on, it should be possible to make changes in a compabible manner.
 </P>
-<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel
-<br>
-University Computing Service
-<br>
-Cambridge CB2 3QH, England.
-<br>
-</P>
-<br><a name="SEC6" href="#TOC1">REVISION</a><br>
-<P>
-Last updated: 13 June 2007
-<br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Last updated: 10 September 2004
 <br>
+Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcresample.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcresample.html?rev=598343&r1=598342&r2=598343&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcresample.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcresample.html Mon Nov 26 09:04:19 2007
@@ -33,10 +33,9 @@
 an empty string. Comments in the code explain what is going on.
 </P>
 <P>
-The demonstration program is automatically built if you use "./configure;make"
-to build PCRE. Otherwise, if PCRE is installed in the standard include and
-library directories for your system, you should be able to compile the
-demonstration program using this command:
+If PCRE is installed in the standard include and library directories for your
+system, you should be able to compile the demonstration program using this
+command:
 <pre>
   gcc -o pcredemo pcredemo.c -lpcre
 </pre>
@@ -73,25 +72,10 @@
 </pre>
 (for example) to the compile command to get round this problem.
 </P>
-<br><b>
-AUTHOR
-</b><br>
 <P>
-Philip Hazel
-<br>
-University Computing Service
-<br>
-Cambridge CB2 3QH, England.
-<br>
-</P>
-<br><b>
-REVISION
-</b><br>
-<P>
-Last updated: 13 June 2007
-<br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Last updated: 09 September 2004
 <br>
+Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcretest.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcretest.html?rev=598343&r1=598342&r2=598343&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcretest.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcretest.html Mon Nov 26 09:04:19 2007
@@ -18,22 +18,17 @@
 <li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
 <li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
 <li><a name="TOC5" href="#SEC5">DATA LINES</a>
-<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
-<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
-<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
-<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
-<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
-<li><a name="TOC11" href="#SEC11">NON-PRINTING CHARACTERS</a>
-<li><a name="TOC12" href="#SEC12">SAVING AND RELOADING COMPILED PATTERNS</a>
-<li><a name="TOC13" href="#SEC13">SEE ALSO</a>
-<li><a name="TOC14" href="#SEC14">AUTHOR</a>
-<li><a name="TOC15" href="#SEC15">REVISION</a>
+<li><a name="TOC6" href="#SEC6">OUTPUT FROM PCRETEST</a>
+<li><a name="TOC7" href="#SEC7">CALLOUTS</a>
+<li><a name="TOC8" href="#SEC8">SAVING AND RELOADING COMPILED PATTERNS</a>
+<li><a name="TOC9" href="#SEC9">AUTHOR</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
-<b>pcretest [options] [source] [destination]</b>
-<br>
-<br>
+<b>pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]</b>
+<b>[destination]</b>
+</P>
+<P>
 <b>pcretest</b> was written as a test program for the PCRE regular expression
 library itself, but it can also be used for experimenting with regular
 expressions. This document describes the features of the test program; for
@@ -46,34 +41,18 @@
 </P>
 <br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
 <P>
-<b>-b</b>
-Behave as if each regex has the <b>/B</b> (show bytecode) modifier; the internal
-form is output after compilation.
-</P>
-<P>
 <b>-C</b>
 Output the version number of the PCRE library, and all available information
 about the optional features that are included, and then exit.
 </P>
 <P>
 <b>-d</b>
-Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
-form and information about the compiled pattern is output after compilation;
-<b>-d</b> is equivalent to <b>-b -i</b>.
-</P>
-<P>
-<b>-dfa</b>
-Behave as if each data line contains the \D escape sequence; this causes the
-alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
-standard <b>pcre_exec()</b> function (more detail is given below).
-</P>
-<P>
-<b>-help</b>
-Output a brief summary these options and then exit.
+Behave as if each regex had the <b>/D</b> (debug) modifier; the internal
+form is output after compilation.
 </P>
 <P>
 <b>-i</b>
-Behave as if each regex has the <b>/I</b> modifier; information about the
+Behave as if each regex had the <b>/I</b> modifier; information about the
 compiled pattern is given after compilation.
 </P>
 <P>
@@ -85,41 +64,21 @@
 <P>
 <b>-o</b> <i>osize</i>
 Set the number of elements in the output vector that is used when calling
-<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> to be <i>osize</i>. The default value
-is 45, which is enough for 14 capturing subexpressions for <b>pcre_exec()</b> or
-22 different matches for <b>pcre_dfa_exec()</b>. The vector size can be
-changed for individual matching calls by including \O in the data line (see
-below).
+<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
+for 14 capturing subexpressions. The vector size can be changed for individual
+matching calls by including \O in the data line (see below).
 </P>
 <P>
 <b>-p</b>
-Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
-used to call PCRE. None of the other options has any effect when <b>-p</b> is
-set.
-</P>
-<P>
-<b>-q</b>
-Do not output the version number of <b>pcretest</b> at the start of execution.
-</P>
-<P>
-<b>-S</b> <i>size</i>
-On Unix-like systems, set the size of the runtime stack to <i>size</i>
-megabytes.
+Behave as if each regex has <b>/P</b> modifier; the POSIX wrapper API is used
+to call PCRE. None of the other options has any effect when <b>-p</b> is set.
 </P>
 <P>
 <b>-t</b>
 Run each compile, study, and match many times with a timer, and output
 resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
 <b>-t</b>, because you will then get the size output a zillion times, and the
-timing will be distorted. You can control the number of iterations that are
-used for timing by following <b>-t</b> with a number (as a separate item on the
-command line). For example, "-t 1000" would iterate 1000 times. The default is
-to iterate 500000 times.
-</P>
-<P>
-<b>-tm</b>
-This is like <b>-t</b> except that it times only the matching phase, not the
-compile or study phases.
+timing will be distorted.
 </P>
 <br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
 <P>
@@ -136,15 +95,14 @@
 </P>
 <P>
 Each data line is matched separately and independently. If you want to do
-multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
-etc., depending on the newline setting) in a single line of input to encode the
-newline sequences. There is no limit on the length of data lines; the input
-buffer is automatically extended if it is too small.
+multiple-line matches, you have to use the \n escape sequence in a single line
+of input to encode the newline characters. The maximum length of data line is
+30,000 characters.
 </P>
 <P>
 An empty line signals the end of the data lines, at which point a new regular
 expression is read. The regular expressions are given enclosed in any
-non-alphanumeric delimiters other than backslash, for example:
+non-alphanumeric delimiters other than backslash, for example
 <pre>
   /(a|bc)x+yz/
 </pre>
@@ -191,36 +149,13 @@
 The following table shows additional modifiers for setting PCRE options that do
 not correspond to anything in Perl:
 <pre>
-  <b>/A</b>              PCRE_ANCHORED
-  <b>/C</b>              PCRE_AUTO_CALLOUT
-  <b>/E</b>              PCRE_DOLLAR_ENDONLY
-  <b>/f</b>              PCRE_FIRSTLINE
-  <b>/J</b>              PCRE_DUPNAMES
-  <b>/N</b>              PCRE_NO_AUTO_CAPTURE
-  <b>/U</b>              PCRE_UNGREEDY
-  <b>/X</b>              PCRE_EXTRA
-  <b>/&#60;cr&#62;</b>           PCRE_NEWLINE_CR
-  <b>/&#60;lf&#62;</b>           PCRE_NEWLINE_LF
-  <b>/&#60;crlf&#62;</b>         PCRE_NEWLINE_CRLF
-  <b>/&#60;anycrlf&#62;</b>      PCRE_NEWLINE_ANYCRLF
-  <b>/&#60;any&#62;</b>          PCRE_NEWLINE_ANY
-  <b>/&#60;bsr_anycrlf&#62;</b>  PCRE_BSR_ANYCRLF
-  <b>/&#60;bsr_unicode&#62;</b>  PCRE_BSR_UNICODE
-</pre>
-Those specifying line ending sequences are literal strings as shown, but the
-letters can be in either case. This example sets multiline matching with CRLF
-as the line ending sequence:
-<pre>
-  /^abc/m&#60;crlf&#62;
+  <b>/A</b>    PCRE_ANCHORED
+  <b>/C</b>    PCRE_AUTO_CALLOUT
+  <b>/E</b>    PCRE_DOLLAR_ENDONLY
+  <b>/N</b>    PCRE_NO_AUTO_CAPTURE
+  <b>/U</b>    PCRE_UNGREEDY
+  <b>/X</b>    PCRE_EXTRA
 </pre>
-Details of the meanings of these PCRE options are given in the
-<a href="pcreapi.html"><b>pcreapi</b></a>
-documentation.
-</P>
-<br><b>
-Finding all matches in a string
-</b><br>
-<P>
 Searching for all possible matches within each subject string can be requested
 by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
 again to search the remainder of the subject string. The difference between
@@ -238,9 +173,6 @@
 match is retried. This imitates the way Perl handles such cases when using the
 <b>/g</b> modifier or the <b>split()</b> function.
 </P>
-<br><b>
-Other modifiers
-</b><br>
 <P>
 There are yet more modifiers for controlling the way <b>pcretest</b>
 operates.
@@ -252,14 +184,6 @@
 multiple copies of the same substring.
 </P>
 <P>
-The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b>
-output a representation of the compiled byte code after compilation. Normally
-this information contains length and offset values; however, if <b>/Z</b> is
-also present, this data is replaced by spaces. This is a special feature for
-use in the automatic test scripts; it ensures that the same output is generated
-for different internal link sizes.
-</P>
-<P>
 The <b>/L</b> modifier must be followed directly by the name of a locale, for
 example,
 <pre>
@@ -278,8 +202,10 @@
 pattern. If the pattern is studied, the results of that are also output.
 </P>
 <P>
-The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to
-<b>/BI</b>, that is, both the <b>/B</b> and the <b>/I</b> modifiers.
+The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
+It causes the internal form of compiled regular expressions to be output after
+compilation. If the pattern was studied, the information returned is also
+output.
 </P>
 <P>
 The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
@@ -327,20 +253,19 @@
 expressions, you probably don't need any of these. The following escapes are
 recognized:
 <pre>
-  \a         alarm (BEL, \x07)
-  \b         backspace (\x08)
-  \e         escape (\x27)
-  \f         formfeed (\x0c)
-  \n         newline (\x0a)
-  \qdd       set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
-  \r         carriage return (\x0d)
-  \t         tab (\x09)
-  \v         vertical tab (\x0b)
+  \a         alarm (= BEL)
+  \b         backspace
+  \e         escape
+  \f         formfeed
+  \n         newline
+  \r         carriage return
+  \t         tab
+  \v         vertical tab
   \nnn       octal character (up to 3 octal digits)
   \xhh       hexadecimal character (up to 2 hex digits)
   \x{hh...}  hexadecimal character, any number of digits in UTF-8 mode
-  \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b>
+  \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b>
   \Cdd       call pcre_copy_substring() for substring dd after a successful match (number less than 32)
   \Cname     call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
                ated by next non alphanumeric character)
@@ -349,50 +274,33 @@
   \C!n       return 1 instead of 0 when callout number n is reached
   \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time
   \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value
-  \D         use the <b>pcre_dfa_exec()</b> match function
-  \F         only shortest match for <b>pcre_dfa_exec()</b>
   \Gdd       call pcre_get_substring() for substring dd after a successful match (number less than 32)
   \Gname     call pcre_get_named_substring() for substring "name" after a successful match (name termin-
                ated by next non-alphanumeric character)
   \L         call pcre_get_substringlist() after a successful match
-  \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
-  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \M         discover the minimum MATCH_LIMIT setting
+  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>
   \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
-  \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
-  \R         pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
+  \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b>
   \S         output details of memory get/free calls during matching
-  \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b>
+  \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b>
   \&#62;dd       start the match at offset dd (any number of digits);
-               this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \&#60;cr&#62;      pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \&#60;lf&#62;      pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \&#60;crlf&#62;    pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \&#60;anycrlf&#62; pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-  \&#60;any&#62;     pass the PCRE_NEWLINE_ANY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
-</pre>
-The escapes that specify line ending sequences are literal strings, exactly as
-shown. No more than one newline setting should be present in any data line.
-</P>
-<P>
-A backslash followed by anything else just escapes the anything else. If
-the very last character is a backslash, it is ignored. This gives a way of
-passing an empty line as data, since a real empty line terminates the data
-input.
+               this sets the <i>startoffset</i> argument for <b>pcre_exec()</b>
+</pre>
+A backslash followed by anything else just escapes the anything else. If the
+very last character is a backslash, it is ignored. This gives a way of passing
+an empty line as data, since a real empty line terminates the data input.
 </P>
 <P>
 If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
-different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
-fields of the <b>pcre_extra</b> data structure, until it finds the minimum
-numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
-<i>match_limit</i> number is a measure of the amount of backtracking that takes
-place, and checking it out can be instructive. For most simple matches, the
-number is quite small, but for patterns with very large numbers of matching
-possibilities, it can become large very quickly with increasing length of
-subject string. The <i>match_limit_recursion</i> number is a measure of how much
-stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
-to complete the match attempt.
+different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data
+structure, until it finds the minimum number that is needed for
+<b>pcre_exec()</b> to complete. This number is a measure of the amount of
+recursion and backtracking that takes place, and checking it out can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string.
 </P>
 <P>
 When \O is used, the value specified may be higher or lower than the size set
@@ -401,51 +309,26 @@
 </P>
 <P>
 If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
-API to be used, the only option-setting sequences that have any effect are \B
-and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
-<b>regexec()</b>.
+API to be used, only \B and \Z have any effect, causing REG_NOTBOL and
+REG_NOTEOL to be passed to <b>regexec()</b> respectively.
 </P>
 <P>
 The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
 of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
 any number of hexadecimal digits inside the braces. The result is from one to
-six bytes, encoded according to the original UTF-8 rules of RFC 2279. This
-allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are
-valid Unicode code points, or indeed valid UTF-8 characters according to the
-later rules in RFC 3629.
-</P>
-<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
-<P>
-By default, <b>pcretest</b> uses the standard PCRE matching function,
-<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
-alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
-different way, and has some restrictions. The differences between the two
-functions are described in the
-<a href="pcrematching.html"><b>pcrematching</b></a>
-documentation.
-</P>
-<P>
-If a data line contains the \D escape sequence, or if the command line
-contains the <b>-dfa</b> option, the alternative matching function is called.
-This function finds all possible matches at a given point. If, however, the \F
-escape sequence is present in the data line, it stops after the first match is
-found. This is always the shortest possible match.
-</P>
-<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
-<P>
-This section describes the output when the normal matching function,
-<b>pcre_exec()</b>, is being used.
+six bytes, encoded according to the UTF-8 rules.
 </P>
+<br><a name="SEC6" href="#TOC1">OUTPUT FROM PCRETEST</a><br>
 <P>
 When a match succeeds, pcretest outputs the list of captured substrings that
 <b>pcre_exec()</b> returns, starting with number 0 for the string that matched
 the whole pattern. Otherwise, it outputs "No match" or "Partial match"
 when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
 respectively, and otherwise the PCRE negative error number. Here is an example
-of an interactive <b>pcretest</b> run.
+of an interactive pcretest run.
 <pre>
   $ pcretest
-  PCRE version 7.0 30-Nov-2006
+  PCRE version 5.00 07-Sep-2004
 
     re&#62; /^abc(\d+)/
   data&#62; abc123
@@ -456,9 +339,9 @@
 </pre>
 If the strings contain any non-printing characters, they are output as \0x
 escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
-pattern. See below for the definition of non-printing characters. If the
-pattern has the <b>/+</b> modifier, the output for substring 0 is followed by
-the the rest of the subject string, identified by "0+" like this:
+pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
+is followed by the the rest of the subject string, identified by "0+" like
+this:
 <pre>
     re&#62; /cat/+
   data&#62; cataract
@@ -488,67 +371,16 @@
 parentheses after each string for <b>\C</b> and <b>\G</b>.
 </P>
 <P>
-Note that whereas patterns can be continued over several lines (a plain "&#62;"
+Note that while patterns can be continued over several lines (a plain "&#62;"
 prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \n escape (or \r, \r\n, etc., depending on
-the newline sequence setting).
+included in data by means of the \n escape.
 </P>
-<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
-<P>
-When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
-means of the \D escape sequence or the <b>-dfa</b> command line option), the
-output consists of a list of all the matches that start at the first point in
-the subject where there is at least one match. For example:
-<pre>
-    re&#62; /(tang|tangerine|tan)/
-  data&#62; yellow tangerine\D
-   0: tangerine
-   1: tang
-   2: tan
-</pre>
-(Using the normal matching function on this data finds only "tang".) The
-longest matching string is always given first (and numbered zero).
-</P>
-<P>
-If <b>/g</b> is present on the pattern, the search for further matches resumes
-at the end of the longest match. For example:
-<pre>
-    re&#62; /(tang|tangerine|tan)/g
-  data&#62; yellow tangerine and tangy sultana\D
-   0: tangerine
-   1: tang
-   2: tan
-   0: tang
-   1: tan
-   0: tan
-</pre>
-Since the matching function does not support substring capture, the escape
-sequences that are concerned with captured substrings are not relevant.
-</P>
-<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
-<P>
-When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
-indicating that the subject partially matched the pattern, you can restart the
-match with additional subject data by means of the \R escape sequence. For
-example:
-<pre>
-    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-  data&#62; 23ja\P\D
-  Partial match: 23ja
-  data&#62; n05\R\D
-   0: n05
-</pre>
-For further information about partial matching, see the
-<a href="pcrepartial.html"><b>pcrepartial</b></a>
-documentation.
-</P>
-<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC7" href="#TOC1">CALLOUTS</a><br>
 <P>
 If the pattern contains any callout requests, <b>pcretest</b>'s callout function
-is called during matching. This works with both matching functions. By default,
-the called function displays the callout number, the start and current
-positions in the text at the callout time, and the next pattern item to be
-tested. For example, the output
+is called during matching. By default, it displays the callout number, the
+start and current positions in the text at the callout time, and the next
+pattern item to be tested. For example, the output
 <pre>
   ---&#62;pqrabcdef
     0    ^  ^     \d
@@ -574,7 +406,7 @@
    0: E*
 </pre>
 The callout function in <b>pcretest</b> returns zero (carry on matching) by
-default, but you can use a \C item in a data line (as described above) to
+default, but you can use an \C item in a data line (as described above) to
 change this.
 </P>
 <P>
@@ -584,19 +416,7 @@
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
 </P>
-<br><a name="SEC11" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
-<P>
-When <b>pcretest</b> is outputting text in the compiled version of a pattern,
-bytes other than 32-126 are always treated as non-printing characters are are
-therefore shown as hex escapes.
-</P>
-<P>
-When <b>pcretest</b> is outputting text that is a matched part of a subject
-string, it behaves in the same way, unless a different locale has been set for
-the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
-function to distinguish printing and non-printing characters.
-</P>
-<br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
+<br><a name="SEC8" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
 <P>
 The facilities described in this section are not available when the POSIX
 inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
@@ -658,26 +478,18 @@
 Finally, if you attempt to load a file that is not in the correct format, the
 result is undefined.
 </P>
-<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
-<P>
-<b>pcre</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
-<b>pcrepartial</b>(d), <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
-</P>
-<br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel
+Philip Hazel &#60;ph10@cam.ac.uk&#62;
 <br>
-University Computing Service
-<br>
-Cambridge CB2 3QH, England.
+University Computing Service,
 <br>
+Cambridge CB2 3QG, England.
 </P>
-<br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 September 2007
-<br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Last updated: 10 September 2004
 <br>
+Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/pcre.3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcre.3?rev=598343&r1=598342&r2=598343&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcre.3 (original)
+++ httpd/httpd/vendor/pcre/current/doc/pcre.3 Mon Nov 26 09:04:19 2007
@@ -6,33 +6,15 @@
 .sp
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl, with just a few
-differences. (Certain features that appeared in Python and PCRE before they
-appeared in Perl are also available using the Python syntax.)
-.P
-The current implementation of PCRE (release 7.x) corresponds approximately with
-Perl 5.10, including support for UTF-8 encoded strings and Unicode general
-category properties. However, UTF-8 and Unicode support has to be explicitly
-enabled; it is not the default. The Unicode tables correspond to Unicode
-release 5.0.0.
-.P
-In addition to the Perl-compatible matching function, PCRE contains an
-alternative matching function that matches the same compiled patterns in a
-different way. In certain circumstances, the alternative function has some
-advantages. For a discussion of the two matching algorithms, see the
-.\" HREF
-\fBpcrematching\fP
-.\"
-page.
+differences. The current implementation of PCRE (release 5.x) corresponds
+approximately with Perl 5.8, including support for UTF-8 encoded strings and
+Unicode general category properties. However, this support has to be explicitly
+enabled; it is not the default.
 .P
 PCRE is written in C and released as a C library. A number of people have
-written wrappers and interfaces of various kinds. In particular, Google Inc.
-have provided a comprehensive C++ wrapper. This is now included as part of the
-PCRE distribution. The
-.\" HREF
-\fBpcrecpp\fP
-.\"
-page has details of this interface. Other people's contributions can be found
-in the \fIContrib\fR directory at the primary FTP site, which is:
+written wrappers and interfaces of various kinds. A C++ class is included in
+these contributions, which can be found in the \fIContrib\fR directory at the
+primary FTP site, which is:
 .sp
 .\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
 .\" </a>
@@ -47,11 +29,7 @@
 .\" HREF
 \fBpcrecompat\fR
 .\"
-pages. There is a syntax summary in the
-.\" HREF
-\fBpcresyntax\fR
-.\"
-page.
+pages.
 .P
 Some features of PCRE can be included, excluded, or changed when the library is
 built. The
@@ -65,14 +43,6 @@
 .\"
 page. Documentation about building PCRE for various operating systems can be
 found in the \fBREADME\fP file in the source distribution.
-.P
-The library contains a number of undocumented internal functions and data
-tables that are used by more than one of the exported external functions, but
-which are not intended for use by external callers. Their names all begin with
-"_pcre_", which hopefully will not provoke any name clashes. In some
-environments, it is possible to control which external symbols are exported
-when a shared library is built, and in these cases the undocumented symbols are
-not exported.
 .
 .
 .SH "USER DOCUMENTATION"
@@ -85,28 +55,23 @@
 follows:
 .sp
   pcre              this document
-  pcre-config       show PCRE installation configuration information
-  pcreapi           details of PCRE's native C API
+  pcreapi           details of PCRE's native API
   pcrebuild         options for building PCRE
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
-  pcrecpp           details of the C++ wrapper
   pcregrep          description of the \fBpcregrep\fP command
-  pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
 .\" JOIN
   pcrepattern       syntax and semantics of supported
                       regular expressions
-  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
-  pcreposix         the POSIX-compatible C API
+  pcreposix         the POSIX-compatible API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the sample program
-  pcrestack         discussion of stack usage
   pcretest          description of the \fBpcretest\fP testing command
 .sp
 In addition, in the "man" and HTML formats, there is a short page for each
-C library function, listing its arguments and results.
+library function, listing its arguments and results.
 .
 .
 .SH LIMITATIONS
@@ -124,27 +89,20 @@
 \fBpcrebuild\fP
 .\"
 documentation for details). In these cases the limit is substantially larger.
-However, the speed of execution is slower.
+However, the speed of execution will be slower.
 .P
 All values in repeating quantifiers must be less than 65536.
+The maximum number of capturing subpatterns is 65535.
 .P
-There is no limit to the number of parenthesized subpatterns, but there can be
-no more than 65535 capturing subpatterns.
-.P
-The maximum length of name for a named subpattern is 32 characters, and the
-maximum number of named subpatterns is 10000.
+There is no limit to the number of non-capturing subpatterns, but the maximum
+depth of nesting of all kinds of parenthesized subpattern, including capturing
+subpatterns, assertions, and other types of subpattern, is 200.
 .P
 The maximum length of a subject string is the largest positive number that an
-integer variable can hold. However, when using the traditional matching
-function, PCRE uses recursion to handle subpatterns and indefinite repetition.
-This means that the available stack space may limit the size of a subject
-string that can be processed by certain patterns. For a discussion of stack
-issues, see the
-.\" HREF
-\fBpcrestack\fP
-.\"
-documentation.
-.
+integer variable can hold. However, PCRE uses recursion to handle subpatterns
+and indefinite repetition. This means that the available stack space may limit
+the size of a subject string that can be processed by certain patterns.
+.sp
 .\" HTML <a name="utf8support"></a>
 .
 .
@@ -167,84 +125,51 @@
 .P
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
 library will be a bit bigger, but the additional run time overhead is limited
-to testing the PCRE_UTF8 flag occasionally, so should not be very big.
+to testing the PCRE_UTF8 flag in several places, so should not be very large.
 .P
 If PCRE is built with Unicode character property support (which implies UTF-8
 support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
 The available properties that can be tested are limited to the general
 category properties such as Lu for an upper case letter or Nd for a decimal
-number, the Unicode script names such as Arabic or Han, and the derived
-properties Any and L&. A full list is given in the
+number. A full list is given in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
-documentation. Only the short names for properties are supported. For example,
-\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
-Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
-compatibility with Perl 5.6. PCRE does not support this.
-.
-.\" HTML <a name="utf8strings"></a>
-.
-.SS "Validity of UTF-8 strings"
-.rs
-.sp
-When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
-are (by default) checked for validity on entry to the relevant functions. From
-release 7.3 of PCRE, the check is according the rules of RFC 3629, which are
-themselves derived from the Unicode specification. Earlier releases of PCRE
-followed the rules of RFC 2279, which allows the full range of 31-bit values (0
-to 0x7FFFFFFF). The current check allows only values in the range U+0 to
-U+10FFFF, excluding U+D800 to U+DFFF.
-.P
-The excluded code points are the "Low Surrogate Area" of Unicode, of which the
-Unicode Standard says this: "The Low Surrogate Area does not contain any
-character assignments, consequently no character code charts or namelists are
-provided for this area. Surrogates are reserved for use with UTF-16 and then
-must be used in pairs." The code points that are encoded by UTF-16 pairs are
-available as independent code points in the UTF-8 encoding. (In other words,
-the whole surrogate thing is a fudge for UTF-16 which unfortunately messes up
-UTF-8.)
-.P
-If an invalid UTF-8 string is passed to PCRE, an error return
-(PCRE_ERROR_BADUTF8) is given. In some situations, you may already know that
-your strings are valid, and therefore want to skip these checks in order to
-improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or
-at run time, PCRE assumes that the pattern or subject it is given
-(respectively) contains only valid UTF-8 codes. In this case, it does not
-diagnose an invalid UTF-8 string.
-.P
-If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
-happens depends on why the string is invalid. If the string conforms to the
-"old" definition of UTF-8 (RFC 2279), it is processed as a string of characters
-in the range 0 to 0x7FFFFFFF. In other words, apart from the initial validity
-test, PCRE (when in UTF-8 mode) handles strings according to the more liberal
-rules of RFC 2279. However, if the string does not even conform to RFC 2279,
-the result is undefined. Your program may crash.
-.P
-If you want to process strings of values in the full range 0 to 0x7FFFFFFF,
-encoded in a UTF-8-like manner as per the old RFC, you can set
-PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this
-situation, you will have to apply your own validity check.
-.
-.SS "General comments about UTF-8 mode"
-.rs
-.sp
-1. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
-UTF-8 character if the value is greater than 127.
+documentation. The PCRE library is increased in size by about 90K when Unicode
+property support is included.
+.P
+The following comments apply when PCRE is running in UTF-8 mode:
 .P
-2. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
-characters for values greater than \e177.
+1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
+are checked for validity on entry to the relevant functions. If an invalid
+UTF-8 string is passed, an error return is given. In some situations, you may
+already know that your strings are valid, and therefore want to skip these
+checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
+at compile time or at run time, PCRE assumes that the pattern or subject it
+is given (respectively) contains only valid UTF-8 codes. In this case, it does
+not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
+PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
+may crash.
+.P
+2. In a pattern, the escape sequence \ex{...}, where the contents of the braces
+is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
+code number is the given hexadecimal number, for example: \ex{1234}. If a
+non-hexadecimal digit appears between the braces, the item is not recognized.
+This escape sequence can be used either as a literal, or within a character
+class.
 .P
-3. Repeat quantifiers apply to complete UTF-8 characters, not to individual
+3. The original hexadecimal escape sequence, \exhh, matches a two-byte UTF-8
+character if the value is greater than 127.
+.P
+4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
 bytes, for example: \ex{100}{3}.
 .P
-4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+5. The dot metacharacter matches one UTF-8 character instead of a single byte.
 .P
-5. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
-but its use can lead to some strange effects. This facility is not available in
-the alternative matching function, \fBpcre_dfa_exec()\fP.
+6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
+but its use can lead to some strange effects.
 .P
-6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
+7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
 test characters of any code value, but the characters that PCRE recognizes as
 digits, spaces, or word characters remain the same set as before, all with
 values less than 256. This remains true even when PCRE includes Unicode
@@ -252,41 +177,28 @@
 cases. If you really want to test for a wider sense of, say, "digit", you
 must use Unicode property tests such as \ep{Nd}.
 .P
-7. Similarly, characters that match the POSIX named character classes are all
+8. Similarly, characters that match the POSIX named character classes are all
 low-valued characters.
 .P
-8. However, the Perl 5.10 horizontal and vertical whitespace matching escapes
-(\eh, \eH, \ev, and \eV) do match all the appropriate Unicode characters.
-.P
 9. Case-insensitive matching applies only to characters whose values are less
 than 128, unless PCRE is built with Unicode property support. Even when Unicode
 property support is available, PCRE still uses its own character tables when
 checking the case of low-valued characters, so as not to degrade performance.
 The Unicode property information is used only for characters with higher
-values. Even when Unicode property support is available, PCRE supports
-case-insensitive matching only when there is a one-to-one mapping between a
-letter's cases. There are a small number of many-to-one mappings in Unicode;
-these are not supported by PCRE.
-.
+values.
 .
 .SH AUTHOR
 .rs
 .sp
-.nf
-Philip Hazel
-University Computing Service
-Cambridge CB2 3QH, England.
-.fi
-.P
-Putting an actual email address here seems to have been a spam magnet, so I've
-taken it away. If you want to email me, use my two initials, followed by the
-two digits 10, at the domain cam.ac.uk.
-.
-.
-.SH REVISION
-.rs
-.sp
-.nf
-Last updated: 09 August 2007
-Copyright (c) 1997-2007 University of Cambridge.
-.fi
+Philip Hazel <ph...@cam.ac.uk>
+.br
+University Computing Service,
+.br
+Cambridge CB2 3QG, England.
+.br
+Phone: +44 1223 334714
+.sp
+.in 0
+Last updated: 09 September 2004
+.br
+Copyright (c) 1997-2004 University of Cambridge.