You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by pg...@apache.org on 2007/11/26 17:50:09 UTC
svn commit: r598339 [20/37] - in /httpd/httpd/vendor/pcre/current: ./ doc/ doc/html/ testdata/

Modified: httpd/httpd/vendor/pcre/current/doc/pcresample.3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcresample.3?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcresample.3 (original)
+++ httpd/httpd/vendor/pcre/current/doc/pcresample.3 Mon Nov 26 08:49:53 2007
@@ -1,4 +1,4 @@
-.TH PCRE 3
+.TH PCRESAMPLE 3
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE SAMPLE PROGRAM"
@@ -18,9 +18,10 @@
 string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 .P
-If PCRE is installed in the standard include and library directories for your
-system, you should be able to compile the demonstration program using this
-command:
+The demonstration program is automatically built if you use "./configure;make"
+to build PCRE. Otherwise, if PCRE is installed in the standard include and
+library directories for your system, you should be able to compile the
+demonstration program using this command:
 .sp
   gcc -o pcredemo pcredemo.c -lpcre
 .sp
@@ -59,8 +60,22 @@
   -R/usr/local/lib
 .sp
 (for example) to the compile command to get round this problem.
-.P
-.in 0
-Last updated: 09 September 2004
-.br
-Copyright (c) 1997-2004 University of Cambridge.
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 13 June 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi

Added: httpd/httpd/vendor/pcre/current/doc/pcrestack.3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcrestack.3?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcrestack.3 (added)
+++ httpd/httpd/vendor/pcre/current/doc/pcrestack.3 Mon Nov 26 08:49:53 2007
@@ -0,0 +1,140 @@
+.TH PCRESTACK 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH "PCRE DISCUSSION OF STACK USAGE"
+.rs
+.sp
+When you call \fBpcre_exec()\fP, it makes use of an internal function called
+\fBmatch()\fP. This calls itself recursively at branch points in the pattern,
+in order to remember the state of the match so that it can back up and try a
+different alternative if the first one fails. As matching proceeds deeper and
+deeper into the tree of possibilities, the recursion depth increases.
+.P
+Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
+as a* it may be called several times at the same level, after matching
+different numbers of a's. Furthermore, in a number of cases where the result of
+the recursive call would immediately be passed back as the result of the
+current call (a "tail recursion"), the function is just restarted instead.
+.P
+The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and
+hardly uses recursion at all. The limit on its complexity is the amount of
+workspace it is given. The comments that follow do NOT apply to
+\fBpcre_dfa_exec()\fP; they are relevant only for \fBpcre_exec()\fP.
+.P
+You can set limits on the number of times that \fBmatch()\fP is called, both in
+total and recursively. If the limit is exceeded, an error occurs. For details,
+see the
+.\" HTML <a href="pcreapi.html#extradata">
+.\" </a>
+section on extra data for \fBpcre_exec()\fP
+.\"
+in the
+.\" HREF
+\fBpcreapi\fP
+.\"
+documentation.
+.P
+Each time that \fBmatch()\fP is actually called recursively, it uses memory
+from the process stack. For certain kinds of pattern and data, very large
+amounts of stack may be needed, despite the recognition of "tail recursion".
+You can often reduce the amount of recursion, and therefore the amount of stack
+used, by modifying the pattern that is being matched. Consider, for example,
+this pattern:
+.sp
+  ([^<]|<(?!inet))+
+.sp
+It matches from wherever it starts until it encounters "<inet" or the end of
+the data, and is the kind of pattern that might be used when processing an XML
+file. Each iteration of the outer parentheses matches either one character that
+is not "<" or a "<" that is not followed by "inet". However, each time a
+parenthesis is processed, a recursion occurs, so this formulation uses a stack
+frame for each matched character. For a long string, a lot of stack is
+required. Consider now this rewritten pattern, which matches exactly the same
+strings:
+.sp
+  ([^<]++|<(?!inet))+
+.sp
+This uses very much less stack, because runs of characters that do not contain
+"<" are "swallowed" in one item inside the parentheses. Recursion happens only
+when a "<" character that is not followed by "inet" is encountered (and we
+assume this is relatively rare). A possessive quantifier is used to stop any
+backtracking into the runs of non-"<" characters, but that is not related to
+stack usage.
+.P
+This example shows that one way of avoiding stack problems when matching long
+subject strings is to write repeated parenthesized subpatterns to match more
+than one character whenever possible.
+.P
+In environments where stack memory is constrained, you might want to compile
+PCRE to use heap memory instead of stack for remembering back-up points. This
+makes it run a lot more slowly, however. Details of how to do this are given in
+the
+.\" HREF
+\fBpcrebuild\fP
+.\"
+documentation. When built in this way, instead of using the stack, PCRE obtains
+and frees memory by calling the functions that are pointed to by the
+\fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables. By default, these
+point to \fBmalloc()\fP and \fBfree()\fP, but you can replace the pointers to
+cause PCRE to use your own functions. Since the block sizes are always the
+same, and are always freed in reverse order, it may be possible to implement
+customized memory handlers that are more efficient than the standard functions.
+.P
+In Unix-like environments, there is not often a problem with the stack unless
+very long strings are involved, though the default limit on stack size varies
+from system to system. Values from 8Mb to 64Mb are common. You can find your
+default limit by running the command:
+.sp
+  ulimit -s
+.sp
+Unfortunately, the effect of running out of stack is often SIGSEGV, though
+sometimes a more explicit error message is given. You can normally increase the
+limit on stack size by code such as this:
+.sp
+  struct rlimit rlim;
+  getrlimit(RLIMIT_STACK, &rlim);
+  rlim.rlim_cur = 100*1024*1024;
+  setrlimit(RLIMIT_STACK, &rlim);
+.sp
+This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
+attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
+do this before calling \fBpcre_exec()\fP.
+.P
+PCRE has an internal counter that can be used to limit the depth of recursion,
+and thus cause \fBpcre_exec()\fP to give an error code before it runs out of
+stack. By default, the limit is very large, and unlikely ever to operate. It
+can be changed when PCRE is built, and it can also be set when
+\fBpcre_exec()\fP is called. For details of these interfaces, see the
+.\" HREF
+\fBpcrebuild\fP
+.\"
+and
+.\" HREF
+\fBpcreapi\fP
+.\"
+documentation.
+.P
+As a very rough rule of thumb, you should reckon on about 500 bytes per
+recursion. Thus, if you want to limit your stack usage to 8Mb, you
+should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
+support around 128000 recursions. The \fBpcretest\fP test program has a command
+line option (\fB-S\fP) that can be used to increase the size of its stack.
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 05 June 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi

Added: httpd/httpd/vendor/pcre/current/doc/pcresyntax.3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcresyntax.3?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcresyntax.3 (added)
+++ httpd/httpd/vendor/pcre/current/doc/pcresyntax.3 Mon Nov 26 08:49:53 2007
@@ -0,0 +1,423 @@
+.TH PCRESYNTAX 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
+.rs
+.sp
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+.\" HREF
+\fBpcrepattern\fP
+.\"
+documentation. This document contains just a quick-reference summary of the
+syntax.
+.
+.
+.SH "QUOTING"
+.rs
+.sp
+  \ex         where x is non-alphanumeric is a literal x
+  \eQ...\eE    treat enclosed characters as literal
+.
+.
+.SH "CHARACTERS"
+.rs
+.sp
+  \ea         alarm, that is, the BEL character (hex 07)
+  \ecx        "control-x", where x is any character
+  \ee         escape (hex 1B)
+  \ef         formfeed (hex 0C)
+  \en         newline (hex 0A)
+  \er         carriage return (hex 0D)
+  \et         tab (hex 09)
+  \eddd       character with octal code ddd, or backreference
+  \exhh       character with hex code hh
+  \ex{hhh..}  character with hex code hhh..
+.
+.
+.SH "CHARACTER TYPES"
+.rs
+.sp
+  .          any character except newline;
+               in dotall mode, any character whatsoever
+  \eC         one byte, even in UTF-8 mode (best avoided)
+  \ed         a decimal digit
+  \eD         a character that is not a decimal digit
+  \eh         a horizontal whitespace character
+  \eH         a character that is not a horizontal whitespace character
+  \ep{\fIxx\fP}     a character with the \fIxx\fP property
+  \eP{\fIxx\fP}     a character without the \fIxx\fP property
+  \eR         a newline sequence
+  \es         a whitespace character
+  \eS         a character that is not a whitespace character
+  \ev         a vertical whitespace character
+  \eV         a character that is not a vertical whitespace character
+  \ew         a "word" character
+  \eW         a "non-word" character
+  \eX         an extended Unicode sequence
+.sp
+In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
+.
+.
+.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
+.rs
+.sp
+  C          Other
+  Cc         Control
+  Cf         Format
+  Cn         Unassigned
+  Co         Private use
+  Cs         Surrogate
+.sp
+  L          Letter
+  Ll         Lower case letter
+  Lm         Modifier letter
+  Lo         Other letter
+  Lt         Title case letter
+  Lu         Upper case letter
+  L&         Ll, Lu, or Lt
+.sp
+  M          Mark
+  Mc         Spacing mark
+  Me         Enclosing mark
+  Mn         Non-spacing mark
+.sp
+  N          Number
+  Nd         Decimal number
+  Nl         Letter number
+  No         Other number
+.sp
+  P          Punctuation
+  Pc         Connector punctuation
+  Pd         Dash punctuation
+  Pe         Close punctuation
+  Pf         Final punctuation
+  Pi         Initial punctuation
+  Po         Other punctuation
+  Ps         Open punctuation
+.sp
+  S          Symbol
+  Sc         Currency symbol
+  Sk         Modifier symbol
+  Sm         Mathematical symbol
+  So         Other symbol
+.sp
+  Z          Separator
+  Zl         Line separator
+  Zp         Paragraph separator
+  Zs         Space separator
+.
+.
+.SH "SCRIPT NAMES FOR \ep AND \eP"
+.rs
+.sp
+Arabic,
+Armenian,
+Balinese,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+.
+.
+.SH "CHARACTER CLASSES"
+.rs
+.sp
+  [...]       positive character class
+  [^...]      negative character class
+  [x-y]       range (can be used for hex characters)
+  [[:xxx:]]   positive POSIX named set
+  [[^:xxx:]]  negative POSIX named set
+.sp
+  alnum       alphanumeric
+  alpha       alphabetic
+  ascii       0-127
+  blank       space or tab
+  cntrl       control character
+  digit       decimal digit
+  graph       printing, excluding space
+  lower       lower case letter
+  print       printing, including space
+  punct       printing, excluding alphanumeric
+  space       whitespace
+  upper       upper case letter
+  word        same as \ew
+  xdigit      hexadecimal digit
+.sp
+In PCRE, POSIX character set names recognize only ASCII characters. You can use
+\eQ...\eE inside a character class.
+.
+.
+.SH "QUANTIFIERS"
+.rs
+.sp
+  ?           0 or 1, greedy
+  ?+          0 or 1, possessive
+  ??          0 or 1, lazy
+  *           0 or more, greedy
+  *+          0 or more, possessive
+  *?          0 or more, lazy
+  +           1 or more, greedy
+  ++          1 or more, possessive
+  +?          1 or more, lazy
+  {n}         exactly n
+  {n,m}       at least n, no more than m, greedy
+  {n,m}+      at least n, no more than m, possessive
+  {n,m}?      at least n, no more than m, lazy
+  {n,}        n or more, greedy
+  {n,}+       n or more, possessive
+  {n,}?       n or more, lazy
+.
+.
+.SH "ANCHORS AND SIMPLE ASSERTIONS"
+.rs
+.sp
+  \eb          word boundary
+  \eB          not a word boundary
+  ^           start of subject
+               also after internal newline in multiline mode
+  \eA          start of subject
+  $           end of subject
+               also before newline at end of subject
+               also before internal newline in multiline mode
+  \eZ          end of subject
+               also before newline at end of subject
+  \ez          end of subject
+  \eG          first matching position in subject
+.
+.
+.SH "MATCH POINT RESET"
+.rs
+.sp
+  \eK          reset start of match
+.
+.
+.SH "ALTERNATION"
+.rs
+.sp
+  expr|expr|expr...
+.
+.
+.SH "CAPTURING"
+.rs
+.sp
+  (...)          capturing group
+  (?<name>...)   named capturing group (Perl)
+  (?'name'...)   named capturing group (Perl)
+  (?P<name>...)  named capturing group (Python)
+  (?:...)        non-capturing group
+  (?|...)        non-capturing group; reset group numbers for
+                  capturing groups in each alternative
+.
+.
+.SH "ATOMIC GROUPS"
+.rs
+.sp
+  (?>...)        atomic, non-capturing group
+.
+.
+.
+.
+.SH "COMMENT"
+.rs
+.sp
+  (?#....)       comment (not nestable)
+.
+.
+.SH "OPTION SETTING"
+.rs
+.sp
+  (?i)           caseless
+  (?J)           allow duplicate names
+  (?m)           multiline
+  (?s)           single line (dotall)
+  (?U)           default ungreedy (lazy)
+  (?x)           extended (ignore white space)
+  (?-...)        unset option(s)
+.
+.
+.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
+.rs
+.sp
+  (?=...)        positive look ahead
+  (?!...)        negative look ahead
+  (?<=...)       positive look behind
+  (?<!...)       negative look behind
+.sp
+Each top-level branch of a look behind must be of a fixed length.
+.SH "BACKREFERENCES"
+.rs
+.sp
+  \en             reference by number (can be ambiguous)
+  \egn            reference by number
+  \eg{n}          reference by number
+  \eg{-n}         relative reference by number
+  \ek<name>       reference by name (Perl)
+  \ek'name'       reference by name (Perl)
+  \eg{name}       reference by name (Perl)
+  \ek{name}       reference by name (.NET)
+  (?P=name)      reference by name (Python)
+.
+.
+.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
+.rs
+.sp
+  (?R)           recurse whole pattern
+  (?n)           call subpattern by absolute number
+  (?+n)          call subpattern by relative number
+  (?-n)          call subpattern by relative number
+  (?&name)       call subpattern by name (Perl)
+  (?P>name)      call subpattern by name (Python)
+.
+.
+.SH "CONDITIONAL PATTERNS"
+.rs
+.sp
+  (?(condition)yes-pattern)
+  (?(condition)yes-pattern|no-pattern)
+.sp
+  (?(n)...       absolute reference condition
+  (?(+n)...      relative reference condition
+  (?(-n)...      relative reference condition
+  (?(<name>)...  named reference condition (Perl)
+  (?('name')...  named reference condition (Perl)
+  (?(name)...    named reference condition (PCRE)
+  (?(R)...       overall recursion condition
+  (?(Rn)...      specific group recursion condition
+  (?(R&name)...  specific recursion condition
+  (?(DEFINE)...  define subpattern for reference
+  (?(assert)...  assertion condition
+.
+.
+.SH "BACKTRACKING CONTROL"
+.rs
+.sp
+The following act immediately they are reached:
+.sp
+  (*ACCEPT)      force successful match
+  (*FAIL)        force backtrack; synonym (*F)
+.sp
+The following act only when a subsequent match failure causes a backtrack to
+reach them. They all force a match failure, but they differ in what happens
+afterwards. Those that advance the start-of-match point do so only if the
+pattern is not anchored.
+.sp
+  (*COMMIT)      overall failure, no advance of starting point
+  (*PRUNE)       advance to next starting character
+  (*SKIP)        advance start to current matching position
+  (*THEN)        local failure, backtrack to next alternation
+.
+.
+.SH "NEWLINE CONVENTIONS"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after a
+(*BSR_...) option.
+.sp
+  (*CR)
+  (*LF)
+  (*CRLF)
+  (*ANYCRLF)
+  (*ANY)
+.
+.
+.SH "WHAT \eR MATCHES"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after a
+(*...) option that sets the newline convention.
+.sp
+  (*BSR_ANYCRLF)
+  (*BSR_UNICODE)
+.
+.
+.SH "CALLOUTS"
+.rs
+.sp
+  (?C)      callout
+  (?Cn)     callout with data n
+.
+.
+.SH "SEE ALSO"
+.rs
+.sp
+\fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
+\fBpcrematching\fP(3), \fBpcre\fP(3).
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 21 September 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi

Modified: httpd/httpd/vendor/pcre/current/doc/pcretest.1
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcretest.1?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcretest.1 (original)
+++ httpd/httpd/vendor/pcre/current/doc/pcretest.1 Mon Nov 26 08:49:53 2007
@@ -4,10 +4,8 @@
 .SH SYNOPSIS
 .rs
 .sp
-.B pcretest "[-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]"
-.ti +5n
-.B "[destination]"
-.P
+.B pcretest "[options] [source] [destination]"
+.sp
 \fBpcretest\fP was written as a test program for the PCRE regular expression
 library itself, but it can also be used for experimenting with regular
 expressions. This document describes the features of the test program; for
@@ -26,16 +24,29 @@
 .SH OPTIONS
 .rs
 .TP 10
+\fB-b\fP
+Behave as if each regex has the \fB/B\fP (show bytecode) modifier; the internal
+form is output after compilation.
+.TP 10
 \fB-C\fP
 Output the version number of the PCRE library, and all available information
 about the optional features that are included, and then exit.
 .TP 10
 \fB-d\fP
-Behave as if each regex had the \fB/D\fP (debug) modifier; the internal
-form is output after compilation.
+Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
+form and information about the compiled pattern is output after compilation;
+\fB-d\fP is equivalent to \fB-b -i\fP.
+.TP 10
+\fB-dfa\fP
+Behave as if each data line contains the \eD escape sequence; this causes the
+alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
+standard \fBpcre_exec()\fP function (more detail is given below).
+.TP 10
+\fB-help\fP
+Output a brief summary these options and then exit.
 .TP 10
 \fB-i\fP
-Behave as if each regex had the \fB/I\fP modifier; information about the
+Behave as if each regex has the \fB/I\fP modifier; information about the
 compiled pattern is given after compilation.
 .TP 10
 \fB-m\fP
@@ -45,19 +56,36 @@
 .TP 10
 \fB-o\fP \fIosize\fP
 Set the number of elements in the output vector that is used when calling
-\fBpcre_exec()\fP to be \fIosize\fP. The default value is 45, which is enough
-for 14 capturing subexpressions. The vector size can be changed for individual
-matching calls by including \eO in the data line (see below).
+\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP to be \fIosize\fP. The default value
+is 45, which is enough for 14 capturing subexpressions for \fBpcre_exec()\fP or
+22 different matches for \fBpcre_dfa_exec()\fP. The vector size can be
+changed for individual matching calls by including \eO in the data line (see
+below).
 .TP 10
 \fB-p\fP
-Behave as if each regex has \fB/P\fP modifier; the POSIX wrapper API is used
-to call PCRE. None of the other options has any effect when \fB-p\fP is set.
+Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
+used to call PCRE. None of the other options has any effect when \fB-p\fP is
+set.
+.TP 10
+\fB-q\fP
+Do not output the version number of \fBpcretest\fP at the start of execution.
+.TP 10
+\fB-S\fP \fIsize\fP
+On Unix-like systems, set the size of the runtime stack to \fIsize\fP
+megabytes.
 .TP 10
 \fB-t\fP
 Run each compile, study, and match many times with a timer, and output
 resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
 \fB-t\fP, because you will then get the size output a zillion times, and the
-timing will be distorted.
+timing will be distorted. You can control the number of iterations that are
+used for timing by following \fB-t\fP with a number (as a separate item on the
+command line). For example, "-t 1000" would iterate 1000 times. The default is
+to iterate 500000 times.
+.TP 10
+\fB-tm\fP
+This is like \fB-t\fP except that it times only the matching phase, not the
+compile or study phases.
 .
 .
 .SH DESCRIPTION
@@ -74,13 +102,14 @@
 lines to be matched against the pattern.
 .P
 Each data line is matched separately and independently. If you want to do
-multiple-line matches, you have to use the \en escape sequence in a single line
-of input to encode the newline characters. The maximum length of data line is
-30,000 characters.
+multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
+etc., depending on the newline setting) in a single line of input to encode the
+newline sequences. There is no limit on the length of data lines; the input
+buffer is automatically extended if it is too small.
 .P
 An empty line signals the end of the data lines, at which point a new regular
 expression is read. The regular expressions are given enclosed in any
-non-alphanumeric delimiters other than backslash, for example
+non-alphanumeric delimiters other than backslash, for example:
 .sp
   /(a|bc)x+yz/
 .sp
@@ -128,12 +157,37 @@
 The following table shows additional modifiers for setting PCRE options that do
 not correspond to anything in Perl:
 .sp
-  \fB/A\fP    PCRE_ANCHORED
-  \fB/C\fP    PCRE_AUTO_CALLOUT
-  \fB/E\fP    PCRE_DOLLAR_ENDONLY
-  \fB/N\fP    PCRE_NO_AUTO_CAPTURE
-  \fB/U\fP    PCRE_UNGREEDY
-  \fB/X\fP    PCRE_EXTRA
+  \fB/A\fP              PCRE_ANCHORED
+  \fB/C\fP              PCRE_AUTO_CALLOUT
+  \fB/E\fP              PCRE_DOLLAR_ENDONLY
+  \fB/f\fP              PCRE_FIRSTLINE
+  \fB/J\fP              PCRE_DUPNAMES
+  \fB/N\fP              PCRE_NO_AUTO_CAPTURE
+  \fB/U\fP              PCRE_UNGREEDY
+  \fB/X\fP              PCRE_EXTRA
+  \fB/<cr>\fP           PCRE_NEWLINE_CR
+  \fB/<lf>\fP           PCRE_NEWLINE_LF
+  \fB/<crlf>\fP         PCRE_NEWLINE_CRLF
+  \fB/<anycrlf>\fP      PCRE_NEWLINE_ANYCRLF
+  \fB/<any>\fP          PCRE_NEWLINE_ANY
+  \fB/<bsr_anycrlf>\fP  PCRE_BSR_ANYCRLF
+  \fB/<bsr_unicode>\fP  PCRE_BSR_UNICODE
+.sp
+Those specifying line ending sequences are literal strings as shown, but the
+letters can be in either case. This example sets multiline matching with CRLF
+as the line ending sequence:
+.sp
+  /^abc/m<crlf>
+.sp
+Details of the meanings of these PCRE options are given in the
+.\" HREF
+\fBpcreapi\fP
+.\"
+documentation.
+.
+.
+.SS "Finding all matches in a string"
+.rs
 .sp
 Searching for all possible matches within each subject string can be requested
 by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
@@ -150,7 +204,11 @@
 If this second match fails, the start offset is advanced by one, and the normal
 match is retried. This imitates the way Perl handles such cases when using the
 \fB/g\fP modifier or the \fBsplit()\fP function.
-.P
+.
+.
+.SS "Other modifiers"
+.rs
+.sp
 There are yet more modifiers for controlling the way \fBpcretest\fP
 operates.
 .P
@@ -159,6 +217,13 @@
 the subject string. This is useful for tests where the subject contains
 multiple copies of the same substring.
 .P
+The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP
+output a representation of the compiled byte code after compilation. Normally
+this information contains length and offset values; however, if \fB/Z\fP is
+also present, this data is replaced by spaces. This is a special feature for
+use in the automatic test scripts; it ensures that the same output is generated
+for different internal link sizes.
+.P
 The \fB/L\fP modifier must be followed directly by the name of a locale, for
 example,
 .sp
@@ -175,10 +240,8 @@
 so on). It does this by calling \fBpcre_fullinfo()\fP after compiling a
 pattern. If the pattern is studied, the results of that are also output.
 .P
-The \fB/D\fP modifier is a PCRE debugging feature, which also assumes \fB/I\fP.
-It causes the internal form of compiled regular expressions to be output after
-compilation. If the pattern was studied, the information returned is also
-output.
+The \fB/D\fP modifier is a PCRE debugging feature, and is equivalent to
+\fB/BI\fP, that is, both the \fB/B\fP and the \fB/I\fP modifiers.
 .P
 The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the
 fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
@@ -222,21 +285,28 @@
 expressions, you probably don't need any of these. The following escapes are
 recognized:
 .sp
-  \ea         alarm (= BEL)
-  \eb         backspace
-  \ee         escape
-  \ef         formfeed
-  \en         newline
-  \er         carriage return
-  \et         tab
-  \ev         vertical tab
+  \ea         alarm (BEL, \ex07)
+  \eb         backspace (\ex08)
+  \ee         escape (\ex27)
+  \ef         formfeed (\ex0c)
+  \en         newline (\ex0a)
+.\" JOIN
+  \eqdd       set the PCRE_MATCH_LIMIT limit to dd
+               (any number of digits)
+  \er         carriage return (\ex0d)
+  \et         tab (\ex09)
+  \ev         vertical tab (\ex0b)
   \ennn       octal character (up to 3 octal digits)
   \exhh       hexadecimal character (up to 2 hex digits)
 .\" JOIN
   \ex{hh...}  hexadecimal character, any number of digits
                in UTF-8 mode
+.\" JOIN
   \eA         pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
   \eB         pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
 .\" JOIN
   \eCdd       call pcre_copy_substring() for substring dd
                after a successful match (number less than 32)
@@ -257,6 +327,8 @@
 .\" JOIN
   \eC*n       pass the number n (may be negative) as callout
                data; this is used as the callout return value
+  \eD         use the \fBpcre_dfa_exec()\fP match function
+  \eF         only shortest match for \fBpcre_dfa_exec()\fP
 .\" JOIN
   \eGdd       call pcre_get_substring() for substring dd
                after a successful match (number less than 32)
@@ -267,59 +339,122 @@
 .\" JOIN
   \eL         call pcre_get_substringlist() after a
                successful match
-  \eM         discover the minimum MATCH_LIMIT setting
+.\" JOIN
+  \eM         discover the minimum MATCH_LIMIT and
+               MATCH_LIMIT_RECURSION settings
+.\" JOIN
   \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
 .\" JOIN
   \eOdd       set the size of the output vector passed to
                \fBpcre_exec()\fP to dd (any number of digits)
+.\" JOIN
   \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \eQdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
+               (any number of digits)
+  \eR         pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
   \eS         output details of memory get/free calls during matching
+.\" JOIN
   \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
 .\" JOIN
   \e?         pass the PCRE_NO_UTF8_CHECK option to
-               \fBpcre_exec()\fP
+               \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
   \e>dd       start the match at offset dd (any number of digits);
+.\" JOIN
                this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \e<cr>      pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \e<lf>      pass the PCRE_NEWLINE_LF option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \e<crlf>    pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \e<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
+.\" JOIN
+  \e<any>     pass the PCRE_NEWLINE_ANY option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP
 .sp
-A backslash followed by anything else just escapes the anything else. If the
-very last character is a backslash, it is ignored. This gives a way of passing
-an empty line as data, since a real empty line terminates the data input.
+The escapes that specify line ending sequences are literal strings, exactly as
+shown. No more than one newline setting should be present in any data line.
+.P
+A backslash followed by anything else just escapes the anything else. If
+the very last character is a backslash, it is ignored. This gives a way of
+passing an empty line as data, since a real empty line terminates the data
+input.
 .P
 If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
-different values in the \fImatch_limit\fP field of the \fBpcre_extra\fP data
-structure, until it finds the minimum number that is needed for
-\fBpcre_exec()\fP to complete. This number is a measure of the amount of
-recursion and backtracking that takes place, and checking it out can be
-instructive. For most simple matches, the number is quite small, but for
-patterns with very large numbers of matching possibilities, it can become large
-very quickly with increasing length of subject string.
+different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
+fields of the \fBpcre_extra\fP data structure, until it finds the minimum
+numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
+\fImatch_limit\fP number is a measure of the amount of backtracking that takes
+place, and checking it out can be instructive. For most simple matches, the
+number is quite small, but for patterns with very large numbers of matching
+possibilities, it can become large very quickly with increasing length of
+subject string. The \fImatch_limit_recursion\fP number is a measure of how much
+stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
+to complete the match attempt.
 .P
 When \eO is used, the value specified may be higher or lower than the size set
 by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
 the call of \fBpcre_exec()\fP for the line in which it appears.
 .P
 If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
-API to be used, only \eB and \eZ have any effect, causing REG_NOTBOL and
-REG_NOTEOL to be passed to \fBregexec()\fP respectively.
+API to be used, the only option-setting sequences that have any effect are \eB
+and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
+\fBregexec()\fP.
 .P
 The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
 of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
 any number of hexadecimal digits inside the braces. The result is from one to
-six bytes, encoded according to the UTF-8 rules.
+six bytes, encoded according to the original UTF-8 rules of RFC 2279. This
+allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are
+valid Unicode code points, or indeed valid UTF-8 characters according to the
+later rules in RFC 3629.
 .
 .
-.SH "OUTPUT FROM PCRETEST"
+.SH "THE ALTERNATIVE MATCHING FUNCTION"
 .rs
 .sp
+By default, \fBpcretest\fP uses the standard PCRE matching function,
+\fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
+alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
+different way, and has some restrictions. The differences between the two
+functions are described in the
+.\" HREF
+\fBpcrematching\fP
+.\"
+documentation.
+.P
+If a data line contains the \eD escape sequence, or if the command line
+contains the \fB-dfa\fP option, the alternative matching function is called.
+This function finds all possible matches at a given point. If, however, the \eF
+escape sequence is present in the data line, it stops after the first match is
+found. This is always the shortest possible match.
+.
+.
+.SH "DEFAULT OUTPUT FROM PCRETEST"
+.rs
+.sp
+This section describes the output when the normal matching function,
+\fBpcre_exec()\fP, is being used.
+.P
 When a match succeeds, pcretest outputs the list of captured substrings that
 \fBpcre_exec()\fP returns, starting with number 0 for the string that matched
 the whole pattern. Otherwise, it outputs "No match" or "Partial match"
 when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
 respectively, and otherwise the PCRE negative error number. Here is an example
-of an interactive pcretest run.
+of an interactive \fBpcretest\fP run.
 .sp
   $ pcretest
-  PCRE version 5.00 07-Sep-2004
+  PCRE version 7.0 30-Nov-2006
 .sp
     re> /^abc(\ed+)/
   data> abc123
@@ -330,9 +465,9 @@
 .sp
 If the strings contain any non-printing characters, they are output as \e0x
 escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the
-pattern. If the pattern has the \fB/+\fP modifier, the output for substring 0
-is followed by the the rest of the subject string, identified by "0+" like
-this:
+pattern. See below for the definition of non-printing characters. If the
+pattern has the \fB/+\fP modifier, the output for substring 0 is followed by
+the the rest of the subject string, identified by "0+" like this:
 .sp
     re> /cat/+
   data> cataract
@@ -360,18 +495,75 @@
 length (that is, the return from the extraction function) is given in
 parentheses after each string for \fB\eC\fP and \fB\eG\fP.
 .P
-Note that while patterns can be continued over several lines (a plain ">"
+Note that whereas patterns can be continued over several lines (a plain ">"
 prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \en escape.
+included in data by means of the \en escape (or \er, \er\en, etc., depending on
+the newline sequence setting).
+.
+.
+.
+.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
+.rs
+.sp
+When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
+means of the \eD escape sequence or the \fB-dfa\fP command line option), the
+output consists of a list of all the matches that start at the first point in
+the subject where there is at least one match. For example:
+.sp
+    re> /(tang|tangerine|tan)/
+  data> yellow tangerine\eD
+   0: tangerine
+   1: tang
+   2: tan
+.sp
+(Using the normal matching function on this data finds only "tang".) The
+longest matching string is always given first (and numbered zero).
+.P
+If \fB/g\fP is present on the pattern, the search for further matches resumes
+at the end of the longest match. For example:
+.sp
+    re> /(tang|tangerine|tan)/g
+  data> yellow tangerine and tangy sultana\eD
+   0: tangerine
+   1: tang
+   2: tan
+   0: tang
+   1: tan
+   0: tan
+.sp
+Since the matching function does not support substring capture, the escape
+sequences that are concerned with captured substrings are not relevant.
+.
+.
+.SH "RESTARTING AFTER A PARTIAL MATCH"
+.rs
+.sp
+When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
+indicating that the subject partially matched the pattern, you can restart the
+match with additional subject data by means of the \eR escape sequence. For
+example:
+.sp
+    re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
+  data> 23ja\eP\eD
+  Partial match: 23ja
+  data> n05\eR\eD
+   0: n05
+.sp
+For further information about partial matching, see the
+.\" HREF
+\fBpcrepartial\fP
+.\"
+documentation.
 .
 .
 .SH CALLOUTS
 .rs
 .sp
 If the pattern contains any callout requests, \fBpcretest\fP's callout function
-is called during matching. By default, it displays the callout number, the
-start and current positions in the text at the callout time, and the next
-pattern item to be tested. For example, the output
+is called during matching. This works with both matching functions. By default,
+the called function displays the callout number, the start and current
+positions in the text at the callout time, and the next pattern item to be
+tested. For example, the output
 .sp
   --->pqrabcdef
     0    ^  ^     \ed
@@ -396,7 +588,7 @@
    0: E*
 .sp
 The callout function in \fBpcretest\fP returns zero (carry on matching) by
-default, but you can use an \eC item in a data line (as described above) to
+default, but you can use a \eC item in a data line (as described above) to
 change this.
 .P
 Inserting callouts can be helpful when using \fBpcretest\fP to check
@@ -408,6 +600,21 @@
 documentation.
 .
 .
+.
+.SH "NON-PRINTING CHARACTERS"
+.rs
+.sp
+When \fBpcretest\fP is outputting text in the compiled version of a pattern,
+bytes other than 32-126 are always treated as non-printing characters are are
+therefore shown as hex escapes.
+.P
+When \fBpcretest\fP is outputting text that is a matched part of a subject
+string, it behaves in the same way, unless a different locale has been set for
+the pattern (using the \fB/L\fP modifier). In this case, the \fBisprint()\fP
+function to distinguish printing and non-printing characters.
+.
+.
+.
 .SH "SAVING AND RELOADING COMPILED PATTERNS"
 .rs
 .sp
@@ -468,16 +675,27 @@
 result is undefined.
 .
 .
+.SH "SEE ALSO"
+.rs
+.sp
+\fBpcre\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3),
+\fBpcrepartial\fP(d), \fBpcrepattern\fP(3), \fBpcreprecompile\fP(3).
+.
+.
 .SH AUTHOR
 .rs
 .sp
-Philip Hazel <ph...@cam.ac.uk>
-.br
-University Computing Service,
-.br
-Cambridge CB2 3QG, England.
-.P
-.in 0
-Last updated: 10 September 2004
-.br
-Copyright (c) 1997-2004 University of Cambridge.
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 11 September 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi

Modified: httpd/httpd/vendor/pcre/current/doc/pcretest.txt
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcretest.txt?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcretest.txt (original)
+++ httpd/httpd/vendor/pcre/current/doc/pcretest.txt Mon Nov 26 08:49:53 2007
@@ -1,14 +1,13 @@
 PCRETEST(1)                                                        PCRETEST(1)
 
 
-
 NAME
        pcretest - a program for testing Perl-compatible regular expressions.
 
+
 SYNOPSIS
 
-       pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]
-            [destination]
+       pcretest [options] [source] [destination]
 
        pcretest  was written as a test program for the PCRE regular expression
        library itself, but it can also be used for experimenting with  regular
@@ -20,99 +19,126 @@
 
 OPTIONS
 
+       -b        Behave as if each regex has the /B (show bytecode)  modifier;
+                 the internal form is output after compilation.
+
        -C        Output the version number of the PCRE library, and all avail-
-                 able   information  about  the  optional  features  that  are
+                 able  information  about  the  optional  features  that   are
                  included, and then exit.
 
-       -d        Behave as if each regex had  the  /D  (debug)  modifier;  the
-                 internal form is output after compilation.
+       -d        Behave  as  if  each  regex  has the /D (debug) modifier; the
+                 internal form and information about the compiled  pattern  is
+                 output after compilation; -d is equivalent to -b -i.
+
+       -dfa      Behave  as if each data line contains the \D escape sequence;
+                 this    causes    the    alternative    matching    function,
+                 pcre_dfa_exec(),   to   be   used  instead  of  the  standard
+                 pcre_exec() function (more detail is given below).
 
-       -i        Behave  as  if  each  regex  had the /I modifier; information
+       -help     Output a brief summary these options and then exit.
+
+       -i        Behave as if each regex  has  the  /I  modifier;  information
                  about the compiled pattern is given after compilation.
 
-       -m        Output the size of each compiled pattern after  it  has  been
-                 compiled.  This  is  equivalent  to adding /M to each regular
-                 expression.  For  compatibility  with  earlier  versions   of
+       -m        Output  the  size  of each compiled pattern after it has been
+                 compiled. This is equivalent to adding  /M  to  each  regular
+                 expression.   For  compatibility  with  earlier  versions  of
                  pcretest, -s is a synonym for -m.
 
-       -o osize  Set  the number of elements in the output vector that is used
-                 when calling pcre_exec() to be osize. The  default  value  is
-                 45, which is enough for 14 capturing subexpressions. The vec-
-                 tor size can be changed  for  individual  matching  calls  by
-                 including \O in the data line (see below).
-
-       -p        Behave  as  if  each regex has /P modifier; the POSIX wrapper
-                 API is used to call PCRE. None of the other options  has  any
-                 effect when -p is set.
-
-       -t        Run  each  compile, study, and match many times with a timer,
-                 and output resulting time per compile or match (in  millisec-
-                 onds).  Do  not set -m with -t, because you will then get the
-                 size output a zillion times, and  the  timing  will  be  dis-
-                 torted.
+       -o osize  Set the number of elements in the output vector that is  used
+                 when  calling pcre_exec() or pcre_dfa_exec() to be osize. The
+                 default value is 45, which is enough for 14 capturing  subex-
+                 pressions   for  pcre_exec()  or  22  different  matches  for
+                 pcre_dfa_exec(). The vector size can be changed for  individ-
+                 ual  matching  calls  by  including  \O in the data line (see
+                 below).
+
+       -p        Behave as if each regex has the /P modifier; the POSIX  wrap-
+                 per  API  is used to call PCRE. None of the other options has
+                 any effect when -p is set.
+
+       -q        Do not output the version number of pcretest at the start  of
+                 execution.
+
+       -S size   On  Unix-like  systems,  set the size of the runtime stack to
+                 size megabytes.
+
+       -t        Run each compile, study, and match many times with  a  timer,
+                 and  output resulting time per compile or match (in millisec-
+                 onds). Do not set -m with -t, because you will then  get  the
+                 size  output  a  zillion  times,  and the timing will be dis-
+                 torted. You can control the number  of  iterations  that  are
+                 used  for timing by following -t with a number (as a separate
+                 item on the command line). For example, "-t 1000" would iter-
+                 ate 1000 times. The default is to iterate 500000 times.
+
+       -tm       This is like -t except that it times only the matching phase,
+                 not the compile or study phases.
 
 
 DESCRIPTION
 
-       If  pcretest  is  given two filename arguments, it reads from the first
+       If pcretest is given two filename arguments, it reads  from  the  first
        and writes to the second. If it is given only one filename argument, it
-       reads  from  that  file  and writes to stdout. Otherwise, it reads from
-       stdin and writes to stdout, and prompts for each line of  input,  using
+       reads from that file and writes to stdout.  Otherwise,  it  reads  from
+       stdin  and  writes to stdout, and prompts for each line of input, using
        "re>" to prompt for regular expressions, and "data>" to prompt for data
        lines.
 
        The program handles any number of sets of input on a single input file.
-       Each  set starts with a regular expression, and continues with any num-
+       Each set starts with a regular expression, and continues with any  num-
        ber of data lines to be matched against the pattern.
 
-       Each data line is matched separately and independently. If you want  to
-       do  multiple-line  matches, you have to use the \n escape sequence in a
-       single line of input to encode  the  newline  characters.  The  maximum
-       length of data line is 30,000 characters.
-
-       An  empty  line signals the end of the data lines, at which point a new
-       regular expression is read. The regular expressions are given  enclosed
-       in any non-alphanumeric delimiters other than backslash, for example
+       Each  data line is matched separately and independently. If you want to
+       do multi-line matches, you have to use the \n escape sequence (or \r or
+       \r\n, etc., depending on the newline setting) in a single line of input
+       to encode the newline sequences. There is no limit  on  the  length  of
+       data  lines;  the  input  buffer is automatically extended if it is too
+       small.
+
+       An empty line signals the end of the data lines, at which point  a  new
+       regular  expression is read. The regular expressions are given enclosed
+       in any non-alphanumeric delimiters other than backslash, for example:
 
          /(a|bc)x+yz/
 
-       White  space before the initial delimiter is ignored. A regular expres-
-       sion may be continued over several input lines, in which case the  new-
-       line  characters  are included within it. It is possible to include the
+       White space before the initial delimiter is ignored. A regular  expres-
+       sion  may be continued over several input lines, in which case the new-
+       line characters are included within it. It is possible to  include  the
        delimiter within the pattern by escaping it, for example
 
          /abc\/def/
 
-       If you do so, the escape and the delimiter form part  of  the  pattern,
-       but  since delimiters are always non-alphanumeric, this does not affect
-       its interpretation.  If the terminating delimiter is  immediately  fol-
+       If  you  do  so, the escape and the delimiter form part of the pattern,
+       but since delimiters are always non-alphanumeric, this does not  affect
+       its  interpretation.   If the terminating delimiter is immediately fol-
        lowed by a backslash, for example,
 
          /abc/\
 
-       then  a  backslash  is added to the end of the pattern. This is done to
-       provide a way of testing the error condition that arises if  a  pattern
+       then a backslash is added to the end of the pattern. This  is  done  to
+       provide  a  way of testing the error condition that arises if a pattern
        finishes with a backslash, because
 
          /abc\/
 
-       is  interpreted as the first line of a pattern that starts with "abc/",
+       is interpreted as the first line of a pattern that starts with  "abc/",
        causing pcretest to read the next line as a continuation of the regular
        expression.
 
 
 PATTERN MODIFIERS
 
-       A  pattern may be followed by any number of modifiers, which are mostly
-       single characters. Following Perl usage, these are  referred  to  below
-       as,  for  example,  "the /i modifier", even though the delimiter of the
-       pattern need not always be a slash, and no slash is used  when  writing
-       modifiers.  Whitespace  may  appear between the final pattern delimiter
+       A pattern may be followed by any number of modifiers, which are  mostly
+       single  characters.  Following  Perl usage, these are referred to below
+       as, for example, "the /i modifier", even though the  delimiter  of  the
+       pattern  need  not always be a slash, and no slash is used when writing
+       modifiers. Whitespace may appear between the  final  pattern  delimiter
        and the first modifier, and between the modifiers themselves.
 
        The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
-       PCRE_DOTALL,  or  PCRE_EXTENDED  options,  respectively, when pcre_com-
-       pile() is called. These four modifier letters have the same  effect  as
+       PCRE_DOTALL, or PCRE_EXTENDED  options,  respectively,  when  pcre_com-
+       pile()  is  called. These four modifier letters have the same effect as
        they do in Perl. For example:
 
          /caseless/i
@@ -120,12 +146,32 @@
        The following table shows additional modifiers for setting PCRE options
        that do not correspond to anything in Perl:
 
-         /A    PCRE_ANCHORED
-         /C    PCRE_AUTO_CALLOUT
-         /E    PCRE_DOLLAR_ENDONLY
-         /N    PCRE_NO_AUTO_CAPTURE
-         /U    PCRE_UNGREEDY
-         /X    PCRE_EXTRA
+         /A              PCRE_ANCHORED
+         /C              PCRE_AUTO_CALLOUT
+         /E              PCRE_DOLLAR_ENDONLY
+         /f              PCRE_FIRSTLINE
+         /J              PCRE_DUPNAMES
+         /N              PCRE_NO_AUTO_CAPTURE
+         /U              PCRE_UNGREEDY
+         /X              PCRE_EXTRA
+         /<cr>           PCRE_NEWLINE_CR
+         /<lf>           PCRE_NEWLINE_LF
+         /<crlf>         PCRE_NEWLINE_CRLF
+         /<anycrlf>      PCRE_NEWLINE_ANYCRLF
+         /<any>          PCRE_NEWLINE_ANY
+         /<bsr_anycrlf>  PCRE_BSR_ANYCRLF
+         /<bsr_unicode>  PCRE_BSR_UNICODE
+
+       Those  specifying  line  ending sequences are literal strings as shown,
+       but the letters can be in either  case.  This  example  sets  multiline
+       matching with CRLF as the line ending sequence:
+
+         /^abc/m<crlf>
+
+       Details  of the meanings of these PCRE options are given in the pcreapi
+       documentation.
+
+   Finding all matches in a string
 
        Searching for all possible matches within each subject  string  can  be
        requested  by  the  /g  or  /G modifier. After finding a match, PCRE is
@@ -144,6 +190,8 @@
        one, and the normal match is retried. This imitates the way  Perl  han-
        dles such cases when using the /g modifier or the split() function.
 
+   Other modifiers
+
        There are yet more modifiers for controlling the way pcretest operates.
 
        The /+ modifier requests that as well as outputting the substring  that
@@ -151,83 +199,92 @@
        remainder of the subject string. This is useful  for  tests  where  the
        subject contains multiple copies of the same substring.
 
-       The  /L modifier must be followed directly by the name of a locale, for
+       The  /B modifier is a debugging feature. It requests that pcretest out-
+       put a representation of the compiled byte code after compilation.  Nor-
+       mally  this  information contains length and offset values; however, if
+       /Z is also present, this data is replaced by spaces. This is a  special
+       feature for use in the automatic test scripts; it ensures that the same
+       output is generated for different internal link sizes.
+
+       The /L modifier must be followed directly by the name of a locale,  for
        example,
 
          /pattern/Lfr_FR
 
        For this reason, it must be the last modifier. The given locale is set,
-       pcre_maketables()  is called to build a set of character tables for the
-       locale, and this is then passed to pcre_compile()  when  compiling  the
-       regular  expression.  Without  an  /L  modifier,  NULL is passed as the
-       tables pointer; that is, /L applies only to the expression on which  it
+       pcre_maketables() is called to build a set of character tables for  the
+       locale,  and  this  is then passed to pcre_compile() when compiling the
+       regular expression. Without an /L  modifier,  NULL  is  passed  as  the
+       tables  pointer; that is, /L applies only to the expression on which it
        appears.
 
-       The  /I  modifier  requests  that pcretest output information about the
-       compiled pattern (whether it is anchored, has a fixed first  character,
-       and  so  on). It does this by calling pcre_fullinfo() after compiling a
-       pattern. If the pattern is studied, the results of that are  also  out-
+       The /I modifier requests that pcretest  output  information  about  the
+       compiled  pattern (whether it is anchored, has a fixed first character,
+       and so on). It does this by calling pcre_fullinfo() after  compiling  a
+       pattern.  If  the pattern is studied, the results of that are also out-
        put.
 
-       The /D modifier is a PCRE debugging feature, which also assumes /I.  It
-       causes the internal form of compiled regular expressions to  be  output
-       after compilation. If the pattern was studied, the information returned
-       is also output.
+       The /D modifier is a PCRE debugging feature, and is equivalent to  /BI,
+       that is, both the /B and the /I modifiers.
 
        The /F modifier causes pcretest to flip the byte order of the fields in
-       the  compiled  pattern  that  contain  2-byte  and 4-byte numbers. This
-       facility is for testing the feature in PCRE that allows it  to  execute
+       the compiled pattern that  contain  2-byte  and  4-byte  numbers.  This
+       facility  is  for testing the feature in PCRE that allows it to execute
        patterns that were compiled on a host with a different endianness. This
-       feature is not available when the POSIX  interface  to  PCRE  is  being
-       used,  that is, when the /P pattern modifier is specified. See also the
+       feature  is  not  available  when  the POSIX interface to PCRE is being
+       used, that is, when the /P pattern modifier is specified. See also  the
        section about saving and reloading compiled patterns below.
 
-       The /S modifier causes pcre_study() to be called after  the  expression
+       The  /S  modifier causes pcre_study() to be called after the expression
        has been compiled, and the results used when the expression is matched.
 
-       The /M modifier causes the size of memory block used to hold  the  com-
+       The  /M  modifier causes the size of memory block used to hold the com-
        piled pattern to be output.
 
-       The  /P modifier causes pcretest to call PCRE via the POSIX wrapper API
-       rather than its native API. When this  is  done,  all  other  modifiers
-       except  /i,  /m, and /+ are ignored. REG_ICASE is set if /i is present,
-       and REG_NEWLINE is set if /m is present. The  wrapper  functions  force
-       PCRE_DOLLAR_ENDONLY  always, and PCRE_DOTALL unless REG_NEWLINE is set.
-
-       The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8  option
-       set.  This  turns on support for UTF-8 character handling in PCRE, pro-
-       vided that it was compiled with this  support  enabled.  This  modifier
+       The /P modifier causes pcretest to call PCRE via the POSIX wrapper  API
+       rather  than  its  native  API.  When this is done, all other modifiers
+       except /i, /m, and /+ are ignored. REG_ICASE is set if /i  is  present,
+       and  REG_NEWLINE  is  set if /m is present. The wrapper functions force
+       PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is  set.
+
+       The  /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
+       set. This turns on support for UTF-8 character handling in  PCRE,  pro-
+       vided  that  it  was  compiled with this support enabled. This modifier
        also causes any non-printing characters in output strings to be printed
        using the \x{hh...} notation if they are valid UTF-8 sequences.
 
-       If the /? modifier  is  used  with  /8,  it  causes  pcretest  to  call
-       pcre_compile()  with  the  PCRE_NO_UTF8_CHECK  option,  to suppress the
+       If  the  /?  modifier  is  used  with  /8,  it  causes pcretest to call
+       pcre_compile() with the  PCRE_NO_UTF8_CHECK  option,  to  suppress  the
        checking of the string for UTF-8 validity.
 
 
 DATA LINES
 
-       Before each data line is passed to pcre_exec(),  leading  and  trailing
-       whitespace  is  removed,  and it is then scanned for \ escapes. Some of
-       these are pretty esoteric features, intended for checking out  some  of
-       the  more  complicated features of PCRE. If you are just testing "ordi-
-       nary" regular expressions, you probably don't need any  of  these.  The
+       Before  each  data  line is passed to pcre_exec(), leading and trailing
+       whitespace is removed, and it is then scanned for \  escapes.  Some  of
+       these  are  pretty esoteric features, intended for checking out some of
+       the more complicated features of PCRE. If you are just  testing  "ordi-
+       nary"  regular  expressions,  you probably don't need any of these. The
        following escapes are recognized:
 
-         \a         alarm (= BEL)
-         \b         backspace
-         \e         escape
-         \f         formfeed
-         \n         newline
-         \r         carriage return
-         \t         tab
-         \v         vertical tab
+         \a         alarm (BEL, \x07)
+         \b         backspace (\x08)
+         \e         escape (\x27)
+         \f         formfeed (\x0c)
+         \n         newline (\x0a)
+         \qdd       set the PCRE_MATCH_LIMIT limit to dd
+                      (any number of digits)
+         \r         carriage return (\x0d)
+         \t         tab (\x09)
+         \v         vertical tab (\x0b)
          \nnn       octal character (up to 3 octal digits)
          \xhh       hexadecimal character (up to 2 hex digits)
          \x{hh...}  hexadecimal character, any number of digits
                       in UTF-8 mode
          \A         pass the PCRE_ANCHORED option to pcre_exec()
+                      or pcre_dfa_exec()
          \B         pass the PCRE_NOTBOL option to pcre_exec()
+                      or pcre_dfa_exec()
          \Cdd       call pcre_copy_substring() for substring dd
                       after a successful match (number less than 32)
          \Cname     call pcre_copy_named_substring() for substring
@@ -242,6 +299,8 @@
                       reached for the nth time
          \C*n       pass the number n (may be negative) as callout
                       data; this is used as the callout return value
+         \D         use the pcre_dfa_exec() match function
+         \F         only shortest match for pcre_dfa_exec()
          \Gdd       call pcre_get_substring() for substring dd
                       after a successful match (number less than 32)
          \Gname     call pcre_get_named_substring() for substring
@@ -249,57 +308,105 @@
                       ated by next non-alphanumeric character)
          \L         call pcre_get_substringlist() after a
                       successful match
-         \M         discover the minimum MATCH_LIMIT setting
+         \M         discover the minimum MATCH_LIMIT and
+                      MATCH_LIMIT_RECURSION settings
          \N         pass the PCRE_NOTEMPTY option to pcre_exec()
+                      or pcre_dfa_exec()
          \Odd       set the size of the output vector passed to
                       pcre_exec() to dd (any number of digits)
          \P         pass the PCRE_PARTIAL option to pcre_exec()
+                      or pcre_dfa_exec()
+         \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
+                      (any number of digits)
+         \R         pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
          \S         output details of memory get/free calls during matching
          \Z         pass the PCRE_NOTEOL option to pcre_exec()
+                      or pcre_dfa_exec()
          \?         pass the PCRE_NO_UTF8_CHECK option to
-                      pcre_exec()
+                      pcre_exec() or pcre_dfa_exec()
          \>dd       start the match at offset dd (any number of digits);
                       this sets the startoffset argument for pcre_exec()
-
-       A  backslash  followed by anything else just escapes the anything else.
-       If the very last character is a backslash, it is ignored. This gives  a
-       way  of  passing  an empty line as data, since a real empty line termi-
+                      or pcre_dfa_exec()
+         \<cr>      pass the PCRE_NEWLINE_CR option to pcre_exec()
+                      or pcre_dfa_exec()
+         \<lf>      pass the PCRE_NEWLINE_LF option to pcre_exec()
+                      or pcre_dfa_exec()
+         \<crlf>    pass the PCRE_NEWLINE_CRLF option to pcre_exec()
+                      or pcre_dfa_exec()
+         \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to pcre_exec()
+                      or pcre_dfa_exec()
+         \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()
+                      or pcre_dfa_exec()
+
+       The escapes that specify line ending  sequences  are  literal  strings,
+       exactly as shown. No more than one newline setting should be present in
+       any data line.
+
+       A backslash followed by anything else just escapes the  anything  else.
+       If  the very last character is a backslash, it is ignored. This gives a
+       way of passing an empty line as data, since a real  empty  line  termi-
        nates the data input.
 
-       If \M is present, pcretest calls pcre_exec() several times,  with  dif-
-       ferent  values  in  the match_limit field of the pcre_extra data struc-
-       ture, until it finds the minimum number that is needed for  pcre_exec()
-       to  complete.  This  number is a measure of the amount of recursion and
-       backtracking that takes place, and checking it out can be  instructive.
-       For  most  simple  matches, the number is quite small, but for patterns
-       with very large numbers of matching possibilities, it can become  large
-       very quickly with increasing length of subject string.
+       If  \M  is present, pcretest calls pcre_exec() several times, with dif-
+       ferent values in the match_limit and  match_limit_recursion  fields  of
+       the  pcre_extra  data structure, until it finds the minimum numbers for
+       each parameter that allow pcre_exec() to complete. The match_limit num-
+       ber  is  a  measure of the amount of backtracking that takes place, and
+       checking it out can be instructive. For most simple matches, the number
+       is  quite  small,  but for patterns with very large numbers of matching
+       possibilities, it can become large very quickly with increasing  length
+       of subject string. The match_limit_recursion number is a measure of how
+       much stack (or, if PCRE is compiled with  NO_RECURSE,  how  much  heap)
+       memory is needed to complete the match attempt.
 
        When  \O  is  used, the value specified may be higher or lower than the
        size set by the -O command line option (or defaulted to 45); \O applies
        only to the call of pcre_exec() for the line in which it appears.
 
        If  the /P modifier was present on the pattern, causing the POSIX wrap-
-       per API to be used, only \B and \Z have any effect, causing  REG_NOTBOL
-       and REG_NOTEOL to be passed to regexec() respectively.
+       per API to be used, the only option-setting  sequences  that  have  any
+       effect  are \B and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively,
+       to be passed to regexec().
+
+       The use of \x{hh...} to represent UTF-8 characters is not dependent  on
+       the  use  of  the  /8 modifier on the pattern. It is recognized always.
+       There may be any number of hexadecimal digits inside  the  braces.  The
+       result  is  from  one  to  six bytes, encoded according to the original
+       UTF-8 rules of RFC 2279. This allows for  values  in  the  range  0  to
+       0x7FFFFFFF.  Note  that not all of those are valid Unicode code points,
+       or indeed valid UTF-8 characters according to the later  rules  in  RFC
+       3629.
+
+
+THE ALTERNATIVE MATCHING FUNCTION
+
+       By   default,  pcretest  uses  the  standard  PCRE  matching  function,
+       pcre_exec() to match each data line. From release 6.0, PCRE supports an
+       alternative  matching  function,  pcre_dfa_test(),  which operates in a
+       different way, and has some restrictions. The differences  between  the
+       two functions are described in the pcrematching documentation.
+
+       If  a data line contains the \D escape sequence, or if the command line
+       contains the -dfa option, the alternative matching function is  called.
+       This function finds all possible matches at a given point. If, however,
+       the \F escape sequence is present in the data line, it stops after  the
+       first match is found. This is always the shortest possible match.
 
-       The  use of \x{hh...} to represent UTF-8 characters is not dependent on
-       the use of the /8 modifier on the pattern.  It  is  recognized  always.
-       There  may  be  any number of hexadecimal digits inside the braces. The
-       result is from one to six bytes, encoded according to the UTF-8  rules.
 
+DEFAULT OUTPUT FROM PCRETEST
 
-OUTPUT FROM PCRETEST
+       This  section  describes  the output when the normal matching function,
+       pcre_exec(), is being used.
 
        When a match succeeds, pcretest outputs the list of captured substrings
-       that pcre_exec() returns, starting with number 0 for  the  string  that
+       that  pcre_exec()  returns,  starting with number 0 for the string that
        matched the whole pattern. Otherwise, it outputs "No match" or "Partial
-       match" when pcre_exec() returns PCRE_ERROR_NOMATCH  or  PCRE_ERROR_PAR-
-       TIAL,  respectively, and otherwise the PCRE negative error number. Here
+       match"  when  pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
+       TIAL, respectively, and otherwise the PCRE negative error number.  Here
        is an example of an interactive pcretest run.
 
          $ pcretest
-         PCRE version 5.00 07-Sep-2004
+         PCRE version 7.0 30-Nov-2006
 
            re> /^abc(\d+)/
          data> abc123
@@ -308,11 +415,12 @@
          data> xyz
          No match
 
-       If the strings contain any non-printing characters, they are output  as
-       \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on
-       the pattern. If the pattern has the /+ modifier, the  output  for  sub-
-       string  0 is followed by the the rest of the subject string, identified
-       by "0+" like this:
+       If  the strings contain any non-printing characters, they are output as
+       \0x escapes, or as \x{...} escapes if the /8 modifier  was  present  on
+       the  pattern.  See below for the definition of non-printing characters.
+       If the pattern has the /+ modifier, the output for substring 0 is  fol-
+       lowed  by  the  the rest of the subject string, identified by "0+" like
+       this:
 
            re> /cat/+
          data> cataract
@@ -340,17 +448,69 @@
        (that  is,  the return from the extraction function) is given in paren-
        theses after each string for \C and \G.
 
-       Note that while patterns can be continued over several lines  (a  plain
+       Note that whereas patterns can be continued over several lines (a plain
        ">" prompt is used for continuations), data lines may not. However new-
-       lines can be included in data by means of the \n escape.
+       lines can be included in data by means of the \n escape (or  \r,  \r\n,
+       etc., depending on the newline sequence setting).
+
+
+OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
+
+       When  the  alternative  matching function, pcre_dfa_exec(), is used (by
+       means of the \D escape sequence or the -dfa command line  option),  the
+       output  consists  of  a list of all the matches that start at the first
+       point in the subject where there is at least one match. For example:
+
+           re> /(tang|tangerine|tan)/
+         data> yellow tangerine\D
+          0: tangerine
+          1: tang
+          2: tan
+
+       (Using the normal matching function on this data  finds  only  "tang".)
+       The  longest matching string is always given first (and numbered zero).
+
+       If /g is present on the pattern, the search for further matches resumes
+       at the end of the longest match. For example:
+
+           re> /(tang|tangerine|tan)/g
+         data> yellow tangerine and tangy sultana\D
+          0: tangerine
+          1: tang
+          2: tan
+          0: tang
+          1: tan
+          0: tan
+
+       Since  the  matching  function  does not support substring capture, the
+       escape sequences that are concerned with captured  substrings  are  not
+       relevant.
+
+
+RESTARTING AFTER A PARTIAL MATCH
+
+       When the alternative matching function has given the PCRE_ERROR_PARTIAL
+       return, indicating that the subject partially matched the pattern,  you
+       can  restart  the match with additional subject data by means of the \R
+       escape sequence. For example:
+
+           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+         data> 23ja\P\D
+         Partial match: 23ja
+         data> n05\R\D
+          0: n05
+
+       For further information about partial  matching,  see  the  pcrepartial
+       documentation.
 
 
 CALLOUTS
 
-       If the pattern contains any callout requests, pcretest's callout  func-
-       tion  is  called  during  matching. By default, it displays the callout
-       number, the start and current positions in  the  text  at  the  callout
-       time, and the next pattern item to be tested. For example, the output
+       If  the pattern contains any callout requests, pcretest's callout func-
+       tion is called during matching. This works  with  both  matching  func-
+       tions. By default, the called function displays the callout number, the
+       start and current positions in the text at the callout  time,  and  the
+       next pattern item to be tested. For example, the output
 
          --->pqrabcdef
            0    ^  ^     \d
@@ -376,7 +536,7 @@
           0: E*
 
        The callout function in pcretest returns zero (carry  on  matching)  by
-       default, but you can use an \C item in a data line (as described above)
+       default,  but you can use a \C item in a data line (as described above)
        to change this.
 
        Inserting callouts can be helpful when using pcretest to check  compli-
@@ -384,6 +544,18 @@
        the pcrecallout documentation.
 
 
+NON-PRINTING CHARACTERS
+
+       When pcretest is outputting text in the compiled version of a  pattern,
+       bytes  other  than 32-126 are always treated as non-printing characters
+       are are therefore shown as hex escapes.
+
+       When pcretest is outputting text that is a matched part  of  a  subject
+       string,  it behaves in the same way, unless a different locale has been
+       set for the  pattern  (using  the  /L  modifier).  In  this  case,  the
+       isprint() function to distinguish printing and non-printing characters.
+
+
 SAVING AND RELOADING COMPILED PATTERNS
 
        The facilities described in this section are  not  available  when  the
@@ -440,11 +612,20 @@
        a file that is not in the correct format, the result is undefined.
 
 
+SEE ALSO
+
+       pcre(3),  pcreapi(3),  pcrecallout(3), pcrematching(3), pcrepartial(d),
+       pcrepattern(3), pcreprecompile(3).
+
+
 AUTHOR
 
-       Philip Hazel <ph...@cam.ac.uk>
-       University Computing Service,
-       Cambridge CB2 3QG, England.
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
 
-Last updated: 10 September 2004
-Copyright (c) 1997-2004 University of Cambridge.
+       Last updated: 11 September 2007
+       Copyright (c) 1997-2007 University of Cambridge.

Modified: httpd/httpd/vendor/pcre/current/doc/perltest.txt
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/perltest.txt?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/perltest.txt (original)
+++ httpd/httpd/vendor/pcre/current/doc/perltest.txt Mon Nov 26 08:49:53 2007
@@ -29,5 +29,5 @@
 test some features of PCRE. Some of these files also contains malformed regular
 expressions, in order to check that PCRE diagnoses them correctly.
 
-Philip Hazel <ph...@cam.ac.uk>
+Philip Hazel
 September 2004