You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by pg...@apache.org on 2007/11/26 17:50:09 UTC
svn commit: r598339 [37/37] - in /httpd/httpd/vendor/pcre/current: ./ doc/
doc/html/ testdata/
Modified: httpd/httpd/vendor/pcre/current/testdata/testoutput3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/testdata/testoutput3?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/testdata/testoutput3 (original)
+++ httpd/httpd/vendor/pcre/current/testdata/testoutput3 Mon Nov 26 08:49:53 2007
@@ -1,5 +1,3 @@
-PCRE version 5.0 13-Sep-2004
-
/^[\w]+/
*** Failers
No match
@@ -95,8 +93,8 @@
No need char
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
- µ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä
- å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
+ ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
+ ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
/^[\xc8-\xc9]/iLfr_FR
École
@@ -111,5 +109,55 @@
No match
école
No match
+
+/\W+/Lfr_FR
+ >>>\xaa<<<
+ 0: >>>
+ >>>\xba<<<
+ 0: >>>
+
+/[\W]+/Lfr_FR
+ >>>\xaa<<<
+ 0: >>>
+ >>>\xba<<<
+ 0: >>>
+
+/[^[:alpha:]]+/Lfr_FR
+ >>>\xaa<<<
+ 0: >>>
+ >>>\xba<<<
+ 0: >>>
+
+/\w+/Lfr_FR
+ >>>\xaa<<<
+ 0: ª
+ >>>\xba<<<
+ 0: º
+
+/[\w]+/Lfr_FR
+ >>>\xaa<<<
+ 0: ª
+ >>>\xba<<<
+ 0: º
+
+/[[:alpha:]]+/Lfr_FR
+ >>>\xaa<<<
+ 0: ª
+ >>>\xba<<<
+ 0: º
+
+/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR
+------------------------------------------------------------------
+ Bra
+ [A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
+ [a-z\xb5\xdf-\xf6\xf8-\xff]
+ [A-Z\xc0-\xd6\xd8-\xde]
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+No options
+No first char
+No need char
/ End of testinput3 /
Modified: httpd/httpd/vendor/pcre/current/testdata/testoutput4
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/testdata/testoutput4?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/testdata/testoutput4 (original)
+++ httpd/httpd/vendor/pcre/current/testdata/testoutput4 Mon Nov 26 08:49:53 2007
@@ -1,5 +1,3 @@
-PCRE version 5.0 13-Sep-2004
-
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
No match
@@ -899,5 +897,45 @@
/^\x{85}$/8i
\x{85}
0: \x{85}
+
+/^á´/8
+ á´
+ 0: \x{1234}
+
+/^\á´/8
+ á´
+ 0: \x{1234}
+
+"(?s)(.{1,5})"8
+ abcdefg
+ 0: abcde
+ 1: abcde
+ ab
+ 0: ab
+ 1: ab
+
+/a*\x{100}*\w/8
+ a
+ 0: a
+
+/\S\S/8g
+ A\x{a3}BC
+ 0: A\x{a3}
+ 0: BC
+
+/\S{2}/8g
+ A\x{a3}BC
+ 0: A\x{a3}
+ 0: BC
+
+/\W\W/8g
+ +\x{a3}==
+ 0: +\x{a3}
+ 0: ==
+
+/\W{2}/8g
+ +\x{a3}==
+ 0: +\x{a3}
+ 0: ==
/ End of testinput4 /
Modified: httpd/httpd/vendor/pcre/current/testdata/testoutput5
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/testdata/testoutput5?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/testdata/testoutput5 (original)
+++ httpd/httpd/vendor/pcre/current/testdata/testoutput5 Mon Nov 26 08:49:53 2007
@@ -1,116 +1,105 @@
-PCRE version 5.0 13-Sep-2004
-
-/\x{100}/8DM
-Memory allocation (code space): 10
+/\x{100}/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{100}
- 6 6 Ket
- 9 End
+ Bra
+ \x{100}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 196
Need char = 128
-/\x{1000}/8DM
-Memory allocation (code space): 11
+/\x{1000}/8DZ
------------------------------------------------------------------
- 0 7 Bra 0
- 3 \x{1000}
- 7 7 Ket
- 10 End
+ Bra
+ \x{1000}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 225
Need char = 128
-/\x{10000}/8DM
-Memory allocation (code space): 12
+/\x{10000}/8DZ
------------------------------------------------------------------
- 0 8 Bra 0
- 3 \x{10000}
- 8 8 Ket
- 11 End
+ Bra
+ \x{10000}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 240
Need char = 128
-/\x{100000}/8DM
-Memory allocation (code space): 12
+/\x{100000}/8DZ
------------------------------------------------------------------
- 0 8 Bra 0
- 3 \x{100000}
- 8 8 Ket
- 11 End
+ Bra
+ \x{100000}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 244
Need char = 128
-/\x{1000000}/8DM
-Memory allocation (code space): 13
+/\x{1000000}/8DZ
------------------------------------------------------------------
- 0 9 Bra 0
- 3 \x{1000000}
- 9 9 Ket
- 12 End
+ Bra
+ \x{1000000}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 249
Need char = 128
-/\x{4000000}/8DM
-Memory allocation (code space): 14
+/\x{4000000}/8DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 \x{4000000}
- 10 10 Ket
- 13 End
+ Bra
+ \x{4000000}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 252
Need char = 128
-/\x{7fffFFFF}/8DM
-Memory allocation (code space): 14
+/\x{7fffFFFF}/8DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 \x{7fffffff}
- 10 10 Ket
- 13 End
+ Bra
+ \x{7fffffff}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 253
Need char = 191
-/[\x{ff}]/8DM
-Memory allocation (code space): 10
+/[\x{ff}]/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{ff}
- 6 6 Ket
- 9 End
+ Bra
+ \x{ff}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 195
Need char = 191
-/[\x{100}]/8DM
-Memory allocation (code space): 47
+/[\x{100}]/8DZ
------------------------------------------------------------------
- 0 11 Bra 0
- 3 [\x{100}]
- 11 11 Ket
- 14 End
+ Bra
+ [\x{100}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -127,36 +116,36 @@
\x{100}a\x{1234}bcd
0: \x{100}a\x{1234}
-/\x80/8D
+/\x80/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{80}
- 6 6 Ket
- 9 End
+ Bra
+ \x{80}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 194
Need char = 128
-/\xff/8D
+/\xff/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{ff}
- 6 6 Ket
- 9 End
+ Bra
+ \x{ff}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 195
Need char = 191
-/\x{0041}\x{2262}\x{0391}\x{002e}/D8
+/\x{0041}\x{2262}\x{0391}\x{002e}/DZ8
------------------------------------------------------------------
- 0 14 Bra 0
- 3 A\x{2262}\x{391}.
- 14 14 Ket
- 17 End
+ Bra
+ A\x{2262}\x{391}.
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -165,12 +154,12 @@
\x{0041}\x{2262}\x{0391}\x{002e}
0: A\x{2262}\x{391}.
-/\x{D55c}\x{ad6d}\x{C5B4}/D8
+/\x{D55c}\x{ad6d}\x{C5B4}/DZ8
------------------------------------------------------------------
- 0 15 Bra 0
- 3 \x{d55c}\x{ad6d}\x{c5b4}
- 15 15 Ket
- 18 End
+ Bra
+ \x{d55c}\x{ad6d}\x{c5b4}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -179,12 +168,12 @@
\x{D55c}\x{ad6d}\x{C5B4}
0: \x{d55c}\x{ad6d}\x{c5b4}
-/\x{65e5}\x{672c}\x{8a9e}/D8
+/\x{65e5}\x{672c}\x{8a9e}/DZ8
------------------------------------------------------------------
- 0 15 Bra 0
- 3 \x{65e5}\x{672c}\x{8a9e}
- 15 15 Ket
- 18 End
+ Bra
+ \x{65e5}\x{672c}\x{8a9e}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -193,74 +182,74 @@
\x{65e5}\x{672c}\x{8a9e}
0: \x{65e5}\x{672c}\x{8a9e}
-/\x{80}/D8
+/\x{80}/DZ8
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{80}
- 6 6 Ket
- 9 End
+ Bra
+ \x{80}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 194
Need char = 128
-/\x{084}/D8
+/\x{084}/DZ8
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{84}
- 6 6 Ket
- 9 End
+ Bra
+ \x{84}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 194
Need char = 132
-/\x{104}/D8
+/\x{104}/DZ8
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{104}
- 6 6 Ket
- 9 End
+ Bra
+ \x{104}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 196
Need char = 132
-/\x{861}/D8
+/\x{861}/DZ8
------------------------------------------------------------------
- 0 7 Bra 0
- 3 \x{861}
- 7 7 Ket
- 10 End
+ Bra
+ \x{861}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 224
Need char = 161
-/\x{212ab}/D8
+/\x{212ab}/DZ8
------------------------------------------------------------------
- 0 8 Bra 0
- 3 \x{212ab}
- 8 8 Ket
- 11 End
+ Bra
+ \x{212ab}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 240
Need char = 171
-/.{3,5}X/D8
+/.{3,5}X/DZ8
------------------------------------------------------------------
- 0 13 Bra 0
- 3 Any{3}
- 7 Any{0,2}
- 11 X
- 13 13 Ket
- 16 End
+ Bra
+ Any{3}
+ Any{0,2}
+ X
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -271,13 +260,13 @@
0: \x{212ab}\x{212ab}\x{212ab}\x{861}X
-/.{3,5}?/D8
+/.{3,5}?/DZ8
------------------------------------------------------------------
- 0 11 Bra 0
- 3 Any{3}
- 7 Any{0,2}?
- 11 11 Ket
- 14 End
+ Bra
+ Any{3}
+ Any{0,2}?
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -287,11 +276,9 @@
\x{212ab}\x{212ab}\x{212ab}\x{861}
0: \x{212ab}\x{212ab}\x{212ab}
-/-- These tests are here rather than in testinput4 because Perl 5.6 has --/
-/-- some problems with UTF-8 support, in the area of \x{..} where the --/
-No match
-/-- value is < 255. It grumbles about invalid UTF-8 strings. --/
-No match
+/-- These tests are here rather than in testinput4 because Perl 5.6 has some
+problems with UTF-8 support, in the area of \x{..} where the value is < 255.
+It grumbles about invalid UTF-8 strings. --/
/^[a\x{c0}]b/8
\x{c0}b
@@ -331,11 +318,9 @@
/(?<=\C)X/8
Failed: \C not allowed in lookbehind assertion at offset 6
-/-- This one is here not because it's different to Perl, but because the --/
-/-- way the captured single-byte is displayed. (In Perl it becomes a --/
-No match
-/-- character, and you can't tell the difference.) --/
-No match
+/-- This one is here not because it's different to Perl, but because the way
+the captured single-byte is displayed. (In Perl it becomes a character, and you
+can't tell the difference.) --/
/X(\C)(.*)/8
X\x{1234}
@@ -347,13 +332,13 @@
1: \x{0a}
2: abc
-/^[ab]/8D
+/^[ab]/8DZ
------------------------------------------------------------------
- 0 37 Bra 0
- 3 ^
- 4 [ab]
- 37 37 Ket
- 40 End
+ Bra
+ ^
+ [ab]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: anchored utf8
@@ -370,13 +355,13 @@
\x{100}
No match
-/^[^ab]/8D
+/^[^ab]/8DZ
------------------------------------------------------------------
- 0 37 Bra 0
- 3 ^
- 4 [\x00-`c-\xff] (neg)
- 37 37 Ket
- 40 End
+ Bra
+ ^
+ [\x00-`c-\xff] (neg)
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: anchored utf8
@@ -393,12 +378,12 @@
aaa
No match
-/[^ab\xC0-\xF0]/8SD
+/[^ab\xC0-\xF0]/8SDZ
------------------------------------------------------------------
- 0 36 Bra 0
- 3 [\x00-`c-\xbf\xf1-\xff] (neg)
- 36 36 Ket
- 39 End
+ Bra
+ [\x00-`c-\xbf\xf1-\xff] (neg)
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -429,13 +414,13 @@
\x{f0}
No match
-/Ä{3,4}/8SD
+/Ä{3,4}/8SDZ
------------------------------------------------------------------
- 0 13 Bra 0
- 3 \x{100}{3}
- 8 \x{100}{,1}
- 13 13 Ket
- 16 End
+ Bra
+ \x{100}{3}
+ \x{100}?
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -446,16 +431,16 @@
\x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100}
-/(\x{100}+|x)/8SD
+/(\x{100}+|x)/8SDZ
------------------------------------------------------------------
- 0 17 Bra 0
- 3 6 Bra 1
- 6 \x{100}+
- 9 5 Alt
- 12 x
- 14 11 Ket
- 17 17 Ket
- 20 End
+ Bra
+ CBra 1
+ \x{100}+
+ Alt
+ x
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
@@ -464,17 +449,17 @@
No need char
Starting byte set: x \xc4
-/(\x{100}*a|x)/8SD
+/(\x{100}*a|x)/8SDZ
------------------------------------------------------------------
- 0 19 Bra 0
- 3 8 Bra 1
- 6 \x{100}*
- 9 a
- 11 5 Alt
- 14 x
- 16 13 Ket
- 19 19 Ket
- 22 End
+ Bra
+ CBra 1
+ \x{100}*+
+ a
+ Alt
+ x
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
@@ -483,17 +468,17 @@
No need char
Starting byte set: a x \xc4
-/(\x{100}{0,2}a|x)/8SD
+/(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------
- 0 21 Bra 0
- 3 10 Bra 1
- 6 \x{100}{,2}
- 11 a
- 13 5 Alt
- 16 x
- 18 15 Ket
- 21 21 Ket
- 24 End
+ Bra
+ CBra 1
+ \x{100}{0,2}
+ a
+ Alt
+ x
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
@@ -502,18 +487,18 @@
No need char
Starting byte set: a x \xc4
-/(\x{100}{1,2}a|x)/8SD
+/(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------
- 0 24 Bra 0
- 3 13 Bra 1
- 6 \x{100}
- 9 \x{100}{,1}
- 14 a
- 16 5 Alt
- 19 x
- 21 18 Ket
- 24 24 Ket
- 27 End
+ Bra
+ CBra 1
+ \x{100}
+ \x{100}{0,1}
+ a
+ Alt
+ x
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
@@ -546,24 +531,24 @@
\x{100}\x{100}abcd
No match
-/\x{100}/8D
+/\x{100}/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{100}
- 6 6 Ket
- 9 End
+ Bra
+ \x{100}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 196
Need char = 128
-/\x{100}*/8D
+/\x{100}*/8DZ
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{100}*
- 6 6 Ket
- 9 End
+ Bra
+ \x{100}*
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -571,13 +556,13 @@
No first char
No need char
-/a\x{100}*/8D
+/a\x{100}*/8DZ
------------------------------------------------------------------
- 0 8 Bra 0
- 3 a
- 5 \x{100}*
- 8 8 Ket
- 11 End
+ Bra
+ a
+ \x{100}*
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -585,13 +570,13 @@
First char = 'a'
No need char
-/ab\x{100}*/8D
+/ab\x{100}*/8DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 ab
- 7 \x{100}*
- 10 10 Ket
- 13 End
+ Bra
+ ab
+ \x{100}*
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -599,13 +584,13 @@
First char = 'a'
Need char = 'b'
-/a\x{100}\x{101}*/8D
+/a\x{100}\x{101}*/8DZ
------------------------------------------------------------------
- 0 11 Bra 0
- 3 a\x{100}
- 8 \x{101}*
- 11 11 Ket
- 14 End
+ Bra
+ a\x{100}
+ \x{101}*
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -613,13 +598,13 @@
First char = 'a'
Need char = 128
-/a\x{100}\x{101}+/8D
+/a\x{100}\x{101}+/8DZ
------------------------------------------------------------------
- 0 11 Bra 0
- 3 a\x{100}
- 8 \x{101}+
- 11 11 Ket
- 14 End
+ Bra
+ a\x{100}
+ \x{101}+
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -627,13 +612,13 @@
First char = 'a'
Need char = 129
-/\x{100}*A/8D
+/\x{100}*A/8DZ
------------------------------------------------------------------
- 0 8 Bra 0
- 3 \x{100}*
- 6 A
- 8 8 Ket
- 11 End
+ Bra
+ \x{100}*+
+ A
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -643,14 +628,16 @@
A
0: A
-/\x{100}*\d(?R)/8D
+/\x{100}*\d(?R)/8DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 \x{100}*
- 6 \d
- 7 0 Recurse
- 10 10 Ket
- 13 End
+ Bra
+ \x{100}*+
+ \d
+ Once
+ Recurse
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -658,37 +645,36 @@
No first char
No need char
-/[^\x{c4}]/D
+/[^\x{c4}]/DZ
------------------------------------------------------------------
- 0 36 Bra 0
- 3 [\x01-35-bd-z|~-\xff] (neg)
- 36 36 Ket
- 39 End
+ Bra
+ [^\xc4]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[^\x{c4}]/8D
+/[^\x{c4}]/8DZ
------------------------------------------------------------------
- 0 36 Bra 0
- 3 [\x00-\xc3\xc5-\xff] (neg)
- 36 36 Ket
- 39 End
+ Bra
+ [\x00-\xc3\xc5-\xff] (neg)
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
No first char
No need char
-/[\x{100}]/8DM
-Memory allocation (code space): 47
+/[\x{100}]/8DZ
------------------------------------------------------------------
- 0 11 Bra 0
- 3 [\x{100}]
- 11 11 Ket
- 14 End
+ Bra
+ [\x{100}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -703,13 +689,12 @@
*** Failers
No match
-/[Z\x{100}]/8DM
-Memory allocation (code space): 47
+/[Z\x{100}]/8DZ
------------------------------------------------------------------
- 0 43 Bra 0
- 3 [Z\x{100}]
- 43 43 Ket
- 46 End
+ Bra
+ [Z\x{100}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -739,24 +724,24 @@
\x{ff}
No match
-/[z-\x{100}]/8D
+/[z-\x{100}]/8DZ
------------------------------------------------------------------
- 0 12 Bra 0
- 3 [z-\x{100}]
- 12 12 Ket
- 15 End
+ Bra
+ [z-\x{100}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
No first char
No need char
-/[z\Qa-d]Ä\E]/8D
+/[z\Qa-d]Ä\E]/8DZ
------------------------------------------------------------------
- 0 43 Bra 0
- 3 [\-\]adz\x{100}]
- 43 43 Ket
- 46 End
+ Bra
+ [\-\]adz\x{100}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -767,12 +752,12 @@
Ä
0: \x{100}
-/[\xFF]/D
+/[\xFF]/DZ
------------------------------------------------------------------
- 0 5 Bra 0
- 3 \xff
- 5 5 Ket
- 8 End
+ Bra
+ \xff
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
@@ -781,12 +766,12 @@
>\xff<
0: \xff
-/[\xff]/D8
+/[\xff]/DZ8
------------------------------------------------------------------
- 0 6 Bra 0
- 3 \x{ff}
- 6 6 Ket
- 9 End
+ Bra
+ \x{ff}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -795,24 +780,24 @@
>\x{ff}<
0: \x{ff}
-/[^\xFF]/D
+/[^\xFF]/DZ
------------------------------------------------------------------
- 0 5 Bra 0
- 3 [^\xff]
- 5 5 Ket
- 8 End
+ Bra
+ [^\xff]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[^\xff]/8D
+/[^\xff]/8DZ
------------------------------------------------------------------
- 0 36 Bra 0
- 3 [\x00-\xfe] (neg)
- 36 36 Ket
- 39 End
+ Bra
+ [\x00-\xfe] (neg)
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -852,12 +837,12 @@
/ÃÃÃxxx/8
Failed: invalid UTF-8 string at offset 1
-/ÃÃÃxxx/8?D
+/ÃÃÃxxx/8?DZ
------------------------------------------------------------------
- 0 15 Bra 0
- 3 \X{c0}\X{c0}\X{c0}xxx
- 15 15 Ket
- 18 End
+ Bra
+ \X{c0}\X{c0}\X{c0}xxx
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8 no_utf8_check
@@ -902,160 +887,186 @@
\xf1\x8f\x80\x80
No match
\xf8\x88\x80\x80\x80
-No match
+Error -10
\xf9\x87\x80\x80\x80
-No match
+Error -10
\xfc\x84\x80\x80\x80\x80
-No match
+Error -10
\xfd\x83\x80\x80\x80\x80
+Error -10
+ \?\xf8\x88\x80\x80\x80
+No match
+ \?\xf9\x87\x80\x80\x80
+No match
+ \?\xfc\x84\x80\x80\x80\x80
+No match
+ \?\xfd\x83\x80\x80\x80\x80
No match
-/\x{100}abc(xyz(?1))/8D
+/\x{100}abc(xyz(?1))/8DZ
------------------------------------------------------------------
- 0 27 Bra 0
- 3 \x{100}abc
- 12 12 Bra 1
- 15 xyz
- 21 12 Recurse
- 24 12 Ket
- 27 27 Ket
- 30 End
+ Bra
+ \x{100}abc
+ CBra 1
+ xyz
+ Once
+ Recurse
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf8
First char = 196
Need char = 'z'
-/[^\x{100}]abc(xyz(?1))/8D
+/[^\x{100}]abc(xyz(?1))/8DZ
------------------------------------------------------------------
- 0 32 Bra 0
- 3 [^\x{100}]
- 11 abc
- 17 12 Bra 1
- 20 xyz
- 26 17 Recurse
- 29 12 Ket
- 32 32 Ket
- 35 End
+ Bra
+ [^\x{100}]
+ abc
+ CBra 1
+ xyz
+ Once
+ Recurse
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf8
No first char
Need char = 'z'
-/[ab\x{100}]abc(xyz(?1))/8D
+/[ab\x{100}]abc(xyz(?1))/8DZ
------------------------------------------------------------------
- 0 64 Bra 0
- 3 [ab\x{100}]
- 43 abc
- 49 12 Bra 1
- 52 xyz
- 58 49 Recurse
- 61 12 Ket
- 64 64 Ket
- 67 End
+ Bra
+ [ab\x{100}]
+ abc
+ CBra 1
+ xyz
+ Once
+ Recurse
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf8
No first char
Need char = 'z'
-/(\x{100}(b(?2)c))?/D8
+/(\x{100}(b(?2)c))?/DZ8
------------------------------------------------------------------
- 0 26 Bra 0
- 3 Brazero
- 4 19 Bra 1
- 7 \x{100}
- 10 10 Bra 2
- 13 b
- 15 10 Recurse
- 18 c
- 20 10 Ket
- 23 19 Ket
- 26 26 Ket
- 29 End
+ Bra
+ Brazero
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 2
Options: utf8
No first char
No need char
-/(\x{100}(b(?2)c)){0,2}/D8
+/(\x{100}(b(?2)c)){0,2}/DZ8
------------------------------------------------------------------
- 0 55 Bra 0
- 3 Brazero
- 4 48 Bra 0
- 7 19 Bra 1
- 10 \x{100}
- 13 10 Bra 2
- 16 b
- 18 13 Recurse
- 21 c
- 23 10 Ket
- 26 19 Ket
- 29 Brazero
- 30 19 Bra 1
- 33 \x{100}
- 36 10 Bra 2
- 39 b
- 41 13 Recurse
- 44 c
- 46 10 Ket
- 49 19 Ket
- 52 48 Ket
- 55 55 Ket
- 58 End
+ Bra
+ Brazero
+ Bra
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Brazero
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 2
Options: utf8
No first char
No need char
-/(\x{100}(b(?1)c))?/D8
+/(\x{100}(b(?1)c))?/DZ8
------------------------------------------------------------------
- 0 26 Bra 0
- 3 Brazero
- 4 19 Bra 1
- 7 \x{100}
- 10 10 Bra 2
- 13 b
- 15 4 Recurse
- 18 c
- 20 10 Ket
- 23 19 Ket
- 26 26 Ket
- 29 End
+ Bra
+ Brazero
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 2
Options: utf8
No first char
No need char
-/(\x{100}(b(?1)c)){0,2}/D8
+/(\x{100}(b(?1)c)){0,2}/DZ8
------------------------------------------------------------------
- 0 55 Bra 0
- 3 Brazero
- 4 48 Bra 0
- 7 19 Bra 1
- 10 \x{100}
- 13 10 Bra 2
- 16 b
- 18 7 Recurse
- 21 c
- 23 10 Ket
- 26 19 Ket
- 29 Brazero
- 30 19 Bra 1
- 33 \x{100}
- 36 10 Bra 2
- 39 b
- 41 7 Recurse
- 44 c
- 46 10 Ket
- 49 19 Ket
- 52 48 Ket
- 55 55 Ket
- 58 End
+ Bra
+ Brazero
+ Bra
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Brazero
+ CBra 1
+ \x{100}
+ CBra 2
+ b
+ Once
+ Recurse
+ Ket
+ c
+ Ket
+ Ket
+ Ket
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 2
Options: utf8
@@ -1072,4 +1083,516 @@
\x{100}X
0: X
+/a\x{1234}b/P8
+ a\x{1234}b
+ 0: a\x{1234}b
+
+/^\á´/8DZ
+------------------------------------------------------------------
+ Bra
+ ^
+ \x{1234}
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options: anchored utf8
+No first char
+No need char
+
+/\777/I
+Failed: octal value is greater than \377 (not in UTF-8 mode) at offset 3
+
+/\777/8I
+Capturing subpattern count = 0
+Options: utf8
+First char = 199
+Need char = 191
+ \x{1ff}
+ 0: \x{1ff}
+ \777
+ 0: \x{1ff}
+
+/\x{100}*\d/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*+
+ \d
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}*\s/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*+
+ \s
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}*\w/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*+
+ \w
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}*\D/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*
+ \D
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}*\S/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*
+ \S
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}*\W/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}*
+ \W
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+No first char
+No need char
+
+/\x{100}+\x{200}/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}++
+ \x{200}
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+First char = 196
+Need char = 128
+
+/\x{100}+X/8DZ
+------------------------------------------------------------------
+ Bra
+ \x{100}++
+ X
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+First char = 196
+Need char = 'X'
+
+/X+\x{200}/8DZ
+------------------------------------------------------------------
+ Bra
+ X++
+ \x{200}
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Partial matching not supported
+Options: utf8
+First char = 'X'
+Need char = 128
+
+/()()()()()()()()()()
+ ()()()()()()()()()()
+ ()()()()()()()()()()
+ ()()()()()()()()()()
+ A (x) (?41) B/8x
+ AxxB
+Matched, but too many substrings
+ 0: AxxB
+ 1:
+ 2:
+ 3:
+ 4:
+ 5:
+ 6:
+ 7:
+ 8:
+ 9:
+10:
+11:
+12:
+13:
+14:
+
+/^[\x{100}\E-\Q\E\x{150}]/BZ8
+------------------------------------------------------------------
+ Bra
+ ^
+ [\x{100}-\x{150}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/^[\QÄ\E-\QÅ\E]/BZ8
+------------------------------------------------------------------
+ Bra
+ ^
+ [\x{100}-\x{150}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/^[\QÄ\E-\QÅ\E/BZ8
+Failed: missing terminating ] for character class at offset 15
+
+/^abc./mgx8<any>
+ abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK
+ 0: abc1
+ 0: abc2
+ 0: abc3
+ 0: abc4
+ 0: abc5
+ 0: abc6
+ 0: abc7
+ 0: abc8
+ 0: abc9
+
+/abc.$/mgx8<any>
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9
+ 0: abc1
+ 0: abc2
+ 0: abc3
+ 0: abc4
+ 0: abc5
+ 0: abc6
+ 0: abc7
+ 0: abc8
+ 0: abc9
+
+/^a\Rb/8<bsr_unicode>
+ a\nb
+ 0: a\x{0a}b
+ a\rb
+ 0: a\x{0d}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ a\x0bb
+ 0: a\x{0b}b
+ a\x0cb
+ 0: a\x{0c}b
+ a\x{85}b
+ 0: a\x{85}b
+ a\x{2028}b
+ 0: a\x{2028}b
+ a\x{2029}b
+ 0: a\x{2029}b
+ ** Failers
+No match
+ a\n\rb
+No match
+
+/^a\R*b/8<bsr_unicode>
+ ab
+ 0: ab
+ a\nb
+ 0: a\x{0a}b
+ a\rb
+ 0: a\x{0d}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ a\x0bb
+ 0: a\x{0b}b
+ a\x0c\x{2028}\x{2029}b
+ 0: a\x{0c}\x{2028}\x{2029}b
+ a\x{85}b
+ 0: a\x{85}b
+ a\n\rb
+ 0: a\x{0a}\x{0d}b
+ a\n\r\x{85}\x0cb
+ 0: a\x{0a}\x{0d}\x{85}\x{0c}b
+
+/^a\R+b/8<bsr_unicode>
+ a\nb
+ 0: a\x{0a}b
+ a\rb
+ 0: a\x{0d}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ a\x0bb
+ 0: a\x{0b}b
+ a\x0c\x{2028}\x{2029}b
+ 0: a\x{0c}\x{2028}\x{2029}b
+ a\x{85}b
+ 0: a\x{85}b
+ a\n\rb
+ 0: a\x{0a}\x{0d}b
+ a\n\r\x{85}\x0cb
+ 0: a\x{0a}\x{0d}\x{85}\x{0c}b
+ ** Failers
+No match
+ ab
+No match
+
+/^a\R{1,3}b/8<bsr_unicode>
+ a\nb
+ 0: a\x{0a}b
+ a\n\rb
+ 0: a\x{0a}\x{0d}b
+ a\n\r\x{85}b
+ 0: a\x{0a}\x{0d}\x{85}b
+ a\r\n\r\nb
+ 0: a\x{0d}\x{0a}\x{0d}\x{0a}b
+ a\r\n\r\n\r\nb
+ 0: a\x{0d}\x{0a}\x{0d}\x{0a}\x{0d}\x{0a}b
+ a\n\r\n\rb
+ 0: a\x{0a}\x{0d}\x{0a}\x{0d}b
+ a\n\n\r\nb
+ 0: a\x{0a}\x{0a}\x{0d}\x{0a}b
+ ** Failers
+No match
+ a\n\n\n\rb
+No match
+ a\r
+No match
+
+/\H\h\V\v/8
+ X X\x0a
+ 0: X X\x{0a}
+ X\x09X\x0b
+ 0: X\x{09}X\x{0b}
+ ** Failers
+No match
+ \x{a0} X\x0a
+No match
+
+/\H*\h+\V?\v{3,4}/8
+ \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a
+ 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d}
+ \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a
+ 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d}
+ \x09\x20\x{a0}\x0a\x0b\x0c
+ 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}
+ ** Failers
+No match
+ \x09\x20\x{a0}\x0a\x0b
+No match
+
+/\H\h\V\v/8
+ \x{3001}\x{3000}\x{2030}\x{2028}
+ 0: \x{3001}\x{3000}\x{2030}\x{2028}
+ X\x{180e}X\x{85}
+ 0: X\x{180e}X\x{85}
+ ** Failers
+No match
+ \x{2009} X\x0a
+No match
+
+/\H*\h+\V?\v{3,4}/8
+ \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a
+ 0: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c}\x{0d}
+ \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a
+ 0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028}
+ \x09\x20\x{202f}\x0a\x0b\x0c
+ 0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c}
+ ** Failers
+No match
+ \x09\x{200a}\x{a0}\x{2028}\x0b
+No match
+
+/[\h]/8BZ
+------------------------------------------------------------------
+ Bra
+ [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]
+ Ket
+ End
+------------------------------------------------------------------
+ >\x{1680}
+ 0: \x{1680}
+
+/[\h]{3,}/8BZ
+------------------------------------------------------------------
+ Bra
+ [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]{3,}
+ Ket
+ End
+------------------------------------------------------------------
+ >\x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}<
+ 0: \x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}
+
+/[\v]/8BZ
+------------------------------------------------------------------
+ Bra
+ [\x0a-\x0d\x85\x{2028}-\x{2029}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/[\H]/8BZ
+------------------------------------------------------------------
+ Bra
+ [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{7fffffff}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/[\V]/8BZ
+------------------------------------------------------------------
+ Bra
+ [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{2029}-\x{7fffffff}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/.*$/8<any>
+ \x{1ec5}
+ 0: \x{1ec5}
+
+/-- This tests the stricter UTF-8 check according to RFC 3629. --/
+
+/X/8
+ \x{0}\x{d7ff}\x{e000}\x{10ffff}
+No match
+ \x{d800}
+Error -10
+ \x{d800}\?
+No match
+ \x{da00}
+Error -10
+ \x{da00}\?
+No match
+ \x{dfff}
+Error -10
+ \x{dfff}\?
+No match
+ \x{110000}
+Error -10
+ \x{110000}\?
+No match
+ \x{2000000}
+Error -10
+ \x{2000000}\?
+No match
+ \x{7fffffff}
+Error -10
+ \x{7fffffff}\?
+No match
+
+/a\Rb/I8<bsr_anycrlf>
+Capturing subpattern count = 0
+Options: bsr_anycrlf utf8
+First char = 'a'
+Need char = 'b'
+ a\rb
+ 0: a\x{0d}b
+ a\nb
+ 0: a\x{0a}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ ** Failers
+No match
+ a\x{85}b
+No match
+ a\x0bb
+No match
+
+/a\Rb/I8<bsr_unicode>
+Capturing subpattern count = 0
+Options: bsr_unicode utf8
+First char = 'a'
+Need char = 'b'
+ a\rb
+ 0: a\x{0d}b
+ a\nb
+ 0: a\x{0a}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ a\x{85}b
+ 0: a\x{85}b
+ a\x0bb
+ 0: a\x{0b}b
+ ** Failers
+No match
+ a\x{85}b\<bsr_anycrlf>
+No match
+ a\x0bb\<bsr_anycrlf>
+No match
+
+/a\R?b/I8<bsr_anycrlf>
+Capturing subpattern count = 0
+Options: bsr_anycrlf utf8
+First char = 'a'
+Need char = 'b'
+ a\rb
+ 0: a\x{0d}b
+ a\nb
+ 0: a\x{0a}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ ** Failers
+No match
+ a\x{85}b
+No match
+ a\x0bb
+No match
+
+/a\R?b/I8<bsr_unicode>
+Capturing subpattern count = 0
+Options: bsr_unicode utf8
+First char = 'a'
+Need char = 'b'
+ a\rb
+ 0: a\x{0d}b
+ a\nb
+ 0: a\x{0a}b
+ a\r\nb
+ 0: a\x{0d}\x{0a}b
+ a\x{85}b
+ 0: a\x{85}b
+ a\x0bb
+ 0: a\x{0b}b
+ ** Failers
+No match
+ a\x{85}b\<bsr_anycrlf>
+No match
+ a\x0bb\<bsr_anycrlf>
+No match
+
/ End of testinput5 /
Modified: httpd/httpd/vendor/pcre/current/testdata/testoutput6
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/testdata/testoutput6?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/testdata/testoutput6 (original)
+++ httpd/httpd/vendor/pcre/current/testdata/testoutput6 Mon Nov 26 08:49:53 2007
@@ -1,5 +1,3 @@
-PCRE version 5.0 13-Sep-2004
-
/^\pC\pL\pM\pN\pP\pS\pZ</8
\x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
0: \x{7f}\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
@@ -85,6 +83,8 @@
No match
/^\p{Cn}/8
+ \x{e0000}
+ 0: \x{e0000}
** Failers
No match
\x{09f}
@@ -99,7 +99,7 @@
No match
/^\p{Cs}/8
- \x{dfff}
+ \?\x{dfff}
0: \x{dfff}
** Failers
No match
@@ -113,7 +113,7 @@
No match
Z
No match
- \x{dfff}
+ \x{e000}
No match
/^\p{Lm}/8
@@ -127,12 +127,24 @@
/^\p{Lo}/8
\x{1bb}
0: \x{1bb}
+ \x{3400}
+ 0: \x{3400}
+ \x{3401}
+ 0: \x{3401}
+ \x{4d00}
+ 0: \x{4d00}
+ \x{4db4}
+ 0: \x{4db4}
+ \x{4db5}
+ 0: \x{4db5}
** Failers
No match
a
No match
\x{2b0}
No match
+ \x{4db6}
+No match
/^\p{Lt}/8
\x{1c5}
@@ -536,73 +548,72 @@
WXYZ
No match
-/[\p{L}]/D
+/[\p{L}]/DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 [\p{L}]
- 10 10 Ket
- 13 End
+ Bra
+ [\p{L}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[\p{^L}]/D
+/[\p{^L}]/DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 [\P{L}]
- 10 10 Ket
- 13 End
+ Bra
+ [\P{L}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[\P{L}]/D
+/[\P{L}]/DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 [\P{L}]
- 10 10 Ket
- 13 End
+ Bra
+ [\P{L}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[\P{^L}]/D
+/[\P{^L}]/DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 [\p{L}]
- 10 10 Ket
- 13 End
+ Bra
+ [\p{L}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
-/[abc\p{L}\x{0660}]/8D
+/[abc\p{L}\x{0660}]/8DZ
------------------------------------------------------------------
- 0 45 Bra 0
- 3 [a-c\p{L}\x{660}]
- 45 45 Ket
- 48 End
+ Bra
+ [a-c\p{L}\x{660}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
No first char
No need char
-/[\p{Nd}]/8DM
-Memory allocation (code space): 46
+/[\p{Nd}]/8DZ
------------------------------------------------------------------
- 0 10 Bra 0
- 3 [\p{Nd}]
- 10 10 Ket
- 13 End
+ Bra
+ [\p{Nd}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
@@ -611,13 +622,12 @@
1234
0: 1
-/[\p{Nd}+-]+/8DM
-Memory allocation (code space): 47
+/[\p{Nd}+-]+/8DZ
------------------------------------------------------------------
- 0 43 Bra 0
- 3 [+\-\p{Nd}]+
- 43 43 Ket
- 46 End
+ Bra
+ [+\-\p{Nd}]+
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
@@ -767,48 +777,48 @@
A\x{391}\x{10427}\x{ff3a}\x{1fb8}
0: A\x{391}\x{10427}\x{ff3a}\x{1fb8}
-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iD
+/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
------------------------------------------------------------------
- 0 21 Bra 0
- 3 NC A\x{391}\x{10427}\x{ff3a}\x{1fb0}
- 21 21 Ket
- 24 End
+ Bra
+ NC A\x{391}\x{10427}\x{ff3a}\x{1fb0}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
First char = 'A' (caseless)
No need char
-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8D
+/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
------------------------------------------------------------------
- 0 21 Bra 0
- 3 A\x{391}\x{10427}\x{ff3a}\x{1fb0}
- 21 21 Ket
- 24 End
+ Bra
+ A\x{391}\x{10427}\x{ff3a}\x{1fb0}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 'A'
Need char = 176
-/AB\x{1fb0}/8D
+/AB\x{1fb0}/8DZ
------------------------------------------------------------------
- 0 11 Bra 0
- 3 AB\x{1fb0}
- 11 11 Ket
- 14 End
+ Bra
+ AB\x{1fb0}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 'A'
Need char = 176
-/AB\x{1fb0}/8Di
+/AB\x{1fb0}/8DZi
------------------------------------------------------------------
- 0 11 Bra 0
- 3 NC AB\x{1fb0}
- 11 11 Ket
- 14 End
+ Bra
+ NC AB\x{1fb0}
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
@@ -845,12 +855,12 @@
\x{e0}
0: \x{e0}
-/[\x{105}-\x{109}]/8iD
+/[\x{105}-\x{109}]/8iDZ
------------------------------------------------------------------
- 0 13 Bra 0
- 3 [\x{104}-\x{109}]
- 13 13 Ket
- 16 End
+ Bra
+ [\x{104}-\x{109}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
@@ -869,12 +879,12 @@
\x{10a}
No match
-/[z-\x{100}]/8iD
+/[z-\x{100}]/8iDZ
------------------------------------------------------------------
- 0 20 Bra 0
- 3 [Z\x{39c}\x{178}z-\x{101}]
- 20 20 Ket
- 23 End
+ Bra
+ [Z\x{39c}\x{178}z-\x{101}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
@@ -907,12 +917,12 @@
y
No match
-/[z-\x{100}]/8Di
+/[z-\x{100}]/8DZi
------------------------------------------------------------------
- 0 20 Bra 0
- 3 [Z\x{39c}\x{178}z-\x{101}]
- 20 20 Ket
- 23 End
+ Bra
+ [Z\x{39c}\x{178}z-\x{101}]
+ Ket
+ End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
@@ -1010,4 +1020,506 @@
0: A\x{300}\x{301}B\x{300}C
1: C
+/^\p{Han}+/8
+ \x{2e81}\x{3007}\x{2f804}\x{31a0}
+ 0: \x{2e81}\x{3007}\x{2f804}
+ ** Failers
+No match
+ \x{2e7f}
+No match
+
+/^\P{Katakana}+/8
+ \x{3105}
+ 0: \x{3105}
+ ** Failers
+ 0: ** Failers
+ \x{30ff}
+No match
+
+/^[\p{Arabic}]/8
+ \x{06e9}
+ 0: \x{6e9}
+ \x{060b}
+ 0: \x{60b}
+ ** Failers
+No match
+ X\x{06e9}
+No match
+
+/^[\P{Yi}]/8
+ \x{2f800}
+ 0: \x{2f800}
+ ** Failers
+ 0: *
+ \x{a014}
+No match
+ \x{a4c6}
+No match
+
+/^\p{Any}X/8
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ ** Failers
+No match
+ X
+No match
+
+/^\P{Any}X/8
+ ** Failers
+No match
+ AX
+No match
+
+/^\p{Any}?X/8
+ XYZ
+ 0: X
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ ** Failers
+No match
+ ABXYZ
+No match
+
+/^\P{Any}?X/8
+ XYZ
+ 0: X
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ ABXYZ
+No match
+
+/^\p{Any}+X/8
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ A\x{1234}XYZ
+ 0: A\x{1234}X
+ ** Failers
+No match
+ XYZ
+No match
+
+/^\P{Any}+X/8
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ A\x{1234}XYZ
+No match
+ XYZ
+No match
+
+/^\p{Any}*X/8
+ XYZ
+ 0: X
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ A\x{1234}XYZ
+ 0: A\x{1234}X
+ ** Failers
+No match
+
+/^\P{Any}*X/8
+ XYZ
+ 0: X
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ A\x{1234}XYZ
+No match
+
+/^[\p{Any}]X/8
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ ** Failers
+No match
+ X
+No match
+
+/^[\P{Any}]X/8
+ ** Failers
+No match
+ AX
+No match
+
+/^[\p{Any}]?X/8
+ XYZ
+ 0: X
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ ** Failers
+No match
+ ABXYZ
+No match
+
+/^[\P{Any}]?X/8
+ XYZ
+ 0: X
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ ABXYZ
+No match
+
+/^[\p{Any}]+X/8
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ A\x{1234}XYZ
+ 0: A\x{1234}X
+ ** Failers
+No match
+ XYZ
+No match
+
+/^[\P{Any}]+X/8
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ A\x{1234}XYZ
+No match
+ XYZ
+No match
+
+/^[\p{Any}]*X/8
+ XYZ
+ 0: X
+ AXYZ
+ 0: AX
+ \x{1234}XYZ
+ 0: \x{1234}X
+ A\x{1234}XYZ
+ 0: A\x{1234}X
+ ** Failers
+No match
+
+/^[\P{Any}]*X/8
+ XYZ
+ 0: X
+ ** Failers
+No match
+ AXYZ
+No match
+ \x{1234}XYZ
+No match
+ A\x{1234}XYZ
+No match
+
+/^\p{Any}{3,5}?/8
+ abcdefgh
+ 0: abc
+ \x{1234}\n\r\x{3456}xyz
+ 0: \x{1234}\x{0a}\x{0d}
+
+/^\p{Any}{3,5}/8
+ abcdefgh
+ 0: abcde
+ \x{1234}\n\r\x{3456}xyz
+ 0: \x{1234}\x{0a}\x{0d}\x{3456}x
+
+/^\P{Any}{3,5}?/8
+ ** Failers
+No match
+ abcdefgh
+No match
+ \x{1234}\n\r\x{3456}xyz
+No match
+
+/^\p{L&}X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ \x{1c5}XY
+ 0: \x{1c5}X
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^[\p{L&}]X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ \x{1c5}XY
+ 0: \x{1c5}X
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^\p{L&}+X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ AbcdeXyz
+ 0: AbcdeX
+ \x{1c5}AbXY
+ 0: \x{1c5}AbX
+ abcDEXypqreXlmn
+ 0: abcDEXypqreX
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^[\p{L&}]+X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ AbcdeXyz
+ 0: AbcdeX
+ \x{1c5}AbXY
+ 0: \x{1c5}AbX
+ abcDEXypqreXlmn
+ 0: abcDEXypqreX
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^\p{L&}+?X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ AbcdeXyz
+ 0: AbcdeX
+ \x{1c5}AbXY
+ 0: \x{1c5}AbX
+ abcDEXypqreXlmn
+ 0: abcDEX
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^[\p{L&}]+?X/8
+ AXY
+ 0: AX
+ aXY
+ 0: aX
+ AbcdeXyz
+ 0: AbcdeX
+ \x{1c5}AbXY
+ 0: \x{1c5}AbX
+ abcDEXypqreXlmn
+ 0: abcDEX
+ ** Failers
+No match
+ \x{1bb}XY
+No match
+ \x{2b0}XY
+No match
+ !XY
+No match
+
+/^\P{L&}X/8
+ !XY
+ 0: !X
+ \x{1bb}XY
+ 0: \x{1bb}X
+ \x{2b0}XY
+ 0: \x{2b0}X
+ ** Failers
+No match
+ \x{1c5}XY
+No match
+ AXY
+No match
+
+/^[\P{L&}]X/8
+ !XY
+ 0: !X
+ \x{1bb}XY
+ 0: \x{1bb}X
+ \x{2b0}XY
+ 0: \x{2b0}X
+ ** Failers
+No match
+ \x{1c5}XY
+No match
+ AXY
+No match
+
+/^(\p{Z}[^\p{C}\p{Z}]+)*$/
+ \xa0!
+ 0: \xa0!
+ 1: \xa0!
+
+/^[\pL](abc)(?1)/
+ AabcabcYZ
+ 0: Aabcabc
+ 1: abc
+
+/([\pL]=(abc))*X/
+ L=abcX
+ 0: L=abcX
+ 1: L=abc
+ 2: abc
+
+/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
+will match it only with UCP support, because without that it has no notion
+of case for anything other than the ASCII letters. /
+
+/((?i)[\x{c0}])/8
+ \x{c0}
+ 0: \x{c0}
+ 1: \x{c0}
+ \x{e0}
+ 0: \x{e0}
+ 1: \x{e0}
+
+/(?i:[\x{c0}])/8
+ \x{c0}
+ 0: \x{c0}
+ \x{e0}
+ 0: \x{e0}
+
+/^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
+ \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
+ 0: \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
+
+/The next two are special cases where the lengths of the different cases of the
+same character differ. The first went wrong with heap fram storage; the 2nd
+was broken in all cases./
+
+/^\x{023a}+?(\x{0130}+)/8i
+ \x{023a}\x{2c65}\x{0130}
+ 0: \x{23a}\x{2c65}\x{130}
+ 1: \x{130}
+
+/^\x{023a}+([^X])/8i
+ \x{023a}\x{2c65}X
+ 0: \x{23a}\x{2c65}
+ 1: \x{2c65}
+
+/Check property support in non-UTF-8 mode/
+
+/\p{L}{4}/
+ 123abcdefg
+ 0: abcd
+ 123abc\xc4\xc5zz
+ 0: abc\xc4
+
+/\X{1,3}\d/
+ \x8aBCD
+No match
+
+/\X?\d/
+ \x8aBCD
+No match
+
+/\P{L}?\d/
+ \x8aBCD
+No match
+
+/[\PPP\x8a]{1,}\x80/
+ A\x80
+ 0: A\x80
+
+/(?:[\PPa*]*){8,}/
+
+/[\P{Any}]/BZ
+------------------------------------------------------------------
+ Bra
+ [\P{Any}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/[\P{Any}\E]/BZ
+------------------------------------------------------------------
+ Bra
+ [\P{Any}]
+ Ket
+ End
+------------------------------------------------------------------
+
+/(\P{Yi}+\277)/
+
+/(\P{Yi}+\277)?/
+
+/(?<=\P{Yi}{3}A)X/
+
+/\p{Yi}+(\P{Yi}+)(?1)/
+
+/(\P{Yi}{2}\277)?/
+
+/[\P{Yi}A]/
+
+/[\P{Yi}\P{Yi}\P{Yi}A]/
+
+/[^\P{Yi}A]/
+
+/[^\P{Yi}\P{Yi}\P{Yi}A]/
+
+/(\P{Yi}*\277)*/
+
+/(\P{Yi}*?\277)*/
+
+/(\p{Yi}*+\277)*/
+
+/(\P{Yi}?\277)*/
+
+/(\P{Yi}??\277)*/
+
+/(\p{Yi}?+\277)*/
+
+/(\P{Yi}{0,3}\277)*/
+
+/(\P{Yi}{0,3}?\277)*/
+
+/(\p{Yi}{0,3}+\277)*/
+
/ End of testinput6 /
Modified: httpd/httpd/vendor/pcre/current/ucp.h
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/ucp.h?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/ucp.h (original)
+++ httpd/httpd/vendor/pcre/current/ucp.h Mon Nov 26 08:49:53 2007
@@ -1,8 +1,16 @@
/*************************************************
-* libucp - Unicode Property Table handler *
+* Unicode Property Table handler *
*************************************************/
-/* These are the character categories that are returned by ucp_findchar */
+#ifndef _UCP_H
+#define _UCP_H
+
+/* This file contains definitions of the property values that are returned by
+the function _pcre_ucp_findprop(). New values that are added for new releases
+of Unicode should always be at the end of each enum, for backwards
+compatibility. */
+
+/* These are the general character categories. */
enum {
ucp_C, /* Other */
@@ -14,7 +22,7 @@
ucp_Z /* Separator */
};
-/* These are the detailed character types that are returned by ucp_findchar */
+/* These are the particular character types. */
enum {
ucp_Cc, /* Control */
@@ -49,10 +57,77 @@
ucp_Zs /* Space separator */
};
-/* For use in PCRE we make this function static so that there is no conflict if
-PCRE is linked with an application that makes use of an external version -
-assuming an external version is ever released... */
+/* These are the script identifications. */
+
+enum {
+ ucp_Arabic,
+ ucp_Armenian,
+ ucp_Bengali,
+ ucp_Bopomofo,
+ ucp_Braille,
+ ucp_Buginese,
+ ucp_Buhid,
+ ucp_Canadian_Aboriginal,
+ ucp_Cherokee,
+ ucp_Common,
+ ucp_Coptic,
+ ucp_Cypriot,
+ ucp_Cyrillic,
+ ucp_Deseret,
+ ucp_Devanagari,
+ ucp_Ethiopic,
+ ucp_Georgian,
+ ucp_Glagolitic,
+ ucp_Gothic,
+ ucp_Greek,
+ ucp_Gujarati,
+ ucp_Gurmukhi,
+ ucp_Han,
+ ucp_Hangul,
+ ucp_Hanunoo,
+ ucp_Hebrew,
+ ucp_Hiragana,
+ ucp_Inherited,
+ ucp_Kannada,
+ ucp_Katakana,
+ ucp_Kharoshthi,
+ ucp_Khmer,
+ ucp_Lao,
+ ucp_Latin,
+ ucp_Limbu,
+ ucp_Linear_B,
+ ucp_Malayalam,
+ ucp_Mongolian,
+ ucp_Myanmar,
+ ucp_New_Tai_Lue,
+ ucp_Ogham,
+ ucp_Old_Italic,
+ ucp_Old_Persian,
+ ucp_Oriya,
+ ucp_Osmanya,
+ ucp_Runic,
+ ucp_Shavian,
+ ucp_Sinhala,
+ ucp_Syloti_Nagri,
+ ucp_Syriac,
+ ucp_Tagalog,
+ ucp_Tagbanwa,
+ ucp_Tai_Le,
+ ucp_Tamil,
+ ucp_Telugu,
+ ucp_Thaana,
+ ucp_Thai,
+ ucp_Tibetan,
+ ucp_Tifinagh,
+ ucp_Ugaritic,
+ ucp_Yi,
+ ucp_Balinese, /* New for Unicode 5.0.0 */
+ ucp_Cuneiform, /* New for Unicode 5.0.0 */
+ ucp_Nko, /* New for Unicode 5.0.0 */
+ ucp_Phags_Pa, /* New for Unicode 5.0.0 */
+ ucp_Phoenician /* New for Unicode 5.0.0 */
+};
-static int ucp_findchar(const int, int *, int *);
+#endif
/* End of ucp.h */
Modified: httpd/httpd/vendor/pcre/current/ucpinternal.h
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/ucpinternal.h?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/ucpinternal.h (original)
+++ httpd/httpd/vendor/pcre/current/ucpinternal.h Mon Nov 26 08:49:53 2007
@@ -1,91 +1,92 @@
/*************************************************
-* libucp - Unicode Property Table handler *
+* Unicode Property Table handler *
*************************************************/
-/* Internal header file defining the layout of compact nodes in the tree. */
+#ifndef _UCPINTERNAL_H
+#define _UCPINTERNAL_H
+
+/* Internal header file defining the layout of the bits in each pair of 32-bit
+words that form a data item in the table. */
typedef struct cnode {
- unsigned short int f0;
- unsigned short int f1;
- unsigned short int f2;
+ pcre_uint32 f0;
+ pcre_uint32 f1;
} cnode;
/* Things for the f0 field */
-#define f0_leftexists 0x8000 /* Left child exists */
-#define f0_typemask 0x3f00 /* Type bits */
-#define f0_typeshift 8 /* Type shift */
-#define f0_chhmask 0x00ff /* Character high bits */
-
-/* Things for the f2 field */
-
-#define f2_rightmask 0xf000 /* Mask for right offset bits */
-#define f2_rightshift 12 /* Shift for right offset */
-#define f2_casemask 0x0fff /* Mask for case offset */
-
-/* The tree consists of a vector of structures of type cnode, with the root
-node as the first element. The three short ints (16-bits) are used as follows:
-
-(f0) (1) The 0x8000 bit of f0 is set if a left child exists. The child's node
- is the next node in the vector.
- (2) The 0x4000 bits of f0 is spare.
- (3) The 0x3f00 bits of f0 contain the character type; this is a number
- defined by the enumeration in ucp.h (e.g. ucp_Lu).
- (4) The bottom 8 bits of f0 contain the most significant byte of the
- character's 24-bit codepoint.
-
-(f1) (1) The f1 field contains the two least significant bytes of the
- codepoint.
-
-(f2) (1) The 0xf000 bits of f2 contain zero if there is no right child of this
- node. Otherwise, they contain one plus the exponent of the power of
- two of the offset to the right node (e.g. a value of 3 means 8). The
- units of the offset are node items.
-
- (2) The 0x0fff bits of f2 contain the signed offset from this character to
- its alternate cased value. They are zero if there is no such
- character.
-
-
------------------------------------------------------------------------------
-||.|.| type (6) | ms char (8) || ls char (16) ||....| case offset (12) ||
------------------------------------------------------------------------------
- | | |
- | |-> spare |
- | exponent of right
- |-> left child exists child offset
-
+#define f0_scriptmask 0xff000000 /* Mask for script field */
+#define f0_scriptshift 24 /* Shift for script value */
+#define f0_rangeflag 0x00f00000 /* Flag for a range item */
+#define f0_charmask 0x001fffff /* Mask for code point value */
+
+/* Things for the f1 field */
+
+#define f1_typemask 0xfc000000 /* Mask for char type field */
+#define f1_typeshift 26 /* Shift for the type field */
+#define f1_rangemask 0x0000ffff /* Mask for a range offset */
+#define f1_casemask 0x0000ffff /* Mask for a case offset */
+#define f1_caseneg 0xffff8000 /* Bits for negation */
+
+/* The data consists of a vector of structures of type cnode. The two unsigned
+32-bit integers are used as follows:
+
+(f0) (1) The most significant byte holds the script number. The numbers are
+ defined by the enum in ucp.h.
+
+ (2) The 0x00800000 bit is set if this entry defines a range of characters.
+ It is not set if this entry defines a single character
+
+ (3) The 0x00600000 bits are spare.
+
+ (4) The 0x001fffff bits contain the code point. No Unicode code point will
+ ever be greater than 0x0010ffff, so this should be OK for ever.
+
+(f1) (1) The 0xfc000000 bits contain the character type number. The numbers are
+ defined by an enum in ucp.h.
+
+ (2) The 0x03ff0000 bits are spare.
+
+ (3) The 0x0000ffff bits contain EITHER the unsigned offset to the top of
+ range if this entry defines a range, OR the *signed* offset to the
+ character's "other case" partner if this entry defines a single
+ character. There is no partner if the value is zero.
+
+-------------------------------------------------------------------------------
+| script (8) |.|.|.| codepoint (21) || type (6) |.|.| spare (8) | offset (16) |
+-------------------------------------------------------------------------------
+ | | | | |
+ | | |-> spare | |-> spare
+ | | |
+ | |-> spare |-> spare
+ |
+ |-> range flag
The upper/lower casing information is set only for characters that come in
-pairs. There are (at present) four non-one-to-one mappings in the Unicode data.
-These are ignored. They are:
-
- 1FBE Greek Prosgegrammeni (lower, with upper -> capital iota)
- 2126 Ohm
- 212A Kelvin
- 212B Angstrom
+pairs. The non-one-to-one mappings in the Unicode data are ignored.
-Certainly for the last three, having an alternate case would seem to be a
-mistake. I don't know any Greek, so cannot comment on the first one.
+When searching the data, proceed as follows:
+(1) Set up for a binary chop search.
-When searching the tree, proceed as follows:
+(2) If the top is not greater than the bottom, the character is not in the
+ table. Its type must therefore be "Cn" ("Undefined").
-(1) Start at the first node.
+(3) Find the middle vector element.
-(2) Extract the character value from f1 and the bottom 8 bits of f0;
+(4) Extract the code point and compare. If equal, we are done.
-(3) Compare with the character being sought. If equal, we are done.
+(5) If the test character is smaller, set the top to the current point, and
+ goto (2).
-(4) If the test character is smaller, inspect the f0_leftexists flag. If it is
- not set, the character is not in the tree. If it is set, move to the next
- node, and go to (2).
+(6) If the current entry defines a range, compute the last character by adding
+ the offset, and see if the test character is within the range. If it is,
+ we are done.
-(5) If the test character is bigger, extract the f2_rightmask bits from f2, and
- shift them right by f2_rightshift. If the result is zero, the character is
- not in the tree. Otherwise, calculate the number of nodes to skip by
- shifting the value 1 left by this number minus one. Go to (2).
+(7) Otherwise, set the bottom to one element past the current point and goto
+ (2).
*/
+#endif /* _UCPINTERNAL_H */
-/* End of internal.h */
+/* End of ucpinternal.h */