You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by br...@apache.org on 2002/03/20 07:22:57 UTC

cvs commit: httpd-2.0/srclib/pcre ChangeLog internal.h pcreposix.c

brianp      02/03/19 22:22:57

  Modified:    srclib/pcre ChangeLog internal.h pcreposix.c
  Log:
  PCRE 3.9 merge
  
  Revision  Changes    Path
  1.3       +175 -0    httpd-2.0/srclib/pcre/ChangeLog
  
  Index: ChangeLog
  ===================================================================
  RCS file: /home/cvs/httpd-2.0/srclib/pcre/ChangeLog,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- ChangeLog	20 Aug 2000 03:21:52 -0000	1.2
  +++ ChangeLog	20 Mar 2002 06:22:57 -0000	1.3
  @@ -1,6 +1,181 @@
   ChangeLog for PCRE
   ------------------
   
  +Version 3.0 02-Jan-02
  +---------------------
  +
  +1. A bit of extraneous text had somehow crept into the pcregrep documentation.
  +
  +2. If --disable-static was given, the building process failed when trying to
  +build pcretest and pcregrep. (For some reason it was using libtool to compile
  +them, which is not right, as they aren't part of the library.)
  +
  +
  +Version 3.8 18-Dec-01
  +---------------------
  +
  +1. The experimental UTF-8 code was completely screwed up. It was packing the
  +bytes in the wrong order. How dumb can you get?
  +
  +
  +Version 3.7 29-Oct-01
  +---------------------
  +
  +1. In updating pcretest to check change 1 of version 3.6, I screwed up.
  +This caused pcretest, when used on the test data, to segfault. Unfortunately,
  +this didn't happen under Solaris 8, where I normally test things.
  +
  +2. The Makefile had to be changed to make it work on BSD systems, where 'make'
  +doesn't seem to recognize that ./xxx and xxx are the same file. (This entry
  +isn't in ChangeLog distributed with 3.7 because I forgot when I hastily made
  +this fix an hour or so after the initial 3.7 release.)
  +
  +
  +Version 3.6 23-Oct-01
  +---------------------
  +
  +1. Crashed with /(sens|respons)e and \1ibility/ and "sense and sensibility" if
  +offsets passed as NULL with zero offset count.
  +
  +2. The config.guess and config.sub files had not been updated when I moved to
  +the latest autoconf.
  +
  +
  +Version 3.5 15-Aug-01
  +---------------------
  +
  +1. Added some missing #if !defined NOPOSIX conditionals in pcretest.c that
  +had been forgotten.
  +
  +2. By using declared but undefined structures, we can avoid using "void"
  +definitions in pcre.h while keeping the internal definitions of the structures
  +private.
  +
  +3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a
  +user point of view, this means that both static and shared libraries are built
  +by default, but this can be individually controlled. More of the work of
  +handling this static/shared cases is now inside libtool instead of PCRE's make
  +file.
  +
  +4. The pcretest utility is now installed along with pcregrep because it is
  +useful for users (to test regexs) and by doing this, it automatically gets
  +relinked by libtool. The documentation has been turned into a man page, so
  +there are now .1, .txt, and .html versions in /doc.
  +
  +5. Upgrades to pcregrep:
  +   (i)   Added long-form option names like gnu grep.
  +   (ii)  Added --help to list all options with an explanatory phrase.
  +   (iii) Added -r, --recursive to recurse into sub-directories.
  +   (iv)  Added -f, --file to read patterns from a file.
  +
  +6. pcre_exec() was referring to its "code" argument before testing that
  +argument for NULL (and giving an error if it was NULL).
  +
  +7. Upgraded Makefile.in to allow for compiling in a different directory from
  +the source directory.
  +
  +8. Tiny buglet in pcretest: when pcre_fullinfo() was called to retrieve the
  +options bits, the pointer it was passed was to an int instead of to an unsigned
  +long int. This mattered only on 64-bit systems.
  +
  +9. Fixed typo (3.4/1) in pcre.h again. Sigh. I had changed pcre.h (which is
  +generated) instead of pcre.in, which it its source. Also made the same change
  +in several of the .c files.
  +
  +10. A new release of gcc defines printf() as a macro, which broke pcretest
  +because it had an ifdef in the middle of a string argument for printf(). Fixed
  +by using separate calls to printf().
  +
  +11. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
  +script, to force use of CR or LF instead of \n in the source. On non-Unix
  +systems, the value can be set in config.h.
  +
  +12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
  +absolute limit. Changed the text of the error message to make this clear, and
  +likewise updated the man page.
  +
  +13. The limit of 99 on the number of capturing subpatterns has been removed.
  +The new limit is 65535, which I hope will not be a "real" limit.
  +
  +
  +Version 3.4 22-Aug-00
  +---------------------
  +
  +1. Fixed typo in pcre.h: unsigned const char * changed to const unsigned char *.
  +
  +2. Diagnose condition (?(0) as an error instead of crashing on matching.
  +
  +
  +Version 3.3 01-Aug-00
  +---------------------
  +
  +1. If an octal character was given, but the value was greater than \377, it
  +was not getting masked to the least significant bits, as documented. This could
  +lead to crashes in some systems.
  +
  +2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
  +the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.
  +
  +3. Added the functions pcre_free_substring() and pcre_free_substring_list().
  +These just pass their arguments on to (pcre_free)(), but they are provided
  +because some uses of PCRE bind it to non-C systems that can call its functions,
  +but cannot call free() or pcre_free() directly.
  +
  +4. Add "make test" as a synonym for "make check". Corrected some comments in
  +the Makefile.
  +
  +5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the
  +Makefile.
  +
  +6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
  +command called pgrep for grepping around the active processes.
  +
  +7. Added the beginnings of support for UTF-8 character strings.
  +
  +8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and
  +RANLIB to ./ltconfig so that they are used by libtool. I think these are all
  +the relevant ones. (AR is not passed because ./ltconfig does its own figuring
  +out for the ar command.)
  +
  +
  +Version 3.2 12-May-00
  +---------------------
  +
  +This is purely a bug fixing release.
  +
  +1. If the pattern /((Z)+|A)*/ was matched agained ZABCDEFG it matched Z instead
  +of ZA. This was just one example of several cases that could provoke this bug,
  +which was introduced by change 9 of version 2.00. The code for breaking
  +infinite loops after an iteration that matches an empty string was't working
  +correctly.
  +
  +2. The pcretest program was not imitating Perl correctly for the pattern /a*/g
  +when matched against abbab (for example). After matching an empty string, it
  +wasn't forcing anchoring when setting PCRE_NOTEMPTY for the next attempt; this
  +caused it to match further down the string than it should.
  +
  +3. The code contained an inclusion of sys/types.h. It isn't clear why this
  +was there because it doesn't seem to be needed, and it causes trouble on some
  +systems, as it is not a Standard C header. It has been removed.
  +
  +4. Made 4 silly changes to the source to avoid stupid compiler warnings that
  +were reported on the Macintosh. The changes were from
  +
  +  while ((c = *(++ptr)) != 0 && c != '\n');
  +to
  +  while ((c = *(++ptr)) != 0 && c != '\n') ;
  +
  +Totally extraordinary, but if that's what it takes...
  +
  +5. PCRE is being used in one environment where neither memmove() nor bcopy() is
  +available. Added HAVE_BCOPY and an autoconf test for it; if neither
  +HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which
  +assumes the way PCRE uses memmove() (always moving upwards).
  +
  +6. PCRE is being used in one environment where strchr() is not available. There
  +was only one use in pcre.c, and writing it out to avoid strchr() probably gives
  +faster code anyway.
  +
   
   Version 3.2 12-May-00
   ---------------------
  
  
  
  1.3       +48 -12    httpd-2.0/srclib/pcre/internal.h
  
  Index: internal.h
  ===================================================================
  RCS file: /home/cvs/httpd-2.0/srclib/pcre/internal.h,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- internal.h	20 Aug 2000 03:21:53 -0000	1.2
  +++ internal.h	20 Mar 2002 06:22:57 -0000	1.3
  @@ -9,7 +9,7 @@
   
   Written by: Philip Hazel <ph...@cam.ac.uk>
   
  -           Copyright (c) 1997-2000 University of Cambridge
  +           Copyright (c) 1997-2001 University of Cambridge
   
   -----------------------------------------------------------------------------
   Permission is granted to anyone to use this software for any purpose on any
  @@ -105,7 +105,7 @@
   
   #define PUBLIC_OPTIONS \
     (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
  -   PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
  +   PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8)
   
   #define PUBLIC_EXEC_OPTIONS \
     (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
  @@ -123,12 +123,36 @@
   #define FALSE   0
   #define TRUE    1
   
  +/* Escape items that are just an encoding of a particular data value. Note that
  +ESC_N is defined as yet another macro, which is set in config.h to either \n
  +(the default) or \r (which some people want). */
  +
  +#ifndef ESC_E
  +#define ESC_E 27
  +#endif
  +
  +#ifndef ESC_F
  +#define ESC_F '\f'
  +#endif
  +
  +#ifndef ESC_N
  +#define ESC_N NEWLINE
  +#endif
  +
  +#ifndef ESC_R
  +#define ESC_R '\r'
  +#endif
  +
  +#ifndef ESC_T
  +#define ESC_T '\t'
  +#endif
  +
   /* These are escaped items that aren't just an encoding of a particular data
   value such as \n. They must have non-zero values, as check_escape() returns
   their negation. Also, they must appear in the same order as in the opcode
   definitions below, up to ESC_z. The final one must be ESC_REF as subsequent
   values are used for \1, \2, \3, etc. There is a test in the code for an escape
  -greater than ESC_b and less than ESC_X to detect the types that may be
  +greater than ESC_b and less than ESC_Z to detect the types that may be
   repeated. If any new escapes are put in-between that don't consume a character,
   that code will have to change. */
   
  @@ -224,19 +248,26 @@
   
     OP_ONCE,           /* Once matched, don't back up into the subpattern */
     OP_COND,           /* Conditional group */
  -  OP_CREF,           /* Used to hold an extraction string number */
  +  OP_CREF,           /* Used to hold an extraction string number (cond ref) */
   
     OP_BRAZERO,        /* These two must remain together and in this */
     OP_BRAMINZERO,     /* order. */
   
  +  OP_BRANUMBER,      /* Used for extracting brackets whose number is greater
  +                        than can fit into an opcode. */
  +
     OP_BRA             /* This and greater values are used for brackets that
  -                        extract substrings. */
  +                        extract substrings up to a basic limit. After that,
  +                        use is made of OP_BRANUMBER. */
   };
   
  -/* The highest extraction number. This is limited by the number of opcodes
  -left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */
  +/* The highest extraction number before we have to start using additional
  +bytes. (Originally PCRE didn't have support for extraction counts highter than
  +this number.) The value is limited by the number of opcodes left after OP_BRA,
  +i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
  +opcodes. */
   
  -#define EXTRACT_MAX  99
  +#define EXTRACT_BASIC_MAX  150
   
   /* The texts of compile-time error messages are defined as macros here so that
   they can be accessed by the POSIX wrapper and converted into error codes.  Yes,
  @@ -255,13 +286,13 @@
   #define ERR10 "operand of unlimited repeat could match the empty string"
   #define ERR11 "internal error: unexpected repeat"
   #define ERR12 "unrecognized character after (?"
  -#define ERR13 "too many capturing parenthesized sub-patterns"
  +#define ERR13 "unused error"
   #define ERR14 "missing )"
   #define ERR15 "back reference to non-existent subpattern"
   #define ERR16 "erroffset passed as NULL"
   #define ERR17 "unknown option bit(s) set"
   #define ERR18 "missing ) after comment"
  -#define ERR19 "too many sets of parentheses"
  +#define ERR19 "parentheses nested too deeply"
   #define ERR20 "regular expression too large"
   #define ERR21 "failed to get memory"
   #define ERR22 "unmatched parentheses"
  @@ -274,6 +305,10 @@
   #define ERR29 "(?p must be followed by )"
   #define ERR30 "unknown POSIX class name"
   #define ERR31 "POSIX collating elements are not supported"
  +#define ERR32 "this version of PCRE is not compiled with PCRE_UTF8 support"
  +#define ERR33 "characters with values > 255 are not yet supported in classes"
  +#define ERR34 "character value in \\x{...} sequence is too large"
  +#define ERR35 "invalid condition (?(0)"
   
   /* All character handling must be done as unsigned characters. Otherwise there
   are problems with top-bit-set characters and functions such as isspace().
  @@ -292,8 +327,8 @@
     size_t size;
     const unsigned char *tables;
     unsigned long int options;
  -  uschar top_bracket;
  -  uschar top_backref;
  +  unsigned short int top_bracket;
  +  unsigned short int top_backref;
     uschar first_char;
     uschar req_char;
     uschar code[1];
  @@ -330,6 +365,7 @@
     BOOL   offset_overflow;       /* Set if too many extractions */
     BOOL   notbol;                /* NOTBOL flag */
     BOOL   noteol;                /* NOTEOL flag */
  +  BOOL   utf8;                  /* UTF8 flag */
     BOOL   endonly;               /* Dollar not before final \n */
     BOOL   notempty;              /* Empty string match not wanted */
     const uschar *start_pattern;  /* For use when recursing */
  
  
  
  1.4       +8 -4      httpd-2.0/srclib/pcre/pcreposix.c
  
  Index: pcreposix.c
  ===================================================================
  RCS file: /home/cvs/httpd-2.0/srclib/pcre/pcreposix.c,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- pcreposix.c	28 Sep 2001 17:15:12 -0000	1.3
  +++ pcreposix.c	20 Mar 2002 06:22:57 -0000	1.4
  @@ -12,7 +12,7 @@
   
   Written by: Philip Hazel <ph...@cam.ac.uk>
   
  -           Copyright (c) 1997-2000 University of Cambridge
  +           Copyright (c) 1997-2001 University of Cambridge
   
   -----------------------------------------------------------------------------
   Permission is granted to anyone to use this software for any purpose on any
  @@ -62,13 +62,13 @@
     REG_BADRPT,  /* "operand of unlimited repeat could match the empty string" */
     REG_ASSERT,  /* "internal error: unexpected repeat" */
     REG_BADPAT,  /* "unrecognized character after (?" */
  -  REG_ESIZE,   /* "too many capturing parenthesized sub-patterns" */
  +  REG_ASSERT,  /* "unused error" */
     REG_EPAREN,  /* "missing )" */
     REG_ESUBREG, /* "back reference to non-existent subpattern" */
     REG_INVARG,  /* "erroffset passed as NULL" */
     REG_INVARG,  /* "unknown option bit(s) set" */
     REG_EPAREN,  /* "missing ) after comment" */
  -  REG_ESIZE,   /* "too many sets of parentheses" */
  +  REG_ESIZE,   /* "parentheses nested too deeply" */
     REG_ESIZE,   /* "regular expression too large" */
     REG_ESPACE,  /* "failed to get memory" */
     REG_EPAREN,  /* "unmatched brackets" */
  @@ -80,7 +80,11 @@
     REG_BADPAT,  /* "assertion expected after (?(" */
     REG_BADPAT,  /* "(?p must be followed by )" */
     REG_ECTYPE,  /* "unknown POSIX class name" */
  -  REG_BADPAT   /* "POSIX collating elements are not supported" */
  +  REG_BADPAT,  /* "POSIX collating elements are not supported" */
  +  REG_INVARG,  /* "this version of PCRE is not compiled with PCRE_UTF8 support" */
  +  REG_BADPAT,  /* "characters with values > 255 are not yet supported in classes" */
  +  REG_BADPAT,  /* "character value in \x{...} sequence is too large" */
  +  REG_BADPAT   /* "invalid condition (?(0)" */
   };
   
   /* Table of texts corresponding to POSIX error codes */
  
  
  

Re: cvs commit: httpd-2.0/srclib/pcre ChangeLog internal.h pcreposix.c

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 12:22 AM 3/20/2002, you wrote:
>brianp      02/03/19 22:22:57
>
>   Modified:    srclib/pcre ChangeLog internal.h pcreposix.c
>   Log:
>   PCRE 3.9 merge
>
>   +7. Added the beginnings of support for UTF-8 character strings.

Thank you for this effort, Brian!

I occured to me that non-US users [who are unsupported, effectively,
back in 1.3] will really benefit from fixing our internals on filenaming to
actually use pcre utf-8 comparisons.

This means that [with the correct hack in the right place] the regex
<DirectoryMatch > section will work correctly for WinNT!

Bill