You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by kf...@collab.net on 2004/12/01 18:41:01 UTC

Re: svn commit: r12091 - in trunk/subversion: include libsvn_client libsvn_subr

brane@tigris.org writes:
> Log:
> Introduce character classification functions that work correctly with
> (the ASCII subset of) UTF-8 strings regardless of locale settings.
> 
> * subversion/include/svn_ctype.h: New.
> * subversion/libsvn_subr/ctype.c: New.
> * subversion/libsvn_subr/genctype.py: Generates the class table in ctype.c.
> 
> * subversion/libsvn_client/prop_commands.c: Include svn_ctype.h, not ctype.h.
>   (is_valid_prop_name): Use the new svn_ctype macros to test characters.
>    Follow-up to r12029.

This commit is a masterpiece.

A few minor comments below:

> --- (empty file)
> +++ trunk/subversion/include/svn_ctype.h	Mon Nov 29 19:56:04 2004
> @@ -0,0 +1,156 @@
> +/**
> + * @copyright
> + * ====================================================================
> + * Copyright (c) 2000-2004 CollabNet.  All rights reserved.
> + *
> + * This software is licensed as described in the file COPYING, which
> + * you should have received as part of this distribution.  The terms
> + * are also available at http://subversion.tigris.org/license-1.html.
> + * If newer versions of this license are posted there, you may use a
> + * newer version instead, at your option.
> + *
> + * This software consists of voluntary contributions made by many
> + * individuals.  For exact contribution history, see the revision
> + * history and logs, available at http://subversion.tigris.org/.
> + * ====================================================================
> + * @endcopyright
> + *
> + * @file svn_ctype.h
> + * @brief Character classification routines
> + * @since New in 1.2.
> + */
> +
> +
> +#ifndef SVN_CTYPE_H
> +#define SVN_CTYPE_H

At the end of this file, you have

   #ifdef __cplusplus
   }
   #endif /* __cplusplus */

But here at the beginning, you don't have the corresponding opening
form.

> +#include <apr.h>
> +
> +/** Table of flags for character classification. */
> +extern const apr_uint32_t *const svn_ctype_table;
> +
> +
> +/** Check if @a c is in the character class described by @a flags.
> + * The @a flags is a bitwise-or combination of @t SVN_CTYPE_* *
> + * constants.
> + */
> +#define svn_ctype_test(c, flags) \
> +  (0 != (svn_ctype_table[(unsigned char)(c)] & (flags)))

Just curious, why the "0 !="?  Did you want to fold the boolean value
to either 1 or 0, instead of non-zero or 0?

> +/**
> + * @defgroup ctype_basic Basic character classification - ASCII only
> + * @{
> + */
> +
> +/* Basic character classes */
> +#define SVN_CTYPE_CNTRL    0x0001 /**< Control character */
> +#define SVN_CTYPE_SPACE    0x0002 /**< Whitespace */
> +#define SVN_CTYPE_DIGIT    0x0004 /**< Decimal digit */
> +#define SVN_CTYPE_UPPER    0x0008 /**< Uppercase letter */
> +#define SVN_CTYPE_LOWER    0x0010 /**< Lowercase letter */
> +#define SVN_CTYPE_PUNCT    0x0020 /**< Punctuation mark */
> +#define SVN_CTYPE_XALPHA   0x0040 /**< Hexadecimal digits A to F */
> +#define SVN_CTYPE_ASCII    0x0080 /**< ASCII subset*/
> +
> +/* Derived character classes */
> +/** ASCII letter */
> +#define SVN_CTYPE_ALPHA    (SVN_CTYPE_LOWER | SVN_CTYPE_UPPER)
> +/** ASCII letter or decimal digit */
> +#define SVN_CTYPE_ALNUM    (SVN_CTYPE_ALPHA | SVN_CTYPE_DIGIT)
> +/** ASCII hexadecimal digit */
> +#define SVN_CTYPE_XDIGIT   (SVN_CTYPE_DIGIT | SVN_CTYPE_XALPHA)
> +/** Printable ASCII except space */
> +#define SVN_CTYPE_GRAPH    (SVN_CTYPE_PUNCT | SVN_CTYPE_ALNUM)
> +/** All printable ASCII */
> +#define SVN_CTYPE_PRINT    (SVN_CTYPE_GRAPH | SVN_CTYPE_SPACE)
> +
> +
> +/** Check if @c is an ASCII control character. */
> +#define svn_ctype_iscntrl(c)  svn_ctype_test((c), SVN_CTYPE_CNTRL)

Here and throughout this file, you've accidentally left out the "c"
after "@c", thus formatting the word "is" instead.  I won't bother to
list out all the other instances.

Also, you might want to state explicitly in this header that "ASCII"
means 7-bit ASCII, and foreshadow the UTF8 stuff coming later.  I know
this might technically be redundant, but in colloquial use, many
people imagine a beast "ASCII" with two parts, upper and lower (i.e.,
8-bit and 7-bit).  They might wonder why we seem to ignore upper.

> +static const apr_uint32_t ctype_table[256] =
> +  {
> +    /* nul */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* soh */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* stx */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* etx */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* eot */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* enq */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* ack */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* bel */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* bs  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* ht  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
> +    /* nl  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
> +    /* vt  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
> +    /* np  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
> +    /* cr  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,

We're going to remember, when it comes time to take care of that
"#TBD" for XMLNAME and URISAFE, that VT and NP are special, right?

Not at all a complaint about this commit, of course.  I'm sure the
question of what officially is and isn't whitespace in ASCII was
settled at the Council of Elrond long ago, and we can only humbly
reflect those decisions here.  I'm just making a mental note to myself
(well, and to this list :-) ) that for XMLNAME purposes, only three of
these SVN_CTYPE_SPACEs are useable (plus SP below, of course).

> +    /* so  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* si  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* dle */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* dc1 */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* dc2 */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* dc3 */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* dc4 */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* nak */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* syn */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* etb */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* can */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* em  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* sub */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* esc */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* fs  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* gs  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* rs  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* us  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* sp  */ SVN_CTYPE_ASCII | SVN_CTYPE_SPACE,
> +    /*  !  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  "  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  #  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  $  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  %  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  &  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  '  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  (  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  )  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  *  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  +  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ,  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  -  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  .  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  /  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  0  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  1  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  2  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  3  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  4  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  5  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  6  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  7  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  8  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  9  */ SVN_CTYPE_ASCII | SVN_CTYPE_DIGIT,
> +    /*  :  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ;  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  <  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  =  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  >  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ?  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  @  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  A  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  B  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  C  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  D  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  E  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  F  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER | SVN_CTYPE_XALPHA,
> +    /*  G  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  H  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  I  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  J  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  K  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  L  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  M  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  N  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  O  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  P  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  Q  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  R  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  S  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  T  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  U  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  V  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  W  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  X  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  Y  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  Z  */ SVN_CTYPE_ASCII | SVN_CTYPE_UPPER,
> +    /*  [  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  \  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ]  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ^  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  _  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  `  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  a  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  b  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  c  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  d  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  e  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  f  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER | SVN_CTYPE_XALPHA,
> +    /*  g  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  h  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  i  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  j  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  k  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  l  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  m  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  n  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  o  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  p  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  q  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  r  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  s  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  t  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  u  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  v  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  w  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  x  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  y  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  z  */ SVN_CTYPE_ASCII | SVN_CTYPE_LOWER,
> +    /*  {  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  |  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  }  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /*  ~  */ SVN_CTYPE_ASCII | SVN_CTYPE_PUNCT,
> +    /* del */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL,
> +    /* x80 */ SVN_CTYPE_UTF8CONT,
> +    /* x81 */ SVN_CTYPE_UTF8CONT,
> +    /* x82 */ SVN_CTYPE_UTF8CONT,
> +    /* x83 */ SVN_CTYPE_UTF8CONT,
> +    /* x84 */ SVN_CTYPE_UTF8CONT,
> +    /* x85 */ SVN_CTYPE_UTF8CONT,
> +    /* x86 */ SVN_CTYPE_UTF8CONT,
> +    /* x87 */ SVN_CTYPE_UTF8CONT,
> +    /* x88 */ SVN_CTYPE_UTF8CONT,
> +    /* x89 */ SVN_CTYPE_UTF8CONT,
> +    /* x8a */ SVN_CTYPE_UTF8CONT,
> +    /* x8b */ SVN_CTYPE_UTF8CONT,
> +    /* x8c */ SVN_CTYPE_UTF8CONT,
> +    /* x8d */ SVN_CTYPE_UTF8CONT,
> +    /* x8e */ SVN_CTYPE_UTF8CONT,
> +    /* x8f */ SVN_CTYPE_UTF8CONT,
> +    /* x90 */ SVN_CTYPE_UTF8CONT,
> +    /* x91 */ SVN_CTYPE_UTF8CONT,
> +    /* x92 */ SVN_CTYPE_UTF8CONT,
> +    /* x93 */ SVN_CTYPE_UTF8CONT,
> +    /* x94 */ SVN_CTYPE_UTF8CONT,
> +    /* x95 */ SVN_CTYPE_UTF8CONT,
> +    /* x96 */ SVN_CTYPE_UTF8CONT,
> +    /* x97 */ SVN_CTYPE_UTF8CONT,
> +    /* x98 */ SVN_CTYPE_UTF8CONT,
> +    /* x99 */ SVN_CTYPE_UTF8CONT,
> +    /* x9a */ SVN_CTYPE_UTF8CONT,
> +    /* x9b */ SVN_CTYPE_UTF8CONT,
> +    /* x9c */ SVN_CTYPE_UTF8CONT,
> +    /* x9d */ SVN_CTYPE_UTF8CONT,
> +    /* x9e */ SVN_CTYPE_UTF8CONT,
> +    /* x9f */ SVN_CTYPE_UTF8CONT,
> +    /* xa0 */ SVN_CTYPE_UTF8CONT,
> +    /* xa1 */ SVN_CTYPE_UTF8CONT,
> +    /* xa2 */ SVN_CTYPE_UTF8CONT,
> +    /* xa3 */ SVN_CTYPE_UTF8CONT,
> +    /* xa4 */ SVN_CTYPE_UTF8CONT,
> +    /* xa5 */ SVN_CTYPE_UTF8CONT,
> +    /* xa6 */ SVN_CTYPE_UTF8CONT,
> +    /* xa7 */ SVN_CTYPE_UTF8CONT,
> +    /* xa8 */ SVN_CTYPE_UTF8CONT,
> +    /* xa9 */ SVN_CTYPE_UTF8CONT,
> +    /* xaa */ SVN_CTYPE_UTF8CONT,
> +    /* xab */ SVN_CTYPE_UTF8CONT,
> +    /* xac */ SVN_CTYPE_UTF8CONT,
> +    /* xad */ SVN_CTYPE_UTF8CONT,
> +    /* xae */ SVN_CTYPE_UTF8CONT,
> +    /* xaf */ SVN_CTYPE_UTF8CONT,
> +    /* xb0 */ SVN_CTYPE_UTF8CONT,
> +    /* xb1 */ SVN_CTYPE_UTF8CONT,
> +    /* xb2 */ SVN_CTYPE_UTF8CONT,
> +    /* xb3 */ SVN_CTYPE_UTF8CONT,
> +    /* xb4 */ SVN_CTYPE_UTF8CONT,
> +    /* xb5 */ SVN_CTYPE_UTF8CONT,
> +    /* xb6 */ SVN_CTYPE_UTF8CONT,
> +    /* xb7 */ SVN_CTYPE_UTF8CONT,
> +    /* xb8 */ SVN_CTYPE_UTF8CONT,
> +    /* xb9 */ SVN_CTYPE_UTF8CONT,
> +    /* xba */ SVN_CTYPE_UTF8CONT,
> +    /* xbb */ SVN_CTYPE_UTF8CONT,
> +    /* xbc */ SVN_CTYPE_UTF8CONT,
> +    /* xbd */ SVN_CTYPE_UTF8CONT,
> +    /* xbe */ SVN_CTYPE_UTF8CONT,
> +    /* xbf */ SVN_CTYPE_UTF8CONT,
> +    /* xc0 */ 0,
> +    /* xc1 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc2 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc3 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc4 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc5 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc6 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc7 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc8 */ SVN_CTYPE_UTF8LEAD,
> +    /* xc9 */ SVN_CTYPE_UTF8LEAD,
> +    /* xca */ SVN_CTYPE_UTF8LEAD,
> +    /* xcb */ SVN_CTYPE_UTF8LEAD,
> +    /* xcc */ SVN_CTYPE_UTF8LEAD,
> +    /* xcd */ SVN_CTYPE_UTF8LEAD,
> +    /* xce */ SVN_CTYPE_UTF8LEAD,
> +    /* xcf */ SVN_CTYPE_UTF8LEAD,
> +    /* xd0 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd1 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd2 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd3 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd4 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd5 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd6 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd7 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd8 */ SVN_CTYPE_UTF8LEAD,
> +    /* xd9 */ SVN_CTYPE_UTF8LEAD,
> +    /* xda */ SVN_CTYPE_UTF8LEAD,
> +    /* xdb */ SVN_CTYPE_UTF8LEAD,
> +    /* xdc */ SVN_CTYPE_UTF8LEAD,
> +    /* xdd */ SVN_CTYPE_UTF8LEAD,
> +    /* xde */ SVN_CTYPE_UTF8LEAD,
> +    /* xdf */ SVN_CTYPE_UTF8LEAD,
> +    /* xe0 */ 0,
> +    /* xe1 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe2 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe3 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe4 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe5 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe6 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe7 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe8 */ SVN_CTYPE_UTF8LEAD,
> +    /* xe9 */ SVN_CTYPE_UTF8LEAD,
> +    /* xea */ SVN_CTYPE_UTF8LEAD,
> +    /* xeb */ SVN_CTYPE_UTF8LEAD,
> +    /* xec */ SVN_CTYPE_UTF8LEAD,
> +    /* xed */ SVN_CTYPE_UTF8LEAD,
> +    /* xee */ SVN_CTYPE_UTF8LEAD,
> +    /* xef */ SVN_CTYPE_UTF8LEAD,
> +    /* xf0 */ 0,
> +    /* xf1 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf2 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf3 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf4 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf5 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf6 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf7 */ SVN_CTYPE_UTF8LEAD,
> +    /* xf8 */ 0,
> +    /* xf9 */ SVN_CTYPE_UTF8LEAD,
> +    /* xfa */ SVN_CTYPE_UTF8LEAD,
> +    /* xfb */ SVN_CTYPE_UTF8LEAD,
> +    /* xfc */ 0,
> +    /* xfd */ SVN_CTYPE_UTF8LEAD,
> +    /* xfe */ 0,
> +    /* xff */ 0
> +  };
> +
> +const apr_uint32_t *const svn_ctype_table = ctype_table;

By the way, I noticed some nice UTF8 tests in the Putty code at

   $ svn cat svn://ixion.tartarus.org/main/putty/charset/utf8.c

Looks like the license is compatible with ours, if you want to swipe
the tests :-).
 
> --- (empty file)
> +++ trunk/subversion/libsvn_subr/genctype.py	Mon Nov 29 19:56:04 2004
> @@ -0,0 +1,93 @@
> +#! /usr/bin/python
> +"""getctype.py - Generate the svn_ctype character classification table.
> +"""

If the table was autogenerated from this file, then a comment in the C
file ought to say so, so people realize they're dealing with generated
content.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn commit: r12091 - in trunk/subversion: include libsvn_client libsvn_subr

Posted by Branko Čibej <br...@xbc.nu>.
kfogel@collab.net wrote:

>>+
>>+
>>+#ifndef SVN_CTYPE_H
>>+#define SVN_CTYPE_H
>>    
>>
>
>At the end of this file, you have
>
>   #ifdef __cplusplus
>   }
>   #endif /* __cplusplus */
>
>But here at the beginning, you don't have the corresponding opening
>form.
>  
>
Ouch! "Never copy-and-paste at 3 in the morning". Thanks.

>>+#include <apr.h>
>>+
>>+/** Table of flags for character classification. */
>>+extern const apr_uint32_t *const svn_ctype_table;
>>+
>>+
>>+/** Check if @a c is in the character class described by @a flags.
>>+ * The @a flags is a bitwise-or combination of @t SVN_CTYPE_* *
>>+ * constants.
>>+ */
>>+#define svn_ctype_test(c, flags) \
>>+  (0 != (svn_ctype_table[(unsigned char)(c)] & (flags)))
>>    
>>
>
>Just curious, why the "0 !="?  Did you want to fold the boolean value
>to either 1 or 0, instead of non-zero or 0?
>  
>
Basically, it's for clarity, yes.

>>+/** Check if @c is an ASCII control character. */
>>+#define svn_ctype_iscntrl(c)  svn_ctype_test((c), SVN_CTYPE_CNTRL)
>>    
>>
>
>Here and throughout this file, you've accidentally left out the "c"
>after "@c", thus formatting the word "is" instead.
>
Grrrr.... I always do that...

>  I won't bother to
>list out all the other instances.
>
>Also, you might want to state explicitly in this header that "ASCII"
>means 7-bit ASCII, and foreshadow the UTF8 stuff coming later.  I know
>this might technically be redundant, but in colloquial use, many
>people imagine a beast "ASCII" with two parts, upper and lower (i.e.,
>8-bit and 7-bit).  They might wonder why we seem to ignore upper.
>  
>
Well, there is no such thing as 8-bit ASCII... but I'll mention the 
distinction in the @defgroup.

>>+    /* ht  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
>>+    /* nl  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
>>+    /* vt  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
>>+    /* np  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
>>+    /* cr  */ SVN_CTYPE_ASCII | SVN_CTYPE_CNTRL | SVN_CTYPE_SPACE,
>>    
>>
>
>We're going to remember, when it comes time to take care of that
>"#TBD" for XMLNAME and URISAFE, that VT and NP are special, right?
>  
>
Oh, certainly!

>Not at all a complaint about this commit, of course.  I'm sure the
>question of what officially is and isn't whitespace in ASCII was
>settled at the Council of Elrond long ago, and we can only humbly
>reflect those decisions here.
>
No, actually we get to ignore them and define our own classification 
table instead. :-) If people think we shouldn't classify formfeed and 
vertical tab as whitespace, we can do that, too.

The fact that we use ^L in the source files is irrelevant. :-)

>  I'm just making a mental note to myself
>(well, and to this list :-) ) that for XMLNAME purposes, only three of
>these SVN_CTYPE_SPACEs are useable (plus SP below, of course).
>  
>
Yes, that's why I expect to use a separate bit for XMLNAME.

>By the way, I noticed some nice UTF8 tests in the Putty code at
>
>   $ svn cat svn://ixion.tartarus.org/main/putty/charset/utf8.c
>
>Looks like the license is compatible with ours, if you want to swipe
>the tests :-).
>  
>
Hm, looks like those tests are for what we have in utf_validate.c. And 
note that their validation code has a bug wherein they allow lead bytes 
(e.g., C0) that encode a leading zero -- which aren't allowed, and which 
we do check for in utf_validate.c. :-)

> 
>  
>
>>--- (empty file)
>>+++ trunk/subversion/libsvn_subr/genctype.py	Mon Nov 29 19:56:04 2004
>>@@ -0,0 +1,93 @@
>>+#! /usr/bin/python
>>+"""getctype.py - Generate the svn_ctype character classification table.
>>+"""
>>    
>>
>
>If the table was autogenerated from this file, then a comment in the C
>file ought to say so, so people realize they're dealing with generated
>content.
>  
>
Good point, and thanks for the review.

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org