You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ics.uci.edu> on 1998/03/16 21:24:42 UTC

Re: roy: uri clarification

>Hey Roy, I'm guessing based on the regexes in your draft document that we
>really shouldn't be requiring schemes, hosts, pathnames, etc. to fit into
>the "accepted" character set, right?  i.e. your regex uses classes like
>[^/?#] rather than [a-zA-Z0-9+-]... and I'm certain the reason is because
>8-bit character sets get royally shafted by such english-centric
>definitions.

Actually, the main reason was for efficiency (and "be robust in what you
accept"), but I also had raw UTF-8 in the back of my mind.  You never
know when that might happen.

>The hand-coded scanner I just wrote should behave like the regex in your
>draft as far as this goes.  In reality the only special characters in the
>url for the purposes of busting it apart are:  : / ? # @ \0

That's right -- the rest of the stuff is for generative grammars, such as
the special characters for creating a URL from the filesystem mod_autoindex,
and to provide common definitions for the individual scheme specifications
(doc reuse).

....Roy