You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1998/03/13 09:32:09 UTC

roy: uri clarification

Hey Roy, I'm guessing based on the regexes in your draft document that we
really shouldn't be requiring schemes, hosts, pathnames, etc. to fit into
the "accepted" character set, right?  i.e. your regex uses classes like
[^/?#] rather than [a-zA-Z0-9+-]... and I'm certain the reason is because
8-bit character sets get royally shafted by such english-centric
definitions.

The hand-coded scanner I just wrote should behave like the regex in your
draft as far as this goes.  In reality the only special characters in the
url for the purposes of busting it apart are:  : / ? # @ \0

It has some implications on the printability of error messages, but see my
rant of last week on that one ;)

Dean