You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2001/02/24 10:44:33 UTC

canonical stuff (was: Re: apache 2.0.11 - tag 2.0.12?)

On Fri, Feb 23, 2001 at 02:21:22PM -0600, William A. Rowe, Jr. wrote:
>...
> I have some very major structural hacking to do to wipe out the old canonical
> methods - and a quiet house to do so for the next two days.  I don't want to
> start warping the source as must be done till we have this 'good' tag so other
> folks can start looking for any remaining leaks and holes.

Can we do the canonical stuff in pieces rather than wholesale? IOW, add the
new functions into CVS and review. After that is stable, then start the
conversion process. (specifically, there was a lot of concerns all around
about how this stuff would be built/operate, so it seems prudent to do that
outline via actual code, agree on it, then to use it)

In a similar vein, when you added all that Unicode stuff, it just kind of
dropped into the code. No big deal as it was all Win32 specific (i.e. it
didn't affect my playground), but it was an awfully big change. Especially
in the semantics. We still haven't refactored the API into two sets of
functions (one for Unicode chars, one for 8-bit native).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: unicode file APIs (was: Re: canonical stuff)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
[Moved strictly to dev@apr.apache.org - since this seems to _not_ be a discussion
of apache, but primarily of an API for other APR users.]

From: "Greg Stein" <gs...@lyra.org>
Sent: Saturday, February 24, 2001 5:27 PM


> On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> > From: "Greg Stein" <gs...@lyra.org>
> > Sent: Saturday, February 24, 2001 3:44 AM
> >...
> > > In a similar vein, when you added all that Unicode stuff, it just kind of
> > > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > > didn't affect my playground), but it was an awfully big change. Especially
> > > in the semantics. We still haven't refactored the API into two sets of
> > > functions (one for Unicode chars, one for 8-bit native).
> > 
> > I'm absolutely positively near certain we won't.  Please let me explain.
> >
> > ... lot of stuff about why Unicode filenames are Goodness ...
> 
> I don't disagree with wanting Unicode filenames. I completely disagree with
> APIs that change their semantics based on the platform they are compiled on.

And I'm _arguing_ that the semantics _do_ change, regardless of APR_HAS_UNICODE_FS.

Simply put - Win32 has a restricted set of characters.  Not only is it a restricted
set of characters, but alpha chars map from upper to lower case in very unpredictable
ways.  By unpredictable, I mean that the clib tolower()/toupper() _never_ matches the
mappings that the Win32 filesystem performs.  That's a very nasty side effect that
isn't really very tollerable.  Of course, we also eliminate a number of symbols on
Win32 that simply aren't supported, but are perfectly legal on Unix.

OTOH, spaces are not a problem, as they seem to be for Unix.

> If I have an application that I desire to be portable, then I'm going to use
> APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
> work on all platforms. With the current single API, it will break on NT when
> compiled with the Unicode stuff.

What is portable here?  There is nothing portable about high-bit characters.
Other than opaque data, you can't make many assumptions about them without an
API that we haven't defined for APR.  Not that we shouldn't.  Not that it shouldn't
map the characters appropriately for _whatever_ code page the user desires.  But
it simply doesn't parse.  Local code pages are not effective for file naming, for
most applications, unless more information is known about the system.  We don't have
a way to provide that information.

> None of the APIs change their semantics. They exist or they don't, but they
> don't change.
> 
> The answer is to have apr_file_open_u() for opening with Unicode filenames,
> not changing the encoding of the existing apr_file_open. You completely
> break all possibility of writing portable apps when you do that. And APR is
> *about* writing portable apps.

What does apr_file_open_u() do on Unix?  I would expect, nothing.  Unless you have
a utf-8 build of unix (which there are) this is pretty meaningless.  But what _if_
the user is building apr under a utf-8 powered unix?  Is the filename Bite%x81Me.txt
accepted?  I can't answer the question.  What happens if it is accepted and created?
What does ls Bite* do?  That character alone is a continuation character with no lead
byte.  Does ls show anything worthwhile?

I'm saying stop even looking at Win32 for 10 minutes, and examine the bigger issues
that allow this to become a cross-platform API.  Then we can begin the process of
determining an _appropriate_ api to cover these issues.

There is nothing that says _any_ filesystem accepts high bit characters, except that
some do.  How can we relate this to the user and the coder?  I don't have an answer,
I simply believe that apr_functions_u() this anything but the common denominator.

Bill


Re: unicode file APIs (was: Re: canonical stuff)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "dean gaudet" <dg...@arctic.org>
Sent: Sunday, February 25, 2001 7:42 PM


> i'm a bit of an I18N novice, but doesn't it all just magically work if you
> use UTF-8 encoding everywhere?
>
> UTF-8 deliberately avoids using \0 and / in the encodings.  plain ascii
> works unmodified.  unix filesystems generally support UTF-8 directly
> (because of the \0 and / avoidance).
>
> this allows you to have a single API which understands unicode on all
> platforms -- you don't need to have _u versions which take unicode
> strings.

You are understanding exactly what I proposed with APR_HAS_UNICODE_FS.
My only small change is a way to get config directives in with wchar
support.  Since Win32 has no utf-8 editor, I'm working out the patch
to recognize the lead word of a unicode stream and switch to unicode
to utf-8 conversion.  Even notepad on Win32 supports unicode files, so
this becomes a no-brainer for administrators.

> give this page a perusal:  http://www.cl.cam.ac.uk/~mgk25/unicode.html

I especially liked a comment from http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux

a.. External file system drivers such as VFAT and WinNT have to convert file name character encodings. UTF-8 has to be added to the
list of already available conversion options, and the mount command has to tell the kernel driver that user processes shall see
UTF-8 file names. Since VFAT and WinNT use already Unicode anyway, UTF-8 has the advantage of guaranteeing a lossless conversion
here.

My key concept is _lossless_.  All SomeWin32FunctionA() variants are lossy, and
their encoding doesn't correspond to MS's own clib [we can comment on their lack
of brain cells here ... but we won't.]  All SomeWin32FunctionW() variants are
not only lossless, but faster.  Obviously we replace their conversion cycles
from local code page to unicode with our own utf-8 to unicode functions, but that
shouldn't (if I succeeded) add any net CPU cycles.

Of course they don't correspond to the clib functions [e.g. - consider strlen()]
but we are damned if we do... damned if we don't.  mod_autoindex obviously needs
to see APR_IS_UNICODE_FS and adjust the width accordingly.  We will get there, but
we aren't there yet.

If we support the native narrow characters we need an effective API to do so
[should we use the current ansi code page or the current oem code page?]  We didn't
have a respectable design, and this change made all those other issues mute.



Re: unicode file APIs (was: Re: canonical stuff)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "dean gaudet" <dg...@arctic.org>
Sent: Sunday, February 25, 2001 7:42 PM


> i'm a bit of an I18N novice, but doesn't it all just magically work if you
> use UTF-8 encoding everywhere?
>
> UTF-8 deliberately avoids using \0 and / in the encodings.  plain ascii
> works unmodified.  unix filesystems generally support UTF-8 directly
> (because of the \0 and / avoidance).
>
> this allows you to have a single API which understands unicode on all
> platforms -- you don't need to have _u versions which take unicode
> strings.

You are understanding exactly what I proposed with APR_HAS_UNICODE_FS.
My only small change is a way to get config directives in with wchar
support.  Since Win32 has no utf-8 editor, I'm working out the patch
to recognize the lead word of a unicode stream and switch to unicode
to utf-8 conversion.  Even notepad on Win32 supports unicode files, so
this becomes a no-brainer for administrators.

> give this page a perusal:  http://www.cl.cam.ac.uk/~mgk25/unicode.html

I especially liked a comment from http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux

a.. External file system drivers such as VFAT and WinNT have to convert file name character encodings. UTF-8 has to be added to the
list of already available conversion options, and the mount command has to tell the kernel driver that user processes shall see
UTF-8 file names. Since VFAT and WinNT use already Unicode anyway, UTF-8 has the advantage of guaranteeing a lossless conversion
here.

My key concept is _lossless_.  All SomeWin32FunctionA() variants are lossy, and
their encoding doesn't correspond to MS's own clib [we can comment on their lack
of brain cells here ... but we won't.]  All SomeWin32FunctionW() variants are
not only lossless, but faster.  Obviously we replace their conversion cycles
from local code page to unicode with our own utf-8 to unicode functions, but that
shouldn't (if I succeeded) add any net CPU cycles.

Of course they don't correspond to the clib functions [e.g. - consider strlen()]
but we are damned if we do... damned if we don't.  mod_autoindex obviously needs
to see APR_IS_UNICODE_FS and adjust the width accordingly.  We will get there, but
we aren't there yet.

If we support the native narrow characters we need an effective API to do so
[should we use the current ansi code page or the current oem code page?]  We didn't
have a respectable design, and this change made all those other issues mute.



Re: unicode file APIs (was: Re: canonical stuff)

Posted by Sander van Zoest <sa...@covalent.net>.
On Sun, 25 Feb 2001, dean gaudet wrote:

> > The answer is to have apr_file_open_u() for opening with Unicode filenames,
> > not changing the encoding of the existing apr_file_open. You completely
> > break all possibility of writing portable apps when you do that. And APR is
> > *about* writing portable apps.
> i'm a bit of an I18N novice, but doesn't it all just magically work if you
> use UTF-8 encoding everywhere?
> 
> UTF-8 deliberately avoids using \0 and / in the encodings.  plain ascii
> works unmodified.  unix filesystems generally support UTF-8 directly
> (because of the \0 and / avoidance).
> 
> this allows you to have a single API which understands unicode on all
> platforms -- you don't need to have _u versions which take unicode
> strings.
> 
> give this page a perusal:  http://www.cl.cam.ac.uk/~mgk25/unicode.html

i18n can be kind of pain when you need to convert data that you do not
know the charset for or is data you do not control.

Going to a fully ISO-10646 (UTF-8) system would kill all the issues,
but the problem is making that migration and converting everything. This
is where there isn't too much code out there that does all the mappings.

I do think, as wrowe points out, this probably should be handled inside
APR, so this way apache can handle as much as possible in ISO-10646, 
especially if everything it interacts with supports it.

Now the problem comes in when you deal with non 10646 stuff outside of
the ASCII and latin1 charsets when you have a 10646 based server. You
need to convert somehow and if we convert to UTF-8 via iconv then I
do not see an issue.
  
--
Sander van Zoest                                         [sander@covalent.net]
Covalent Technologies, Inc.                           http://www.covalent.net/
(415) 536-5218                                 http://www.vanzoest.com/sander/


Re: unicode file APIs (was: Re: canonical stuff)

Posted by dean gaudet <dg...@arctic.org>.
i'm a bit of an I18N novice, but doesn't it all just magically work if you
use UTF-8 encoding everywhere?

UTF-8 deliberately avoids using \0 and / in the encodings.  plain ascii
works unmodified.  unix filesystems generally support UTF-8 directly
(because of the \0 and / avoidance).

this allows you to have a single API which understands unicode on all
platforms -- you don't need to have _u versions which take unicode
strings.

give this page a perusal:  http://www.cl.cam.ac.uk/~mgk25/unicode.html

-dean

On Sat, 24 Feb 2001, Greg Stein wrote:

> On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> > From: "Greg Stein" <gs...@lyra.org>
> > Sent: Saturday, February 24, 2001 3:44 AM
> >...
> > > In a similar vein, when you added all that Unicode stuff, it just kind of
> > > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > > didn't affect my playground), but it was an awfully big change. Especially
> > > in the semantics. We still haven't refactored the API into two sets of
> > > functions (one for Unicode chars, one for 8-bit native).
> >
> > I'm absolutely positively near certain we won't.  Please let me explain.
> >
> > ... lot of stuff about why Unicode filenames are Goodness ...
>
> I don't disagree with wanting Unicode filenames. I completely disagree with
> APIs that change their semantics based on the platform they are compiled on.
>
> If I have an application that I desire to be portable, then I'm going to use
> APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
> work on all platforms. With the current single API, it will break on NT when
> compiled with the Unicode stuff.
>
> None of the APIs change their semantics. They exist or they don't, but they
> don't change.
>
> The answer is to have apr_file_open_u() for opening with Unicode filenames,
> not changing the encoding of the existing apr_file_open. You completely
> break all possibility of writing portable apps when you do that. And APR is
> *about* writing portable apps.
>
> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/
>



Re: unicode file APIs (was: Re: canonical stuff)

Posted by dean gaudet <dg...@arctic.org>.
i'm a bit of an I18N novice, but doesn't it all just magically work if you
use UTF-8 encoding everywhere?

UTF-8 deliberately avoids using \0 and / in the encodings.  plain ascii
works unmodified.  unix filesystems generally support UTF-8 directly
(because of the \0 and / avoidance).

this allows you to have a single API which understands unicode on all
platforms -- you don't need to have _u versions which take unicode
strings.

give this page a perusal:  http://www.cl.cam.ac.uk/~mgk25/unicode.html

-dean

On Sat, 24 Feb 2001, Greg Stein wrote:

> On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> > From: "Greg Stein" <gs...@lyra.org>
> > Sent: Saturday, February 24, 2001 3:44 AM
> >...
> > > In a similar vein, when you added all that Unicode stuff, it just kind of
> > > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > > didn't affect my playground), but it was an awfully big change. Especially
> > > in the semantics. We still haven't refactored the API into two sets of
> > > functions (one for Unicode chars, one for 8-bit native).
> >
> > I'm absolutely positively near certain we won't.  Please let me explain.
> >
> > ... lot of stuff about why Unicode filenames are Goodness ...
>
> I don't disagree with wanting Unicode filenames. I completely disagree with
> APIs that change their semantics based on the platform they are compiled on.
>
> If I have an application that I desire to be portable, then I'm going to use
> APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
> work on all platforms. With the current single API, it will break on NT when
> compiled with the Unicode stuff.
>
> None of the APIs change their semantics. They exist or they don't, but they
> don't change.
>
> The answer is to have apr_file_open_u() for opening with Unicode filenames,
> not changing the encoding of the existing apr_file_open. You completely
> break all possibility of writing portable apps when you do that. And APR is
> *about* writing portable apps.
>
> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/
>



unicode file APIs (was: Re: canonical stuff)

Posted by Greg Stein <gs...@lyra.org>.
On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> From: "Greg Stein" <gs...@lyra.org>
> Sent: Saturday, February 24, 2001 3:44 AM
>...
> > In a similar vein, when you added all that Unicode stuff, it just kind of
> > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > didn't affect my playground), but it was an awfully big change. Especially
> > in the semantics. We still haven't refactored the API into two sets of
> > functions (one for Unicode chars, one for 8-bit native).
> 
> I'm absolutely positively near certain we won't.  Please let me explain.
>
> ... lot of stuff about why Unicode filenames are Goodness ...

I don't disagree with wanting Unicode filenames. I completely disagree with
APIs that change their semantics based on the platform they are compiled on.

If I have an application that I desire to be portable, then I'm going to use
APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
work on all platforms. With the current single API, it will break on NT when
compiled with the Unicode stuff.

None of the APIs change their semantics. They exist or they don't, but they
don't change.

The answer is to have apr_file_open_u() for opening with Unicode filenames,
not changing the encoding of the existing apr_file_open. You completely
break all possibility of writing portable apps when you do that. And APR is
*about* writing portable apps.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

unicode file APIs (was: Re: canonical stuff)

Posted by Greg Stein <gs...@lyra.org>.
On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> From: "Greg Stein" <gs...@lyra.org>
> Sent: Saturday, February 24, 2001 3:44 AM
>...
> > In a similar vein, when you added all that Unicode stuff, it just kind of
> > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > didn't affect my playground), but it was an awfully big change. Especially
> > in the semantics. We still haven't refactored the API into two sets of
> > functions (one for Unicode chars, one for 8-bit native).
> 
> I'm absolutely positively near certain we won't.  Please let me explain.
>
> ... lot of stuff about why Unicode filenames are Goodness ...

I don't disagree with wanting Unicode filenames. I completely disagree with
APIs that change their semantics based on the platform they are compiled on.

If I have an application that I desire to be portable, then I'm going to use
APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
work on all platforms. With the current single API, it will break on NT when
compiled with the Unicode stuff.

None of the APIs change their semantics. They exist or they don't, but they
don't change.

The answer is to have apr_file_open_u() for opening with Unicode filenames,
not changing the encoding of the existing apr_file_open. You completely
break all possibility of writing portable apps when you do that. And APR is
*about* writing portable apps.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: canonical stuff (was: Re: apache 2.0.11 - tag 2.0.12?)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "Greg Stein" <gs...@lyra.org>
Sent: Saturday, February 24, 2001 3:44 AM


> On Fri, Feb 23, 2001 at 02:21:22PM -0600, William A. Rowe, Jr. wrote:
> >...
> > I have some very major structural hacking to do to wipe out the old canonical
> > methods - and a quiet house to do so for the next two days.  I don't want to
> > start warping the source as must be done till we have this 'good' tag so other
> > folks can start looking for any remaining leaks and holes.
>
> Can we do the canonical stuff in pieces rather than wholesale? IOW, add the
> new functions into CVS and review. After that is stable, then start the
> conversion process. (specifically, there was a lot of concerns all around
> about how this stuff would be built/operate, so it seems prudent to do that
> outline via actual code, agree on it, then to use it)

Yes yes yes!  Very shortly... let's please get 2_0_12 ready (I see you did :-)

> In a similar vein, when you added all that Unicode stuff, it just kind of
> dropped into the code. No big deal as it was all Win32 specific (i.e. it
> didn't affect my playground), but it was an awfully big change. Especially
> in the semantics. We still haven't refactored the API into two sets of
> functions (one for Unicode chars, one for 8-bit native).

I'm absolutely positively near certain we won't.  Please let me explain.

The underlying 'real' filesystem on WinNT [not on 9x] is Unicode.  There is a
huge body of folks that don't restrict their playgrounds to ASCII - much of
their keyboard isn't.  I wrote a java script playground about two years ago
to experiment with client-side reporting [allowing the client to handle the
cruft of resorting reports.  Problem?  http://mall.lnd.com/wrowe/pokémon/
wasn't working for me, where it was on my local file system.  Gave up, of
course, at that time.

What does this have to do with anything?  Win32 is completely bogus in terms
of context.  This patch was required, as soon as the per-user-vhost stuff is
added to the mpm, we begin to see the 128-255 character values start to shift
based on the --user's-- desired codepage and location.  This is entirely bogus
for a server, although it is pretty cool for a shared interactive workstation.

Apache - in terms of canonical, absolute values for filenames - doesn't care to
see shifting cruft like that.  So we end up in a very odd situation.  Either we
dismiss this all and set up a bunch of 'use this codepage' controls to assure
we aren't shifting around - or we simply use the full and unrestricted charset.

Users have asked for Unicode filenames.  Very few members of this list would
care to see the Apache engine provide wchar_t support for all our strings.  That
would be a monster.  Since everything we do is down the 8 bit wire, it makes
next to 0 sense to even attempt it.

How does Unix provide support?  Utf-8, typically.  Yes - you can setup your
local code page and even support multibyte encodings like jis - but why?  If
you are serving international web sites - Utf-8 is the way to go for naming
resources across many languages.

But most importantly, once we convert to unicode, we break the 255 character
limits on file pathnames.  We break through Windows internal name conversion
that always occurs from user's current codepage into unicode.  This needs the
benchmarking and comparison, optimization of my quick (?) utf8 converter, and
possibly other refinements, but it is the way to go.

We aren't done - getting Unicode environment variables into perl or other unicode
enabled parsers still needs to be done.  But I don't see a clean way to provide
both high-bit latin characters and Unicode at the same time.

If we layer our encoding onto the filename functions, and eventually allow the
user to specify a file name in any encoding, that's cool.  That could be it's own
API, or maybe simply an apr_filesystem_encoding_set/apr_filesystem_encoding_get
[I don't like this solution in a multi-threaded/multiple libraries linking against
a common apr library scenario.]

All we did was transform ambigous naming into absolute naming as the underlying
API --- where we go from here is the apr community's choice, but IMHO Apache
doesn't need the second API.  All Apache needs now is the code to detect FFFE or
FEFF from the first two bytes of any config file, to decide it's a file saved as
unicode and convert to utf-8 on the fly.

Bill



Re: canonical stuff (was: Re: apache 2.0.11 - tag 2.0.12?)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "Greg Stein" <gs...@lyra.org>
Sent: Saturday, February 24, 2001 3:44 AM


> On Fri, Feb 23, 2001 at 02:21:22PM -0600, William A. Rowe, Jr. wrote:
> >...
> > I have some very major structural hacking to do to wipe out the old canonical
> > methods - and a quiet house to do so for the next two days.  I don't want to
> > start warping the source as must be done till we have this 'good' tag so other
> > folks can start looking for any remaining leaks and holes.
>
> Can we do the canonical stuff in pieces rather than wholesale? IOW, add the
> new functions into CVS and review. After that is stable, then start the
> conversion process. (specifically, there was a lot of concerns all around
> about how this stuff would be built/operate, so it seems prudent to do that
> outline via actual code, agree on it, then to use it)

Yes yes yes!  Very shortly... let's please get 2_0_12 ready (I see you did :-)

> In a similar vein, when you added all that Unicode stuff, it just kind of
> dropped into the code. No big deal as it was all Win32 specific (i.e. it
> didn't affect my playground), but it was an awfully big change. Especially
> in the semantics. We still haven't refactored the API into two sets of
> functions (one for Unicode chars, one for 8-bit native).

I'm absolutely positively near certain we won't.  Please let me explain.

The underlying 'real' filesystem on WinNT [not on 9x] is Unicode.  There is a
huge body of folks that don't restrict their playgrounds to ASCII - much of
their keyboard isn't.  I wrote a java script playground about two years ago
to experiment with client-side reporting [allowing the client to handle the
cruft of resorting reports.  Problem?  http://mall.lnd.com/wrowe/pokémon/
wasn't working for me, where it was on my local file system.  Gave up, of
course, at that time.

What does this have to do with anything?  Win32 is completely bogus in terms
of context.  This patch was required, as soon as the per-user-vhost stuff is
added to the mpm, we begin to see the 128-255 character values start to shift
based on the --user's-- desired codepage and location.  This is entirely bogus
for a server, although it is pretty cool for a shared interactive workstation.

Apache - in terms of canonical, absolute values for filenames - doesn't care to
see shifting cruft like that.  So we end up in a very odd situation.  Either we
dismiss this all and set up a bunch of 'use this codepage' controls to assure
we aren't shifting around - or we simply use the full and unrestricted charset.

Users have asked for Unicode filenames.  Very few members of this list would
care to see the Apache engine provide wchar_t support for all our strings.  That
would be a monster.  Since everything we do is down the 8 bit wire, it makes
next to 0 sense to even attempt it.

How does Unix provide support?  Utf-8, typically.  Yes - you can setup your
local code page and even support multibyte encodings like jis - but why?  If
you are serving international web sites - Utf-8 is the way to go for naming
resources across many languages.

But most importantly, once we convert to unicode, we break the 255 character
limits on file pathnames.  We break through Windows internal name conversion
that always occurs from user's current codepage into unicode.  This needs the
benchmarking and comparison, optimization of my quick (?) utf8 converter, and
possibly other refinements, but it is the way to go.

We aren't done - getting Unicode environment variables into perl or other unicode
enabled parsers still needs to be done.  But I don't see a clean way to provide
both high-bit latin characters and Unicode at the same time.

If we layer our encoding onto the filename functions, and eventually allow the
user to specify a file name in any encoding, that's cool.  That could be it's own
API, or maybe simply an apr_filesystem_encoding_set/apr_filesystem_encoding_get
[I don't like this solution in a multi-threaded/multiple libraries linking against
a common apr library scenario.]

All we did was transform ambigous naming into absolute naming as the underlying
API --- where we go from here is the apr community's choice, but IMHO Apache
doesn't need the second API.  All Apache needs now is the code to detect FFFE or
FEFF from the first two bytes of any config file, to decide it's a file saved as
unicode and convert to utf-8 on the fly.

Bill