You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2002/07/19 18:23:13 UTC

Re: [PATCH] Determining the character encoding used for paths in APR

At 10:29 AM 7/19/2002, Karl Fogel wrote:
>Branko Čibej <br...@xbc.nu> writes:
> > Anyway, the intent here is to tell the user what the APR
> > _implementation_ knows about the path encoding. On most platforms, APR
> > doesn't do anything with the paths, so we know that information can be
> > pulled from the locale. On Windows, we know when we'll return UTF-8,
> > regardless of locale. The enum would only grow if we started doing
> > something radically different in a port to a new system.

On Windows, we will only utf-8 on WinNT, on the Win9x family we will have
to return the local code page.

> > We need opinions from lots of Unix people here. Folks, don't be shy! :-)
>
>:-)
>
>+1, and I think I prefer the simplicity of Brane's original
>implementation.
>
>In fact, I wonder if we couldn't do it *entirely* with macros, since
>the function call just compiles down to a single return statement
>anyway.  Why not avoid the function call overhead entirely?

On Win32, we need to a function to return based on platform.
But I must be missing something ... where do we intend to pick
up the local code page values from on Win9x and other, more
obscure unices that don't simply follow the locale construct?

I'd really like Jeff Trawick's feedback on this, given his expsoure
to the many ebcdic-based boxes.  If we implement this at all,
I'd like to avoid a half-assed approach and return something useful,
such as determine the local charset used for the filesystem across
all platforms.  Otherwise this is YAWH [yet another Win32 hack].

Bill





Re: [PATCH] Determining the character encoding used for paths in APR

Posted by br...@xbc.nu.
Quoting "William A. Rowe, Jr." <wr...@rowe-clan.net>:

> At 10:29 AM 7/19/2002, Karl Fogel wrote:
> >Branko Ä&#65533;ibej <br...@xbc.nu> writes:
> > > Anyway, the intent here is to tell the user what the APR
> > > _implementation_ knows about the path encoding. On most platforms,
> APR
> > > doesn't do anything with the paths, so we know that information can
> be
> > > pulled from the locale. On Windows, we know when we'll return
> UTF-8,
> > > regardless of locale. The enum would only grow if we started doing
> > > something radically different in a port to a new system.
> 
> On Windows, we will only utf-8 on WinNT, on the Win9x family we will
> have
> to return the local code page.

Exactly. That's just what the patch does.

> > > We need opinions from lots of Unix people here. Folks, don't be shy!
> :-)
> >
> >:-)
> >
> >+1, and I think I prefer the simplicity of Brane's original
> >implementation.
> >
> >In fact, I wonder if we couldn't do it *entirely* with macros, since
> >the function call just compiles down to a single return statement
> >anyway.  Why not avoid the function call overhead entirely?
> 
> On Win32, we need to a function to return based on platform.
> But I must be missing something ... 

Yes :-)

> where do we intend to pick
> up the local code page values from on Win9x and other, more
> obscure unices that don't simply follow the locale construct?

Let me say again: The intent of this function is _not_ to return the actual
encoding. It's intent is to tell you how to _find_ the encoding, so that you can
use apr_xlate correctly on the paths.

The use case is:
    switch (apr_filepath_encoding())
    {
    case APR_FILEPATH_ENCODING_LOCALE:
        cvt = apr_xlate_open(APR_LOCALE_CHARSET, "foo");
    case APR_FILEPATH_ENCODING_UTF8:
        cvt = apr_xlate_open("UTF-8", "foo");
    default:
        /* What to do? Reeling, writhing and fainting in coils
           might be appropriate. */;
    }





> I'd really like Jeff Trawick's feedback on this, given his expsoure
> to the many ebcdic-based boxes.  If we implement this at all,
> I'd like to avoid a half-assed approach and return something useful,
> such as determine the local charset used for the filesystem across
> all platforms.  Otherwise this is YAWH [yet another Win32 hack].

No, it't not.


Um, O.K., I concede there's another way to do this.... We'll need
apr_os_locale_charset() and apr_os_default_charset() in APR anyway, because
apr_xlate needs them. In that case, this function can really return the actual
charset identifier.