You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2002/07/19 02:16:46 UTC

Re: [PATCH] Determining the character encoding used for paths in APR

Before we consider this patch, there is a deeper question.

Will we support MULTIPLE paths?  If so we should consider a given
filepath as an argument to the function.  I'm thinking unix or NFS mounts
here, not really win32 filesystems since NT is utf-8 and others are local
codepages.

Finally, I would rather return an apr_xlate compatible string, not some
enum that must be extended forever.

Bill

At 05:45 PM 7/18/2002, brane wrote:
>Bill, you may remember use talking about this issue on IRC a couple of 
>days ago. Subversion needs this information, so I cooked up a patch. Could 
>you please have a quick look at it, to see if it's O.K.? I know you're 
>very busy at the moment ....
>
>Thanks.
>
>Index: include/apr_file_info.h
>===================================================================
>RCS file: /home/cvs/apr/include/apr_file_info.h,v
>retrieving revision 1.32
>diff -u -p -r1.32 apr_file_info.h
>--- include/apr_file_info.h     20 Mar 2002 08:54:43 -0000      1.32
>+++ include/apr_file_info.h     18 Jul 2002 22:37:50 -0000
>@@ -314,6 +314,14 @@ APR_DECLARE(apr_status_t) apr_dir_rewind
>  * trailing slash if a directory
>  */
>#define APR_FILEPATH_TRUENAME       0x20
>+
>+
>+typedef enum
>+{
>+    APR_FILEPATH_ENCODING_UNKNOWN, /**< The path encoding is not known. */
>+    APR_FILEPATH_ENCODING_DEFAULT, /**< The locale determines path 
>encoding. */
>+    APR_FILEPATH_ENCODING_UTF8     /**< The path endoding is UTF-8. */
>+} apr_filepath_encoding_e;
>/** @} */
>/**
>  * Extract the rootpath from the given filepath
>@@ -381,6 +389,14 @@ APR_DECLARE(apr_status_t) apr_filepath_g
>  * @deffunc apr_status_t apr_filepath_get(char **defpath, apr_pool_t *p)
>  */
>APR_DECLARE(apr_status_t) apr_filepath_set(const char *path, apr_pool_t *p);
>+
>+
>+/**
>+ * Return the character encoding for paths produced and consumed by APR
>+ * @ingroup apr_filepath
>+ * @deffunc apr_filepath_encoding_e apr_filepath_encoding(void)
>+ */
>+APR_DECLARE(apr_filepath_encoding_e) apr_filepath_encoding(void);
>/** @} */
>Index: file_io/unix/filepath.c
>===================================================================
>RCS file: /home/cvs/apr/file_io/unix/filepath.c,v
>retrieving revision 1.15
>diff -u -p -r1.15 filepath.c
>--- file_io/unix/filepath.c     12 Jun 2002 01:42:35 -0000      1.15
>+++ file_io/unix/filepath.c     18 Jul 2002 22:37:50 -0000
>@@ -329,3 +329,9 @@ APR_DECLARE(apr_status_t) apr_filepath_m
>     *newpath = path;
>     return APR_SUCCESS;
>}
>+
>+
>+APR_DECLARE(apr_filepath_encoding_e) apr_filepath_encoding(void)
>+{
>+    return APR_FILEPATH_ENCODING_DEFAULT;
>+}
>Index: file_io/win32/filepath.c
>===================================================================
>RCS file: /home/cvs/apr/file_io/win32/filepath.c,v
>retrieving revision 1.25
>diff -u -p -r1.25 filepath.c
>--- file_io/win32/filepath.c    10 Jul 2002 06:01:12 -0000      1.25
>+++ file_io/win32/filepath.c    18 Jul 2002 22:37:51 -0000
>@@ -966,3 +966,20 @@ APR_DECLARE(apr_status_t) apr_filepath_m
>     (*newpath)[pathlen] = '\0';
>     return APR_SUCCESS;
>}
>+
>+
>+APR_DECLARE(apr_filepath_encoding_e) apr_filepath_encoding(void)
>+{
>+#ifdef WIN32
>+#if APR_HAS_UNICODE_FS
>+    IF_WIN_OS_IS_UNICODE
>+        return APR_FILEPATH_ENCODING_UTF8;
>+#endif
>+#if APR_HAS_ANSI_FS
>+    ELSE_WIN_OS_IS_ANSI
>+        return APR_FILEPATH_ENCODING_DEFAULT;
>+#endif
>+#else  /* !WIN32 */
>+    return APR_FILEPATH_ENCODING_DEFAULT;
>+#endif /* !WIN32 */
>+}
>
>
>
>--
>Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/



Re: [PATCH] Determining the character encoding used for paths in APR

Posted by br...@xbc.nu.
Quoting "William A. Rowe, Jr." <wr...@rowe-clan.net>:

> At 10:29 AM 7/19/2002, Karl Fogel wrote:
> >Branko Ä&#65533;ibej <br...@xbc.nu> writes:
> > > Anyway, the intent here is to tell the user what the APR
> > > _implementation_ knows about the path encoding. On most platforms,
> APR
> > > doesn't do anything with the paths, so we know that information can
> be
> > > pulled from the locale. On Windows, we know when we'll return
> UTF-8,
> > > regardless of locale. The enum would only grow if we started doing
> > > something radically different in a port to a new system.
> 
> On Windows, we will only utf-8 on WinNT, on the Win9x family we will
> have
> to return the local code page.

Exactly. That's just what the patch does.

> > > We need opinions from lots of Unix people here. Folks, don't be shy!
> :-)
> >
> >:-)
> >
> >+1, and I think I prefer the simplicity of Brane's original
> >implementation.
> >
> >In fact, I wonder if we couldn't do it *entirely* with macros, since
> >the function call just compiles down to a single return statement
> >anyway.  Why not avoid the function call overhead entirely?
> 
> On Win32, we need to a function to return based on platform.
> But I must be missing something ... 

Yes :-)

> where do we intend to pick
> up the local code page values from on Win9x and other, more
> obscure unices that don't simply follow the locale construct?

Let me say again: The intent of this function is _not_ to return the actual
encoding. It's intent is to tell you how to _find_ the encoding, so that you can
use apr_xlate correctly on the paths.

The use case is:
    switch (apr_filepath_encoding())
    {
    case APR_FILEPATH_ENCODING_LOCALE:
        cvt = apr_xlate_open(APR_LOCALE_CHARSET, "foo");
    case APR_FILEPATH_ENCODING_UTF8:
        cvt = apr_xlate_open("UTF-8", "foo");
    default:
        /* What to do? Reeling, writhing and fainting in coils
           might be appropriate. */;
    }





> I'd really like Jeff Trawick's feedback on this, given his expsoure
> to the many ebcdic-based boxes.  If we implement this at all,
> I'd like to avoid a half-assed approach and return something useful,
> such as determine the local charset used for the filesystem across
> all platforms.  Otherwise this is YAWH [yet another Win32 hack].

No, it't not.


Um, O.K., I concede there's another way to do this.... We'll need
apr_os_locale_charset() and apr_os_default_charset() in APR anyway, because
apr_xlate needs them. In that case, this function can really return the actual
charset identifier.

Re: [PATCH] Determining the character encoding used for paths in APR

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 10:29 AM 7/19/2002, Karl Fogel wrote:
>Branko Čibej <br...@xbc.nu> writes:
> > Anyway, the intent here is to tell the user what the APR
> > _implementation_ knows about the path encoding. On most platforms, APR
> > doesn't do anything with the paths, so we know that information can be
> > pulled from the locale. On Windows, we know when we'll return UTF-8,
> > regardless of locale. The enum would only grow if we started doing
> > something radically different in a port to a new system.

On Windows, we will only utf-8 on WinNT, on the Win9x family we will have
to return the local code page.

> > We need opinions from lots of Unix people here. Folks, don't be shy! :-)
>
>:-)
>
>+1, and I think I prefer the simplicity of Brane's original
>implementation.
>
>In fact, I wonder if we couldn't do it *entirely* with macros, since
>the function call just compiles down to a single return statement
>anyway.  Why not avoid the function call overhead entirely?

On Win32, we need to a function to return based on platform.
But I must be missing something ... where do we intend to pick
up the local code page values from on Win9x and other, more
obscure unices that don't simply follow the locale construct?

I'd really like Jeff Trawick's feedback on this, given his expsoure
to the many ebcdic-based boxes.  If we implement this at all,
I'd like to avoid a half-assed approach and return something useful,
such as determine the local charset used for the filesystem across
all platforms.  Otherwise this is YAWH [yet another Win32 hack].

Bill





Re: [PATCH] Determining the character encoding used for paths in APR

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Branko Čibej <br...@xbc.nu> writes:
> Anyway, the intent here is to tell the user what the APR
> _implementation_ knows about the path encoding. On most platforms, APR
> doesn't do anything with the paths, so we know that information can be
> pulled from the locale. On Windows, we know when we'll return UTF-8,
> regardless of locale. The enum would only grow if we started doing
> something radically different in a port to a new system.
> 
> We need opinions from lots of Unix people here. Folks, don't be shy! :-)

:-)

+1, and I think I prefer the simplicity of Brane's original
implementation.

In fact, I wonder if we couldn't do it *entirely* with macros, since
the function call just compiles down to a single return statement
anyway.  Why not avoid the function call overhead entirely?

-K

Re: [PATCH] Determining the character encoding used for paths in APR

Posted by Branko Čibej <br...@xbc.nu>.
William A. Rowe, Jr. wrote:

> Before we consider this patch, there is a deeper question.
>
> Will we support MULTIPLE paths?  If so we should consider a given
> filepath as an argument to the function.  I'm thinking unix or NFS mounts
> here, not really win32 filesystems since NT is utf-8 and others are local
> codepages.


I don't think you _can_ determine the actual encoding for paths on NFS 
mounts. Anyone?
That said, adding a path argument wouldn't hurt -- even though you'd 
have to pass in a pool, too, for completeness. :-)

How about:

apr_status_t apr_filepath_encoding (apr_filepath_encoding_e *encoding,
                                    const char* path, apr_pool_t *pool);


It looks like overkill to me, though ... -0.

> Finally, I would rather return an apr_xlate compatible string, not some
> enum that must be extended forever.


You'd have to include apr_xlate.h to get APR_LOCALE_CHARSET and 
APR_DEFAULT_CHARSET, but that file is in APR-util now. I don't want to 
move those constants back into APR.

Anyway, the intent here is to tell the user what the APR 
_implementation_ knows about the path encoding. On most platforms, APR 
doesn't do anything with the paths, so we know that information can be 
pulled from the locale. On Windows, we know when we'll return UTF-8, 
regardless of locale. The enum would only grow if we started doing 
something radically different in a port to a new system.


We need opinions from lots of Unix people here. Folks, don't be shy! :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/