You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2006/01/18 20:17:30 UTC

httpd and locales

Is there any particular reason that httpd never does the
'setlocale(LC_ALL, "");' magic necessary to get libc to respect the
various locale related environment variables?  As far as I can tell,
despite system settings for locale (i.e. /etc/sysconfig/i18n on RHEL)
httpd always runs with a locale of C, which is fine for most things,
but pretty irritating if you have a need to do stuff with multibyte
strings in a module.

Just adding a call to setlocale with a "" locale in httpd's main makes
my particular problem go away, but I'm kind of hesitant to propose
actually doing so since I don't know what kind of fallout there would
be from having httpd all of a sudden start respecting the environment
variables...

-garrett

Re: httpd and locales

Posted by Joe Orton <jo...@redhat.com>.
On Thu, Jan 19, 2006 at 11:09:13AM -0800, Garrett Rooney wrote:
> On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > * Branko Čibej wrote:
> >
> > > You're confusing the content of the SVN repository and hook scripts
> > > stored on the local filesystem. Paths in the first are always encoded in
> > > UTF-8. The latter naturally have to obey the server's locale.
> >
> > I don't think so. The task was to pass the name of a file stored in the
> > repository to a hook script via the command line. Otherwise I must have
> > misunderstood something quite heavily.
> 
> That is correct, it's an argument to the hook script that happens to
> contain the path of a file in the repository.  Currently all arguments
> are transcoded from utf8 to native before we execute the hook script.

I really don't think that relying on that working properly is a good 
idea.  All it takes is for one rogue PHP script to set the locale to 
some odd locale to be able to print currency symbols properly or 
whatever, and the hook scripts would start behaving really strangely.

As a module author, presuming the locale is undefined is the safest bet, 
and as an adminstrator, starting the server in the C locale is the 
safest bet.

joe

Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/19/06, André Malo <nd...@perlig.de> wrote:
> * Branko Čibej wrote:
>
> > You're confusing the content of the SVN repository and hook scripts
> > stored on the local filesystem. Paths in the first are always encoded in
> > UTF-8. The latter naturally have to obey the server's locale.
>
> I don't think so. The task was to pass the name of a file stored in the
> repository to a hook script via the command line. Otherwise I must have
> misunderstood something quite heavily.

That is correct, it's an argument to the hook script that happens to
contain the path of a file in the repository.  Currently all arguments
are transcoded from utf8 to native before we execute the hook script.

-garrett

Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Branko Čibej wrote:

> You're confusing the content of the SVN repository and hook scripts
> stored on the local filesystem. Paths in the first are always encoded in
> UTF-8. The latter naturally have to obey the server's locale.

I don't think so. The task was to pass the name of a file stored in the 
repository to a hook script via the command line. Otherwise I must have 
misunderstood something quite heavily.

nd
-- 
Das einzige, das einen Gebäudekollaps (oder auch einen
thermonuklearen Krieg) unbeschadet übersteht, sind Kakerlaken
und AOL-CDs.
                                      -- Bastian Lipp in dcsm

Re: httpd and locales

Posted by Branko Čibej <br...@xbc.nu>.
André Malo wrote:
> * Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
>   
>>> It doesn't belong here, but... I'm wondering why the path isn't passed as
>>> UTF-8. Why is it translated to the locale at all? It's all happening within
>>> the svn file system, so I'd really expect to get utf-8 and would consider
>>> locale translation as a bug.
>>>       
>> Well, I imagine that the assumption is that any hook script is going
>> to be using the actual locale specified in LANG/LC_ALL/etc env
>> variables, so if we don't translate to that locale it'll get rather
>> confused by utf8 data in its command line.  As a general rule svn
>> translates from native -> utf8 on input and from utf8 -> native for
>> output.  Ironically, if the LANG/LC_ALL/etc env vars were being
>> followed by httpd this translation would be a noop, since the system
>> uses a utf8 locale...
>>     
>
> So whether the users of a repository (httpd or svnserve) may use the full
> unicode range for their files depends on the locale of the server? That feels
> just wrong ;-) I don't see how there are command line confusings...
>   
You're confusing the content of the SVN repository and hook scripts 
stored on the local filesystem. Paths in the first are always encoded in 
UTF-8. The latter naturally have to obey the server's locale.

-- Brane


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Nicolás Lichtmaier wrote:

> > To take a step back, it might not be unreasonable to have the hook
> > scripts communicate through UTF8 and document it. The hook scripts
> > would have to be careful to make things it invokes to consume/produce
> > UTF8. I'm not sure if can just change this for compatibility reasons,
> > though.
>
> The users complainig are a minority.

Aha. How do you know that? I think, one can get the impression, because most 
people just use US-ASCII for their filenames and have never a chance to 
find out that something doesn't work. But I don't think, it's true; see 
below.

> The majority out there would be 
> very surprised and it would curse the developers when they can no longer
> call svnlook without complex (for them) charset conversions.

I think, it's *easy* to add a command line switch to svnlook and all the 
other programs, which tells them, that all input is UTF-8.

In contrast it's kinda silly, that every program out there that happens to 
use the bindings or the libraries directly first converts all stuff from 
locale to utf-8 and (as happening with the path to the repository itself) 
let the library convert it back. That's just not logical.

By the way - are the paths passed via STDIN (locking hooks) also translated 
to the locale? A quick check shows: No, they're not. But I may be wrong 
here.

> If someone 
> has an UTF-8 system he should set the locale.

Subversion claims to be an UTF-8 system by itself. It _does not need_ the 
locale as long as it moves data inside the system. There is still the 
problem, that hooks even don't know the locale so far (as with svn 1.3!) 
anyway. So sort of a majority of the users actually have the C locale 
inside their hook scripts (and because they're using ascii filenames, they 
don't get any problem, as said above).

> If other components 
> (Apache) cause undesirable effects then those components are the ones in
> need for a fix, not subversion.

I think, this thought is too easy. Subversion is typically just one 
component of a bigger system. It's not in the position to dictate the 
environment (which isn't even handled consistently by itself).

Please get me right, IMHO on of the great things about svn *is* the unicode 
ability, and it would be sad if it got kinda lost, or, let's say 
restricted, because of a another feature (namely hooks).

just my EUR .02
nd
-- 
"Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte" -- Karl May, "Winnetou III"

Im Westen was neues: <http://pub.perlig.de/books.html#apache2>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
> To take a step back, it might not be unreasonable to have the hook scripts
> communicate through UTF8 and document it. The hook scripts would have to
> be careful to make things it invokes to consume/produce UTF8. I'm not sure
> if can just change this for compatibility reasons, though.
>   

The users complainig are a minority. The majority out there would be 
very surprised and it would curse the developers when they can no longer 
call svnlook without complex (for them) charset conversions. If someone 
has an UTF-8 system he should set the locale. If other components 
(Apache) cause undesirable effects then those components are the ones in 
need for a fix, not subversion.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2006-01-21 23:28:37 +0100, Peter N. Lundblad wrote:
> Do you know a portable way to get "a UTF-8 locale"?  We don't know which
> locales are available on a particular system.

Shouldn't this be the goal of the APR library?

Now a problem is that some systems don't have any UTF-8 locale installed.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sat, 21 Jan 2006, Kevin Puetz wrote:

> Peter N. Lundblad wrote:
>
> > I understood that after my first reply. That seems like a real problem,
> > especially if this is unlikely to change.
> >
> > To take a step back, it might not be unreasonable to have the hook scripts
> > communicate through UTF8 and document it. The hook scripts would have to
> > be careful to make things it invokes to consume/produce UTF8. I'm not sure
> > if can just change this for compatibility reasons, though.
>
> Since the server controls the environment for the hook scripts, one obvious
> way to do this "compatibly" would be to always invoke the hook scripts with
> a UTF-8 locale. Older scripts were, after all, supposed to follow the
> locale, whatever it might be...
>
Do you know a portable way to get "a UTF-8 locale"?  We don't know which
locales are available on a particular system.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Kevin Puetz <pu...@puetzk.org>.
Peter N. Lundblad wrote:

> I understood that after my first reply. That seems like a real problem,
> especially if this is unlikely to change.
> 
> To take a step back, it might not be unreasonable to have the hook scripts
> communicate through UTF8 and document it. The hook scripts would have to
> be careful to make things it invokes to consume/produce UTF8. I'm not sure
> if can just change this for compatibility reasons, though.

Since the server controls the environment for the hook scripts, one obvious
way to do this "compatibly" would be to always invoke the hook scripts with
a UTF-8 locale. Older scripts were, after all, supposed to follow the
locale, whatever it might be...


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, André Malo wrote:

> * Peter N. Lundblad wrote:
>
> > On Thu, 19 Jan 2006, Garrett Rooney wrote:
> > > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > > > By the way, at least in the past the locale information wasn't passed
> > > > to the hook scripts at all (I don't know if this was fixed already),
> > > > so the hook scripts could/can not determine the encoding anyway.
> > > > Passing UTF-8 encoded filenames is good and clear choice then.
> > >
> > > Now that is an interesting point.  I'm not sure if the locale env
> > > variables are passed on or not...  Will have to investigate that.
> >
> > It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> > LANG and LANGUAGE variables to the child process is the solution.
> >
> > For the original question, if you run a server, then consider using an
> > UTF8 locale. Problem solved.
>
> When it's so easy, why not just pass utf-8 conditionless? What has the
> locale to do with passing stuff around *inside* a unicode system? It's
> really an unnecessary transition.
>
I think this depends on your perspective. I can understand your argument
about being "inside an UTF8 system". Still, I think programs usually
expect to communicate with their environment (stdin, stdout, stderr,
arguments, envars) using the locale encoding. If your hook script runs
some program that produces, say, output on stderr, that will use the
locale encoding. That's currently "C", but that might be considered a bug.

> Further - the original problem was (brought up on the httpd list), that the
> httpd doesn't set the locale. It doesn't need it (and actually doesn't want
> it, there are known issues inside the httpd with things like atof()). And
> actually I think, no server/daemon process should be dependant on the
> locale. I've actually seen weird problems with this (services requiring a
> locale of de_DE@euro [which is iso-8859-15], for example).
>
I understood that after my first reply. That seems like a real problem,
especially if this is unlikely to change.

To take a step back, it might not be unreasonable to have the hook scripts
communicate through UTF8 and document it. The hook scripts would have to
be careful to make things it invokes to consume/produce UTF8. I'm not sure
if can just change this for compatibility reasons, though.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Peter N. Lundblad wrote:

> On Thu, 19 Jan 2006, Garrett Rooney wrote:
> > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > > By the way, at least in the past the locale information wasn't passed
> > > to the hook scripts at all (I don't know if this was fixed already),
> > > so the hook scripts could/can not determine the encoding anyway.
> > > Passing UTF-8 encoded filenames is good and clear choice then.
> >
> > Now that is an interesting point.  I'm not sure if the locale env
> > variables are passed on or not...  Will have to investigate that.
>
> It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> LANG and LANGUAGE variables to the child process is the solution.
>
> For the original question, if you run a server, then consider using an
> UTF8 locale. Problem solved.

When it's so easy, why not just pass utf-8 conditionless? What has the 
locale to do with passing stuff around *inside* a unicode system? It's 
really an unnecessary transition.

Well, I don't know all the installations out there, but there might be 
installations which just can't switch their current locale or just want to 
use C for some reason.

Further - the original problem was (brought up on the httpd list), that the 
httpd doesn't set the locale. It doesn't need it (and actually doesn't want 
it, there are known issues inside the httpd with things like atof()). And 
actually I think, no server/daemon process should be dependant on the 
locale. I've actually seen weird problems with this (services requiring a 
locale of de_DE@euro [which is iso-8859-15], for example).

nd
-- 
my @japh = (sub{q~Just~},sub{q~Another~},sub{q~Perl~},sub{q~Hacker~});
my $japh = q[sub japh { }]; print join       #########################
 [ $japh =~ /{(.)}/] -> [0] => map $_ -> ()  #            André Malo #
=> @japh;                                    # http://pub.perlig.de/ #

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, Garrett Rooney wrote:

> On 1/19/06, Peter N. Lundblad <pe...@famlundblad.se> wrote:
> > On Thu, 19 Jan 2006, Garrett Rooney wrote:
> >
> > > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > >
> > > > By the way, at least in the past the locale information wasn't passed to the
> > > > hook scripts at all (I don't know if this was fixed already), so the hook
> > > > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > > > filenames is good and clear choice then.
> > >
> > > Now that is an interesting point.  I'm not sure if the locale env
> > > variables are passed on or not...  Will have to investigate that.
> > >
> > It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> > LANG and LANGUAGE variables to the child process is the solution.
> >
> > For the original question, if you run a server, then consider using an
> > UTF8 locale. Problem solved.
>
> That doesn't solve the problem because httpd defaults to the C locale,
> it never makes the necessary setlocale call to cause it to actually
> pay attention to the various env vars, so the system locale is totally
> ignored.  This is why I started the discussion on the httpd dev list,
> to see if there was a reason that was never done.
>
Oh, I didn't get that. Sorry. I understand the problem better now.

Then, what about r17101 which converts the hook's stderr to UTF8?
Does that cause any problems and do we really see any alternatives?

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/19/06, Peter N. Lundblad <pe...@famlundblad.se> wrote:
> On Thu, 19 Jan 2006, Garrett Rooney wrote:
>
> > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> >
> > > By the way, at least in the past the locale information wasn't passed to the
> > > hook scripts at all (I don't know if this was fixed already), so the hook
> > > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > > filenames is good and clear choice then.
> >
> > Now that is an interesting point.  I'm not sure if the locale env
> > variables are passed on or not...  Will have to investigate that.
> >
> It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> LANG and LANGUAGE variables to the child process is the solution.
>
> For the original question, if you run a server, then consider using an
> UTF8 locale. Problem solved.

That doesn't solve the problem because httpd defaults to the C locale,
it never makes the necessary setlocale call to cause it to actually
pay attention to the various env vars, so the system locale is totally
ignored.  This is why I started the discussion on the httpd dev list,
to see if there was a reason that was never done.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, Garrett Rooney wrote:

> On 1/19/06, André Malo <nd...@perlig.de> wrote:
>
> > By the way, at least in the past the locale information wasn't passed to the
> > hook scripts at all (I don't know if this was fixed already), so the hook
> > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > filenames is good and clear choice then.
>
> Now that is an interesting point.  I'm not sure if the locale env
> variables are passed on or not...  Will have to investigate that.
>
It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
LANG and LANGUAGE variables to the child process is the solution.

For the original question, if you run a server, then consider using an
UTF8 locale. Problem solved.

//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/19/06, André Malo <nd...@perlig.de> wrote:

> By the way, at least in the past the locale information wasn't passed to the
> hook scripts at all (I don't know if this was fixed already), so the hook
> scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> filenames is good and clear choice then.

Now that is an interesting point.  I'm not sure if the locale env
variables are passed on or not...  Will have to investigate that.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Garrett Rooney <ro...@electricjellyfish.net> wrote:

> > So whether the users of a repository (httpd or svnserve) may use the full
> > unicode range for their files depends on the locale of the server? That feels
> > just wrong ;-) I don't see how there are command line confusings...
> 
> Well, yes and no.  For all the internals of the repository it doesn't
> matter at all what the the locale of the server is, but as soon as you
> need to pass that data as part of the command line of an external
> program like a hook script it does matter.

Just in terms of documentation. IMHO.

> > As long as one references files enclosed in the filesystem no translation
> > should occur at all. It's just unicode (in utf-8 format). The only part of
> > the subversion system which should deal with filename recodings of reposiory
> > stored path should be a client.
> 
> I'm really not sure I agree, for an external program on a system
> running in a particular locale I'd be REALLY surprised to get data
> passed in via the command line that shows up in some arbitrary
> encoding, it should really show up in the native encoding IMO.  The
> fact that httpd choses to ignore the system's locale and thus has a
> native encoding that only allows 7 bit ascii is the real bug here.

I see the hook system as part of the repository internals. If you mix it with
locales (e.g. start svnserve in a latin-1 locale and have paths containing
japanese chars), you end up with an unusable repository (or let's call it
not fully functional) for no good reason. I mean, there's in the middle of a
unicode aware system a transition to a non-aware system and back (if you
happen to lookup the path in the repository from inside of the hook script).

By the way, at least in the past the locale information wasn't passed to the
hook scripts at all (I don't know if this was fixed already), so the hook
scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
filenames is good and clear choice then.

nd

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Philip Martin <ph...@codematters.co.uk>.
Garrett Rooney <ro...@electricjellyfish.net> writes:

> I'm really not sure I agree, for an external program on a system
> running in a particular locale I'd be REALLY surprised to get data
> passed in via the command line that shows up in some arbitrary
> encoding, it should really show up in the native encoding IMO.  The
> fact that httpd choses to ignore the system's locale and thus has a
> native encoding that only allows 7 bit ascii is the real bug here.

This came up recently, hooks are started using APR_PROGRAM which means
they have an empty environment and so cannot determine the native
encoding.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
(Moving to dev@subversion.tigris.org, where it's more appropriate...)

On 1/19/06, André Malo <nd...@perlig.de> wrote:
> * Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> > > It doesn't belong here, but... I'm wondering why the path isn't passed as
> > > UTF-8. Why is it translated to the locale at all? It's all happening within
> > > the svn file system, so I'd really expect to get utf-8 and would consider
> > > locale translation as a bug.
> >
> > Well, I imagine that the assumption is that any hook script is going
> > to be using the actual locale specified in LANG/LC_ALL/etc env
> > variables, so if we don't translate to that locale it'll get rather
> > confused by utf8 data in its command line.  As a general rule svn
> > translates from native -> utf8 on input and from utf8 -> native for
> > output.  Ironically, if the LANG/LC_ALL/etc env vars were being
> > followed by httpd this translation would be a noop, since the system
> > uses a utf8 locale...
>
> So whether the users of a repository (httpd or svnserve) may use the full
> unicode range for their files depends on the locale of the server? That feels
> just wrong ;-) I don't see how there are command line confusings...

Well, yes and no.  For all the internals of the repository it doesn't
matter at all what the the locale of the server is, but as soon as you
need to pass that data as part of the command line of an external
program like a hook script it does matter.

> As long as one references files enclosed in the filesystem no translation
> should occur at all. It's just unicode (in utf-8 format). The only part of
> the subversion system which should deal with filename recodings of reposiory
> stored path should be a client.

I'm really not sure I agree, for an external program on a system
running in a particular locale I'd be REALLY surprised to get data
passed in via the command line that shows up in some arbitrary
encoding, it should really show up in the native encoding IMO.  The
fact that httpd choses to ignore the system's locale and thus has a
native encoding that only allows 7 bit ascii is the real bug here.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Garrett Rooney <ro...@electricjellyfish.net> wrote:

> > It doesn't belong here, but... I'm wondering why the path isn't passed as
> > UTF-8. Why is it translated to the locale at all? It's all happening within
> > the svn file system, so I'd really expect to get utf-8 and would consider
> > locale translation as a bug.
> 
> Well, I imagine that the assumption is that any hook script is going
> to be using the actual locale specified in LANG/LC_ALL/etc env
> variables, so if we don't translate to that locale it'll get rather
> confused by utf8 data in its command line.  As a general rule svn
> translates from native -> utf8 on input and from utf8 -> native for
> output.  Ironically, if the LANG/LC_ALL/etc env vars were being
> followed by httpd this translation would be a noop, since the system
> uses a utf8 locale...

So whether the users of a repository (httpd or svnserve) may use the full
unicode range for their files depends on the locale of the server? That feels
just wrong ;-) I don't see how there are command line confusings...

As long as one references files enclosed in the filesystem no translation
should occur at all. It's just unicode (in utf-8 format). The only part of
the subversion system which should deal with filename recodings of reposiory 
stored path should be a client.

But as said, this doesn't belong here.

nd

Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/18/06, André Malo <nd...@perlig.de> wrote:
> * Garrett Rooney wrote:
>
> > The specific problem I'm trying to fix is that mod_dav_svn fails to
> > run a pre-lock hook script when you try to lock a filename with double
> > byte characters.  It never even gets to the point of trying to run the
> > script, it fails trying to build the command line because it can't
> > convert the filename from utf8 to the native encoding because the
> > locale is C and thus the native encoding is 7 bit ascii.  I'm having
> > trouble finding a work around for this that doesn't involve setting
> > the locale, although if there's anything obvious I'm missing I'd love
> > to hear it.
>
> It doesn't belong here, but... I'm wondering why the path isn't passed as
> UTF-8. Why is it translated to the locale at all? It's all happening within
> the svn file system, so I'd really expect to get utf-8 and would consider
> locale translation as a bug.

Well, I imagine that the assumption is that any hook script is going
to be using the actual locale specified in LANG/LC_ALL/etc env
variables, so if we don't translate to that locale it'll get rather
confused by utf8 data in its command line.  As a general rule svn
translates from native -> utf8 on input and from utf8 -> native for
output.  Ironically, if the LANG/LC_ALL/etc env vars were being
followed by httpd this translation would be a noop, since the system
uses a utf8 locale...

-garrett

Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Garrett Rooney wrote:

> The specific problem I'm trying to fix is that mod_dav_svn fails to
> run a pre-lock hook script when you try to lock a filename with double
> byte characters.  It never even gets to the point of trying to run the
> script, it fails trying to build the command line because it can't
> convert the filename from utf8 to the native encoding because the
> locale is C and thus the native encoding is 7 bit ascii.  I'm having
> trouble finding a work around for this that doesn't involve setting
> the locale, although if there's anything obvious I'm missing I'd love
> to hear it.

It doesn't belong here, but... I'm wondering why the path isn't passed as 
UTF-8. Why is it translated to the locale at all? It's all happening within 
the svn file system, so I'd really expect to get utf-8 and would consider 
locale translation as a bug.

nd
-- 
"Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte" -- Karl May, "Winnetou III"

Im Westen was neues: <http://pub.perlig.de/books.html#apache2>

Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/18/06, Joe Orton <jo...@redhat.com> wrote:
> On Wed, Jan 18, 2006 at 11:17:30AM -0800, Garrett Rooney wrote:
> > Is there any particular reason that httpd never does the
> > 'setlocale(LC_ALL, "");' magic necessary to get libc to respect the
> > various locale related environment variables?  As far as I can tell,
> > despite system settings for locale (i.e. /etc/sysconfig/i18n on RHEL)
> > httpd always runs with a locale of C, which is fine for most things,
> > but pretty irritating if you have a need to do stuff with multibyte
> > strings in a module.
> >
> > Just adding a call to setlocale with a "" locale in httpd's main makes
> > my particular problem go away, but I'm kind of hesitant to propose
> > actually doing so since I don't know what kind of fallout there would
> > be from having httpd all of a sudden start respecting the environment
> > variables...
>
> Ideally the locale shouldn't matter, but in practice it does: notably
> strcasecmp() and the is* functions behave differently.  This can cause
> things to fail in surprising ways, so it's generally to be avoided.
>
> Various modules will do it at startup anyway, so it's hard to avoid
> completely, but it's not something that I'd really advise propagating.

The specific problem I'm trying to fix is that mod_dav_svn fails to
run a pre-lock hook script when you try to lock a filename with double
byte characters.  It never even gets to the point of trying to run the
script, it fails trying to build the command line because it can't
convert the filename from utf8 to the native encoding because the
locale is C and thus the native encoding is 7 bit ascii.  I'm having
trouble finding a work around for this that doesn't involve setting
the locale, although if there's anything obvious I'm missing I'd love
to hear it.

-garrett

Re: httpd and locales

Posted by Joe Orton <jo...@redhat.com>.
On Wed, Jan 18, 2006 at 11:17:30AM -0800, Garrett Rooney wrote:
> Is there any particular reason that httpd never does the
> 'setlocale(LC_ALL, "");' magic necessary to get libc to respect the
> various locale related environment variables?  As far as I can tell,
> despite system settings for locale (i.e. /etc/sysconfig/i18n on RHEL)
> httpd always runs with a locale of C, which is fine for most things,
> but pretty irritating if you have a need to do stuff with multibyte
> strings in a module.
> 
> Just adding a call to setlocale with a "" locale in httpd's main makes
> my particular problem go away, but I'm kind of hesitant to propose
> actually doing so since I don't know what kind of fallout there would
> be from having httpd all of a sudden start respecting the environment
> variables...

Ideally the locale shouldn't matter, but in practice it does: notably 
strcasecmp() and the is* functions behave differently.  This can cause 
things to fail in surprising ways, so it's generally to be avoided.

Various modules will do it at startup anyway, so it's hard to avoid 
completely, but it's not something that I'd really advise propagating.

joe