You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2006/01/19 16:03:08 UTC

Re: httpd and locales

(Moving to dev@subversion.tigris.org, where it's more appropriate...)

On 1/19/06, André Malo <nd...@perlig.de> wrote:
> * Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> > > It doesn't belong here, but... I'm wondering why the path isn't passed as
> > > UTF-8. Why is it translated to the locale at all? It's all happening within
> > > the svn file system, so I'd really expect to get utf-8 and would consider
> > > locale translation as a bug.
> >
> > Well, I imagine that the assumption is that any hook script is going
> > to be using the actual locale specified in LANG/LC_ALL/etc env
> > variables, so if we don't translate to that locale it'll get rather
> > confused by utf8 data in its command line.  As a general rule svn
> > translates from native -> utf8 on input and from utf8 -> native for
> > output.  Ironically, if the LANG/LC_ALL/etc env vars were being
> > followed by httpd this translation would be a noop, since the system
> > uses a utf8 locale...
>
> So whether the users of a repository (httpd or svnserve) may use the full
> unicode range for their files depends on the locale of the server? That feels
> just wrong ;-) I don't see how there are command line confusings...

Well, yes and no.  For all the internals of the repository it doesn't
matter at all what the the locale of the server is, but as soon as you
need to pass that data as part of the command line of an external
program like a hook script it does matter.

> As long as one references files enclosed in the filesystem no translation
> should occur at all. It's just unicode (in utf-8 format). The only part of
> the subversion system which should deal with filename recodings of reposiory
> stored path should be a client.

I'm really not sure I agree, for an external program on a system
running in a particular locale I'd be REALLY surprised to get data
passed in via the command line that shows up in some arbitrary
encoding, it should really show up in the native encoding IMO.  The
fact that httpd choses to ignore the system's locale and thus has a
native encoding that only allows 7 bit ascii is the real bug here.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Nicolás Lichtmaier wrote:

> > To take a step back, it might not be unreasonable to have the hook
> > scripts communicate through UTF8 and document it. The hook scripts
> > would have to be careful to make things it invokes to consume/produce
> > UTF8. I'm not sure if can just change this for compatibility reasons,
> > though.
>
> The users complainig are a minority.

Aha. How do you know that? I think, one can get the impression, because most 
people just use US-ASCII for their filenames and have never a chance to 
find out that something doesn't work. But I don't think, it's true; see 
below.

> The majority out there would be 
> very surprised and it would curse the developers when they can no longer
> call svnlook without complex (for them) charset conversions.

I think, it's *easy* to add a command line switch to svnlook and all the 
other programs, which tells them, that all input is UTF-8.

In contrast it's kinda silly, that every program out there that happens to 
use the bindings or the libraries directly first converts all stuff from 
locale to utf-8 and (as happening with the path to the repository itself) 
let the library convert it back. That's just not logical.

By the way - are the paths passed via STDIN (locking hooks) also translated 
to the locale? A quick check shows: No, they're not. But I may be wrong 
here.

> If someone 
> has an UTF-8 system he should set the locale.

Subversion claims to be an UTF-8 system by itself. It _does not need_ the 
locale as long as it moves data inside the system. There is still the 
problem, that hooks even don't know the locale so far (as with svn 1.3!) 
anyway. So sort of a majority of the users actually have the C locale 
inside their hook scripts (and because they're using ascii filenames, they 
don't get any problem, as said above).

> If other components 
> (Apache) cause undesirable effects then those components are the ones in
> need for a fix, not subversion.

I think, this thought is too easy. Subversion is typically just one 
component of a bigger system. It's not in the position to dictate the 
environment (which isn't even handled consistently by itself).

Please get me right, IMHO on of the great things about svn *is* the unicode 
ability, and it would be sad if it got kinda lost, or, let's say 
restricted, because of a another feature (namely hooks).

just my EUR .02
nd
-- 
"Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte" -- Karl May, "Winnetou III"

Im Westen was neues: <http://pub.perlig.de/books.html#apache2>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
> To take a step back, it might not be unreasonable to have the hook scripts
> communicate through UTF8 and document it. The hook scripts would have to
> be careful to make things it invokes to consume/produce UTF8. I'm not sure
> if can just change this for compatibility reasons, though.
>   

The users complainig are a minority. The majority out there would be 
very surprised and it would curse the developers when they can no longer 
call svnlook without complex (for them) charset conversions. If someone 
has an UTF-8 system he should set the locale. If other components 
(Apache) cause undesirable effects then those components are the ones in 
need for a fix, not subversion.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2006-01-21 23:28:37 +0100, Peter N. Lundblad wrote:
> Do you know a portable way to get "a UTF-8 locale"?  We don't know which
> locales are available on a particular system.

Shouldn't this be the goal of the APR library?

Now a problem is that some systems don't have any UTF-8 locale installed.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sat, 21 Jan 2006, Kevin Puetz wrote:

> Peter N. Lundblad wrote:
>
> > I understood that after my first reply. That seems like a real problem,
> > especially if this is unlikely to change.
> >
> > To take a step back, it might not be unreasonable to have the hook scripts
> > communicate through UTF8 and document it. The hook scripts would have to
> > be careful to make things it invokes to consume/produce UTF8. I'm not sure
> > if can just change this for compatibility reasons, though.
>
> Since the server controls the environment for the hook scripts, one obvious
> way to do this "compatibly" would be to always invoke the hook scripts with
> a UTF-8 locale. Older scripts were, after all, supposed to follow the
> locale, whatever it might be...
>
Do you know a portable way to get "a UTF-8 locale"?  We don't know which
locales are available on a particular system.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Kevin Puetz <pu...@puetzk.org>.
Peter N. Lundblad wrote:

> I understood that after my first reply. That seems like a real problem,
> especially if this is unlikely to change.
> 
> To take a step back, it might not be unreasonable to have the hook scripts
> communicate through UTF8 and document it. The hook scripts would have to
> be careful to make things it invokes to consume/produce UTF8. I'm not sure
> if can just change this for compatibility reasons, though.

Since the server controls the environment for the hook scripts, one obvious
way to do this "compatibly" would be to always invoke the hook scripts with
a UTF-8 locale. Older scripts were, after all, supposed to follow the
locale, whatever it might be...


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, André Malo wrote:

> * Peter N. Lundblad wrote:
>
> > On Thu, 19 Jan 2006, Garrett Rooney wrote:
> > > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > > > By the way, at least in the past the locale information wasn't passed
> > > > to the hook scripts at all (I don't know if this was fixed already),
> > > > so the hook scripts could/can not determine the encoding anyway.
> > > > Passing UTF-8 encoded filenames is good and clear choice then.
> > >
> > > Now that is an interesting point.  I'm not sure if the locale env
> > > variables are passed on or not...  Will have to investigate that.
> >
> > It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> > LANG and LANGUAGE variables to the child process is the solution.
> >
> > For the original question, if you run a server, then consider using an
> > UTF8 locale. Problem solved.
>
> When it's so easy, why not just pass utf-8 conditionless? What has the
> locale to do with passing stuff around *inside* a unicode system? It's
> really an unnecessary transition.
>
I think this depends on your perspective. I can understand your argument
about being "inside an UTF8 system". Still, I think programs usually
expect to communicate with their environment (stdin, stdout, stderr,
arguments, envars) using the locale encoding. If your hook script runs
some program that produces, say, output on stderr, that will use the
locale encoding. That's currently "C", but that might be considered a bug.

> Further - the original problem was (brought up on the httpd list), that the
> httpd doesn't set the locale. It doesn't need it (and actually doesn't want
> it, there are known issues inside the httpd with things like atof()). And
> actually I think, no server/daemon process should be dependant on the
> locale. I've actually seen weird problems with this (services requiring a
> locale of de_DE@euro [which is iso-8859-15], for example).
>
I understood that after my first reply. That seems like a real problem,
especially if this is unlikely to change.

To take a step back, it might not be unreasonable to have the hook scripts
communicate through UTF8 and document it. The hook scripts would have to
be careful to make things it invokes to consume/produce UTF8. I'm not sure
if can just change this for compatibility reasons, though.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Peter N. Lundblad wrote:

> On Thu, 19 Jan 2006, Garrett Rooney wrote:
> > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > > By the way, at least in the past the locale information wasn't passed
> > > to the hook scripts at all (I don't know if this was fixed already),
> > > so the hook scripts could/can not determine the encoding anyway.
> > > Passing UTF-8 encoded filenames is good and clear choice then.
> >
> > Now that is an interesting point.  I'm not sure if the locale env
> > variables are passed on or not...  Will have to investigate that.
>
> It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> LANG and LANGUAGE variables to the child process is the solution.
>
> For the original question, if you run a server, then consider using an
> UTF8 locale. Problem solved.

When it's so easy, why not just pass utf-8 conditionless? What has the 
locale to do with passing stuff around *inside* a unicode system? It's 
really an unnecessary transition.

Well, I don't know all the installations out there, but there might be 
installations which just can't switch their current locale or just want to 
use C for some reason.

Further - the original problem was (brought up on the httpd list), that the 
httpd doesn't set the locale. It doesn't need it (and actually doesn't want 
it, there are known issues inside the httpd with things like atof()). And 
actually I think, no server/daemon process should be dependant on the 
locale. I've actually seen weird problems with this (services requiring a 
locale of de_DE@euro [which is iso-8859-15], for example).

nd
-- 
my @japh = (sub{q~Just~},sub{q~Another~},sub{q~Perl~},sub{q~Hacker~});
my $japh = q[sub japh { }]; print join       #########################
 [ $japh =~ /{(.)}/] -> [0] => map $_ -> ()  #            André Malo #
=> @japh;                                    # http://pub.perlig.de/ #

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, Garrett Rooney wrote:

> On 1/19/06, Peter N. Lundblad <pe...@famlundblad.se> wrote:
> > On Thu, 19 Jan 2006, Garrett Rooney wrote:
> >
> > > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> > >
> > > > By the way, at least in the past the locale information wasn't passed to the
> > > > hook scripts at all (I don't know if this was fixed already), so the hook
> > > > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > > > filenames is good and clear choice then.
> > >
> > > Now that is an interesting point.  I'm not sure if the locale env
> > > variables are passed on or not...  Will have to investigate that.
> > >
> > It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> > LANG and LANGUAGE variables to the child process is the solution.
> >
> > For the original question, if you run a server, then consider using an
> > UTF8 locale. Problem solved.
>
> That doesn't solve the problem because httpd defaults to the C locale,
> it never makes the necessary setlocale call to cause it to actually
> pay attention to the various env vars, so the system locale is totally
> ignored.  This is why I started the discussion on the httpd dev list,
> to see if there was a reason that was never done.
>
Oh, I didn't get that. Sorry. I understand the problem better now.

Then, what about r17101 which converts the hook's stderr to UTF8?
Does that cause any problems and do we really see any alternatives?

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/19/06, Peter N. Lundblad <pe...@famlundblad.se> wrote:
> On Thu, 19 Jan 2006, Garrett Rooney wrote:
>
> > On 1/19/06, André Malo <nd...@perlig.de> wrote:
> >
> > > By the way, at least in the past the locale information wasn't passed to the
> > > hook scripts at all (I don't know if this was fixed already), so the hook
> > > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > > filenames is good and clear choice then.
> >
> > Now that is an interesting point.  I'm not sure if the locale env
> > variables are passed on or not...  Will have to investigate that.
> >
> It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
> LANG and LANGUAGE variables to the child process is the solution.
>
> For the original question, if you run a server, then consider using an
> UTF8 locale. Problem solved.

That doesn't solve the problem because httpd defaults to the C locale,
it never makes the necessary setlocale call to cause it to actually
pay attention to the various env vars, so the system locale is totally
ignored.  This is why I started the discussion on the httpd dev list,
to see if there was a reason that was never done.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 19 Jan 2006, Garrett Rooney wrote:

> On 1/19/06, André Malo <nd...@perlig.de> wrote:
>
> > By the way, at least in the past the locale information wasn't passed to the
> > hook scripts at all (I don't know if this was fixed already), so the hook
> > scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> > filenames is good and clear choice then.
>
> Now that is an interesting point.  I'm not sure if the locale env
> variables are passed on or not...  Will have to investigate that.
>
It's not fixed. Philip raised this some weeks ago. Just adding the LC_*,
LANG and LANGUAGE variables to the child process is the solution.

For the original question, if you run a server, then consider using an
UTF8 locale. Problem solved.

//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/19/06, André Malo <nd...@perlig.de> wrote:

> By the way, at least in the past the locale information wasn't passed to the
> hook scripts at all (I don't know if this was fixed already), so the hook
> scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
> filenames is good and clear choice then.

Now that is an interesting point.  I'm not sure if the locale env
variables are passed on or not...  Will have to investigate that.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: httpd and locales

Posted by André Malo <nd...@perlig.de>.
* Garrett Rooney <ro...@electricjellyfish.net> wrote:

> > So whether the users of a repository (httpd or svnserve) may use the full
> > unicode range for their files depends on the locale of the server? That feels
> > just wrong ;-) I don't see how there are command line confusings...
> 
> Well, yes and no.  For all the internals of the repository it doesn't
> matter at all what the the locale of the server is, but as soon as you
> need to pass that data as part of the command line of an external
> program like a hook script it does matter.

Just in terms of documentation. IMHO.

> > As long as one references files enclosed in the filesystem no translation
> > should occur at all. It's just unicode (in utf-8 format). The only part of
> > the subversion system which should deal with filename recodings of reposiory
> > stored path should be a client.
> 
> I'm really not sure I agree, for an external program on a system
> running in a particular locale I'd be REALLY surprised to get data
> passed in via the command line that shows up in some arbitrary
> encoding, it should really show up in the native encoding IMO.  The
> fact that httpd choses to ignore the system's locale and thus has a
> native encoding that only allows 7 bit ascii is the real bug here.

I see the hook system as part of the repository internals. If you mix it with
locales (e.g. start svnserve in a latin-1 locale and have paths containing
japanese chars), you end up with an unusable repository (or let's call it
not fully functional) for no good reason. I mean, there's in the middle of a
unicode aware system a transition to a non-aware system and back (if you
happen to lookup the path in the repository from inside of the hook script).

By the way, at least in the past the locale information wasn't passed to the
hook scripts at all (I don't know if this was fixed already), so the hook
scripts could/can not determine the encoding anyway. Passing UTF-8 encoded
filenames is good and clear choice then.

nd

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: httpd and locales

Posted by Philip Martin <ph...@codematters.co.uk>.
Garrett Rooney <ro...@electricjellyfish.net> writes:

> I'm really not sure I agree, for an external program on a system
> running in a particular locale I'd be REALLY surprised to get data
> passed in via the command line that shows up in some arbitrary
> encoding, it should really show up in the native encoding IMO.  The
> fact that httpd choses to ignore the system's locale and thus has a
> native encoding that only allows 7 bit ascii is the real bug here.

This came up recently, hooks are started using APR_PROGRAM which means
they have an empty environment and so cannot determine the native
encoding.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org