You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2002/07/11 18:55:52 UTC

i18n hell.

So I compiled subversion with --enable-utf8, and suddenly started
getting errors on *every* invocation of 'svn':

apr_error: #22, src_err 0 : <Invalid argument>
  (charset translator procurement failed)

So I traced into our call to apr/i18n/unix/xlate.capr_xlate_open().

This function had "UTF-8" passed in already, and the value
APR_LOCALE_CHARSET caused the code to run nl_langinfo(CODESET).  The
return value from nl_langinfo was "ISO8859-1".

Then we call iconv_open() on these two strings:  bam, I get an EINVAL
error.  What's invalid, you ask?

It turns out that my iconv only accepts "ISO-8859-1", not "ISO8859-1":

$ man iconv_open
...
European languages
              ASCII,     ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
...

$ iconv -f UTF-8 -t ISO8859-1 foo
iconv: conversion to ISO8859-1 unsupported
$ iconv -f UTF-8 -t ISO-8859-1 foo
iconv: foo: No such file or directory

This seems way screwed up to me.  The unhyphenated codepage name came
from nl_langinfo(), which is part of my FreeBSD 4.5 libc!  And this
isn't accepted by iconv??


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: i18n hell.

Posted by Ben Collins-Sussman <su...@collab.net>.
Ben Collins-Sussman <su...@collab.net> writes:

> Marcus Comstedt <ma...@mc.pp.se> writes:
> 
> > What are your $LC-variables set to?  Maybe FreeBSD is being stupid
> > about this and just returning whatever you set without verifying that
> > it's valid?
> 
> Nope, no $LC variables are set.
> 
> I'm gonna google for other FreeBSD users that can't use iconv
> properly...


OK, my iconv problem is solved for now.  

There are two different ports in my tree:

      /usr/ports/converters/iconv
      /usr/ports/converters/libiconv

The former one seems to be broken.  When I discovered and installed
the latter, all is good now:

$ iconv -f UTF-8 -t ISO-8859-1 foo
iconv: foo: No such file or directory
$ iconv -f UTF-8 -t ISO8859-1 foo
iconv: foo: No such file or directory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: i18n hell.

Posted by Ben Collins-Sussman <su...@collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:

> What are your $LC-variables set to?  Maybe FreeBSD is being stupid
> about this and just returning whatever you set without verifying that
> it's valid?

Nope, no $LC variables are set.

I'm gonna google for other FreeBSD users that can't use iconv
properly...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: i18n hell.

Posted by Marcus Comstedt <ma...@mc.pp.se>.
Ben Collins-Sussman <su...@collab.net> writes:

> So I compiled subversion with --enable-utf8, and suddenly started
> getting errors on *every* invocation of 'svn':
> 
> apr_error: #22, src_err 0 : <Invalid argument>
>   (charset translator procurement failed)
> 
> So I traced into our call to apr/i18n/unix/xlate.capr_xlate_open().
> 
> This function had "UTF-8" passed in already, and the value
> APR_LOCALE_CHARSET caused the code to run nl_langinfo(CODESET).  The
> return value from nl_langinfo was "ISO8859-1".
> 
> Then we call iconv_open() on these two strings:  bam, I get an EINVAL
> error.  What's invalid, you ask?
> 
> It turns out that my iconv only accepts "ISO-8859-1", not "ISO8859-1":

What are your $LC-variables set to?  Maybe FreeBSD is being stupid
about this and just returning whatever you set without verifying that
it's valid?


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: i18n hell.

Posted by Ulrich Drepper <dr...@redhat.com>.
> This seems way screwed up to me.  The unhyphenated codepage name came
> from nl_langinfo(), which is part of my FreeBSD 4.5 libc!  And this
> isn't accepted by iconv??

The only thing which is screwed up is that libc implementation of
yours.  They cannot even keep internal consistency.  Leave alone
implementing aliases for charset names.

So, don't complain here.  There are working implementations (use
glibc).  What you describe is entirely the *BSD libc's fault.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Re: i18n hell.

Posted by Nuutti Kotivuori <na...@iki.fi>.
Ben Collins-Sussman wrote:
> 
> So I compiled subversion with --enable-utf8, and suddenly started
> getting errors on *every* invocation of 'svn':
> 
> apr_error: #22, src_err 0 : <Invalid argument>
>   (charset translator procurement failed)
> 
> So I traced into our call to apr/i18n/unix/xlate.capr_xlate_open().
> 
> This function had "UTF-8" passed in already, and the value
> APR_LOCALE_CHARSET caused the code to run nl_langinfo(CODESET).  The
> return value from nl_langinfo was "ISO8859-1".
> 
> Then we call iconv_open() on these two strings: bam, I get an EINVAL
> error.  What's invalid, you ask?
> 
> It turns out that my iconv only accepts "ISO-8859-1", not
> "ISO8859-1":
> 
> $ man iconv_open
> ...
> European languages
>               ASCII,     ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
> ...
> 
> $ iconv -f UTF-8 -t ISO8859-1 foo
> iconv: conversion to ISO8859-1 unsupported
> $ iconv -f UTF-8 -t ISO-8859-1 foo
> iconv: foo: No such file or directory
> 
> This seems way screwed up to me.  The unhyphenated codepage name
> came from nl_langinfo(), which is part of my FreeBSD 4.5 libc!  And
> this isn't accepted by iconv??

Oof, you've truly hit into a nest of trouble.

For a very long time, every GTK program refused to work with
LC_CTYPE=fi_FI. This was because glibc stubbornly decided that
ISO-8859-1 is the charset name - and X11R6 decided that ISO8859-1 is
the charset name. This had a long list of aliases and other things
which affected this both ways. X and glibc have both their _own_
locales, which are used for a bit different things - and these do
conflict at times.

Just today, because of another mess, I tried again, how does glibc
handle these. And the result was that ISO-8859-1 works, but so do
ISO8859-1, ISO88591 and ISO_8859-1. And I was unable to find the magic
piece of code that makes all of these works.

In BSD land I've heard that ISO_8859-1 and ISO8859-1 generally work,
but ISO-8859-1 does not, not anywhere. I don't know how true this is.

Way back, when OSX was just released - I heard that locales did not
really work at all. I have no idea what's the status these days.

Locales and charsets have had troubles on several distributions for a
very long time - and some of those problems are not too easily
solved. And standardization between these seems to be a long way
off. I don't really know what would be a good way to handle all this.

-- Naked


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org