You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Stephan Hermann <sh...@sourcecode.de> on 2003/08/06 08:07:48 UTC

issue 1463: locale problems during import and export/checkout

HI SVN Users :)

as Karl send me a comment on this issue, that I should describe you the 
problem here on this list.

Ok, lets go:

You have to take a file tree with binary and/or text data just like this:

/content/bla/fasel/großbritannien.html
/content/fasel/bla/Für alle Zeiten_trailer.mov 

As you can see, I'm using german umlaute (ß == &szlig; and ü == &uuml;).

My OS is Debian Linux Woody with all security and bugfix packages applied.
I'm using Berkeley DB 4.1.25 or 4.0.14 as reference implementation.
Also I'm using apache 2.0.45/2.0.46 as a webdav server.

Neon, Openssl, etc. are always the latest stable releases.

ok, my system locale is POSIX (export LANG=POSIX), this locale is set up from 
the installation server of our server farm, so normally no harm.

ok now do the following:

svnadmin create /data/repos/inbox

svn import /content file://data/repos/inbox

When you are trying to import now the files into the repository, this action 
will abort with the failure: failure during string recode (utf.c:173) 
(libsvn_subr).
After this action, the DB is completly broken.


When I change the locale to "export LANG=de_DE@euro" and do the import again
everything works fine.

Now the other way around, you import the filetree with a correct locale 
setting, and after this, you reset the locale to "POSIX" or to another locale 
!= iso-8859-xx, and try to export or checkout (i did a checkout in this 
test).

If you reach the first file with the special char (german umlaut), the 
checkout will abort with the same error message I wrote, at the same line in 
the sourcecode.

What's the problem with it:

I have to serve different data repositories for diff. countries.
All countries have their own locale setting, but I can't change the locale 
everytime, just because all repositories are laying on a HA cluster.

So, if I have a german user with a german locale, and he wants to checkout a 
repository which was imported with e.g. a russian locale (kyrillic charset), 
this action will abort and during adding and importing files it will destroy 
the berkeley db behind the repository.

I can't be sure, that all users will be using only filenames with 7bit 
characters, so, I have to deal with the allmyghty windows people, who are 
using all those "nifty" features like spaces and special chars in filenames 
;)


I hope, that this is not a "feature" of subversion or apr (the function which 
do the convert is in APR, and I think they're using some of glibc's iconv 
features).

hope you can help me to get a solution for this problem, just because I want 
to use subversion for my project, but if this is an issue, what can't be 
fixed, I have to think about other version control systems.

regards,

\sh


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: issue 1463: locale problems during import and export/checkout

Posted by kf...@collab.net.
Stephan Hermann <sh...@sourcecode.de> writes:
> On Wednesday 06 August 2003 17:23, kfogel@collab.net wrote:
> > Stephan Hermann <sh...@sourcecode.de> writes:
> > > When you are trying to import now the files into the repository, this
> > > action will abort with the failure: failure during string recode
> > > (utf.c:173) (libsvn_subr).
> > > After this action, the DB is completly broken.
> >
> > What does "broken" mean exactly?
> 
> broken means, that there is no repository after the aborted import.
> 
> after checkout the repository db is not broken at all.

Sorry, I'm looking for a *much* more detailed answer...  When you say
there is "no repository", do you mean something like this:

   $ svnadmin create myrepos
   $ cd import-tree
   $ svn import file:///path/to/myrepos ...
    [see error happen]
   $ cd ..
   $ ls myrepos
   ls: myrepos: no such file or directory

:-) ?  (I assume that's not what's happening.)

or Do you just mean that the files you tried to import are not present
in the repository?  That's expected for a failed import.  Subversion
does all or none, no halfway.

If you could simply post a complete transcript, from beginning to end,
that might help.

By the way, as other people in this thread have said: if you try to do
character conversion in a locale that doesn't support the characters
in question, I'm not sure that failure is a "bug".  At least, I'm not
sure what the correct behavior for Subversion would be.  It has no way
to guess at a correct representation for the characters...

Again, I might be misunderstanding what's happening, though.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Stephan Hermann <sh...@sourcecode.de>.
Hi,


On Wednesday 06 August 2003 17:23, kfogel@collab.net wrote:
> Stephan Hermann <sh...@sourcecode.de> writes:
> > When you are trying to import now the files into the repository, this
> > action will abort with the failure: failure during string recode
> > (utf.c:173) (libsvn_subr).
> > After this action, the DB is completly broken.
>
> What does "broken" mean exactly?

broken means, that there is no repository after the aborted import.

after checkout the repository db is not broken at all.

regards,

\sh


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

RE: issue 1463: locale problems during import and export/checkout

Posted by Sander Striker <st...@apache.org>.
> From: kfogel@newton.ch.collab.net [mailto:kfogel@newton.ch.collab.net]On
> Behalf Of kfogel@collab.net
> Sent: Wednesday, August 06, 2003 5:24 PM

> Stephan Hermann <sh...@sourcecode.de> writes:
> > When you are trying to import now the files into the repository, this action 
> > will abort with the failure: failure during string recode (utf.c:173) 
> > (libsvn_subr).
> > After this action, the DB is completly broken.
> 
> What does "broken" mean exactly?

Broken in the sense that there is no way to checkout the entire repository.
You'd have to do partial checkouts and switch locales in between.

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by kf...@collab.net.
Stephan Hermann <sh...@sourcecode.de> writes:
> When you are trying to import now the files into the repository, this action 
> will abort with the failure: failure during string recode (utf.c:173) 
> (libsvn_subr).
> After this action, the DB is completly broken.

What does "broken" mean exactly?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Vincent Lefevre <vi...@vinc17.org>.
On Thu, Aug 07, 2003 at 11:30:18 +0000, Erik Hülsmann wrote:
> # LC_CTYPE=nl_NL.UTF8@euro
> 
> for utf8 encoded characters and the euro?

"@euro" is a short for ISO8859-15 in the locale definitions, AFAIK.
But since you use UTF8, you don't need it.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/> - 100%
validated (X)HTML - Acorn Risc PC, Yellow Pig 17, Championnat International
des Jeux Mathématiques et Logiques, TETRHEX, etc.
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Erik Hülsmann <e....@gmx.net>.
>On Thu, Aug 07, 2003 at 10:05:18 +0200, Stephan Hermann wrote:
>> Well, if you tell me the right utf-8 locale for whole europe, i
>> would be fine :)
>> 
>> We have to include not only west europe locales, but also all east europe 
>> locales.
>> We would be on a saftey side, if anyone is using us-ascii for filenames, but 
>> our web/php/windows developers are using all those "nifty" features of ms.
>> And that gives me nightmares.
>
>So, you can choose any UTF-8 locale. The differences between them
>mainly concern the language, but in any case, you'll be able to
>represent any character you can have in a filename, and you'll be
>able to handle filenames without breaking anything (unless there
>are bugs).

Sorry to drop in like this, but how would I set that up?

# LC_CTYPE=nl_NL.UTF8@euro

for utf8 encoded characters and the euro? I don't want dutch error messages
or anything else set to the dutch language.

Where can I find more about the locale system in linux/glibc?

bye,

Erik.

PS: Do you know where to find information on building a system *without* locale
support?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Vincent Lefevre <vi...@vinc17.org>.
On Thu, Aug 07, 2003 at 10:05:18 +0200, Stephan Hermann wrote:
> Well, if you tell me the right utf-8 locale for whole europe, i
> would be fine :)
> 
> We have to include not only west europe locales, but also all east europe 
> locales.
> We would be on a saftey side, if anyone is using us-ascii for filenames, but 
> our web/php/windows developers are using all those "nifty" features of ms.
> And that gives me nightmares.

So, you can choose any UTF-8 locale. The differences between them
mainly concern the language, but in any case, you'll be able to
represent any character you can have in a filename, and you'll be
able to handle filenames without breaking anything (unless there
are bugs).

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/> - 100%
validated (X)HTML - Acorn Risc PC, Yellow Pig 17, Championnat International
des Jeux Mathématiques et Logiques, TETRHEX, etc.
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Stephan Hermann <sh...@sourcecode.de>.
Hi,

On Wednesday 06 August 2003 17:52, Michael Wood wrote:
> > When you are trying to import now the files into the repository, this
> > action will abort with the failure: failure during string recode
> > (utf.c:173) (libsvn_subr).
> > After this action, the DB is completly broken.
>
> I suspect "svnadmin recover" will fix it.

No. Doesn't work.


> > When I change the locale to "export LANG=de_DE@euro" and do the import
> > again everything works fine.
>
> This is because umlauts etc. are not valid in the POSIX locale.

Right ;)

> But the locale is a client side thing, not a server side thing.  If this
> causes the repository to need a recovery, I think that is a bug.
>
> User A has locale set to de_DE@euro and User B has their locale set to
> some cyrillic locale or something.  These filenames are translated to
> UTF8 internally, so the repository never knows or cares about the
> clients' locales.  If the German user tries to check out filenames with
> Russian special characters in them, then he's going to have trouble, but
> maybe setting the locale to something like de_DE.UTF-8 would work?

Well, if you tell me the right utf-8 locale for whole europe, i would be fine 
:)

We have to include not only west europe locales, but also all east europe 
locales.
We would be on a saftey side, if anyone is using us-ascii for filenames, but 
our web/php/windows developers are using all those "nifty" features of ms.
And that gives me nightmares.

There must be a possibility to add/import checkout/export filenames with 
special chars without breaking anything.

> I am by no means an expert on this sort of thing, though...

Add me ! Me too (c) aol.com

regards,

\sh


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: issue 1463: locale problems during import and export/checkout

Posted by Michael Wood <mw...@its.uct.ac.za>.
On Wed, Aug 06, 2003 at 10:07:48AM +0200, Stephan Hermann wrote:
> HI SVN Users :)
> 
> as Karl send me a comment on this issue, that I should describe you the 
> problem here on this list.
> 
> Ok, lets go:
> 
> You have to take a file tree with binary and/or text data just like this:
> 
> /content/bla/fasel/großbritannien.html
> /content/fasel/bla/Für alle Zeiten_trailer.mov 
> 
> As you can see, I'm using german umlaute (ß == &szlig; and ü == &uuml;).
> 
> My OS is Debian Linux Woody with all security and bugfix packages applied.
> I'm using Berkeley DB 4.1.25 or 4.0.14 as reference implementation.
> Also I'm using apache 2.0.45/2.0.46 as a webdav server.
> 
> Neon, Openssl, etc. are always the latest stable releases.
> 
> ok, my system locale is POSIX (export LANG=POSIX), this locale is set up from 
> the installation server of our server farm, so normally no harm.
> 
> ok now do the following:
> 
> svnadmin create /data/repos/inbox
> 
> svn import /content file://data/repos/inbox
> 
> When you are trying to import now the files into the repository, this action 
> will abort with the failure: failure during string recode (utf.c:173) 
> (libsvn_subr).
> After this action, the DB is completly broken.

I suspect "svnadmin recover" will fix it.

> When I change the locale to "export LANG=de_DE@euro" and do the import again
> everything works fine.

This is because umlauts etc. are not valid in the POSIX locale.

> Now the other way around, you import the filetree with a correct locale 
> setting, and after this, you reset the locale to "POSIX" or to another locale 
> != iso-8859-xx, and try to export or checkout (i did a checkout in this 
> test).
> 
> If you reach the first file with the special char (german umlaut), the 
> checkout will abort with the same error message I wrote, at the same line in 
> the sourcecode.
> 
> What's the problem with it:
> 
> I have to serve different data repositories for diff. countries.
> All countries have their own locale setting, but I can't change the locale 
> everytime, just because all repositories are laying on a HA cluster.
> 
> So, if I have a german user with a german locale, and he wants to
> checkout a repository which was imported with e.g. a russian locale
> (kyrillic charset), this action will abort and during adding and
> importing files it will destroy the berkeley db behind the repository.
[snip]

But the locale is a client side thing, not a server side thing.  If this
causes the repository to need a recovery, I think that is a bug.

User A has locale set to de_DE@euro and User B has their locale set to
some cyrillic locale or something.  These filenames are translated to
UTF8 internally, so the repository never knows or cares about the
clients' locales.  If the German user tries to check out filenames with
Russian special characters in them, then he's going to have trouble, but
maybe setting the locale to something like de_DE.UTF-8 would work?

I am by no means an expert on this sort of thing, though...

-- 
Michael Wood <mw...@its.uct.ac.za>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org