You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Lübbe Onken <l....@rac.de> on 2004/09/21 09:50:10 UTC

Re: Non Ascii chars in paths cause trouble

Hi Folks,

A month ago, I asked a question about non ascii chars in dumpfiles, 
which caused problems when displaying them. Now I'm up the proverbial 
creek without the proverbial paddle, because I'm trying to migrate old 
repositories to a 1.1 server to be prepared when 1.1 is released.

The problem is: "My German characters are scrambled"

Source system:
- Suse Linux 8
- Subversion 0.32.1
- BerkeleyDB 4.0.14
- commits made by several clients (mostly TortoiseSVN) built against svn 
versions ranging from pre 0.32 to 1.1.0.rc3

Target system:
- Suse Linux 9
- Subversion 1.1.0RC3
- BerkeleyDB 4.2.52

On the source system:
Attached is a snippet of the created dumpfile copied from vi. As you can 
see, the Umlaut 'ö'='oe' in the log message "Böse Welt, ob das gut geht" 
is scrambled, but the characters in Node-path: tags/Umlautname_Ä_Ö_Ü 
look 'proper'.

'svnlook log -r40' displays the log message properly
'svnlook changed -r40' fails with the following error:
svn: Invalid argument
svn: failure during string recoding
Checking out the tags fails with an 'in

---SNIP---
Revision-number: 40
Prop-content-length: 129
Content-length: 129

K 7
svn:log
V 28
Böse Welt, ob das gut geht?
K 10
svn:author
V 6
lonken
K 8
svn:date
V 27
2004-09-21T08:47:15.616600Z
PROPS-END

Node-path: tags/Umlautname_Ä_Ö_Ü
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 37
Node-copyfrom-path: trunk
---SNIP---

On the target system:
When I scp this this dumpfile to the target system and load it, all the 
Umlauts are gone.

'svnlook log -r40 repository/testrepos/'
B?\246se Welt, ob das gut geht?
'svnlook changed -r40 repository/testrepos/'
A   tags/Umlautname_?\196_?\214_?\220/

I'm afraid that something is going terribly wrong here, that the 0.32.1 
dumpfile isn't utf-8 or something like that. How can I migrate my 
repositories?

Cheers & thanks
- Lübbe

--
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Non Ascii chars in paths cause trouble

Posted by Lübbe Onken <l....@rac.de>.
John Szakmeister wrote:

> Wednesday, September 22, 2004, 4:46:35 AM, Lübbe Onken wrote:
> [snip]
>>But:
>>- Why does 'svn log' on windows display ?\195?\182 instead of 'ö'?
> 
> 
> I recently ran into this problem myself.  I had to set APR_ICONV_PATH
> to 'x:\path\to\svn\iconv' to get them to display correctly.  I believe
> TSVN took a different approach to solving this problem (of having to
> set APR_ICONV_PATH), which is why you don't this issue with it.
Thanks for your hint, but I still wonder, why the following happens.

- Why does 'svn log' on Linux display the log message properly?
- Why does 'svnlook log' on Linux *not* display the log message 
properly? (same bash, same locale...) Is this a bug in svnlook?

I investigated a bit further and found out that 'svn log' and 'svnlook 
log' behave different depending on the locale settings.

Results with locale set to de_DE@euro:
svnlook log: Doppelte Datei auf dem Server gel?\246scht
svn log:     Doppelte Datei auf dem Server gelöscht

Results with locale set to de_DE.utf-8
svnlook log:Doppelte Datei auf dem Server gelöscht
svn log:    Doppelte Datei auf dem Server gelöscht

I believe that this is a bug in svnlook.

Cheers
- Lübbe

--
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Non Ascii chars in paths cause trouble

Posted by John Szakmeister <jo...@szakmeister.net>.
Wednesday, September 22, 2004, 4:46:35 AM, Lübbe Onken wrote:
[snip]
>> You can dump your repositories, find the paths which are not correctly utf-8
>> encoded, replace the invalid characters with valid utf-8 characters and try
>> loading the dumpfiles. That's all I can advise you to do. It's not much, a
>> lot of work, but it's possible.
> Ouch, I was afraid of that answer. So the only thing left to me is 
> dumping, grepping "Node-path"s for accented characters and replacing them?
> But what about accented characters inside my source? Should they be 
> UTF-8 in the dumpfile as well or are the file contents not touched by svn?

File contents are never touched by svn.  So you don't need to worry
about that.  Right now it's just the node paths.

[snip]
> That makes me hope that the log messages are at least dumped&loaded OK.

They are.  That's one feature that's been working for a long time. :-)

> But:
> - Why does 'svn log' on windows display ?\195?\182 instead of 'ö'?

I recently ran into this problem myself.  I had to set APR_ICONV_PATH
to 'x:\path\to\svn\iconv' to get them to display correctly.  I believe
TSVN took a different approach to solving this problem (of having to
set APR_ICONV_PATH), which is why you don't this issue with it.

[snip]

-John



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Non Ascii chars in paths cause trouble

Posted by Lübbe Onken <l....@rac.de>.
Erik Huelsmann wrote:

> What locale does your vi terminal run in? does it use iso-8859-1 / -15
> character encoding? (assuming you use a german locale)
locale tells me it's de_DE@euro for my svn user and POSIX for root. I 
tried setting it to de_DE.utf8 before dumping the repository, but that 
didn't make any difference.

> If it does, then the fact that the tags directory name looks like it does
> (with the accented characters) is quite alarming. They should have been
> recoded to utf-8, which looks 'scrambled' on a iso-8859-xx encoded terminal.
:-|

> You can dump your repositories, find the paths which are not correctly utf-8
> encoded, replace the invalid characters with valid utf-8 characters and try
> loading the dumpfiles. That's all I can advise you to do. It's not much, a
> lot of work, but it's possible.
Ouch, I was afraid of that answer. So the only thing left to me is 
dumping, grepping "Node-path"s for accented characters and replacing them?
But what about accented characters inside my source? Should they be 
UTF-8 in the dumpfile as well or are the file contents not touched by svn?

I still have the problem on the target system, that the german 
characters in my commit messages are not displayed properly when 
checking with svnlook. I get two possible (but no good) results 
depending on my locale setting.

svnlook on the server:
======================
svn@raneu:~> locale
LANG=de_DE@euro
...
svn@raneu:~> svnlook log -r41 repository/testrepos/
Doppelte Datei auf dem Server gel?\246scht

svn@raneu:~> export LANG=de_DE.utf-8
svn@raneu:~> svnlook log -r41 repository/testrepos/
Doppelte Datei auf dem Server gelöscht

Browsing the repository on the server using WebSVN also results in the 
?\246, since WebSVN is using svnlook to fetch the status.

svn log on the server:
======================
using 'svn log' on the server results in a proper log message:
svn@raneu:~> svn log file:///svn/repository/testrepos/
------------------------------------------------------------------------
r41 | lonken | 2004-09-22 10:15:29 +0200 (Mit, 22 Sep 2004) | 1 line

Doppelte Datei auf dem Server gelöscht

svn log on windows client:
=========================
D:\Testprojekt\RA_Neu>svn log
------------------------------------------------------------------------
r41 | lonken | 2004-09-22 10:15:29 +0200 (Mi, 22 Sep 2004) | 1 line

Doppelte Datei auf dem Server gel?\195?\182scht

TortoiseSVN on Windows client:
=============================
Revision: 41
Autor: lonken
Datum: 22.09.2004 10:15:29
Meldung:
Doppelte Datei auf dem Server gelöscht
----
Löschen  /tags/Test ob geht/Lib/lizenz.pas

That makes me hope that the log messages are at least dumped&loaded OK.

But:
- Why does 'svn log' on windows display ?\195?\182 instead of 'ö'?
- Why does 'svn log' on Linux display the log message properly?
- Why does 'svnlook log' on Linux *not* display the log message 
properly? (same bash, same locale...) Is this a bug in svnlook?

Cheers
- Lübbe

--
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Non Ascii chars in paths cause trouble

Posted by Erik Huelsmann <e....@gmx.net>.
> Hi Folks,

Hi Luebbe,

> A month ago, I asked a question about non ascii chars in dumpfiles, 
> which caused problems when displaying them. Now I'm up the proverbial 
> creek without the proverbial paddle, because I'm trying to migrate old 
> repositories to a 1.1 server to be prepared when 1.1 is released.
> 
> The problem is: "My German characters are scrambled"

> Source system:
> - Suse Linux 8
> - Subversion 0.32.1
> - BerkeleyDB 4.0.14
> - commits made by several clients (mostly TortoiseSVN) built against svn 
> versions ranging from pre 0.32 to 1.1.0.rc3

Ok, 0.32.1 is a *very* good reason to migrate. In those days checks were not
as strict about utf-8 conformance as they are now (even though I think to
know they could be better). You may have been committing non-utf-8 paths and
log messages into your repository.
 
> Target system:
> - Suse Linux 9
> - Subversion 1.1.0RC3
> - BerkeleyDB 4.2.52
> 
> On the source system:
> Attached is a snippet of the created dumpfile copied from vi. As you can 
> see, the Umlaut 'ö'='oe' in the log message "Böse Welt, ob das gut geht" 
> is scrambled, but the characters in Node-path: tags/Umlautname_Ä_Ö_Ü 
> look 'proper'.

What locale does your vi terminal run in? does it use iso-8859-1 / -15
character encoding? (assuming you use a german locale)

If it does, then the fact that the tags directory name looks like it does
(with the accented characters) is quite alarming. They should have been
recoded to utf-8, which looks 'scrambled' on a iso-8859-xx encoded terminal.
 
> 'svnlook log -r40' displays the log message properly
> 'svnlook changed -r40' fails with the following error:
> svn: Invalid argument
> svn: failure during string recoding
> Checking out the tags fails with an 'in
> 
> ---SNIP---
> Revision-number: 40
> Prop-content-length: 129
> Content-length: 129
> 
> K 7
> svn:log
> V 28
> Böse Welt, ob das gut geht?
> K 10
> svn:author
> V 6
> lonken
> K 8
> svn:date
> V 27
> 2004-09-21T08:47:15.616600Z
> PROPS-END
> 
> Node-path: tags/Umlautname_Ä_Ö_Ü
> Node-kind: dir
> Node-action: add
> Node-copyfrom-rev: 37
> Node-copyfrom-path: trunk
> ---SNIP---
> 
> On the target system:
> When I scp this this dumpfile to the target system and load it, all the 
> Umlauts are gone.
> 
> 'svnlook log -r40 repository/testrepos/'
> B?\246se Welt, ob das gut geht?
> 'svnlook changed -r40 repository/testrepos/'
> A   tags/Umlautname_?\196_?\214_?\220/
> 
> I'm afraid that something is going terribly wrong here, that the 0.32.1 
> dumpfile isn't utf-8 or something like that. How can I migrate my 
> repositories?

You can dump your repositories, find the paths which are not correctly utf-8
encoded, replace the invalid characters with valid utf-8 characters and try
loading the dumpfiles. That's all I can advise you to do. It's not much, a
lot of work, but it's possible.

> Cheers & thanks
> - Lübbe


bye,

Erik.

-- 
+++ GMX DSL Premiumtarife 3 Monate gratis* + WLAN-Router 0,- EUR* +++
Clevere DSL-Nutzer wechseln jetzt zu GMX: http://www.gmx.net/de/go/dsl


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org