You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Carsten Fuchs <Ca...@T-Online.de> on 2006/03/17 17:15:13 UTC

"Safe data was followed by non-ASCII byte ..." and other UTF-8 trouble

Hello all,

I'm using the svn command-line client version 1.3.0 on a german-language 
Windows 2000 system, and experience several UTF-8 related problems:

a) All svn output that contains special characters (e.g. german Umlauts) 
is printed with UTF-8 escape sequences, e.g. if I enter "svn --version", 
the output is

     svn, Version 1.3.0 (r17949)
        ?\195?\188bersetzt Jan 15 2006, 23:18:48

instead of (hand-corrected)

     svn, Version 1.3.0 (r17949)
        Übersetzt Jan 15 2006, 23:18:48

(the first letter in the second line should be a "U" with two dots above 
it).


b) Whenever a non-ASCII character occurs, either in a file-name, in the 
contents of a file(!), etc., svn aborts with the error message:

      svn: Auf sichere Daten ... folgte ein nicht-ASCII Byte 195, das 
nicht von/nach UTF-8 konvertiert werden konnte

Translated: "svn: Safe data ... was followed by non-ASCII byte 195: 
unable to convert to/from UTF-8

For example, "svn diff" aborts with this message when one of the changed 
files contains a UTF-8 character!


--> What can I do to fix this? Especially b) is a serious problem for 
me. I've searched the mailing-list archives in this regard and found 
several suggestions to set one of the LANG environment variables, but 
I've not been able to solve the problem in spite of extensive tests.

Any help would much be appreciated!
Thank you very much!

Best regards,
Carsten



-- 
Ca3D - Engine    http://www.Ca3D-Engine.de
Carsten Fuchs    http://www.Ca3D-Engine.de/c_Carsten.php

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Safe data was followed by non-ASCII byte ..." and other UTF-8 trouble

Posted by Carsten Fuchs <Ca...@T-Online.de>.
Hi Ryan,

Ryan Schmidt wrote:
>> Carsten Fuchs wrote:
>> Setting LANG=en_US "fixes" the problem, as there are no umlauts in 
>> English. So I currently use that setting as a work-around, but would 
>> still be interested in learning of there is a more generic solution.
> 
> Which server are you using? Is it by chance svnserve?

Yes, svnserve version 1.1.4 on a Debian 3.1 GNU/Linux system.

> If so, is the 
> locale on the server a different one than on the client, keeping in mind 
> the charset? For example, is the de_DE locale on the server using UTF-8 
> while the de_DE locale on the client is using ISO-8859-15?

Yes, the locales are different, on the server system I have
    LANG=POSIX
    LC_CTYPE="POSIX"
    LC_NUMERIC="POSIX"
    LC_TIME="POSIX"
    ...
while the client is a german language Windows 2000 system with none of 
the LANG* or LC* environment variables set.

> There was 
> this issue I discovered recently which may relate:
> 
> http://svn.haxx.se/users/archive-2006-03/0494.shtml

Oh. And I thought this was a client-only issue.  ;)

> This problem of "valid UTF-8 data followed by invalid sequence" comes up 
> far too often on this list, and I'd like to finally get to the bottom of 
> it and solve it once and for all.

That would be phantastic!  :-)

Best,
Carsten



-- 
Ca3D - Engine    http://www.Ca3D-Engine.de
Carsten Fuchs    http://www.Ca3D-Engine.de/c_Carsten.php

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Safe data was followed by non-ASCII byte ..." and other UTF-8 trouble

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 17, 2006, at 18:41, Carsten Fuchs wrote:

> Carsten Fuchs wrote:
>> b) Whenever a non-ASCII character occurs, either in a file-name,  
>> in the contents of a file(!), etc., svn aborts with the error  
>> message:
>
> I think that this statement is not quite right, as I just observed  
> that as with a), the problem character did not origin from a file 
> (name), but from svn output. E.g. the "svn diff" command tried to  
> print "Property changes on: ...", which in German contains the "ä"  
> Umlaut: "Eigenschaftsänderung auf: ...", and so the error message  
> that I get in fact reads: "svn: Auf sichere Daten 'Eigenschafts'  
> folgte ein nicht-ASCII Byte 195, das nicht von/nach UTF-8  
> konvertiert werden konnte"
>
>> [...]
>> --> What can I do to fix this? Especially b) is a serious problem  
>> for me. I've searched the mailing-list archives in this regard and  
>> found several suggestions to set one of the LANG environment  
>> variables, but I've not been able to solve the problem in spite of  
>> extensive tests.
>
> Setting LANG=en_US "fixes" the problem, as there are no umlauts in  
> English. So I currently use that setting as a work-around, but  
> would still be interested in learning of there is a more generic  
> solution.

Which server are you using? Is it by chance svnserve? If so, is the  
locale on the server a different one than on the client, keeping in  
mind the charset? For example, is the de_DE locale on the server  
using UTF-8 while the de_DE locale on the client is using  
ISO-8859-15? There was this issue I discovered recently which may  
relate:

http://svn.haxx.se/users/archive-2006-03/0494.shtml

This problem of "valid UTF-8 data followed by invalid sequence" comes  
up far too often on this list, and I'd like to finally get to the  
bottom of it and solve it once and for all.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: "Safe data was followed by non-ASCII byte ..." and other UTF-8 trouble

Posted by Carsten Fuchs <Ca...@T-Online.de>.
Hi again,

two more notes for completeness:

Carsten Fuchs wrote:
> b) Whenever a non-ASCII character occurs, either in a file-name, in the 
> contents of a file(!), etc., svn aborts with the error message:

I think that this statement is not quite right, as I just observed that 
as with a), the problem character did not origin from a file(name), but 
from svn output. E.g. the "svn diff" command tried to print "Property 
changes on: ...", which in German contains the "ä" Umlaut: 
"Eigenschaftsänderung auf: ...", and so the error message that I get in 
fact reads: "svn: Auf sichere Daten 'Eigenschafts' folgte ein 
nicht-ASCII Byte 195, das nicht von/nach UTF-8 konvertiert werden konnte"

> [...]
> --> What can I do to fix this? Especially b) is a serious problem for 
> me. I've searched the mailing-list archives in this regard and found 
> several suggestions to set one of the LANG environment variables, but 
> I've not been able to solve the problem in spite of extensive tests.

Setting LANG=en_US "fixes" the problem, as there are no umlauts in 
English. So I currently use that setting as a work-around, but would 
still be interested in learning of there is a more generic solution.

Best,
Carsten



-- 
Ca3D - Engine    http://www.Ca3D-Engine.de
Carsten Fuchs    http://www.Ca3D-Engine.de/c_Carsten.php

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org