You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Carfield Yim <ca...@carfield.com.hk> on 2005/06/21 17:30:34 UTC

Problem of checkout Chinese file in MacOSX

Get the following message:

carfield:~/Documents/workspace/web_site/testdir carfield$ svn update
subversion/libsvn_subr/utf.c:453: (apr_err=22)
svn: Can't convert string from 'UTF-8' to native encoding:
subversion/libsvn_subr/utf.c:451: (apr_err=22)
svn: ?\228?\184?\173?\230?\150?\135.txt


Any solution? The file is committed at WindowsXP using subclipse, I
suppose it is Big5, why it compliant about "Can't convert string from
'UTF-8' to native encoding"?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem of checkout Chinese file in MacOSX

Posted by Ryan Schmidt <su...@ryandesign.com>.
On 24.06.2005, at 09:09, Ulrich Eckhardt wrote:

>> http://developer.apple.com/documentation/Java/Conceptual/ 
>> Java14Development/
>> 04-JavaUIToolkits/JavaUIToolkits.html

>> The default font encoding on some other platforms is ISO-Latin-1  
>> or WinLatin-1. These are subsets of UTF-8 which means that files  
>> or filenames can be turned into UTF-8 by just turning a byte into  
>> a char.
>
> This is utter nonsense, UTF-8 uses up to six bytes for one  
> character. This
> probably rather means that all three mentioned character encodings  
> have ASCII
> as common subset or that the encodings can represent a subset of the
> characters representable in UTF-8.

It's only nonsense insofar as they meant to write "Unicode" where  
they wrote "UTF-8". The first 255 code points of Unicode are the same  
as the 255 code points of ISO-8859-1. The representation of those  
codepoints in UTF-8 is, of course, different. I am not sure whether  
the additional codepoints present in Windows Latin 1 are also at  
those same codepoints in Unicode, but I don't think they are. So that  
may be the second nonsense.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem of checkout Chinese file in MacOSX

Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
> http://developer.apple.com/documentation/Java/Conceptual/Java14Development/
> 04-JavaUIToolkits/JavaUIToolkits.html
>
>       Font Encoding
>
> The default font encoding in Mac OS X is MacRoman. 
[...]
> The default font  
> encoding on some other platforms is ISO-Latin-1 or WinLatin-1. These are
> subsets of UTF-8 which means that files or filenames can be turned into
> UTF-8 by just turning a byte into a char. 

This is utter nonsense, UTF-8 uses up to six bytes for one character. This 
probably rather means that all three mentioned character encodings have ASCII 
as common subset or that the encodings can represent a subset of the 
characters representable in UTF-8.

> Programs that assume this behavior cause problems in Mac OS X.

Well, that's not unexpected, see above.

> If you do not specify a font encoding explicitly, recognize that:
>
>     *
>       In the conversion from Unicode to MacRoman you may lose information.
>
>     *
>       Filenames are not stored on disk in the default font encoding, but
>       in UTF-8. Usually this isn’t a problem, since most files are
>       handled in Java as |java.io.File|s, though it is good to be aware of.
>
>     *
>       Although filenames are stored on disk as UTF-8, they are stored
>       decomposed. This means certain characters, like e-acute (é) are
>       store as two characters, “e”, followed by “´” (acute accent). The
>       default HFS+ filesystem of Mac OS X enforces this behavior. SMB
>       enforces composed Unicode characters. UFS and NFS do not specify
>       whether filenames are stored composed or decomposed, so they can
>       do either.

I believe this only relates to encoding of Fonts in Java, which also has its 
own internal character encoding. However, it says that the default file name 
encoding is UTF-8, which is nice to know (once I finally have my mac...).

Uli

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Problem of checkout Chinese file in MacOSX

Posted by Carfield Yim <ca...@carfield.com.hk>.
>Subversion uses UTF-8 internally to store all filenames. So, now the problem 
>is rather 
>1. how do I get these two chars in Mac OS X
>2. how do I tell Subversion about it
>
>In general, this is a task solved by so-called locales. Typing 'locale' at the 
>console might give you a hint, other than that you need to do something OS X 
>specific. It's possible that a simple 'export LC_CTYPE=utf8' already does the 
>job. However, I'm mostly guessing - I have never used these charactersets let 
>alone on a Mac.
>
>good luck
>
>Uli
>  
>
I am in luck :-) simple 'export LC_CTYPE=utf-8 is work, In fact, utf-8 
and MacRoman are very similar: 
http://developer.apple.com/documentation/Java/Conceptual/Java14Development/04-JavaUIToolkits/JavaUIToolkits.html


      Font Encoding

The default font encoding in Mac OS X is MacRoman. The default font 
encoding on some other platforms is ISO-Latin-1 or WinLatin-1. These are 
subsets of UTF-8 which means that files or filenames can be turned into 
UTF-8 by just turning a byte into a char. Programs that assume this 
behavior cause problems in Mac OS X.

The simplest way to work around this problem is to specify a font 
encoding explicitly rather than assuming one.

If you do not specify a font encoding explicitly, recognize that:

    *

      In the conversion from Unicode to MacRoman you may lose information.

    *

      Filenames are not stored on disk in the default font encoding, but
      in UTF-8. Usually this isn’t a problem, since most files are
      handled in Java as |java.io.File|s, though it is good to be aware of.

    *

      Although filenames are stored on disk as UTF-8, they are stored
      decomposed. This means certain characters, like e-acute (é) are
      store as two characters, “e”, followed by “´” (acute accent). The
      default HFS+ filesystem of Mac OS X enforces this behavior. SMB
      enforces composed Unicode characters. UFS and NFS do not specify
      whether filenames are stored composed or decomposed, so they can
      do either.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem of checkout Chinese file in MacOSX

Posted by Ben Collins-Sussman <su...@collab.net>.
On Jun 23, 2005, at 3:44 AM, Ulrich Eckhardt wrote:
>
> In general, this is a task solved by so-called locales. Typing  
> 'locale' at the
> console might give you a hint, other than that you need to do  
> something OS X
> specific. It's possible that a simple 'export LC_CTYPE=utf8'  
> already does the
> job. However, I'm mostly guessing - I have never used these  
> charactersets let
> alone on a Mac.


The localization behaviors of Subversion are explained here, if it's  
any help:

    http://svnbook.red-bean.com/en/1.1/ch07s06.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Problem of checkout Chinese file in MacOSX

Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
Carfield Yim wrote:
> Get the following message:
>
> carfield:~/Documents/workspace/web_site/testdir carfield$ svn update
> subversion/libsvn_subr/utf.c:453: (apr_err=22)
> svn: Can't convert string from 'UTF-8' to native encoding:
> subversion/libsvn_subr/utf.c:451: (apr_err=22)
> svn: ?\228?\184?\173?\230?\150?\135.txt

I decoded this Unicode byte sequence and it boils down to two CJK codepoints 
4e2d and 6587, followed by '.txt', so this data is at least plausible.

> Any solution? The file is committed at WindowsXP using subclipse, I
> suppose it is Big5, why it compliant about "Can't convert string from
> 'UTF-8' to native encoding"?

Subversion uses UTF-8 internally to store all filenames. So, now the problem 
is rather 
1. how do I get these two chars in Mac OS X
2. how do I tell Subversion about it

In general, this is a task solved by so-called locales. Typing 'locale' at the 
console might give you a hint, other than that you need to do something OS X 
specific. It's possible that a simple 'export LC_CTYPE=utf8' already does the 
job. However, I'm mostly guessing - I have never used these charactersets let 
alone on a Mac.

good luck

Uli

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org