You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Carfield Yim <ca...@carfield.com.hk> on 2005/06/21 17:30:34 UTC
Problem of checkout Chinese file in MacOSX
Get the following message:
carfield:~/Documents/workspace/web_site/testdir carfield$ svn update
subversion/libsvn_subr/utf.c:453: (apr_err=22)
svn: Can't convert string from 'UTF-8' to native encoding:
subversion/libsvn_subr/utf.c:451: (apr_err=22)
svn: ?\228?\184?\173?\230?\150?\135.txt
Any solution? The file is committed at WindowsXP using subclipse, I
suppose it is Big5, why it compliant about "Can't convert string from
'UTF-8' to native encoding"?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Problem of checkout Chinese file in MacOSX
Posted by Ryan Schmidt <su...@ryandesign.com>.
On 24.06.2005, at 09:09, Ulrich Eckhardt wrote:
>> http://developer.apple.com/documentation/Java/Conceptual/
>> Java14Development/
>> 04-JavaUIToolkits/JavaUIToolkits.html
>> The default font encoding on some other platforms is ISO-Latin-1
>> or WinLatin-1. These are subsets of UTF-8 which means that files
>> or filenames can be turned into UTF-8 by just turning a byte into
>> a char.
>
> This is utter nonsense, UTF-8 uses up to six bytes for one
> character. This
> probably rather means that all three mentioned character encodings
> have ASCII
> as common subset or that the encodings can represent a subset of the
> characters representable in UTF-8.
It's only nonsense insofar as they meant to write "Unicode" where
they wrote "UTF-8". The first 255 code points of Unicode are the same
as the 255 code points of ISO-8859-1. The representation of those
codepoints in UTF-8 is, of course, different. I am not sure whether
the additional codepoints present in Windows Latin 1 are also at
those same codepoints in Unicode, but I don't think they are. So that
may be the second nonsense.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Problem of checkout Chinese file in MacOSX
Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
> http://developer.apple.com/documentation/Java/Conceptual/Java14Development/
> 04-JavaUIToolkits/JavaUIToolkits.html
>
> Font Encoding
>
> The default font encoding in Mac OS X is MacRoman.
[...]
> The default font
> encoding on some other platforms is ISO-Latin-1 or WinLatin-1. These are
> subsets of UTF-8 which means that files or filenames can be turned into
> UTF-8 by just turning a byte into a char.
This is utter nonsense, UTF-8 uses up to six bytes for one character. This
probably rather means that all three mentioned character encodings have ASCII
as common subset or that the encodings can represent a subset of the
characters representable in UTF-8.
> Programs that assume this behavior cause problems in Mac OS X.
Well, that's not unexpected, see above.
> If you do not specify a font encoding explicitly, recognize that:
>
> *
> In the conversion from Unicode to MacRoman you may lose information.
>
> *
> Filenames are not stored on disk in the default font encoding, but
> in UTF-8. Usually this isn’t a problem, since most files are
> handled in Java as |java.io.File|s, though it is good to be aware of.
>
> *
> Although filenames are stored on disk as UTF-8, they are stored
> decomposed. This means certain characters, like e-acute (é) are
> store as two characters, “e”, followed by “´” (acute accent). The
> default HFS+ filesystem of Mac OS X enforces this behavior. SMB
> enforces composed Unicode characters. UFS and NFS do not specify
> whether filenames are stored composed or decomposed, so they can
> do either.
I believe this only relates to encoding of Fonts in Java, which also has its
own internal character encoding. However, it says that the default file name
encoding is UTF-8, which is nice to know (once I finally have my mac...).
Uli
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Problem of checkout Chinese file in MacOSX
Posted by Carfield Yim <ca...@carfield.com.hk>.
>Subversion uses UTF-8 internally to store all filenames. So, now the problem
>is rather
>1. how do I get these two chars in Mac OS X
>2. how do I tell Subversion about it
>
>In general, this is a task solved by so-called locales. Typing 'locale' at the
>console might give you a hint, other than that you need to do something OS X
>specific. It's possible that a simple 'export LC_CTYPE=utf8' already does the
>job. However, I'm mostly guessing - I have never used these charactersets let
>alone on a Mac.
>
>good luck
>
>Uli
>
>
I am in luck :-) simple 'export LC_CTYPE=utf-8 is work, In fact, utf-8
and MacRoman are very similar:
http://developer.apple.com/documentation/Java/Conceptual/Java14Development/04-JavaUIToolkits/JavaUIToolkits.html
Font Encoding
The default font encoding in Mac OS X is MacRoman. The default font
encoding on some other platforms is ISO-Latin-1 or WinLatin-1. These are
subsets of UTF-8 which means that files or filenames can be turned into
UTF-8 by just turning a byte into a char. Programs that assume this
behavior cause problems in Mac OS X.
The simplest way to work around this problem is to specify a font
encoding explicitly rather than assuming one.
If you do not specify a font encoding explicitly, recognize that:
*
In the conversion from Unicode to MacRoman you may lose information.
*
Filenames are not stored on disk in the default font encoding, but
in UTF-8. Usually this isn’t a problem, since most files are
handled in Java as |java.io.File|s, though it is good to be aware of.
*
Although filenames are stored on disk as UTF-8, they are stored
decomposed. This means certain characters, like e-acute (é) are
store as two characters, “e”, followed by “´” (acute accent). The
default HFS+ filesystem of Mac OS X enforces this behavior. SMB
enforces composed Unicode characters. UFS and NFS do not specify
whether filenames are stored composed or decomposed, so they can
do either.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Problem of checkout Chinese file in MacOSX
Posted by Ben Collins-Sussman <su...@collab.net>.
On Jun 23, 2005, at 3:44 AM, Ulrich Eckhardt wrote:
>
> In general, this is a task solved by so-called locales. Typing
> 'locale' at the
> console might give you a hint, other than that you need to do
> something OS X
> specific. It's possible that a simple 'export LC_CTYPE=utf8'
> already does the
> job. However, I'm mostly guessing - I have never used these
> charactersets let
> alone on a Mac.
The localization behaviors of Subversion are explained here, if it's
any help:
http://svnbook.red-bean.com/en/1.1/ch07s06.html
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Problem of checkout Chinese file in MacOSX
Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
Carfield Yim wrote:
> Get the following message:
>
> carfield:~/Documents/workspace/web_site/testdir carfield$ svn update
> subversion/libsvn_subr/utf.c:453: (apr_err=22)
> svn: Can't convert string from 'UTF-8' to native encoding:
> subversion/libsvn_subr/utf.c:451: (apr_err=22)
> svn: ?\228?\184?\173?\230?\150?\135.txt
I decoded this Unicode byte sequence and it boils down to two CJK codepoints
4e2d and 6587, followed by '.txt', so this data is at least plausible.
> Any solution? The file is committed at WindowsXP using subclipse, I
> suppose it is Big5, why it compliant about "Can't convert string from
> 'UTF-8' to native encoding"?
Subversion uses UTF-8 internally to store all filenames. So, now the problem
is rather
1. how do I get these two chars in Mac OS X
2. how do I tell Subversion about it
In general, this is a task solved by so-called locales. Typing 'locale' at the
console might give you a hint, other than that you need to do something OS X
specific. It's possible that a simple 'export LC_CTYPE=utf8' already does the
job. However, I'm mostly guessing - I have never used these charactersets let
alone on a Mac.
good luck
Uli
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org