You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Marcus Comstedt <ma...@mc.pp.se> on 2002/06/01 00:53:34 UTC
Re: charset neutral? pls solve this
"Bill Tutt" <ra...@lyra.org> writes:
> > UCS-2 is really useful --- almost essential --- if you are
> > manipulating Unicode characters. So, if svn is reformatting strings
> > to word-wrap, or is translating between encodings, it really does want
> > to be using UCS-2 for that.
> >
>
> A big +20 to that. UCS-2/UTF-16 is so much easier to process than UTF-8
> sequences.
Actually, Subversion doesn't do any processing that would be easier to
do with UCS-2/UTF-16 as far as I have seen. Mostly the strings are
just passed around, concatenated, and splitted on ASCII characters
(which have identity encoding in UTF-8). And UCS-2/UTF-16 sucks as
they can't encode the full range of ISO-10646. If you are going to
use wide characters, UCS-4 is the way to go. But for the particular
application of Subversion, I don't see a problem with using UTF-8. If
you find a place in the code where using wide chars would have been
easier, please point it out.
// Marcus
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
RE: Re: charset neutral? pls solve this
Posted by Bill Tutt <ra...@lyra.org>.
> From: Marcus Comstedt [mailto:marcus@mc.pp.se]
>
> =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
>
> > You're mixing apples and oranges. UCS-2 indeed can't encode the
whole
> > range. UTF-16 can. They're not the same.
>
> Nope. You are wrong. Sorry. And I'm not mixing them to any further
> extent than to put them both into the "partial ISO-10646 salad".
>
> UTF-16 can encode 65536-2048+1048576 = 1112064 characters.
>
> The whole ISO-10646 range is 2147483648 characters, so UTF-16 only
> covers about 0.05%.
>
> Thus, unfortuantely neither the apples nor the oranges are
> sufficient. The bananas and the pears (UCS-4 and UTF-8) are though.
>
>
Your "whole range" for ISO 10646 is still inherently what UTF-16 can
handle.
Re: charset neutral? pls solve this
Posted by Marcus Comstedt <ma...@mc.pp.se>.
=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
> You're mixing apples and oranges. UCS-2 indeed can't encode the whole
> range. UTF-16 can. They're not the same.
Nope. You are wrong. Sorry. And I'm not mixing them to any further
extent than to put them both into the "partial ISO-10646 salad".
UTF-16 can encode 65536-2048+1048576 = 1112064 characters.
The whole ISO-10646 range is 2147483648 characters, so UTF-16 only
covers about 0.05%.
Thus, unfortuantely neither the apples nor the oranges are
sufficient. The bananas and the pears (UCS-4 and UTF-8) are though.
// Marcus
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: charset neutral? pls solve this
Posted by Branko Čibej <br...@xbc.nu>.
Marcus Comstedt wrote:
>Actually, Subversion doesn't do any processing that would be easier to
>do with UCS-2/UTF-16 as far as I have seen. Mostly the strings are
>just passed around, concatenated, and splitted on ASCII characters
>(which have identity encoding in UTF-8). And UCS-2/UTF-16 sucks as
>they can't encode the full range of ISO-10646.
>
You're mixing apples and oranges. UCS-2 indeed can't encode the whole
range. UTF-16 can. They're not the same.
> If you are going to
>use wide characters, UCS-4 is the way to go. But for the particular
>application of Subversion, I don't see a problem with using UTF-8. If
>you find a place in the code where using wide chars would have been
>easier, please point it out.
>
>
--
Brane Čibej <br...@xbc.nu> http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org