You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Marcus Comstedt <ma...@mc.pp.se> on 2002/06/01 00:53:34 UTC

Re: charset neutral? pls solve this

"Bill Tutt" <ra...@lyra.org> writes:

> > UCS-2 is really useful --- almost essential --- if you are
> > manipulating Unicode characters.  So, if svn is reformatting strings
> > to word-wrap, or is translating between encodings, it really does want
> > to be using UCS-2 for that.
> > 
> 
> A big +20 to that. UCS-2/UTF-16 is so much easier to process than UTF-8
> sequences.

Actually, Subversion doesn't do any processing that would be easier to
do with UCS-2/UTF-16 as far as I have seen.  Mostly the strings are
just passed around, concatenated, and splitted on ASCII characters
(which have identity encoding in UTF-8).  And UCS-2/UTF-16 sucks as
they can't encode the full range of ISO-10646.  If you are going to
use wide characters, UCS-4 is the way to go.  But for the particular
application of Subversion, I don't see a problem with using UTF-8.  If
you find a place in the code where using wide chars would have been
easier, please point it out.


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: charset neutral? pls solve this

Posted by Bill Tutt <ra...@lyra.org>.

> From: Marcus Comstedt [mailto:marcus@mc.pp.se]
> 
> =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
> 
> > You're mixing apples and oranges. UCS-2 indeed can't encode the
whole
> > range. UTF-16 can. They're not the same.
> 
> Nope.  You are wrong.  Sorry.  And I'm not mixing them to any further
> extent than to put them both into the "partial ISO-10646 salad".
> 
> UTF-16 can encode 65536-2048+1048576 = 1112064 characters.
> 
> The whole ISO-10646 range is 2147483648 characters, so UTF-16 only
> covers about 0.05%.
> 
> Thus, unfortuantely neither the apples nor the oranges are
> sufficient.  The bananas and the pears (UCS-4 and UTF-8) are though.
> 
> 

Your "whole range" for ISO 10646 is still inherently what UTF-16 can
handle.

Re: charset neutral? pls solve this

Posted by Marcus Comstedt <ma...@mc.pp.se>.
=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:

> You're mixing apples and oranges. UCS-2 indeed can't encode the whole
> range. UTF-16 can. They're not the same.

Nope.  You are wrong.  Sorry.  And I'm not mixing them to any further
extent than to put them both into the "partial ISO-10646 salad".

UTF-16 can encode 65536-2048+1048576 = 1112064 characters.

The whole ISO-10646 range is 2147483648 characters, so UTF-16 only
covers about 0.05%.

Thus, unfortuantely neither the apples nor the oranges are
sufficient.  The bananas and the pears (UCS-4 and UTF-8) are though.


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: charset neutral? pls solve this

Posted by Branko Čibej <br...@xbc.nu>.
Marcus Comstedt wrote:

>Actually, Subversion doesn't do any processing that would be easier to
>do with UCS-2/UTF-16 as far as I have seen.  Mostly the strings are
>just passed around, concatenated, and splitted on ASCII characters
>(which have identity encoding in UTF-8).  And UCS-2/UTF-16 sucks as
>they can't encode the full range of ISO-10646.
>
You're mixing apples and oranges. UCS-2 indeed can't encode the whole 
range. UTF-16 can. They're not the same.

>  If you are going to
>use wide characters, UCS-4 is the way to go.  But for the particular
>application of Subversion, I don't see a problem with using UTF-8.  If
>you find a place in the code where using wide chars would have been
>easier, please point it out.
>  
>


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org