You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@xbc.nu> on 2002/06/02 21:09:47 UTC

Re: use of UTF-8

Greg Stein wrote:

>Sheesh. Of course I can see it. And it is a very wrong position, when we can
>so *easily* just say "it is UTF-8" and be done with it. That opens up a
>whole world of simplicity and determinism for the applications that will be
>built on top of Subversion.
>  
>
Um. I'd rather say it opens up a huge can of very hungry carnivorous 
worms. While it might be true that you can trust the locale settings on 
most machines today (something I'm not at all sure about), you can't 
trust programs. On Windows, for instance, I can set notepad as my 
$EDITOR, then go and save the log message as UTF-8 or two different 
kinds of UTF-16 (big- and little-endian). My locale info says I'm using 
codepage 1250. Converting that text would produce ... interesting? ... 
results.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: use of UTF-8

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Colin Putney <co...@whistler.com> writes:
> I'm wondering if this boils down to a question of what the 1.0
> behaviour will be. I'm pretty convinced that the email-like is the way
> to go, but it does require some changes to the existing codebase.
> 
> Is this something that should be part of the I18N work that will be
> done after 1.0? How much of the desire for UTF-8 is really a desire to
> get 1.0 out the door?

I don't think this relates to 1.0 much at all.  (We might make the
discovery that UTF-8 conversion is a good idea, or that it's a bad
idea, at any point between now and 1.0, or some time after 1.0.)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: use of UTF-8

Posted by Colin Putney <co...@whistler.com>.
On Monday, June 3, 2002, at 12:18  PM, Karl Fogel wrote:

> Branko Čibej <br...@xbc.nu> writes:
>> Um. I'd rather say it opens up a huge can of very hungry carnivorous
>> worms. While it might be true that you can trust the locale settings
>> on most machines today (something I'm not at all sure about), you
>> can't trust programs. On Windows, for instance, I can set notepad as
>> my $EDITOR, then go and save the log message as UTF-8 or two different
>> kinds of UTF-16 (big- and little-endian). My locale info says I'm
>> using codepage 1250. Converting that text would produce
>> ... interesting? ... results.
>
> I'm still worried about this scenario too, but the reason I'm willing
> to risk it is that we can change Subversion if we discover we were
> wrong.  So let's see how often problems happen in practice.  After
> all, if conversion to UTF-8 *does* corrupt log messages in real life,
> then we can simply say "Well, that was a mistake", and
> backwards-compatibly change the client libraries's behavior.
>
> It would be simple enough to switch to email/mime-like behavior.  Just
> stop converting to UTF-8, and start storing the literal bits of the
> log message, along with a best guess at the encoding for which they
> were written -- i.e., a new revision prop, `svn:log-message-encoding'
> or whatever.  Revisions that don't have that property are assumed to
> be in UTF-8.

I'm wondering if this boils down to a question of what the 1.0 behaviour 
will be. I'm pretty convinced that the email-like is the way to go, but 
it does require some changes to the existing codebase.

Is this something that should be part of the I18N work that will be done 
after 1.0? How much of the desire for UTF-8 is really a desire to get 
1.0 out the door?


Colin Putney
Whistler.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: use of UTF-8

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Branko Čibej <br...@xbc.nu> writes:
> Um. I'd rather say it opens up a huge can of very hungry carnivorous
> worms. While it might be true that you can trust the locale settings
> on most machines today (something I'm not at all sure about), you
> can't trust programs. On Windows, for instance, I can set notepad as
> my $EDITOR, then go and save the log message as UTF-8 or two different
> kinds of UTF-16 (big- and little-endian). My locale info says I'm
> using codepage 1250. Converting that text would produce
> ... interesting? ... results.

I'm still worried about this scenario too, but the reason I'm willing
to risk it is that we can change Subversion if we discover we were
wrong.  So let's see how often problems happen in practice.  After
all, if conversion to UTF-8 *does* corrupt log messages in real life,
then we can simply say "Well, that was a mistake", and
backwards-compatibly change the client libraries's behavior.

It would be simple enough to switch to email/mime-like behavior.  Just
stop converting to UTF-8, and start storing the literal bits of the
log message, along with a best guess at the encoding for which they
were written -- i.e., a new revision prop, `svn:log-message-encoding'
or whatever.  Revisions that don't have that property are assumed to
be in UTF-8.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org