You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Norbert Unterberg <ne...@gmx.net> on 2004/11/21 10:02:49 UTC

Subversion Unicode Support

Hi all,

After trying some things, I wonder what level of UNICODE support 
subversion has or is supposed to have.

What do I need to do to check in UNICODE text files (which encoding is 
supported), so that subversion still treats them as text files, doing 
all sorts of diff and CR/LF conversion?

Adding a UTF-16 file (with/without BOM) adds it as binary (mime-type: 
applcation/octet-stream). svn diff is not possible. Changing the 
mime-type to text/plain allows a diff, but the output is not displayed 
with the correct encoding.

Am I doing something wrong, or does svn not support unicode textfiles at 
all?

Norbert


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion Unicode Support

Posted by Norbert Unterberg <ne...@gmx.net>.
Ulrich Eckhardt schrieb:

> Subversion is totally ignorant of underlying file content, it treats all files 
> as binary blobs. 

I think that this decision was not all the best when designing 
subversion. After all, Subversion has support for text files (it 
supports different CR/LF styles). Subversion would have been a better 
system if it would treat text files as special files: A text file is a 
sequence of lines, that have a particuar encoding (UTF-8, UTF-16, 
with/without BOM, ASCII) and a particular end-of-line style (CRLF, CR, 
LF). Then many of these strange problems just weren't there.

However, I'm saying this without deeper knowledge of subversion's and 
character encoding details, and without much thinking. Maybe there is 
much more behind this as I can see now.

> If I were you, I 
> would consider a) dropping UTF-16 altogether and b) storing files in UTF-8, 

This would not be easy.
a) The native encoding for WIN32 UNICODE applications is UTF-16, and it 
would require an additional resource handling layer to switch to UTF-8.
b) We also edit resource text files for an embedded target that uses 
UTF-16 encoding.

Changing the implementation of a project just because a tool lacks some 
  features would not be a good idea. However, in our current project 
there are few UTF-16 files, all the source files are still encoded in 
the good old 8 bit Windows ANSI code page 1252.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion Unicode Support

Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
On Sunday 21 November 2004 11:02, Norbert Unterberg wrote:
> After trying some things, I wonder what level of UNICODE support
> subversion has or is supposed to have.

Subversion is totally ignorant of underlying file content, it treats all files 
as binary blobs. 

> What do I need to do to check in UNICODE text files (which encoding is
> supported), so that subversion still treats them as text files, doing
> all sorts of diff and CR/LF conversion?
>
> Adding a UTF-16 file (with/without BOM) adds it as binary (mime-type:
> applcation/octet-stream). svn diff is not possible. Changing the
> mime-type to text/plain allows a diff, but the output is not displayed
> with the correct encoding.

Now here we come to the exception to above rule: there is support for treating 
a file as text, but that is mostly limited to ASCII, maybe it still works 
more or less with other single-byte charsets or even UTF-8. If I were you, I 
would consider a) dropping UTF-16 altogether and b) storing files in UTF-8, 
because UTF-16 combines the worst of two worlds anyway. That's just my 
opinion though, and I'm aware of the fact that for MS Windows targetters it 
is tempting to use the native encoding, but let's not discuss that here. 

In order to allow diffing of other files, Subversion will get a pluggable diff 
system, so you can diff files however you want and even extend that to new 
file formats yourself. Until then, on the client side, you still have the 
'original' in the .svn subdir which you can use to diff with your changes. 
Everything else(i.e. with older versions, merging in changes) will require 
repository access.

hth

Uli

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion Unicode Support

Posted by kf...@collab.net.
Norbert Unterberg <ne...@gmx.net> writes:
> Am I doing something wrong, or does svn not support unicode textfiles
> at all?

Subversion doesn't (yet) know how to treat UTF-16 files as text for
purposes of eol-translation.  It can handle UTF-8, of course.  Sorry.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org