You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Norbert Unterberg <ne...@gmx.net> on 2004/11/21 10:02:49 UTC
Subversion Unicode Support
Hi all,
After trying some things, I wonder what level of UNICODE support
subversion has or is supposed to have.
What do I need to do to check in UNICODE text files (which encoding is
supported), so that subversion still treats them as text files, doing
all sorts of diff and CR/LF conversion?
Adding a UTF-16 file (with/without BOM) adds it as binary (mime-type:
applcation/octet-stream). svn diff is not possible. Changing the
mime-type to text/plain allows a diff, but the output is not displayed
with the correct encoding.
Am I doing something wrong, or does svn not support unicode textfiles at
all?
Norbert
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Subversion Unicode Support
Posted by Norbert Unterberg <ne...@gmx.net>.
Ulrich Eckhardt schrieb:
> Subversion is totally ignorant of underlying file content, it treats all files
> as binary blobs.
I think that this decision was not all the best when designing
subversion. After all, Subversion has support for text files (it
supports different CR/LF styles). Subversion would have been a better
system if it would treat text files as special files: A text file is a
sequence of lines, that have a particuar encoding (UTF-8, UTF-16,
with/without BOM, ASCII) and a particular end-of-line style (CRLF, CR,
LF). Then many of these strange problems just weren't there.
However, I'm saying this without deeper knowledge of subversion's and
character encoding details, and without much thinking. Maybe there is
much more behind this as I can see now.
> If I were you, I
> would consider a) dropping UTF-16 altogether and b) storing files in UTF-8,
This would not be easy.
a) The native encoding for WIN32 UNICODE applications is UTF-16, and it
would require an additional resource handling layer to switch to UTF-8.
b) We also edit resource text files for an embedded target that uses
UTF-16 encoding.
Changing the implementation of a project just because a tool lacks some
features would not be a good idea. However, in our current project
there are few UTF-16 files, all the source files are still encoded in
the good old 8 bit Windows ANSI code page 1252.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Subversion Unicode Support
Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
On Sunday 21 November 2004 11:02, Norbert Unterberg wrote:
> After trying some things, I wonder what level of UNICODE support
> subversion has or is supposed to have.
Subversion is totally ignorant of underlying file content, it treats all files
as binary blobs.
> What do I need to do to check in UNICODE text files (which encoding is
> supported), so that subversion still treats them as text files, doing
> all sorts of diff and CR/LF conversion?
>
> Adding a UTF-16 file (with/without BOM) adds it as binary (mime-type:
> applcation/octet-stream). svn diff is not possible. Changing the
> mime-type to text/plain allows a diff, but the output is not displayed
> with the correct encoding.
Now here we come to the exception to above rule: there is support for treating
a file as text, but that is mostly limited to ASCII, maybe it still works
more or less with other single-byte charsets or even UTF-8. If I were you, I
would consider a) dropping UTF-16 altogether and b) storing files in UTF-8,
because UTF-16 combines the worst of two worlds anyway. That's just my
opinion though, and I'm aware of the fact that for MS Windows targetters it
is tempting to use the native encoding, but let's not discuss that here.
In order to allow diffing of other files, Subversion will get a pluggable diff
system, so you can diff files however you want and even extend that to new
file formats yourself. Until then, on the client side, you still have the
'original' in the .svn subdir which you can use to diff with your changes.
Everything else(i.e. with older versions, merging in changes) will require
repository access.
hth
Uli
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Subversion Unicode Support
Posted by kf...@collab.net.
Norbert Unterberg <ne...@gmx.net> writes:
> Am I doing something wrong, or does svn not support unicode textfiles
> at all?
Subversion doesn't (yet) know how to treat UTF-16 files as text for
purposes of eol-translation. It can handle UTF-8, of course. Sorry.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org