You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by engelbert gruber <en...@gmail.com> on 2017/10/30 20:57:29 UTC

eol-style and utf-16

hi

checking in a file with eol-style native on unix : eol = 0x0a
checking it out on windows : 0x0a is replaced by 0x0d 0x0a

when the file is in utf-16 : eol ist 0x00 0x0a
and when checked out on windows this becomes : 0x00 0x0d 0x0a

which breaks utf-16 as far as i understand it

possible fixes:

* get utf-## aware
* add a charsize property
* document it
* recommend eol-style a nonnative eol-style: LF CR or CRLF

all the best
  e

Re: eol-style and utf-16

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Tue, 31 Oct 2017 10:11 +0100:
> On Mon, Oct 30, 2017 at 09:12:38PM -0400, Nico Kadel-Garcia wrote:
> > It doesn't do much for otehr UTF difficulties, but it sure avoids the
> > whole inconsistent EOL issues.
> 
> In my opinion the problem under discussion has nothing to do with eol-style.
> Rather, it is that UTF-16 must be treated as binary data in SVN.
> 
> The property svn:mime-type should be set to 'application/octet-stream'
> on UTF-16 files.

"application/octet-stream; charset=utf-16" should work too.  I don't
remember off the top of my head which tools consume the additional
information --- httpd mod_magic perhaps? --- but they exist.  (Sorry, I
don't have time to look up the details right now.)

> And setting svn:eol-style on a binary file is obviously
> not a good idea (unfortunately, these features are not mutually exclusive
> but they should be).
> 
> Adding UTF-16 support is not impossible but difficult because Subversion
> as a system assumes UTF-8 strings and won't work correctly with strings
> that contain embedded NUL bytes, and there are a lot of entry points
> for text data in the system.

I'm not sure which part of the system is not NUL-safe?  UTF-8 text files with
svn:eol-style set and embedded NULs seem to be handled correctly.

I agree that principle it'd be possible to sniff the charset from the
svn:mime-type property and then <handwave>DTRT for UTF-16 files with svn:eol-
style</handwave>.  This will happen when someone implements it, aka,
patches welcome.

Cheers,

Daniel

Re: eol-style and utf-16

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Oct 30, 2017 at 09:12:38PM -0400, Nico Kadel-Garcia wrote:
> On Mon, Oct 30, 2017 at 4:57 PM, engelbert gruber
> <en...@gmail.com> wrote:
> > hi
> >
> > checking in a file with eol-style native on unix : eol = 0x0a
> > checking it out on windows : 0x0a is replaced by 0x0d 0x0a
> >
> > when the file is in utf-16 : eol ist 0x00 0x0a
> > and when checked out on windows this becomes : 0x00 0x0d 0x0a
> >
> > which breaks utf-16 as far as i understand it
> >
> > possible fixes:
> >
> > * get utf-## aware
> > * add a charsize property
> > * document it
> > * recommend eol-style a nonnative eol-style: LF CR or CRLF
> >
> > all the best
> >   e
> 
> 
> So, easy solution. *Never* use eol-style.

I would not point at svn:eol-style as the root cause here.
This feature works fine with text files.

> It's destructive to any
> working copy that may be accessed via operating systems with distinct
> eol styles.

It works fine unless the operating system is so obscure that is uses
something other than LF, CRLF, or CR as a newline character.

> And its destructiveness is insidious when files are
> edited, locally, with editor that auto-interpret EOL on the fly,
> leading to inconsistent EOL and EOL confusion when creating new files
> in the repo.

If an editor decides to change all the newlines, this creates
a diff where every line in a text file appears as changed,
even if just a single line was modified by the editor's user.
That's a problem svn:eol-style can solve.

If an editor decides to create inconsistent newlines, it has broken
the file. All you can do now is treat is as a binary file because
text content cannot be split into lines anymore. I would put the
blame on the editor here.

> It doesn't do much for otehr UTF difficulties, but it sure avoids the
> whole inconsistent EOL issues.

In my opinion the problem under discussion has nothing to do with eol-style.
Rather, it is that UTF-16 must be treated as binary data in SVN.

The property svn:mime-type should be set to 'application/octet-stream'
on UTF-16 files. And setting svn:eol-style on a binary file is obviously
not a good idea (unfortunately, these features are not mutually exclusive
but they should be).

Adding UTF-16 support is not impossible but difficult because Subversion
as a system assumes UTF-8 strings and won't work correctly with strings
that contain embedded NUL bytes, and there are a lot of entry points
for text data in the system.

Re: eol-style and utf-16

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Mon, Oct 30, 2017 at 4:57 PM, engelbert gruber
<en...@gmail.com> wrote:
> hi
>
> checking in a file with eol-style native on unix : eol = 0x0a
> checking it out on windows : 0x0a is replaced by 0x0d 0x0a
>
> when the file is in utf-16 : eol ist 0x00 0x0a
> and when checked out on windows this becomes : 0x00 0x0d 0x0a
>
> which breaks utf-16 as far as i understand it
>
> possible fixes:
>
> * get utf-## aware
> * add a charsize property
> * document it
> * recommend eol-style a nonnative eol-style: LF CR or CRLF
>
> all the best
>   e


So, easy solution. *Never* use eol-style. It's destructive to any
working copy that may be accessed via operating systems with distinct
eol styles. And its destructiveness is insidious when files are
edited, locally, with editor that auto-interpret EOL on the fly,
leading to inconsistent EOL and EOL confusion when creating new files
in the repo.

It doesn't do much for otehr UTF difficulties, but it sure avoids the
whole inconsistent EOL issues.

Re: eol-style and utf-16

Posted by engelbert gruber <en...@gmail.com>.
sorry one more

On 30 October 2017 at 21:57, engelbert gruber <en...@gmail.com>
wrote:

checking in a file with eol-style native on unix : eol = 0x0a
> checking it out on windows : 0x0a is replaced by 0x0d 0x0a
>
> when the file is in utf-16 : eol ist 0x00 0x0a
> and when checked out on windows this becomes : 0x00 0x0d 0x0a
>
> which breaks utf-16 as far as i understand it
>
> possible fixes:
>
> * get utf-## aware
> * add a charsize property
> * document it
> * recommend eol-style a nonnative eol-style: LF CR or CRLF
>
>
* turning of eol-style is an option, but not in general, as subversion
comes with eol-style native
  (i like to stick with defaults, to ease setting up systems and because
subversion-maintainers
  are more knowledgable than me)

setting svn:mime-type  to 'application/octet-stream' shouldn't be necessary
if
http://help.collab.net/index.jsp?topic=/faq/svnbinary.html

  Currently, Subversion only looks at the first 1024 bytes of the file; if
any of the bytes are zero,
  or if more than 15 percent are not ASCII printing characters, then
Subversion calls the file binary.

is correct. by this utf-16 will always be binary.
mine was a csv-file, but the problem might be that it was imported from a
CVS-repo

cheers

>   e
>