You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Brian McCann <Br...@viziant.net> on 2007/08/14 14:20:45 UTC

text and binary files in SVN

Hi, 
 
is there a way to mark file extensions as text rather than binary in svn? ".as" and ".mxml" files seem to be being treated as binary...
 
How would this be done?
 
Thanks,
Brian
 

Re: text and binary files in SVN

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2007-08-14 16:43:19 +0200, Erik Huelsmann wrote:
> On 8/14/07, Vincent Lefevre <vi...@vinc17.org> wrote:
> > On 2007-08-14 16:25:10 +0200, Erik Huelsmann wrote:
> > > There's an algorithm to estimate whether files are binary or texty:
> > >
> > > Check the first 1024 bytes to be within the 020-0x7F and 0x07-0x0D
> > > regions. If more than 85% of the bytes fall in that region (and none
> > > were 0x00), then the file is probably texty.
> >
> > I wonder if non-occidental users would agree with you.
> 
> They don't have to. This is what currently defines texty and we've had
> had no complaints. It's based on what diff thinks what's texty.
      ^^^^^^^^^^^^^

This is not true (see below).

> > And what about UTF-16?
> 
> There's no support for wide characters in the built-in diff routine.
> You can use external diff routines, or provide a patch to support
> it...

There have been some complaints concerning UTF-16 (but the threads also
mention the problem of UTF-8 sometimes being recognized as binary), and
there's even an open issue:

  http://subversion.tigris.org/issues/show_bug.cgi?id=2194

> > One can have compressed XML files with text/xml mime-type. How does
> > Subversion handle that?
> 
> As incorrectly as the mime-type. Clearly a compressed XML file isn't
> text. More appropriate seems application/xml. Or even
> application/x-gzip+xml.

No, this is wrong. For instance, see /etc/mime.types distributed in
Debian:

#  Note: Compression schemes like "gzip", "bzip", and "compress" are not
#  actually "mime-types".  They are "encodings" and hence must _not_ have
#  entries in this file to map their extensions.  The "mime-type" of an
#  encoded file refers to the type of data that has been encoded, not the
#  type of encoding.

Apache behaves the same way: the compression is declared in a separate
header (Content-Encoding). That's HTTP/1.1 (RFC 2616) after all...

> > Also, for instance, is text/rtf more textual than application/x-sh
> > as far as diff is concerned?
> 
> Yes, because it doesn't have a text/* mime-type.

But that's wrong: doing a textual diff on sh scripts makes more sense
than doing one on RTF files. Again, there have been several complaints.

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: text and binary files in SVN

Posted by Erik Huelsmann <eh...@gmail.com>.
On 8/14/07, Vincent Lefevre <vi...@vinc17.org> wrote:
> On 2007-08-14 16:25:10 +0200, Erik Huelsmann wrote:
> > There's an algorithm to estimate whether files are binary or texty:
> >
> > Check the first 1024 bytes to be within the 020-0x7F and 0x07-0x0D
> > regions. If more than 85% of the bytes fall in that region (and none
> > were 0x00), then the file is probably texty.
>
> I wonder if non-occidental users would agree with you.

They don't have to. This is what currently defines texty and we've had
had no complaints. It's based on what diff thinks what's texty.

> And what about UTF-16?

There's no support for wide characters in the built-in diff routine.
You can use external diff routines, or provide a patch to support
it...

> > So, if your files don't have that property, Subversion will consider
> > them binary, until you set a mime-type which makes it look texty
> > (starting with text/)
>
> One can have compressed XML files with text/xml mime-type. How does
> Subversion handle that?

As incorrectly as the mime-type. Clearly a compressed XML file isn't
text. More appropriate seems application/xml. Or even
application/x-gzip+xml.

> Also, for instance, is text/rtf more textual
> than application/x-sh as far as diff is concerned?

Yes, because it doesn't have a text/* mime-type.

Bye,

Erik.

> --
> Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
> Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
>

Re: text and binary files in SVN

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2007-08-14 16:25:10 +0200, Erik Huelsmann wrote:
> There's an algorithm to estimate whether files are binary or texty:
> 
> Check the first 1024 bytes to be within the 020-0x7F and 0x07-0x0D
> regions. If more than 85% of the bytes fall in that region (and none
> were 0x00), then the file is probably texty.

I wonder if non-occidental users would agree with you.
And what about UTF-16?

> So, if your files don't have that property, Subversion will consider
> them binary, until you set a mime-type which makes it look texty
> (starting with text/)

One can have compressed XML files with text/xml mime-type. How does
Subversion handle that? Also, for instance, is text/rtf more textual
than application/x-sh as far as diff is concerned?

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: text and binary files in SVN

Posted by Erik Huelsmann <eh...@gmail.com>.
On 8/14/07, Brian McCann <Br...@viziant.net> wrote:
>
> how does svn handle ".as" and ".mxml" files already committed to the
> repository, after setting the mime type will files already in the repository
> be set to txt or do they remain binary and only new ".as" and ".mxml" files
> get set as text?

They remain as they were (in this case meaning they'll be interpreted
as binary).

HTH,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: text and binary files in SVN

Posted by Erik Huelsmann <eh...@gmail.com>.
On 8/14/07, Brian McCann <Br...@viziant.net> wrote:
>
> Hi,
>
> is there a way to mark file extensions as text rather than binary in svn?
> ".as" and ".mxml" files seem to be being treated as binary...
>
> How would this be done?

I posted about that just days ago; the archive should hold the message by now.

There's an algorithm to estimate whether files are binary or texty:

Check the first 1024 bytes to be within the 020-0x7F and 0x07-0x0D
regions. If more than 85% of the bytes fall in that region (and none
were 0x00), then the file is probably texty.

So, if your files don't have that property, Subversion will consider
them binary, until you set a mime-type which makes it look texty
(starting with text/)

HTH,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: text and binary files in SVN

Posted by Michael Haggerty <mh...@alum.mit.edu>.
Brian McCann wrote:
> is there a way to mark file extensions as text rather than binary in
> svn? ".as" and ".mxml" files seem to be being treated as binary...
>  
> How would this be done?

Aside from the mime type, which affects whether file diffs are shown,
there is another aspect of "textiness", namely svn:eol-style.  This
property specifies whether end-of-line characters are normalized to LF,
CRLF, CR, or the client's native style when the files are checked in and
out.  This property can be set via auto-props or manually using "svn
propset".

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org