You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bert Huijben <be...@vmoo.com> on 2013/02/03 10:41:29 UTC

RE: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

 tests/cmdline/
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=047d7b66f6c3db7dab04d4cd9e91

--047d7b66f6c3db7dab04d4cd9e91
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Maybe we should try to fix this automatic detection then (Does your
setting match the libmagic output for that file?) instead of adding
just a warning when setting the file binary explicitly.

Maybe we should validate our libmagic based decision to make something
binary?

Bert Huijben (Cell phone)
From: Justin Erenkrantz
Sent: 3-2-2013 9:18
To: Bert Huijben
Cc: dev@subversion.apache.org
Subject: Re: svn commit: r1441814 - in /subversion/trunk/subversion:
svn/ tests/cmdline/
As another data point, I have hit this text-as-binary myself just a few
weeks ago when I added a bunch of HTML files to a local repository - so,
it's definitely occurring automatically.  I did not have a chance to dig
into why the magic detection failed so miserably...  -- justin

On Saturday, February 2, 2013, Bert Huijben wrote:

>
>
> > -----Original Message-----
> > From: stsp@apache.org <javascript:;> [mailto:stsp@apache.org<javascript=
:;>
> ]
> > Sent: zaterdag 2 februari 2013 23:05
> > To: commits@subversion.apache.org <javascript:;>
> > Subject: svn commit: r1441814 - in /subversion/trunk/subversion: svn/
> > tests/cmdline/
> >
> > Author: stsp
> > Date: Sat Feb  2 22:04:44 2013
> > New Revision: 1441814
> >
> > URL: http://svn.apache.org/viewvc?rev=3D1441814&view=3Drev
> > Log:
> > When a binary mime-type is set on a file that looks like a text file,
> > make the 'svn' client print a warning about potential future problems
> > with operations such as diff, merge, and blame.
> >
> > This is only done during local propset for now, because the file needs
> > to be present on disk to detect its mime-type.
> >
> > See for related discussion: http://mail-
> > archives.apache.org/mod_mbox/subversion-
> > dev/201301.mbox/%3C20130131185725.GA13721%40ted.stsp.name%3E
>
> From my users I hear that another way this property is introduced is via
> conversions from other version management systems. Visual SourceSafe (lon=
g
> dead, but still used in a lot of small shops) marks UTF-8 files with a BO=
M
> as binary when it does an auto detect.
> (Well what would you guess for a system that wasn=E2=80=99t really update=
d since
> that format became popular)
>
> Most conversion tools just copy the binary flag, and there you have this
> problem on all your historic utf-8 files.
> (Where I worked we had this problem on all .xml files previously stored i=
n
> sourcesafe).
>
>
> I don't see a lot of users accidentally adding invalid properties
> themselves.
>
>         Bert
>
>

--047d7b66f6c3db7dab04d4cd9e91
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html><head><meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Cont=
ent-Type"></head><body><div><div style=3D"font-family: Calibri,sans-serif; =
font-size: 11pt;">Maybe we should try to fix this automatic detection then =
(Does your setting match the libmagic output for that file?) instead of add=
ing just a warning when setting the file binary explicitly.<br><br>Maybe we=
 should validate our libmagic based decision to make something binary? <br>=
<br>Bert Huijben (Cell phone)<br></div></div><hr><span style=3D"font-family=
: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">From: </span><spa=
n style=3D"font-family: Tahoma,sans-serif; font-size: 10pt;">Justin Erenkra=
ntz</span><br><span style=3D"font-family: Tahoma,sans-serif; font-size: 10p=
t; font-weight: bold;">Sent: </span><span style=3D"font-family: Tahoma,sans=
-serif; font-size: 10pt;">3-2-2013 9:18</span><br><span style=3D"font-famil=
y: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span=
 style=3D"font-family: Tahoma,sans-serif; font-size: 10pt;">Bert Huijben</s=
pan><br><span style=3D"font-family: Tahoma,sans-serif; font-size: 10pt; fon=
t-weight: bold;">Cc: </span><span style=3D"font-family: Tahoma,sans-serif; =
font-size: 10pt;">dev@subversion.apache.org</span><br><span style=3D"font-f=
amily: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Subject: </s=
pan><span style=3D"font-family: Tahoma,sans-serif; font-size: 10pt;">Re: sv=
n commit: r1441814 - in /subversion/trunk/subversion: svn/ tests/cmdline/</=
span><br><br></body></html>As another=C2=A0<span></span>data point, I have =
hit this text-as-binary myself just a few weeks ago when I added a bunch of=
 HTML files to a local repository -=C2=A0so, it&#39;s definitely occurring =
automatically. =C2=A0I did not have a chance to=C2=A0dig into why the magic=
 detection failed so miserably...=C2=A0 -- justin<br>
<br>On Saturday, February 2, 2013, Bert Huijben  wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><br>
<br>
&gt; -----Original Message-----<br>
&gt; From: <a href=3D"javascript:;" onclick=3D"_e(event, &#39;cvml&#39;, &#=
39;stsp@apache.org&#39;)">stsp@apache.org</a> [mailto:<a href=3D"javascript=
:;" onclick=3D"_e(event, &#39;cvml&#39;, &#39;stsp@apache.org&#39;)">stsp@a=
pache.org</a>]<br>

&gt; Sent: zaterdag 2 februari 2013 23:05<br>
&gt; To: <a href=3D"javascript:;" onclick=3D"_e(event, &#39;cvml&#39;, &#39=
;commits@subversion.apache.org&#39;)">commits@subversion.apache.org</a><br>
&gt; Subject: svn commit: r1441814 - in /subversion/trunk/subversion: svn/<=
br>
&gt; tests/cmdline/<br>
&gt;<br>
&gt; Author: stsp<br>
&gt; Date: Sat Feb =C2=A02 22:04:44 2013<br>
&gt; New Revision: 1441814<br>
&gt;<br>
&gt; URL: <a href=3D"http://svn.apache.org/viewvc?rev=3D1441814&amp;view=3D=
rev" target=3D"_blank">http://svn.apache.org/viewvc?rev=3D1441814&amp;view=
=3Drev</a><br>
&gt; Log:<br>
&gt; When a binary mime-type is set on a file that looks like a text file,<=
br>
&gt; make the &#39;svn&#39; client print a warning about potential future p=
roblems<br>
&gt; with operations such as diff, merge, and blame.<br>
&gt;<br>
&gt; This is only done during local propset for now, because the file needs=
<br>
&gt; to be present on disk to detect its mime-type.<br>
&gt;<br>
&gt; See for related discussion: <a href=3D"http://mail-" target=3D"_blank"=
>http://mail-</a><br>
&gt; <a href=3D"http://archives.apache.org/mod_mbox/subversion-" target=3D"=
_blank">archives.apache.org/mod_mbox/subversion-</a><br>
&gt; dev/201301.mbox/%3C20130131185725.GA13721%<a href=3D"http://40ted.stsp=
.name" target=3D"_blank">40ted.stsp.name</a>%3E<br>
<br>
>From my users I hear that another way this property is introduced is via co=
nversions from other version management systems. Visual SourceSafe (long de=
ad, but still used in a lot of small shops) marks UTF-8 files with a BOM as=
 binary when it does an auto detect.<br>

(Well what would you guess for a system that wasn=E2=80=99t really updated =
since that format became popular)<br>
<br>
Most conversion tools just copy the binary flag, and there you have this pr=
oblem on all your historic utf-8 files.<br>
(Where I worked we had this problem on all .xml files previously stored in =
sourcesafe).<br>
<br>
<br>
I don&#39;t see a lot of users accidentally adding invalid properties thems=
elves.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Bert<br>
<br>
</blockquote>

--047d7b66f6c3db7dab04d4cd9e91--

Re: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Feb 03, 2013 at 07:42:42AM -0500, Justin Erenkrantz wrote:
> On Sun, Feb 3, 2013 at 6:42 AM, Stefan Sperling <st...@elego.de> wrote:
> > Perhaps we should treat mime-types that contain 'charset=UTF-8'
> > as text by default? libmagic is able to detect the charset AFAIK.
> > Our code currently removes charset information from the mime-type
> > returned by libmagic.
> >
> > I don't really like the idea of adding yet another special case
> > to our definition of what a "text" mime-type is. But it might
> > be worth doing this if it impacts a lot of people.
> >
> 
> % file -i -k index.shtml
> index.shtml: application/xml; charset=us-ascii
> 
> Not all of us use UTF-8, but yes the sentiment is valid.  =)  I do wonder
> if printing out this new message saying SVN considers XML files as binary
> is going to increase the amount of people realizing that they are tripping
> over this.  -- justin

The new warning isn't printed when the mime-type is set by libmagic.
It's only printed during propset/propedit. It only targets people
who set a binary mime-type via propset/propedit.

To raise greater awareness the code could be moved into the subversion
library and send a special notification to alert users whenever a property
is set on a file in the working copy, even during 'svn add'. But I don't
think that adds much value over the existing '(bin)' notification.

Re: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sun, Feb 3, 2013 at 6:42 AM, Stefan Sperling <st...@elego.de> wrote:

> On Sun, Feb 03, 2013 at 06:16:03AM -0500, Justin Erenkrantz wrote:
> > On Sun, Feb 3, 2013 at 6:06 AM, Justin Erenkrantz <justin@erenkrantz.com
> >wrote:
> >
> > > I thought we treated application/xml as text at one time?  -- justin
> > >
> >
> > Yes, we did...and Ben -1'ed it back in 2004.  See:
> >
> >
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=374218
> >
> > I guess...but...really?  Is the only way for a user to override this is
> to
> > manually set their local auto-props?  *sigh*  -- justin
>
> Perhaps we should treat mime-types that contain 'charset=UTF-8'
> as text by default? libmagic is able to detect the charset AFAIK.
> Our code currently removes charset information from the mime-type
> returned by libmagic.
>
> I don't really like the idea of adding yet another special case
> to our definition of what a "text" mime-type is. But it might
> be worth doing this if it impacts a lot of people.
>

% file -i -k index.shtml
index.shtml: application/xml; charset=us-ascii

Not all of us use UTF-8, but yes the sentiment is valid.  =)  I do wonder
if printing out this new message saying SVN considers XML files as binary
is going to increase the amount of people realizing that they are tripping
over this.  -- justin

Re: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Feb 03, 2013 at 06:16:03AM -0500, Justin Erenkrantz wrote:
> On Sun, Feb 3, 2013 at 6:06 AM, Justin Erenkrantz <ju...@erenkrantz.com>wrote:
> 
> > I thought we treated application/xml as text at one time?  -- justin
> >
> 
> Yes, we did...and Ben -1'ed it back in 2004.  See:
> 
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=374218
> 
> I guess...but...really?  Is the only way for a user to override this is to
> manually set their local auto-props?  *sigh*  -- justin

Perhaps we should treat mime-types that contain 'charset=UTF-8'
as text by default? libmagic is able to detect the charset AFAIK.
Our code currently removes charset information from the mime-type
returned by libmagic.

I don't really like the idea of adding yet another special case
to our definition of what a "text" mime-type is. But it might
be worth doing this if it impacts a lot of people.

Re: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sun, Feb 3, 2013 at 6:06 AM, Justin Erenkrantz <ju...@erenkrantz.com>wrote:

> I thought we treated application/xml as text at one time?  -- justin
>

Yes, we did...and Ben -1'ed it back in 2004.  See:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=374218

I guess...but...really?  Is the only way for a user to override this is to
manually set their local auto-props?  *sigh*  -- justin

Re: svn commit: r1441814 - in /subversion/trunk/subversion: svn/

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sun, Feb 3, 2013 at 4:41 AM, Bert Huijben <be...@vmoo.com> wrote:

> Maybe we should try to fix this automatic detection then (Does your
> setting match the libmagic output for that file?) instead of adding
> just a warning when setting the file binary explicitly.
>
> Maybe we should validate our libmagic based decision to make something
> binary?
>

Taking a look while here at FOSDEM...libsvn_client set the mime-type as
"application/xml":

% svn diff index.shtml
Index: index.shtml
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/xml
% svn pg svn:mime-type index.shtml
application/xml
% file --mime-type index.shtml
index.shtml: application/xml
% file -v
file-5.11
magic file from /opt/local/share/misc/magic

I thought we treated application/xml as text at one time?  -- justin