You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Konstantin Kolinko <kn...@gmail.com> on 2011/10/14 15:43:03 UTC

Huge diff for one-line change in ASF repository

Hi!

One of committers in Apache Tomcat project is experimenting with Git
<-> Subversion integration.

A result is that in the last few days several commits produced diffs
for entire file. An example:

http://svn.apache.org/viewvc?view=revision&revision=1183340
http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?r1=1183339&r2=1183340

Commit e-mail:
http://markmail.org/message/f4rdxrjvrenc6tg6
http://markmail.org/message/iysr4bsfckegq7vp



How can this happen from subversion point of view?
The file has svn:eol-style=native.


AFAIK such files are stored with LF endings at the server. Is this
line endings convention enforced on the client only?

I am on Windows. Both
svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183339
svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183340

produce files with CRLF endings for me, so I do not see such huge
difference between the files.

But diff command below does produces that huge diff, that is shown by
viewvc and was in the commit email:

svn diff -c 1183340
http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java


Besides the diff I do not have other evidence that the file in
repository has CRs or is otherwise broken.

Best regards,
Konstantin Kolinko

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Konstantin Kolinko wrote on Sat, Oct 15, 2011 at 14:27:43 +0400:
> 2011/10/14 Daniel Shahaf <d....@daniel.shahaf.name>:
> > for f in $CHANGED_FILES; do
> >  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
> >    exit 1
> >  fi
> > done
> >
> 
> Good, impressive.

Don't deal those adjectives so quickly.  I hadn't even tested this thing...

> As others noted, one has to check for specific values of svn:eol-style
> (native, LF),

True

> so "svnlook propget" instead of proplist will be needed.
> I am not sure about && [ "`...`" ]. Maybe && [ -n "`...`" ] to test
> for non-empty strings.

Not necessary in this case, since the `` evaluates to either "" or "0d".
But, yeah, good style says to add the -n explicitly.

> A friendly error message wouldn't hurt either.
> 

And also svnlook and friends should be used by their full paths.  That
and that were LAAEFTR'd.

> Best regards,
> Konstantin Kolinko

Re: Huge diff for one-line change in ASF repository

Posted by Konstantin Kolinko <kn...@gmail.com>.
> Konstantin Kolinko wrote on Fri, Oct 14, 2011 at 17:43:03 +0400:
>> Hi!
>>
>> One of committers in Apache Tomcat project is experimenting with Git
>> <-> Subversion integration.
>>
>> A result is that in the last few days several commits produced diffs
>> for entire file. An example:
>>
>> http://svn.apache.org/viewvc?view=revision&revision=1183340
>> http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?r1=1183339&r2=1183340
>>
>> Commit e-mail:
>> http://markmail.org/message/f4rdxrjvrenc6tg6
>> http://markmail.org/message/iysr4bsfckegq7vp
>>
>>
>>
>> How can this happen from subversion point of view?
>> The file has svn:eol-style=native.
>>
>>
>> AFAIK such files are stored with LF endings at the server. Is this
>> line endings convention enforced on the client only?
>>
>> I am on Windows. Both
>> svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183339
>> svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183340
>>
>> produce files with CRLF endings for me, so I do not see such huge
>> difference between the files.
>>
>> But diff command below does produces that huge diff, that is shown by
>> viewvc and was in the commit email:
>>
>> svn diff -c 1183340
>> http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java
>>
>>
>> Besides the diff I do not have other evidence that the file in
>> repository has CRs or is otherwise broken.
>>

2011/10/14 Daniel Shahaf <d....@daniel.shahaf.name>:
> The file has CR's in r1183340 but not in r1183339.  Yes, since it has
> svn:eol-style set, it is supposed to be stored in LF on the server.

Thank you for confirming. I also found a way to confirm:

http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?p=1183339
vs
http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?p=1183340

The second one has CRLFs, the first one LFs.

I reverted the file to LFs in r1183612
(by submitting a non-changed file with TortoiseSVN)

2011/10/15 Ryan Schmidt <su...@ryandesign.com>:
> It's not just the svn:eol-style conversion that git-svn doesn't do; I hear it
> doesn't do svn:keywords normalizations either.

I cannot confirm the issue with keywords.
Both files returned by http:// above have $Id$ in them.

I guess that Git did not expand $Id$, so it remained as it was in the
original file.


2011/10/14 Johan Corveleyn <jc...@gmail.com>:
> It does cause a lot of problems though. The files show up as Modified
> in "svn status",

Confirming.
It is displayed as modified, but to observe this you have to touch the file.
(Otherwise subversion does not compare the file with its svn-base)

Using 1.7.0, WindowsXP.

2011/10/14 Daniel Shahaf <d....@daniel.shahaf.name>:
> for f in $CHANGED_FILES; do
>  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
>    exit 1
>  fi
> done
>

Good, impressive.
As others noted, one has to check for specific values of svn:eol-style
(native, LF), so "svnlook propget" instead of proplist will be needed.
I am not sure about && [ "`...`" ]. Maybe && [ -n "`...`" ] to test
for non-empty strings.
A friendly error message wouldn't hurt either.

Best regards,
Konstantin Kolinko

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
The file has CR's in r1183340 but not in r1183339.  Yes, since it has
svn:eol-style set, it is supposed to be stored in LF on the server.

Konstantin Kolinko wrote on Fri, Oct 14, 2011 at 17:43:03 +0400:
> Hi!
> 
> One of committers in Apache Tomcat project is experimenting with Git
> <-> Subversion integration.
> 
> A result is that in the last few days several commits produced diffs
> for entire file. An example:
> 
> http://svn.apache.org/viewvc?view=revision&revision=1183340
> http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?r1=1183339&r2=1183340
> 
> Commit e-mail:
> http://markmail.org/message/f4rdxrjvrenc6tg6
> http://markmail.org/message/iysr4bsfckegq7vp
> 
> 
> 
> How can this happen from subversion point of view?
> The file has svn:eol-style=native.
> 
> 
> AFAIK such files are stored with LF endings at the server. Is this
> line endings convention enforced on the client only?
> 
> I am on Windows. Both
> svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183339
> svn cat http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java@1183340
> 
> produce files with CRLF endings for me, so I do not see such huge
> difference between the files.
> 
> But diff command below does produces that huge diff, that is shown by
> viewvc and was in the commit email:
> 
> svn diff -c 1183340
> http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java
> 
> 
> Besides the diff I do not have other evidence that the file in
> repository has CRs or is otherwise broken.
> 
> Best regards,
> Konstantin Kolinko

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ryan Schmidt wrote on Fri, Oct 14, 2011 at 19:37:44 -0500:
> On Oct 14, 2011, at 19:30, Daniel Shahaf wrote:
> > Ryan Schmidt wrote on Fri, Oct 14, 2011 at 19:18:44 -0500:
> >> When svn:eol-style is set to ANY value, the file is stored in the
> >> repository with LF line endings, and converted to the desired EOL
> >> style by the client on checkout or export (and converted by the client
> >> to LF line endings before being committed).
> > 
> > Yeah, I thought so too, but when I try now with 1.7.0 I notice that
> > setting svn:eol-style to CRLF causes the versioned filesystem (read:
> > repository) to contain a file with CRLF set...
> > 
> > And apparently it's the same with 1.6.  I don't have any older builds
> > handy to test.
> 
> Oh dear. I have apparently been repeating misinformation. The book says you're right:
> 
> http://svnbook.red-bean.com/en/1.7/svn.advanced.props.file-portability.html#svn.advanced.props.special.eol-style

Not sure where I got the idea that svn:eol-style files always use LF in
storage.

Probably some comment in the code referred to the repository-normal form
as "contracted keywords and LF linefeeds".  This very question even came
up in Berlin at one point, but I don't really remember what I said there
and whether people corrected me...

Re: Huge diff for one-line change in ASF repository

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Oct 14, 2011, at 19:30, Daniel Shahaf wrote:
> Ryan Schmidt wrote on Fri, Oct 14, 2011 at 19:18:44 -0500:
>> When svn:eol-style is set to ANY value, the file is stored in the
>> repository with LF line endings, and converted to the desired EOL
>> style by the client on checkout or export (and converted by the client
>> to LF line endings before being committed).
> 
> Yeah, I thought so too, but when I try now with 1.7.0 I notice that
> setting svn:eol-style to CRLF causes the versioned filesystem (read:
> repository) to contain a file with CRLF set...
> 
> And apparently it's the same with 1.6.  I don't have any older builds
> handy to test.

Oh dear. I have apparently been repeating misinformation. The book says you're right:

http://svnbook.red-bean.com/en/1.7/svn.advanced.props.file-portability.html#svn.advanced.props.special.eol-style




Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ryan Schmidt wrote on Fri, Oct 14, 2011 at 19:18:44 -0500:
> When svn:eol-style is set to ANY value, the file is stored in the
> repository with LF line endings, and converted to the desired EOL
> style by the client on checkout or export (and converted by the client
> to LF line endings before being committed).

Yeah, I thought so too, but when I try now with 1.7.0 I notice that
setting svn:eol-style to CRLF causes the versioned filesystem (read:
repository) to contain a file with CRLF set...

And apparently it's the same with 1.6.  I don't have any older builds
handy to test.

Re: Huge diff for one-line change in ASF repository

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sat, Oct 15, 2011 at 2:18 AM, Ryan Schmidt
<su...@ryandesign.com> wrote:
> On Oct 14, 2011, at 18:52, Johan Corveleyn wrote:
>> Ah, but git-svn doesn't seem to honor the invariant that files with
>> native eol-style *must* be converted to LF by the client. By not doing
>> this, they break things for every other svn client using the same
>> repository. So this is clearly a bug in git-svn. Of course, to avoid
>> flames, we should probably first escalate the problem to the git-svn
>> people, to get this bug fixed (or maybe it is already fixed?).
>
> It's not just the svn:eol-style conversion that git-svn doesn't do; I hear it doesn't do svn:keywords normalizations either. It's the responsibility of the client to ensure that, for example "$Id: foo bar baz $" gets normalized to "$Id$" before being sent to the repository to be committed. I was told that the git people don't believe in the kind of munging that Subversion does for eol-style and keywords and therefore they don't want to implement it. I say fix the server to issue an error when a client tries to commit data that is not normalized in the required fashion. Force them to comply, and stop distributing software that does not behave correctly.


Almost forgot about this, but now I've finally created an issue for
this in the issue tracker:

http://subversion.tigris.org/issues/show_bug.cgi?id=4065 (server
should enforce LF normalization for svn:eol-style=native files)

I only mentioned LF-normalization for svn:eol-style=native (actually I
forgot the suggestion that keyword-expansion is in the same boat
here). Maybe another issue should be opened for server-side
enforcement related to keyword expansion...

-- 
Johan

Re: Huge diff for one-line change in ASF repository

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Oct 14, 2011, at 18:52, Johan Corveleyn wrote:
> Ah, but git-svn doesn't seem to honor the invariant that files with
> native eol-style *must* be converted to LF by the client. By not doing
> this, they break things for every other svn client using the same
> repository. So this is clearly a bug in git-svn. Of course, to avoid
> flames, we should probably first escalate the problem to the git-svn
> people, to get this bug fixed (or maybe it is already fixed?).

It's not just the svn:eol-style conversion that git-svn doesn't do; I hear it doesn't do svn:keywords normalizations either. It's the responsibility of the client to ensure that, for example "$Id: foo bar baz $" gets normalized to "$Id$" before being sent to the repository to be committed. I was told that the git people don't believe in the kind of munging that Subversion does for eol-style and keywords and therefore they don't want to implement it. I say fix the server to issue an error when a client tries to commit data that is not normalized in the required fashion. Force them to comply, and stop distributing software that does not behave correctly. Until the server is fixed to do this, a hook like the below should be used.


>> for f in $CHANGED_FILES; do
>>  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
>>    exit 1
>>  fi
>> done
> 
> Thanks. Seems I can experiment a bit with that. Though I think the
> problem is specific to eol-style=native (not any eol-style). I thought
> that the "conversion-to-LF" only happens for native, and not for
> svn:eol-style=LF or svn:eol-style=CRLF (I thought in that case the LF,
> or CRLF was saved "as is" in the repository). But I haven't checked
> that (we only use native).

When svn:eol-style is set to ANY value, the file is stored in the repository with LF line endings, and converted to the desired EOL style by the client on checkout or export (and converted by the client to LF line endings before being committed).



Re: Huge diff for one-line change in ASF repository

Posted by Les Mikesell <le...@gmail.com>.
On Sat, Oct 15, 2011 at 5:36 PM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>
> The git model for EOL is safer, to my mind, because it is
> *predictable*.

Yes, but predictably not working isn't very useful.

-- 
   Les Mikesell
     lesmikesell@gmail.com

Re: Huge diff for one-line change in ASF repository

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sun, Oct 16, 2011 at 12:36 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
> On Sat, Oct 15, 2011 at 11:41 AM, Les Mikesell <le...@gmail.com> wrote:
>> On Sat, Oct 15, 2011 at 9:23 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>>>
>>> Perhaps a better approach, especially if you know people will be using
>>> git-svn, is to review the repository for any files that use svn:eol,
>>> and *turn that off*, with a check that the binary EOL is the format
>>> you want. Then put in a pre-commit hook to prevent anyone using it.
>>
>> But a binary EOL is almost never works for cross platfom text.  Which
>> is why systems designed for cross platform work do the conversions.
>> How do git users deal with it?
>
> [ Allow me to put on my lecture hat. ]
>
> You've made me think about this, thank you. I can't speak for all git
> users: I use what tools work, and that's sometimes Subversion and
> sometimes git (for reasons I won't go into here).
>
> This is one of the things I prefer about git. Cross-platform source
> control systems which try to do this EOL handling for you *break*,
> predictably. It used to drive me bugfutz with CVS. By git standards,
> the file being checked out on one machine should match the code being
> checked out on the other machine in every byte. If you need to
> transform it to use it locally, that's a build environment issue,
> *not* a source control one. Treating it as a source control issue
> breaks things and, in my opinion, shoudl be deprecated, the same way
> that the ftp "text" format is also deprecated.
>
> The git model for EOL is safer, to my mind, because it is
> *predictable*. You know what you're going to get when you do a
> checkout. If you need to, you can have a local transformation utility
> for yet another "clone"  that does not ever have to be transmitted
> back and corrupt the central repository. It can push to a staging git
> repository where these discrepancies can be resolved, and never
> corrupt the "mastter" repository.
>
> The Subversion model "you have to set svn:eol" is workable for a few
> isolated cases, but should be used only with caution. It should
> *never* be applied to a source tree: the risk of precisely this kind
> of mismatched working environment is too frequent.  I've loathed
> having to enforce it for default checkins, which I've enountered,
> because it contributes to just the kind of sourcecode madness we just
> saw.

I don't really understand (the need for this lecture). If you don't
want to use the svn:eol-style feature, just disallow it with a
pre-commit hook (and all files will be committed 'as is'). I happen to
use this feature, and I'm quite happy with it.

In my mind it's quite simple:
- There exists a feature called "svn:eol-style=native".

- It's useful for some people (other people may wish to not use it).

- This feature is quite well defined.

- It has limitations (f.i. it's not a good idea to share working
copies accross multiple platforms). These are known.

- The deal is: svn clients must follow the specification, which means
they must transform EOL's to LF when eol-style=native.

- I don't care how a client does this, or what philosophical /
religious arguments there are about this. If a client does not play by
the rules, it violates the spec. That should be blocked by the server
(or we should at least have the ability to block this faulty client
behavior).

-- 
Johan

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
So just forbid svn:eol-style in your repositories and you'll have the
model you want.

Re: Huge diff for one-line change in ASF repository

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Sat, Oct 15, 2011 at 11:41 AM, Les Mikesell <le...@gmail.com> wrote:
> On Sat, Oct 15, 2011 at 9:23 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>>
>> Perhaps a better approach, especially if you know people will be using
>> git-svn, is to review the repository for any files that use svn:eol,
>> and *turn that off*, with a check that the binary EOL is the format
>> you want. Then put in a pre-commit hook to prevent anyone using it.
>
> But a binary EOL is almost never works for cross platfom text.  Which
> is why systems designed for cross platform work do the conversions.
> How do git users deal with it?

[ Allow me to put on my lecture hat. ]

You've made me think about this, thank you. I can't speak for all git
users: I use what tools work, and that's sometimes Subversion and
sometimes git (for reasons I won't go into here).

This is one of the things I prefer about git. Cross-platform source
control systems which try to do this EOL handling for you *break*,
predictably. It used to drive me bugfutz with CVS. By git standards,
the file being checked out on one machine should match the code being
checked out on the other machine in every byte. If you need to
transform it to use it locally, that's a build environment issue,
*not* a source control one. Treating it as a source control issue
breaks things and, in my opinion, shoudl be deprecated, the same way
that the ftp "text" format is also deprecated.

The git model for EOL is safer, to my mind, because it is
*predictable*. You know what you're going to get when you do a
checkout. If you need to, you can have a local transformation utility
for yet another "clone"  that does not ever have to be transmitted
back and corrupt the central repository. It can push to a staging git
repository where these discrepancies can be resolved, and never
corrupt the "mastter" repository.

The Subversion model "you have to set svn:eol" is workable for a few
isolated cases, but should be used only with caution. It should
*never* be applied to a source tree: the risk of precisely this kind
of mismatched working environment is too frequent.  I've loathed
having to enforce it for default checkins, which I've enountered,
because it contributes to just the kind of sourcecode madness we just
saw.

Re: Huge diff for one-line change in ASF repository

Posted by Andreas Krey <a....@gmx.de>.
On Sat, 15 Oct 2011 10:41:57 +0000, Les Mikesell wrote:
...
> But a binary EOL is almost never works for cross platfom text.  Which
> is why systems designed for cross platform work do the conversions.
> How do git users deal with it?

Only relatively lately, by some local setting that actually does the
CR/LF conversion (iirc core.autocrlf). Details are somewhat intricate.
There may also be hooks that would allow expansion/shortening of
stuff at least similar to $Id$.

As for $Id$, the git way is to compile the output of 'git describe
--dirty' or similar into the binaries, which will show the last tag
(something I don't even know how to find out in svn) reachable from
the current commit/revision, the number of commits since, (part of)
the current commit id and whether there is a local modification in
the sandbox.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

Re: Huge diff for one-line change in ASF repository

Posted by Les Mikesell <le...@gmail.com>.
On Sat, Oct 15, 2011 at 9:23 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>
> Perhaps a better approach, especially if you know people will be using
> git-svn, is to review the repository for any files that use svn:eol,
> and *turn that off*, with a check that the binary EOL is the format
> you want. Then put in a pre-commit hook to prevent anyone using it.

But a binary EOL is almost never works for cross platfom text.  Which
is why systems designed for cross platform work do the conversions.
How do git users deal with it?

-- 
   Les Mikesell
      lesmikesell@gmail.com

Re: Huge diff for one-line change in ASF repository

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Sat, Oct 15, 2011 at 8:25 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Sat, Oct 15, 2011 at 01:52:32 +0200:

>> Thanks. Seems I can experiment a bit with that. Though I think the
>> problem is specific to eol-style=native (not any eol-style). I thought
>> that the "conversion-to-LF" only happens for native, and not for
>> svn:eol-style=LF or svn:eol-style=CRLF (I thought in that case the LF,
>> or CRLF was saved "as is" in the repository). But I haven't checked
>> that (we only use native).
>>
>
> You were right; see elsethread.

Perhaps a better approach, especially if you know people will be using
git-svn, is to review the repository for any files that use svn:eol,
and *turn that off*, with a check that the binary EOL is the format
you want. Then put in a pre-commit hook to prevent anyone using it.

I've had to do this to block svn:keywords for environments where the
git-svn clients inability to update $Id$ and $HeadURL$ caused
heartache and conflict for other developers who expected to be able to
run checksums or to establish the provenance of the deployed file,
based on that data. Frankly, I agree with git's approach on this. If
it's causing fragility or confusion, and the EOL processing and
keyword processing make this inevitable in many environments, *turn it
off*.

This is especially true with multi-platform access to the same working
copy, such as CygWin and Windows, or CIFS shared working copies. And
when I seen a $Id$ or "HeadURL$" tag that is inside a local
configuration modified configuration file being pushed elsewhere, the
keyword is a *lie*. It's why I'm very careful to have the local file,
such as "httpd.conf", stored in Subversion as "httpd.conf.in" and
"svn:ignore" set to never commit those locally updated files.

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Sat, Oct 15, 2011 at 01:52:32 +0200:
> On Fri, Oct 14, 2011 at 4:56 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Johan Corveleyn wrote on Fri, Oct 14, 2011 at 16:25:46 +0200:
> >> You can identify the corrupt files easily by taking a look at the
> >> corresponding pristine files (or .svn-base files in .svn/text-base).
> >> They will have CRLF, while "correct" files with eol-style native
> >> should only have LF in the pristine file.
> >>
> >
> > Nice.  I did wonder about that, since 'svn cat' silently transform the EOLs.
> >
> >> This makes me wonder:
> >> - Why can't the server verify that files with native eol-style are
> >> committed to the repository with LF line-termination? That would seem
> >> like a very sane validation.
> >
> > It's possible.  If we do this we'll (a) cause files in existing
> > repositories to be uncommittable by tools that don't convert to LF
> > before committing (this does _not_ include the 'svn' client, and
> > probably doesn't include anything that goes through libsvn_client),
...
> I think it would be a good thing if the server protected against bugs
> in clients by checking this invariant (in the spirit of "don't trust
> anything that comes from the client").
> 
> >> - Currently the client is the sole responsible to do this, but a
> >> broken client can break a lot of things for all other users.
> >
> > Yep.  And along the same lines, the client may want to force EOL
> > canonicalization in code paths that currently just assume that the
> > server will provide files with the correct EOL styles.
> 
> Maybe that's also a possible solution. Though I'm wondering if this

It's a solution to a different problem.  One thing is "clients
protecting themselves from rogue/buggy servers", and another is
"servers protecting themselves from rogue/buggy clients".

> would be sufficient: you can also see the problem just by diffing
> between repository URL's / revisions (if you "svnlook diff -c REV" one
> of those commits, you'll get a full-file diff, although the eol-style
> property wasn't changed; that's not nice). Of course, one could make
> "svnlook diff" etc. compensate for this too, but that really feels as
> some sort of repository corruption which we're papering over.
> 
> >> - Maybe as a workaround: can someone come up with a hook script for
> >> checking this situation in a pre-commit hook?
> >>
> >
> > for f in $CHANGED_FILES; do
> >  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
> >    exit 1
> >  fi
> > done
> 
> Thanks. Seems I can experiment a bit with that. Though I think the
> problem is specific to eol-style=native (not any eol-style). I thought
> that the "conversion-to-LF" only happens for native, and not for
> svn:eol-style=LF or svn:eol-style=CRLF (I thought in that case the LF,
> or CRLF was saved "as is" in the repository). But I haven't checked
> that (we only use native).
> 

You were right; see elsethread.

> -- 
> Johan

Re: Huge diff for one-line change in ASF repository

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Oct 14, 2011 at 4:56 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Fri, Oct 14, 2011 at 16:25:46 +0200:
>> These "modified" files can also cause tree conflicts (which can
>> generate quite some WTF's).
>>
>
> Tree conflicts?  How?  I'd have expected (full-file) text conflicts only.

Ok, maybe tree conflicts are more rare, but they can happen, of the
kind "locally modified, incoming delete". We experienced one of those,
because one of the corrupt ("modified") files was inside a directory
that was deleted upon update (or merge, I don't remember). I can
assure you that this raises quite some eyebrows :-) (what local
modification? I haven't touched that file!)

>> You can identify the corrupt files easily by taking a look at the
>> corresponding pristine files (or .svn-base files in .svn/text-base).
>> They will have CRLF, while "correct" files with eol-style native
>> should only have LF in the pristine file.
>>
>
> Nice.  I did wonder about that, since 'svn cat' silently transform the EOLs.
>
>> This makes me wonder:
>> - Why can't the server verify that files with native eol-style are
>> committed to the repository with LF line-termination? That would seem
>> like a very sane validation.
>
> It's possible.  If we do this we'll (a) cause files in existing
> repositories to be uncommittable by tools that don't convert to LF
> before committing (this does _not_ include the 'svn' client, and
> probably doesn't include anything that goes through libsvn_client),
> (b) probably get some flames along the lines of "The svn devs broke
> git-svn on purpose". :P

Ah, but git-svn doesn't seem to honor the invariant that files with
native eol-style *must* be converted to LF by the client. By not doing
this, they break things for every other svn client using the same
repository. So this is clearly a bug in git-svn. Of course, to avoid
flames, we should probably first escalate the problem to the git-svn
people, to get this bug fixed (or maybe it is already fixed?).

I think it would be a good thing if the server protected against bugs
in clients by checking this invariant (in the spirit of "don't trust
anything that comes from the client").

>> - Currently the client is the sole responsible to do this, but a
>> broken client can break a lot of things for all other users.
>
> Yep.  And along the same lines, the client may want to force EOL
> canonicalization in code paths that currently just assume that the
> server will provide files with the correct EOL styles.

Maybe that's also a possible solution. Though I'm wondering if this
would be sufficient: you can also see the problem just by diffing
between repository URL's / revisions (if you "svnlook diff -c REV" one
of those commits, you'll get a full-file diff, although the eol-style
property wasn't changed; that's not nice). Of course, one could make
"svnlook diff" etc. compensate for this too, but that really feels as
some sort of repository corruption which we're papering over.

>> - Maybe as a workaround: can someone come up with a hook script for
>> checking this situation in a pre-commit hook?
>>
>
> for f in $CHANGED_FILES; do
>  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
>    exit 1
>  fi
> done

Thanks. Seems I can experiment a bit with that. Though I think the
problem is specific to eol-style=native (not any eol-style). I thought
that the "conversion-to-LF" only happens for native, and not for
svn:eol-style=LF or svn:eol-style=CRLF (I thought in that case the LF,
or CRLF was saved "as is" in the repository). But I haven't checked
that (we only use native).

-- 
Johan

Re: Huge diff for one-line change in ASF repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Fri, Oct 14, 2011 at 16:25:46 +0200:
> These "modified" files can also cause tree conflicts (which can
> generate quite some WTF's).
> 

Tree conflicts?  How?  I'd have expected (full-file) text conflicts only.

> You can identify the corrupt files easily by taking a look at the
> corresponding pristine files (or .svn-base files in .svn/text-base).
> They will have CRLF, while "correct" files with eol-style native
> should only have LF in the pristine file.
> 

Nice.  I did wonder about that, since 'svn cat' silently transform the EOLs.

> This makes me wonder:
> - Why can't the server verify that files with native eol-style are
> committed to the repository with LF line-termination? That would seem
> like a very sane validation.

It's possible.  If we do this we'll (a) cause files in existing
repositories to be uncommittable by tools that don't convert to LF
before committing (this does _not_ include the 'svn' client, and
probably doesn't include anything that goes through libsvn_client),
(b) probably get some flames along the lines of "The svn devs broke
git-svn on purpose". :P

> - Currently the client is the sole responsible to do this, but a
> broken client can break a lot of things for all other users.

Yep.  And along the same lines, the client may want to force EOL
canonicalization in code paths that currently just assume that the
server will provide files with the correct EOL styles.

> - Maybe as a workaround: can someone come up with a hook script for
> checking this situation in a pre-commit hook?
> 

for f in $CHANGED_FILES; do
  if svnlook pl -t $txn $repos $f | grep -w svn:eol-style >/dev/null && [ "`svnlook cat -t $txn $repos $f | xxd -ps -c1 | fgrep 0d | head -n1`" ]; then
    exit 1
  fi
done

> -- 
> Johan

Re: Huge diff for one-line change in ASF repository

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Oct 14, 2011 at 3:54 PM, Stefan Sperling <st...@elego.de> wrote:
> On Fri, Oct 14, 2011 at 05:43:03PM +0400, Konstantin Kolinko wrote:
>> Hi!
>>
>> One of committers in Apache Tomcat project is experimenting with Git
>> <-> Subversion integration.
>>
>> A result is that in the last few days several commits produced diffs
>> for entire file. An example:
>>
>> http://svn.apache.org/viewvc?view=revision&revision=1183340
>> http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?r1=1183339&r2=1183340
>>
>> Commit e-mail:
>> http://markmail.org/message/f4rdxrjvrenc6tg6
>> http://markmail.org/message/iysr4bsfckegq7vp
>>
>> How can this happen from subversion point of view?
>> The file has svn:eol-style=native.
>
> Because whichever git integratio is used ignores this setting and
> commits all with lines CRLF anyway?
>
>> AFAIK such files are stored with LF endings at the server. Is this
>> line endings convention enforced on the client only?
>
> Yes. The client changes the file EOLs whenever the local value of the
> svn:eol-style property changes.
>
> Recall that the server uses binary diffs.
> It doesn't know or care what a newline is.

I've seen this problem too at my company (also probably related to the
use of git-svn (or perhaps also by using the svn package of cygwin,
but I'm not sure about that). I've since repaired the files with
"corrupt eols", and didn't have time to investigate further.

It does cause a lot of problems though. The files show up as Modified
in "svn status", and "svn diff" shows all lines as modified (but "svn
diff -x--ignore-eol-style" doesn't of course). If users commit the
"modification", the eol-corruption is repaired. These "modified" files
can also cause tree conflicts (which can generate quite some WTF's).

You can identify the corrupt files easily by taking a look at the
corresponding pristine files (or .svn-base files in .svn/text-base).
They will have CRLF, while "correct" files with eol-style native
should only have LF in the pristine file.

This makes me wonder:
- Why can't the server verify that files with native eol-style are
committed to the repository with LF line-termination? That would seem
like a very sane validation.
- Currently the client is the sole responsible to do this, but a
broken client can break a lot of things for all other users.
- Maybe as a workaround: can someone come up with a hook script for
checking this situation in a pre-commit hook?

-- 
Johan

Re: Huge diff for one-line change in ASF repository

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Oct 14, 2011 at 05:43:03PM +0400, Konstantin Kolinko wrote:
> Hi!
> 
> One of committers in Apache Tomcat project is experimenting with Git
> <-> Subversion integration.
> 
> A result is that in the last few days several commits produced diffs
> for entire file. An example:
> 
> http://svn.apache.org/viewvc?view=revision&revision=1183340
> http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/catalina/startup/ContextConfig.java?r1=1183339&r2=1183340
> 
> Commit e-mail:
> http://markmail.org/message/f4rdxrjvrenc6tg6
> http://markmail.org/message/iysr4bsfckegq7vp
> 
> How can this happen from subversion point of view?
> The file has svn:eol-style=native.

Because whichever git integratio is used ignores this setting and
commits all with lines CRLF anyway?

> AFAIK such files are stored with LF endings at the server. Is this
> line endings convention enforced on the client only?

Yes. The client changes the file EOLs whenever the local value of the
svn:eol-style property changes.

Recall that the server uses binary diffs.
It doesn't know or care what a newline is.