You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2012/12/09 01:02:55 UTC

enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Last week a colleague managed to commit a non-LF-normalized
svn:eol-style=native file in our repository again. As explained in
issue #4065 [1], this causes all kinds of problems.

I suspect there might be a bug in SVNKit, some edge-case where it
doesn't correctly translate the to-be-committed files to
LF-termination. But regardless, I'd like to protect my repository. I
don't fully control the clients that people use, and those clients can
always have bugs, ...

So ... how hard would it be to fix issue #4065, making the server
enforce the right eol-ness for correct operation? It's a genuine
question, I have no idea :-).

[1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server
should enforce LF normalization for svn:eol-style=native files

-- 
Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 10.12.2012 01:24, Stefan Fuhrmann wrote:
> Well, if we really want to implement such a feature,
> we may as well be a bit more clever about it: While
> writing incoming file contents to the protrev file, we
> may determine its NL style as we go and store the
> result. The latter could then be compared to the props
> setting during commit time.

Premature optimization. We haven't even decided we want this in
libsvn_repos, let alone libsvn_fs_fs, which is the only place where
"protorev file" is even a meaningful term.

-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Dec 10, 2012 at 12:26 AM, Branko Čibej <br...@wandisco.com> wrote:

> On 10.12.2012 00:08, Johan Corveleyn wrote:
> > On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> >> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
> >>> 2) Am I the only one who wants to protect his repository against this
> >>> corruption? Judging from [1], I don't think so. It doesn't make sense
> >>> that everyone starts writing this pre-commit hook, for something that
> >>> IMHO is an obvious anti-corruption protection. I think every
> >>> repository out there deserves to be protected against this.
> >>>
> >> FWIW, I suggested a hook because you could implement that in about
> >> 5 minutes, whereas doing a C code change would take at least 10 times
> >> that (and several weeks if you have to wait for it to appear in a 1.7.x
> >> release that you can install at $WORK).  I won't object to C code
> >> verifying the svn:eol-style invariant.
> > Thanks. And your pre-commit hook example is much appreciated.
> >
> > For the moment I get the impression that it's not really doable /
> > desirable to implement this in the repository. At least until now
> > no-one has suggested how it could be done, and I don't know enough
> > myself about the server-side / back-end to figure it out :-).
>
> The first obstacle is that the server does not interpret properties.[1]
> Therefore, you'd have to implement this check at transaction-commit
> time, because there's no earlier point where you're guaranteed to have
> all node properties at their final values. This implies that the time to
> reject a commit would be proportional to the size of the commit (which
> isn't the case when it comes to conflict detection).
>

Well, if we really want to implement such a feature,
we may as well be a bit more clever about it: While
writing incoming file contents to the protrev file, we
may determine its NL style as we go and store the
result. The latter could then be compared to the props
setting during commit time.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 10.12.2012 01:25, Johan Corveleyn wrote:
> Another known offender is git-svn, because they don't care about
> translating line termination. So if you use git-svn on Windows, you
> might end up committing an eol-style=native file with CRLF's in to the
> repository.

Yuck. We should shout *really* loudly out to the world about broken
git-svn then, if it doesn't honour our wire protocol. This might even
get me to lean towards rejecting these kinds of commits by default.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Dec 10, 2012 at 12:26 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 10.12.2012 00:08, Johan Corveleyn wrote:
>> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>>>> 2) Am I the only one who wants to protect his repository against this
>>>> corruption? Judging from [1], I don't think so. It doesn't make sense
>>>> that everyone starts writing this pre-commit hook, for something that
>>>> IMHO is an obvious anti-corruption protection. I think every
>>>> repository out there deserves to be protected against this.
>>>>
>>> FWIW, I suggested a hook because you could implement that in about
>>> 5 minutes, whereas doing a C code change would take at least 10 times
>>> that (and several weeks if you have to wait for it to appear in a 1.7.x
>>> release that you can install at $WORK).  I won't object to C code
>>> verifying the svn:eol-style invariant.
>> Thanks. And your pre-commit hook example is much appreciated.
>>
>> For the moment I get the impression that it's not really doable /
>> desirable to implement this in the repository. At least until now
>> no-one has suggested how it could be done, and I don't know enough
>> myself about the server-side / back-end to figure it out :-).
>
> The first obstacle is that the server does not interpret properties.[1]
> Therefore, you'd have to implement this check at transaction-commit
> time, because there's no earlier point where you're guaranteed to have
> all node properties at their final values. This implies that the time to
> reject a commit would be proportional to the size of the commit (which
> isn't the case when it comes to conflict detection).
>
> All properties are interpreted by clients. A buggy client will cause
> cause problems for other users, so the best course of action is to
> report the bug (to SvnKit in this case?).

That's actually pretty hard, because I can't reproduce the issue. I
only suspect svnkit because I've only seen it (at our company) with
commits coming from IntelliJ IDEA, which uses svnkit under the hood.

Another known offender is git-svn, because they don't care about
translating line termination. So if you use git-svn on Windows, you
might end up committing an eol-style=native file with CRLF's in to the
repository.

> Also note that you're using a much too strong term when you talk about
> "corrupted files" in this case. There's nothing corrupted about them.

Well, it does mess up all diffs (client-side, server side for
post-commit mails, ...) made against that revision.

Plus it causes these files to be marked as modified in working copies
of other users ... sometimes (when their timestamps mismatch with
their timestamp-in-wc.db, so the content will be checksummed -- if the
timestamps are still ok, it will not show up as modified). This in
turn can cause totally confusing tree conflicts when a parent dir is
moved (confusing because there's a "local edit" that the user knows
nothing about).

But okay, "corrupt" may be exaggerating a bit :-) ...

> -- Brane
>
> [1] Not strictly true since mod_dav_svn will look at svn:mime-type when
> serving the content at its default, unversioned URL; but it doesn't
> actually interpret the value.
>
> --
> Branko Čibej
> Director of Subversion | WANdisco | www.wandisco.com
>



-- 
Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Dec 10, 2012 at 12:47 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Branko Čibej wrote on Mon, Dec 10, 2012 at 11:51:08 +0100:
>> On 10.12.2012 11:31, Daniel Shahaf wrote:
>> > Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100:
>> >> On 10.12.2012 00:08, Johan Corveleyn wrote:
>> >>> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> >>>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>> >>>>> 2) Am I the only one who wants to protect his repository against this
>> >>>>> corruption? Judging from [1], I don't think so. It doesn't make sense
>> >>>>> that everyone starts writing this pre-commit hook, for something that
>> >>>>> IMHO is an obvious anti-corruption protection. I think every
>> >>>>> repository out there deserves to be protected against this.
>> >>>>>
>> >>>> FWIW, I suggested a hook because you could implement that in about
>> >>>> 5 minutes, whereas doing a C code change would take at least 10 times
>> >>>> that (and several weeks if you have to wait for it to appear in a 1.7.x
>> >>>> release that you can install at $WORK).  I won't object to C code
>> >>>> verifying the svn:eol-style invariant.
>> >>> Thanks. And your pre-commit hook example is much appreciated.
>> >>>
>> >>> For the moment I get the impression that it's not really doable /
>> >>> desirable to implement this in the repository. At least until now
>> >>> no-one has suggested how it could be done, and I don't know enough
>> >>> myself about the server-side / back-end to figure it out :-).
>> >> The first obstacle is that the server does not interpret properties.[1]
>> >> Therefore, you'd have to implement this check at transaction-commit
>> >> time, because there's no earlier point where you're guaranteed to have
>> >> all node properties at their final values. This implies that the time to
>> >> reject a commit would be proportional to the size of the commit (which
>> >> isn't the case when it comes to conflict detection).
>> > Can't you do what stefan2 suggested, but in libsvn_repos?  The repos
>> > layer commit editor would remember for each file whether its textdelta
>> > has added a new 0x0D byte, then at close_edit() it would iterate all
>> > files in the commit --- and for each file only compare its svn:eol-style
>> > property to the by-now-precomputed "did it it contain a 0x0D" bit.
>> >
>> > I'm not sure how efficient it would be to parse for 0x0D's in
>> > libsvn_repos, though.  Maybe we should make this optional.
>>
>>
>> It's not enough to just look for \r in the delta stream. Certainly
>> wouldn't help with historically broken files.
>
> I wasn't trying to fix historically broken files.  (That would require
> a fulltext scan --- that's not a cheap computation.)  Do you see another
> problem?

Indeed, historically broken files are not the focus here. They're
broken in the history of the repository anyway, so that's not really
fixable anymore. But if we can prevent new broken files from entering
the repository, that would be great.

>> Maybe that's a good idea, but /why/ put it in libsvn_repos when it's
>> much easier and cleaner to just provide some standard hooks, written in
>> C and distributed with releases, that the admin plug in if she feels
>> like it? Surely that's what hooks are for.
>
> The argument is that a Subversion server should be enforcing
> Subversion's invariants.
>
> That said, I'm not opposed to doing it via standard hooks.  It's a good
> way to introduce the feature in a way that allows more easily changing
> it before it hits the APIs and their strict compatibility rules.

Yes, that would be fine I think. I like Ivan's suggestion of creating
a new standard 'svnhooks' program or something like that, which can
then be part of standard distributions.

-- 
Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Mon, Dec 10, 2012 at 11:51:08 +0100:
> On 10.12.2012 11:31, Daniel Shahaf wrote:
> > Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100:
> >> On 10.12.2012 00:08, Johan Corveleyn wrote:
> >>> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >>>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
> >>>>> 2) Am I the only one who wants to protect his repository against this
> >>>>> corruption? Judging from [1], I don't think so. It doesn't make sense
> >>>>> that everyone starts writing this pre-commit hook, for something that
> >>>>> IMHO is an obvious anti-corruption protection. I think every
> >>>>> repository out there deserves to be protected against this.
> >>>>>
> >>>> FWIW, I suggested a hook because you could implement that in about
> >>>> 5 minutes, whereas doing a C code change would take at least 10 times
> >>>> that (and several weeks if you have to wait for it to appear in a 1.7.x
> >>>> release that you can install at $WORK).  I won't object to C code
> >>>> verifying the svn:eol-style invariant.
> >>> Thanks. And your pre-commit hook example is much appreciated.
> >>>
> >>> For the moment I get the impression that it's not really doable /
> >>> desirable to implement this in the repository. At least until now
> >>> no-one has suggested how it could be done, and I don't know enough
> >>> myself about the server-side / back-end to figure it out :-).
> >> The first obstacle is that the server does not interpret properties.[1]
> >> Therefore, you'd have to implement this check at transaction-commit
> >> time, because there's no earlier point where you're guaranteed to have
> >> all node properties at their final values. This implies that the time to
> >> reject a commit would be proportional to the size of the commit (which
> >> isn't the case when it comes to conflict detection).
> > Can't you do what stefan2 suggested, but in libsvn_repos?  The repos
> > layer commit editor would remember for each file whether its textdelta
> > has added a new 0x0D byte, then at close_edit() it would iterate all
> > files in the commit --- and for each file only compare its svn:eol-style
> > property to the by-now-precomputed "did it it contain a 0x0D" bit.
> >
> > I'm not sure how efficient it would be to parse for 0x0D's in
> > libsvn_repos, though.  Maybe we should make this optional.
> 
> 
> It's not enough to just look for \r in the delta stream. Certainly
> wouldn't help with historically broken files.

I wasn't trying to fix historically broken files.  (That would require
a fulltext scan --- that's not a cheap computation.)  Do you see another
problem?

> Maybe that's a good idea, but /why/ put it in libsvn_repos when it's
> much easier and cleaner to just provide some standard hooks, written in
> C and distributed with releases, that the admin plug in if she feels
> like it? Surely that's what hooks are for.

The argument is that a Subversion server should be enforcing
Subversion's invariants.

That said, I'm not opposed to doing it via standard hooks.  It's a good
way to introduce the feature in a way that allows more easily changing
it before it hits the APIs and their strict compatibility rules.

> 
> 
> -- Brane
> 
> 
> -- 
> Branko Čibej
> Director of Subversion | WANdisco | www.wandisco.com
> 

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Mon, Dec 10, 2012 at 2:51 PM, Branko Čibej <br...@wandisco.com> wrote:
> On 10.12.2012 11:31, Daniel Shahaf wrote:
>> Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100:
>>> On 10.12.2012 00:08, Johan Corveleyn wrote:
>>>> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>>>>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>>>>>> 2) Am I the only one who wants to protect his repository against this
>>>>>> corruption? Judging from [1], I don't think so. It doesn't make sense
>>>>>> that everyone starts writing this pre-commit hook, for something that
>>>>>> IMHO is an obvious anti-corruption protection. I think every
>>>>>> repository out there deserves to be protected against this.
>>>>>>
>>>>> FWIW, I suggested a hook because you could implement that in about
>>>>> 5 minutes, whereas doing a C code change would take at least 10 times
>>>>> that (and several weeks if you have to wait for it to appear in a 1.7.x
>>>>> release that you can install at $WORK).  I won't object to C code
>>>>> verifying the svn:eol-style invariant.
>>>> Thanks. And your pre-commit hook example is much appreciated.
>>>>
>>>> For the moment I get the impression that it's not really doable /
>>>> desirable to implement this in the repository. At least until now
>>>> no-one has suggested how it could be done, and I don't know enough
>>>> myself about the server-side / back-end to figure it out :-).
>>> The first obstacle is that the server does not interpret properties.[1]
>>> Therefore, you'd have to implement this check at transaction-commit
>>> time, because there's no earlier point where you're guaranteed to have
>>> all node properties at their final values. This implies that the time to
>>> reject a commit would be proportional to the size of the commit (which
>>> isn't the case when it comes to conflict detection).
>> Can't you do what stefan2 suggested, but in libsvn_repos?  The repos
>> layer commit editor would remember for each file whether its textdelta
>> has added a new 0x0D byte, then at close_edit() it would iterate all
>> files in the commit --- and for each file only compare its svn:eol-style
>> property to the by-now-precomputed "did it it contain a 0x0D" bit.
>>
>> I'm not sure how efficient it would be to parse for 0x0D's in
>> libsvn_repos, though.  Maybe we should make this optional.
>
>
> It's not enough to just look for \r in the delta stream. Certainly
> wouldn't help with historically broken files.
>
> Moreover, if libsvn_repos started looking at svn:eol-style, that'd only
> make sense if you verified all normalizations, not just "native". And
> why should it just be svn:eol-style, when svn:keywords has potentially
> the same problem? Eventually you start verifying everything that affects
> file contents.
>
> Maybe that's a good idea, but /why/ put it in libsvn_repos when it's
> much easier and cleaner to just provide some standard hooks, written in
> C and distributed with releases, that the admin plug in if she feels
> like it? Surely that's what hooks are for.
>
>
New program 'svnhooks' with set of hooks for standard recommended
polices would be great. svn:eol-style check is good start.


-- 
Ivan Zhakov

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 10.12.2012 11:31, Daniel Shahaf wrote:
> Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100:
>> On 10.12.2012 00:08, Johan Corveleyn wrote:
>>> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>>>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>>>>> 2) Am I the only one who wants to protect his repository against this
>>>>> corruption? Judging from [1], I don't think so. It doesn't make sense
>>>>> that everyone starts writing this pre-commit hook, for something that
>>>>> IMHO is an obvious anti-corruption protection. I think every
>>>>> repository out there deserves to be protected against this.
>>>>>
>>>> FWIW, I suggested a hook because you could implement that in about
>>>> 5 minutes, whereas doing a C code change would take at least 10 times
>>>> that (and several weeks if you have to wait for it to appear in a 1.7.x
>>>> release that you can install at $WORK).  I won't object to C code
>>>> verifying the svn:eol-style invariant.
>>> Thanks. And your pre-commit hook example is much appreciated.
>>>
>>> For the moment I get the impression that it's not really doable /
>>> desirable to implement this in the repository. At least until now
>>> no-one has suggested how it could be done, and I don't know enough
>>> myself about the server-side / back-end to figure it out :-).
>> The first obstacle is that the server does not interpret properties.[1]
>> Therefore, you'd have to implement this check at transaction-commit
>> time, because there's no earlier point where you're guaranteed to have
>> all node properties at their final values. This implies that the time to
>> reject a commit would be proportional to the size of the commit (which
>> isn't the case when it comes to conflict detection).
> Can't you do what stefan2 suggested, but in libsvn_repos?  The repos
> layer commit editor would remember for each file whether its textdelta
> has added a new 0x0D byte, then at close_edit() it would iterate all
> files in the commit --- and for each file only compare its svn:eol-style
> property to the by-now-precomputed "did it it contain a 0x0D" bit.
>
> I'm not sure how efficient it would be to parse for 0x0D's in
> libsvn_repos, though.  Maybe we should make this optional.


It's not enough to just look for \r in the delta stream. Certainly
wouldn't help with historically broken files.

Moreover, if libsvn_repos started looking at svn:eol-style, that'd only
make sense if you verified all normalizations, not just "native". And
why should it just be svn:eol-style, when svn:keywords has potentially
the same problem? Eventually you start verifying everything that affects
file contents.

Maybe that's a good idea, but /why/ put it in libsvn_repos when it's
much easier and cleaner to just provide some standard hooks, written in
C and distributed with releases, that the admin plug in if she feels
like it? Surely that's what hooks are for.


-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100:
> On 10.12.2012 00:08, Johan Corveleyn wrote:
> > On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
> >>> 2) Am I the only one who wants to protect his repository against this
> >>> corruption? Judging from [1], I don't think so. It doesn't make sense
> >>> that everyone starts writing this pre-commit hook, for something that
> >>> IMHO is an obvious anti-corruption protection. I think every
> >>> repository out there deserves to be protected against this.
> >>>
> >> FWIW, I suggested a hook because you could implement that in about
> >> 5 minutes, whereas doing a C code change would take at least 10 times
> >> that (and several weeks if you have to wait for it to appear in a 1.7.x
> >> release that you can install at $WORK).  I won't object to C code
> >> verifying the svn:eol-style invariant.
> > Thanks. And your pre-commit hook example is much appreciated.
> >
> > For the moment I get the impression that it's not really doable /
> > desirable to implement this in the repository. At least until now
> > no-one has suggested how it could be done, and I don't know enough
> > myself about the server-side / back-end to figure it out :-).
> 
> The first obstacle is that the server does not interpret properties.[1]
> Therefore, you'd have to implement this check at transaction-commit
> time, because there's no earlier point where you're guaranteed to have
> all node properties at their final values. This implies that the time to
> reject a commit would be proportional to the size of the commit (which
> isn't the case when it comes to conflict detection).

Can't you do what stefan2 suggested, but in libsvn_repos?  The repos
layer commit editor would remember for each file whether its textdelta
has added a new 0x0D byte, then at close_edit() it would iterate all
files in the commit --- and for each file only compare its svn:eol-style
property to the by-now-precomputed "did it it contain a 0x0D" bit.

I'm not sure how efficient it would be to parse for 0x0D's in
libsvn_repos, though.  Maybe we should make this optional.

> 
> All properties are interpreted by clients. A buggy client will cause
> cause problems for other users, so the best course of action is to
> report the bug (to SvnKit in this case?).
> 
> Also note that you're using a much too strong term when you talk about
> "corrupted files" in this case. There's nothing corrupted about them.
> 

An invariant failed to maintain.  Granted it's not an FS_level
invariant, though.


> 
> -- Brane
> 
> [1] Not strictly true since mod_dav_svn will look at svn:mime-type when
> serving the content at its default, unversioned URL; but it doesn't
> actually interpret the value.
> 
> -- 
> Branko Čibej
> Director of Subversion | WANdisco | www.wandisco.com
> 

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 10.12.2012 00:08, Johan Corveleyn wrote:
> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>>> 2) Am I the only one who wants to protect his repository against this
>>> corruption? Judging from [1], I don't think so. It doesn't make sense
>>> that everyone starts writing this pre-commit hook, for something that
>>> IMHO is an obvious anti-corruption protection. I think every
>>> repository out there deserves to be protected against this.
>>>
>> FWIW, I suggested a hook because you could implement that in about
>> 5 minutes, whereas doing a C code change would take at least 10 times
>> that (and several weeks if you have to wait for it to appear in a 1.7.x
>> release that you can install at $WORK).  I won't object to C code
>> verifying the svn:eol-style invariant.
> Thanks. And your pre-commit hook example is much appreciated.
>
> For the moment I get the impression that it's not really doable /
> desirable to implement this in the repository. At least until now
> no-one has suggested how it could be done, and I don't know enough
> myself about the server-side / back-end to figure it out :-).

The first obstacle is that the server does not interpret properties.[1]
Therefore, you'd have to implement this check at transaction-commit
time, because there's no earlier point where you're guaranteed to have
all node properties at their final values. This implies that the time to
reject a commit would be proportional to the size of the commit (which
isn't the case when it comes to conflict detection).

All properties are interpreted by clients. A buggy client will cause
cause problems for other users, so the best course of action is to
report the bug (to SvnKit in this case?).

Also note that you're using a much too strong term when you talk about
"corrupted files" in this case. There's nothing corrupted about them.


-- Brane

[1] Not strictly true since mod_dav_svn will look at svn:mime-type when
serving the content at its default, unversioned URL; but it doesn't
actually interpret the value.

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 10.12.2012 00:21, Daniel Shahaf wrote:
>> As for my 1st concern, about the slowness of the pre-commit hook
>> validation, it might be possible to make it faster if both 'svnlook
>> pg' and 'svnlook cat' would support multiple targets.
> Or if you wrote your hook script using the Python bindings.  (only one
> svn_fs_open call, intra-process caches, ...)

Or you can write that pre-commit hook in C, after all, using our very
own libraries. It'll take a bit longer than doing it in Python, but much
less than trying to shoehorn something into libsvn_repos that wasn't
designed to be there.

(N.B.: Even if it /was/ in libsvn_repos, it'd just get called at the
same time as the pre-commit hook; so you'd save a fork+exec and a few
open calls, but not that much more.)

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Mon, Dec 10, 2012 at 00:08:26 +0100:
> On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
> >> 2) Am I the only one who wants to protect his repository against this
> >> corruption? Judging from [1], I don't think so. It doesn't make sense
> >> that everyone starts writing this pre-commit hook, for something that
> >> IMHO is an obvious anti-corruption protection. I think every
> >> repository out there deserves to be protected against this.
> >>
> >
> > FWIW, I suggested a hook because you could implement that in about
> > 5 minutes, whereas doing a C code change would take at least 10 times
> > that (and several weeks if you have to wait for it to appear in a 1.7.x
> > release that you can install at $WORK).  I won't object to C code
> > verifying the svn:eol-style invariant.
> 
> Thanks. And your pre-commit hook example is much appreciated.
> 
> For the moment I get the impression that it's not really doable /
> desirable to implement this in the repository. At least until now
> no-one has suggested how it could be done, and I don't know enough
> myself about the server-side / back-end to figure it out :-).
> 

Well, as Brane said the logic should live in libsvn_repos.  I would
guess the code path of interest is the commit editor.  (The property
validations live in svn_repos__validate_prop(), in fs-wrap.c.)

> (side-note: I don't want to rush anything and get it backported to a
> 1.7.x release. For the moment I can manage (I watch the problem from
> an asynchronous post-commit hook, and fix it as soon as it happens,
> which is only a couple of times a year in our repos). But I'm
> interested in a long-term solution to this, so this wart can be
> eliminated :-)).
> 

Well, just extend the wart to also fix it as soon as it happens:

% svn cat $FILE | dos2unix | svnmucc -mm put /dev/stdin $FILE
(BTW, there's a race condition in this line --- it's not guaranteed that
'cat' and 'put' open the RA session to the same base revision)

> As for my 1st concern, about the slowness of the pre-commit hook
> validation, it might be possible to make it faster if both 'svnlook
> pg' and 'svnlook cat' would support multiple targets.

Or if you wrote your hook script using the Python bindings.  (only one
svn_fs_open call, intra-process caches, ...)

> 
> -- 
> Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
>> 2) Am I the only one who wants to protect his repository against this
>> corruption? Judging from [1], I don't think so. It doesn't make sense
>> that everyone starts writing this pre-commit hook, for something that
>> IMHO is an obvious anti-corruption protection. I think every
>> repository out there deserves to be protected against this.
>>
>
> FWIW, I suggested a hook because you could implement that in about
> 5 minutes, whereas doing a C code change would take at least 10 times
> that (and several weeks if you have to wait for it to appear in a 1.7.x
> release that you can install at $WORK).  I won't object to C code
> verifying the svn:eol-style invariant.

Thanks. And your pre-commit hook example is much appreciated.

For the moment I get the impression that it's not really doable /
desirable to implement this in the repository. At least until now
no-one has suggested how it could be done, and I don't know enough
myself about the server-side / back-end to figure it out :-).

(side-note: I don't want to rush anything and get it backported to a
1.7.x release. For the moment I can manage (I watch the problem from
an asynchronous post-commit hook, and fix it as soon as it happens,
which is only a couple of times a year in our repos). But I'm
interested in a long-term solution to this, so this wart can be
eliminated :-)).

As for my 1st concern, about the slowness of the pre-commit hook
validation, it might be possible to make it faster if both 'svnlook
pg' and 'svnlook cat' would support multiple targets.

-- 
Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100:
> 2) Am I the only one who wants to protect his repository against this
> corruption? Judging from [1], I don't think so. It doesn't make sense
> that everyone starts writing this pre-commit hook, for something that
> IMHO is an obvious anti-corruption protection. I think every
> repository out there deserves to be protected against this.
> 

FWIW, I suggested a hook because you could implement that in about
5 minutes, whereas doing a C code change would take at least 10 times
that (and several weeks if you have to wait for it to appear in a 1.7.x
release that you can install at $WORK).  I won't object to C code
verifying the svn:eol-style invariant.

> [1] http://svn.haxx.se/users/archive-2011-10/0287.shtml
> 
> -- 
> Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sun, Dec 9, 2012 at 11:42 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 09.12.2012 09:14, Daniel Shahaf wrote:
>> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100:
>>> Last week a colleague managed to commit a non-LF-normalized
>>> svn:eol-style=native file in our repository again. As explained in
>>> issue #4065 [1], this causes all kinds of problems.
>>>
>>> I suspect there might be a bug in SVNKit, some edge-case where it
>>> doesn't correctly translate the to-be-committed files to
>>> LF-termination. But regardless, I'd like to protect my repository. I
>>> don't fully control the clients that people use, and those clients can
>>> always have bugs, ...
>>>
>>> So ... how hard would it be to fix issue #4065, making the server
>>> enforce the right eol-ness for correct operation? It's a genuine
>>> question, I have no idea :-).
>>>
>> for i in $(svnlook changed -t $TXN $REPOS); do
>>   if propget = native || propget = LF ; then
>>     svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d
>>   fi
>> done
>>
>> And writing that made me realise... "contains CR" is such a simple
>> condition, that you don't need the fulltext for it --- you would be able
>> to implement it by looking at the "literal new text" segments within
>> svndiff streams directly.  However, I'm not quite sure whether this
>> observation is the key to a simple implementation or to an
>> unnecessarily-complicated one.
>>
>>> [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server
>>> should enforce LF normalization for svn:eol-style=native files
>
> The server does not look at the contents of files, or the value of
> properties. It does not even know that properties /have/ semantics. The
> only reasonable place to do this would be in a pre-commit hook. Anything
> else would require a major change in the design of the server and/or
> libsvn_repos.
>
> If the client encounters non-normalized files with svn:eol-style=native
> and does /not/ properly normalize them regardless, that's a client bug.
> Though of course the file would immediately show up as modified as soon
> as it was updated.

Bah, that just sounds like the client (any client) should paper over
these "corrupt files". I don't think that's a good solution.

These files are present in the repository in a corrupt form, not
following the expected invariant that they should have LF line
endings. If you 'svnlook cat' such a file, it will have CRLF line
endings, while normal svn:eol-style=native files will always have LF
line endings when 'svnlook cat'ed.

Any "client" that diffs that change ('svnlook diff', 'svn diff -c $REV
$REPOS', ...), will see a full file diff. They would all have to
normalize that "corrupt file" on the fly to give a good diff like it
should have been if it weren't corrupted.

Note that the Apache repository itself already contains such
corruptions (see [1], the user thread started by Konstantin Kolinko,
which prompted me to file the issue). Any large repository will have
them, because of the diversity of svn clients and bugs they may
contain. These buggy clients ruin the experience for everyone else.
I'd like there to be some better protection against this problem,
centrally.

A pre-commit hook is an option, and I'll probably have to go that
route if there is no other way. But I see two problems with it:

1) It's dog slow. Doing an "svnlook pg" and then potentially "svnlook
cat" for every changed file can easily make the pre-commit hook take
longer than the standard client-side timeout of 30 seconds (for neon,
dunno about the serf timeout). Just imagine doing a catch-up merge
touching 1000 files. Parsing both the properties and the content from
one "svnlook diff" call might be a workaround, but yuck.

2) Am I the only one who wants to protect his repository against this
corruption? Judging from [1], I don't think so. It doesn't make sense
that everyone starts writing this pre-commit hook, for something that
IMHO is an obvious anti-corruption protection. I think every
repository out there deserves to be protected against this.

[1] http://svn.haxx.se/users/archive-2011-10/0287.shtml

-- 
Johan

Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Branko Čibej <br...@wandisco.com>.
On 09.12.2012 09:14, Daniel Shahaf wrote:
> Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100:
>> Last week a colleague managed to commit a non-LF-normalized
>> svn:eol-style=native file in our repository again. As explained in
>> issue #4065 [1], this causes all kinds of problems.
>>
>> I suspect there might be a bug in SVNKit, some edge-case where it
>> doesn't correctly translate the to-be-committed files to
>> LF-termination. But regardless, I'd like to protect my repository. I
>> don't fully control the clients that people use, and those clients can
>> always have bugs, ...
>>
>> So ... how hard would it be to fix issue #4065, making the server
>> enforce the right eol-ness for correct operation? It's a genuine
>> question, I have no idea :-).
>>
> for i in $(svnlook changed -t $TXN $REPOS); do
>   if propget = native || propget = LF ; then 
>     svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d
>   fi
> done
>
> And writing that made me realise... "contains CR" is such a simple
> condition, that you don't need the fulltext for it --- you would be able
> to implement it by looking at the "literal new text" segments within
> svndiff streams directly.  However, I'm not quite sure whether this
> observation is the key to a simple implementation or to an
> unnecessarily-complicated one.
>
>> [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server
>> should enforce LF normalization for svn:eol-style=native files

The server does not look at the contents of files, or the value of
properties. It does not even know that properties /have/ semantics. The
only reasonable place to do this would be in a pre-commit hook. Anything
else would require a major change in the design of the server and/or
libsvn_repos.

If the client encounters non-normalized files with svn:eol-style=native
and does /not/ properly normalize them regardless, that's a client bug.
Though of course the file would immediately show up as modified as soon
as it was updated.

-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100:
> Last week a colleague managed to commit a non-LF-normalized
> svn:eol-style=native file in our repository again. As explained in
> issue #4065 [1], this causes all kinds of problems.
> 
> I suspect there might be a bug in SVNKit, some edge-case where it
> doesn't correctly translate the to-be-committed files to
> LF-termination. But regardless, I'd like to protect my repository. I
> don't fully control the clients that people use, and those clients can
> always have bugs, ...
> 
> So ... how hard would it be to fix issue #4065, making the server
> enforce the right eol-ness for correct operation? It's a genuine
> question, I have no idea :-).
> 

for i in $(svnlook changed -t $TXN $REPOS); do
  if propget = native || propget = LF ; then 
    svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d
  fi
done

And writing that made me realise... "contains CR" is such a simple
condition, that you don't need the fulltext for it --- you would be able
to implement it by looking at the "literal new text" segments within
svndiff streams directly.  However, I'm not quite sure whether this
observation is the key to a simple implementation or to an
unnecessarily-complicated one.

> [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server
> should enforce LF normalization for svn:eol-style=native files
> 
> -- 
> Johan