You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@mit.edu> on 2001/12/14 00:44:48 UTC
Newlines, preserving data, and multiple access paths
I have some new thoughts on newline translation, after talking to some
MIT friends about it.
1. We can avoid irrevocably destroying data if we make sure all
newline translations we do are reversible. A newline translation
is reversible if there are no CRs or LFs in the file which aren't
source-format newlines.
This means we can go back to Ben's proposal, and as long as we add
this safety, we don't have to worry about destroying anyone's
engine designs. If the engine design was made on Windows and
happens to only contain CRLFs, they will get translated to LF on
checkin, but translating LF back to CRLF will restore the file. If
the engine design contains CRLFs mixed with LFs and CRs, we can
error out, or decide that the file must be binary after all.
(If we want to go a little overboard on safety, we could make the
client library set a property on each commit saying what newline
translation was done, if any. Then it would be easy to retrieve
the exact contents of the committed file by reversing the
translation. I don't think this is necessary, though.)
2. Unfortunately, as I noted in one of my many other messages today,
*none* of the schemes presented so far will robustly handle tools
which access the repository through DAV or libsvn_fs, if the tools
run on varying platforms and aren't forgiving about newlines. In
order to do that, we have to actually add the concept of a text
file to the FS layer.
Here is what I propose:
* For now, we implement Ben's scheme, with the proviso that we never
do a non-reversible newline translation. (This totally messes up
Karl's poll because it didn't include Ben's scheme.) The
repository gets a global format of LF.
* Tools which use DAV or libsvn_fs must be able to handle LF line
separators. All Unix tools will be okay. Most Windows tools will
probably also be okay because they know they're getting data over
the net where not everyone uses the same newline style. (And most
Mac tools will probably be okay because MacOS X is already
schizophrenic about newlines.)
* If the above turns out to be a problem, we can talk about changing
the concept of the FS layer.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Bruce Atherton <br...@callenish.com>.
At 11:36 AM 12/14/2001 -0600, Ben Collins-Sussman wrote:
>Bruce's system seems a tad more complicated to implement, since it
>seems to require some kind of auto-detection of EOL style when a
>text-base is first received from the server.
As I mentioned in part two of my proposal, you could choose to store a
second property on a file that indicated what line ending it has in the
repository, but that would still be a property that only the client would
use. It isn't required, but it may be more efficient.
> And it also needs to
>'remember' that a transform happened previously; either that, or
>re-run the detection heuristic on text-base each time the working file
>is committed.
I was thinking of "remember", perhaps in the entries file? Except that
integrates it a little more into the client than may be desirable. In the
abstract, I was thinking more like a set of transforms (line endings,
keywords, whatever) that could be plugged in to a client or not depending
on user preference, and that would provide perhaps three callbacks
(transform_stream, reverse_transform_stream, requires_transform). In the
concrete, of course, that probably all goes out the window.
>Please correct me if I'm wrong. My brain is spinning, and I'm so
>tired of reading/thinking about this issue.
Me too. I'd given up on posting anything more on the topic, but thought
these clarifications might be helpful.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2001-12-14 at 15:48, William Uther wrote:
> --On Friday, 14 December 2001 1:16 PM -0500 Greg Hudson <gh...@MIT.EDU>
> wrote:
> > If newline-style is LF, CR, or CRLF, translate <native newline style>
> > -> <requested newline style>. If we notice any CRs or LFs which aren't
> > part of a native-style newline and aren't part of a requested-style
> > newline, abort the commit. If the commit succeeds, apply the <native
> > newline style> -> <requested newline style> translation to the working
> > copy as well, so that it matches what we would get from a checkout of
> > the new rev.
>
> I don't think this preserves reversability. If a file contains BOTH
> <native-style newline> and <requested-style newline> then you neet to
> abort. If you translate just <native-style newline> then you can't undo
> the transformation - you don't know which newlines need to be untransformed.
This particular transform (for files marked CRLF, CR, or LF) is not
reversible. See where I said:
We probably don't have to worry so much about data safety for
these files since a particular, odd behavior has been specified for
them.
However, let's add a possible variation to my proposal, for those who
are still uncomfortable with data-destroying transformations applied to
such flies:
Variation 5: If the file is marked CRLF, CR, or LF, we translate
<native-style newline> to <requested-style newline> during commit, and
abort the commit if we notice any kind of mixing of newline styles.
(Can also combine with variation 1.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Colin Putney <cp...@whistler.com>.
William Uther wrote:
> --On Friday, 14 December 2001 1:16 PM -0500 Greg Hudson
> <gh...@MIT.EDU> wrote:
>
>> If newline-style is LF, CR, or CRLF, translate <native newline style>
>> -> <requested newline style>. If we notice any CRs or LFs which aren't
>> part of a native-style newline and aren't part of a requested-style
>> newline, abort the commit. If the commit succeeds, apply the <native
>> newline style> -> <requested newline style> translation to the working
>> copy as well, so that it matches what we would get from a checkout of
>> the new rev.
>
> I don't think this preserves reversability. If a file contains BOTH
> <native-style newline> and <requested-style newline> then you neet to
> abort. If you translate just <native-style newline> then you can't
> undo the transformation - you don't know which newlines need to be
> untransformed.
>
> Stated simply: You should only translate when the newline style is
> entirely consistent. Anything else removes the inconsistency and hence
> loses information.
True, this scheme doesn't preserve reversibility. But in this case
that's OK, because the newline-style decrees what the newline style must
be. If there are native-style newlines mixed in with the requested-style
newlines, this is probably the result of corruption by some
native-newline-obsessive user tool. So the non-reversible transform will
actually undo the corruption.
For example, the file foo.dsp, which has newline-style of CRLF. It's
stored in the repository with CRLF newlines and on checkout, no
transformation is done. If Linus checks out the file and edits it in an
old version of emacs, any lines he adds will be terminated with a bare
LF. Since this is his native style of newline, the transformation Greg
described will undo this damage.
If the newline-style is set to a specific newline-style (ie. CR, LF, or
CRLF), then we know that (1) the file is text, not binary, and (2), any
other style of newline present is corruption.
A file should not be marked with a specific newline style unless (1)
user does so explicitly, or (2) it matches some heuristic when it's
added, *and* the file contents conform to that newline style.
So the only real possibility for corruption is if some user tool creates
a binary file that matches a heuristic for a specific newline style. In
our running example, William creates a vector graphics file called
foo.dsp and adds it. By chance, this file happens to have CRLFs
scattered though it, but no bare CRs, LFs, '\0' characters or other
harbingers of binary files. On the commit, svn will notice the
extension, set the newline-style to CRLF and send it to the repository.
William may get an error if he tries to commit a change that introduces
a bare CR or LF, but he won't corrupt the file.
Linus can corrupt the file if he makes a change that introduces a bare
LF, which will get transformed into CRLF on commit. Alternatively,
Madeleine (was that her name?) Can introduce a bare CR and commit, which
will also corrupt the file.
That's a pretty long string of unlikely coincidences though, while the
opposite case, where this transformation *fixes* corruption, is quite
common.
Colin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Karl Fogel <kf...@newton.ch.collab.net>.
William Uther <wi...@cs.cmu.edu> writes:
> > If newline-style is LF, CR, or CRLF, translate <native newline style>
> > -> <requested newline style>. If we notice any CRs or LFs which aren't
> > part of a native-style newline and aren't part of a requested-style
> > newline, abort the commit. If the commit succeeds, apply the <native
> > newline style> -> <requested newline style> translation to the working
> > copy as well, so that it matches what we would get from a checkout of
> > the new rev.
>
> I don't think this preserves reversability. If a file contains BOTH
> <native-style newline> and <requested-style newline> then you neet to
> abort. If you translate just <native-style newline> then you can't
> undo the transformation - you don't know which newlines need to be
> untransformed.
>
> Stated simply: You should only translate when the newline style is
> entirely consistent. Anything else removes the inconsistency and
> hence loses information.
I think that's what Greg H is saying, he just said it differently. He
didn't mean to tolerate files with mixed line endings, just that if
the modified file happens to *match* the specified ending (i.e., any
conversion that took place was entirely undone by some user tool),
that shouldn't be cause for aborting the commit.
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by William Uther <wi...@cs.cmu.edu>.
--On Friday, 14 December 2001 1:16 PM -0500 Greg Hudson <gh...@MIT.EDU>
wrote:
> If newline-style is LF, CR, or CRLF, translate <native newline style>
> -> <requested newline style>. If we notice any CRs or LFs which aren't
> part of a native-style newline and aren't part of a requested-style
> newline, abort the commit. If the commit succeeds, apply the <native
> newline style> -> <requested newline style> translation to the working
> copy as well, so that it matches what we would get from a checkout of
> the new rev.
I don't think this preserves reversability. If a file contains BOTH
<native-style newline> and <requested-style newline> then you neet to
abort. If you translate just <native-style newline> then you can't undo
the transformation - you don't know which newlines need to be untransformed.
Stated simply: You should only translate when the newline style is entirely
consistent. Anything else removes the inconsistency and hence loses
information.
later,
\x/ill :-}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:
> No... it just means that if the mod times force a contents check, you
> have to translate the text-base contents as you compare them against the
> normal contents. That's "a teeny tiny bit slower," not, "a lot slower."
It's worse than that -- it invalidates our size check. Right now,
differing file sizes *guarantee* that a modification was made. With
eol conversion, there can be different sizes with no local mods.
So we'd lose one early return from text_modified_p().
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2001-12-14 at 17:44, Karl Fogel wrote:
> +1 on Greg Hudson's latest proposal -- and I think we're now ready to
> Actually Do It. :-)
I hope so. For a while I was afraid we had hit our first failure to
achieve livable consensus. My apologies for not realizing the
reversability thing until two days and several thousand lines of
misguided debate had already gone by.
> My assumption is that "in the working copy" means both text-base and
> working file, for the sake of an efficient is-modified-p test, and
> since the repository file is just an automatic transform off the
> text-base anyway.
Actually, I was assuming that text-base would be a verbatim copy of the
repository contents. But that's kind of an implementation detail; let's
leave that up to Ben (assuming he's doing the implementation).
> Otherwise, then the is-modified-p check has to be tweaked in a way
> that will make modifiedness checks a lot slower in some cases.
No... it just means that if the mod times force a contents check, you
have to translate the text-base contents as you compare them against the
normal contents. That's "a teeny tiny bit slower," not, "a lot slower."
> The second sentence of the above paragraph isn't about allowing
> mixed-style files. It's saying that if the entire file is native
> format, allow that (and transform when necessary), OR if the entire
> file is in the requested style, then allow that too. The latter
> situation could happen if someone used a LF-style tool under Windows,
> for example, so that when an LF-style file got saved, the whole thing
> would be LF-style now, not native style. No reason to disallow this.
>
> Right?
See my last message, as well as Colin Putney's argument. In summary,
that's not actually what I meant, but I don't really care either way.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Karl Fogel <kf...@newton.ch.collab.net>.
+1 on Greg Hudson's latest proposal -- and I think we're now ready to
Actually Do It. :-)
It's clear that no solution is perfect, because doing eol conversion
raises some inherently unresolvable questions. But my sense from
recent discussions is that everyone here can live with the choices
this proposal makes; also, it follows the Principle Of Least Surprise,
and is highly unlikely to damage data unfixably.
Below I'll quote his proposal, with a few annotations reflecting my
understanding of certain points, just to make sure.
If you have a *violent* objection, please post; otherwise, please do
not. We're looking for liveable consensus now, not further
refinements that would help some border cases and harm others. :-)
> Alright, I'll make a proposal which is like yours but (in my opinion) a
> little clearer. First, let's look at the different use cases:
>
> 1. The most common case--text files which want native line endings.
> These should be stored in the repository using LF line endings, and in
> the working dir using native line endings.
>
> 2. Binary files. These files we don't want to touch at all.
>
> 3. Text files which, for one reason or another, want a specific line
> ending format regardless of platform. These should be stored in the
> repository and in the working directory using the specified line
> ending. We probably don't have to worry so much about data safety for
> these files since a particular, odd behavior has been specified for
> them.
>
> There are, of course, a hundred different ways we could arrange the
> metadata. I propose an "svn:newline-style" property with the possible
> values "none", "native", "LF", "CR", and "CRLF". The values mean:
>
> none: Use case 2. don't do any newline translation
>
> native: Use case 1. Store with LF in repository, and with native line
> endings in the working copy.
My assumption is that "in the working copy" means both text-base and
working file, for the sake of an efficient is-modified-p test, and
since the repository file is just an automatic transform off the
text-base anyway.
If that's what you meant, then incoming svndiff has to be applied to a
deconverted tmp file, which then becomes the new text-base. No
problem.
Otherwise, then the is-modified-p check has to be tweaked in a way
that will make modifiedness checks a lot slower in some cases.
> LF, CR, CRLF: Use case 3. Store with specified format in the
> repository and in the working copy.
>
> On commit, we apply the following rules to transform the data committed
> to the server:
>
> If newline-style is none, do nothing.
>
> If newline-stle is native, translate <native newline style> -> LF. If
> we notice any CRs or LFs which aren't part of a native-style newline,
> abort the commit.
>
> If newline-style is LF, CR, or CRLF, translate <native newline style>
> -> <requested newline style>. If we notice any CRs or LFs which aren't
> part of a native-style newline and aren't part of a requested-style
> newline, abort the commit. If the commit succeeds, apply the <native
> newline style> -> <requested newline style> translation to the working
> copy as well, so that it matches what we would get from a checkout of
> the new rev.
The second sentence of the above paragraph isn't about allowing
mixed-style files. It's saying that if the entire file is native
format, allow that (and transform when necessary), OR if the entire
file is in the requested style, then allow that too. The latter
situation could happen if someone used a LF-style tool under Windows,
for example, so that when an LF-style file got saved, the whole thing
would be LF-style now, not native style. No reason to disallow this.
Right?
> On checkout, we translate LF -> <native newline style> if newline-style
> is native; otherwise, we leave the file alone.
Yup.
> For now, let's say the default value of svn:newline-style is none. In
> the future, we'll want to think about things like how to enable
> newline-translation over the whole repository except for files which
> don't appear to be text.
Agree. Let's wait and let real-life use cases drive how we do mass
enablings.
I don't see any need for any of the "Variations" right now. Let's see
how the above works first.
-Karl
> I think that's a complete proposal. Some possible variations:
>
> Variation 1: If newline-style is native, on commit, translate <first
> newline style seen> -> LF. If we see any CRs or LFs which don't match
> the first newline style seen, abort the commit.
>
> Variation 2: If newline-style is native, before commit, examine the
> file to see if it uses only the native newline style. If it doesn't,
> set the newline-style property to "none" and commit with no translation.
>
> Variation 3: Combine variations 1 and 2; if newline-style is native,
> then if before commit, examine the file to see if it uses a single
> consistent newline style. If it does, translate <that newline style> ->
> LF; if not, commit with newline-style set to "none" and no translation.
>
> Variation 4: If newline-style is native, then on commit, we edit a
> property "svn:newline-conversion" to something like "CRLF LF" to show
> what conversion we did. This enables mechanical reversal of the
> translation if the file is later determined to be binary. (Particularly
> useful with variations 1 or 3 where the transform might not be obvious
> from the platform where the file was checked in.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2001-12-14 at 12:36, Ben Collins-Sussman wrote:
> Sorry, what is a 'source-format newline'?
I guess I played a little fast and loose with terminology there.
Let's say we're transforming CRLF to LF. CRLF is what I meant by
"source-format newline". If there are any CRs or LFs in the file which
aren't part of a CRLF pair, then the transform is not reversible.
> So, given that we're implementing a transform-on-commit system, the
> only clarification left is how metadata fits in.
Alright, I'll make a proposal which is like yours but (in my opinion) a
little clearer. First, let's look at the different use cases:
1. The most common case--text files which want native line endings.
These should be stored in the repository using LF line endings, and in
the working dir using native line endings.
2. Binary files. These files we don't want to touch at all.
3. Text files which, for one reason or another, want a specific line
ending format regardless of platform. These should be stored in the
repository and in the working directory using the specified line
ending. We probably don't have to worry so much about data safety for
these files since a particular, odd behavior has been specified for
them.
There are, of course, a hundred different ways we could arrange the
metadata. I propose an "svn:newline-style" property with the possible
values "none", "native", "LF", "CR", and "CRLF". The values mean:
none: Use case 2. don't do any newline translation
native: Use case 1. Store with LF in repository, and with native line
endings in the working copy.
LF, CR, CRLF: Use case 3. Store with specified format in the
repository and in the working copy.
On commit, we apply the following rules to transform the data committed
to the server:
If newline-style is none, do nothing.
If newline-stle is native, translate <native newline style> -> LF. If
we notice any CRs or LFs which aren't part of a native-style newline,
abort the commit.
If newline-style is LF, CR, or CRLF, translate <native newline style>
-> <requested newline style>. If we notice any CRs or LFs which aren't
part of a native-style newline and aren't part of a requested-style
newline, abort the commit. If the commit succeeds, apply the <native
newline style> -> <requested newline style> translation to the working
copy as well, so that it matches what we would get from a checkout of
the new rev.
On checkout, we translate LF -> <native newline style> if newline-style
is native; otherwise, we leave the file alone.
For now, let's say the default value of svn:newline-style is none. In
the future, we'll want to think about things like how to enable
newline-translation over the whole repository except for files which
don't appear to be text.
I think that's a complete proposal. Some possible variations:
Variation 1: If newline-style is native, on commit, translate <first
newline style seen> -> LF. If we see any CRs or LFs which don't match
the first newline style seen, abort the commit.
Variation 2: If newline-style is native, before commit, examine the
file to see if it uses only the native newline style. If it doesn't,
set the newline-style property to "none" and commit with no translation.
Variation 3: Combine variations 1 and 2; if newline-style is native,
then if before commit, examine the file to see if it uses a single
consistent newline style. If it does, translate <that newline style> ->
LF; if not, commit with newline-style set to "none" and no translation.
Variation 4: If newline-style is native, then on commit, we edit a
property "svn:newline-conversion" to something like "CRLF LF" to show
what conversion we did. This enables mechanical reversal of the
translation if the file is later determined to be binary. (Particularly
useful with variations 1 or 3 where the transform might not be obvious
from the platform where the file was checked in.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Ben Collins-Sussman <su...@collab.net>.
Greg Hudson <gh...@mit.edu> writes:
> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.
Sorry, what is a 'source-format newline'?
> Here is what I propose:
>
> * For now, we implement Ben's scheme, with the proviso that we never
> do a non-reversible newline translation. (This totally messes up
> Karl's poll because it didn't include Ben's scheme.) The
> repository gets a global format of LF.
OK, so you're advocating (like everyone else now) that it's okay to do
a 'reverse transform' when committing, provided our transforms are
Safe. That's a great turn of events! This was the huge Sticking
Point that differentiated your proposal from mine & Bruce's. I feel
like a major hurdle has been crossed.
So, given that we're implementing a transform-on-commit system, the
only clarification left is how metadata fits in. My & Bruce's systems
had slightly different notions how how metadata should work in
determining system behavior.
* In my system, an EOL property defined how a file should look in
the repository. The client was responsible for making sure that
this style was always committed to the repository. If this
property was non-existent, the client assumes it has a value of
'LF'. Then there was a -second- property that enabled one to
switch EOL conversion on/off per file. The absence of this second
property can imply EOL is either on or off by default; I don't
care which.
* In Bruce's system, he had only one property - namely, the on/off
switch. If the property was 'on', then a committed file would be
reverse-transformed on commit, assuming that a transform had
originally happened on checkout.
Bruce's system seems a tad more complicated to implement, since it
seems to require some kind of auto-detection of EOL style when a
text-base is first received from the server. And it also needs to
'remember' that a transform happened previously; either that, or
re-run the detection heuristic on text-base each time the working file
is committed.
Please correct me if I'm wrong. My brain is spinning, and I'm so
tired of reading/thinking about this issue. I just want to code
already. :-)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Karl Fogel <kf...@newton.ch.collab.net>.
This is a real issue, but we have enough to do already. Let's please
put this kind of thing off till post-1.0.
That'll be a wonderful day, the day we have problems because so many
non-Subversion tools are accessing Subversion repositories. That day
is not today. :-)
-K
Mark Benedetto King <bk...@answerfriend.com> writes:
> > 2. Unfortunately, as I noted in one of my many other messages today,
> > *none* of the schemes presented so far will robustly handle tools
> > which access the repository through DAV or libsvn_fs, if the tools
> > run on varying platforms and aren't forgiving about newlines. In
> > order to do that, we have to actually add the concept of a text
> > file to the FS layer.
> >
>
> I proposed a solution on IRC to handle this case. It seems to me that
> what we want here is something like a "view", i.e., a WC-specific set
> of properties. What if we embed the WC-desired CR/NL/CRNL semantics
> *in* the request URL?
>
> http://svn.collab.net/CR/repos/svn/trunk
> http://svn.collab.net/NL/repos/svn/trunk
> http://svn.collab.net/CRNL/repos/svn/trunk
>
> And then let an apache module sort out the rewriting?
>
> Alternatively, we could do:
>
> http://svn.collab.net/repos/svn/trunk?record=CR
> http://svn.collab.net/repos/svn/trunk?record=NL
> http://svn.collab.net/repos/svn/trunk?record=CRNL
>
> And let mod_dav sort out the rewriting. I'm not sure
> if all DAV tools can include a query-string, though.
>
> Another alternative would be to use an SVN branch that
> had alternate default properties:
>
> http://svn.collab.net/repos/CR/svn/trunk
> http://svn.collab.net/repos/NL/svn/trunk
> http://svn.collab.net/repos/CRNL/svn/trunk
>
> This would require server-side implementation of the
> separator semantics (which goes against the current
> proposals, but does clean up this mess). Also, these
> branches would probably need to be read-only.
>
>
> Comments?
>
> --ben
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Newlines, preserving data, and multiple access paths
Posted by Mark Benedetto King <bk...@answerfriend.com>.
On Thu, Dec 13, 2001 at 07:44:48PM -0500, Greg Hudson wrote:
> I have some new thoughts on newline translation, after talking to some
> MIT friends about it.
>
> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.
>
> This means we can go back to Ben's proposal, and as long as we add
> this safety, we don't have to worry about destroying anyone's
> engine designs. If the engine design was made on Windows and
> happens to only contain CRLFs, they will get translated to LF on
> checkin, but translating LF back to CRLF will restore the file. If
> the engine design contains CRLFs mixed with LFs and CRs, we can
> error out, or decide that the file must be binary after all.
>
> (If we want to go a little overboard on safety, we could make the
> client library set a property on each commit saying what newline
> translation was done, if any. Then it would be easy to retrieve
> the exact contents of the committed file by reversing the
> translation. I don't think this is necessary, though.)
I totally agree. Another way to look at this is that if a file
has mixed separators, it's a binary file. Of course, that means
an O(n) scan for determination of "binaryness", but maybe we should
do that anyway (the current heuristic only looks at the first bit
of the file).
>
> 2. Unfortunately, as I noted in one of my many other messages today,
> *none* of the schemes presented so far will robustly handle tools
> which access the repository through DAV or libsvn_fs, if the tools
> run on varying platforms and aren't forgiving about newlines. In
> order to do that, we have to actually add the concept of a text
> file to the FS layer.
>
I proposed a solution on IRC to handle this case. It seems to me that
what we want here is something like a "view", i.e., a WC-specific set
of properties. What if we embed the WC-desired CR/NL/CRNL semantics
*in* the request URL?
http://svn.collab.net/CR/repos/svn/trunk
http://svn.collab.net/NL/repos/svn/trunk
http://svn.collab.net/CRNL/repos/svn/trunk
And then let an apache module sort out the rewriting?
Alternatively, we could do:
http://svn.collab.net/repos/svn/trunk?record=CR
http://svn.collab.net/repos/svn/trunk?record=NL
http://svn.collab.net/repos/svn/trunk?record=CRNL
And let mod_dav sort out the rewriting. I'm not sure
if all DAV tools can include a query-string, though.
Another alternative would be to use an SVN branch that
had alternate default properties:
http://svn.collab.net/repos/CR/svn/trunk
http://svn.collab.net/repos/NL/svn/trunk
http://svn.collab.net/repos/CRNL/svn/trunk
This would require server-side implementation of the
separator semantics (which goes against the current
proposals, but does clean up this mess). Also, these
branches would probably need to be read-only.
Comments?
--ben
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Yet another line-end proposal (YALEP?)
Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Philip Martin <ph...@codematters.co.uk> writes:
> - There is a native-line-end property that can be set on a file. I am
> not sure if this is a separate property from the text/binary thing
> as I am not sure what the text/binary thing does at present!
Philip, to answer your implied question:
A file needs to know if it is text vs binary so that the client can
use `diff' and `patch' to merge repository changes into a locally
modified file. For binary files, the client won't even try to merge,
it just gives you both copies and lets you figure out what to do (we
plan support for pluggable merge tools, but that's not done yet).
Thus, text vs binary would be relevant even if we weren't supporting
newline conversion nor keyword substitution.
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Yet another line-end proposal (YALEP?)
Posted by Philip Martin <ph...@codematters.co.uk>.
Greg Hudson <gh...@mit.edu> writes:
> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.
This is the property I have been trying to use.
Background (feel free to skip this para it explains why I wan't this,
not how it works) I worked with ClearCase on a C++ project in a mixed
Unix/NT environment earlier this year. ClearCase does line-ending
conversion on a per-view basis for text files (a view is a working-
copy in subversion terms). We used views that were mounted on both
Unix and NT boxes, i.e. one view was simultaneously mounted on both
machines. This made cross platform development much easier as changes
could be built on both platforms without requiring them to be either
checked-in, or manually copied between views. However it caused real
problems if line-ending conversion was enabled. A view set up to use
NT line-endings say was hard to use from a Unix box, since every line
has already changed. Merging, which is normally a ClearCase strong
point, was disrupted if a set of line-end changes got eroneously
committed. We ended up abandoning the line-end conversion, and using
pre-commit triggers to produce the line-endings we wanted (.ds[pw]'s
had CRLF, source code plain LF). This initially disposes me to
support no line-end conversion, and for the default to be off if it is
present. However given that lots of people want it, take a deep breath
and here goes.
Proposal
========
Rules:
- The text-base always duplicates the repository.
- Any sort of line-ending can appear in any file in the repository.
- There is a native-line-end property that can be set on a file. I am
not sure if this is a separate property from the text/binary thing
as I am not sure what the text/binary thing does at present!
Rules when native-line-end is not set:
- If the property is not set no line-end conversion occurs. The
working-copy duplicates the repository. File get commited exactly
as they appear in the working-copy, just your straight binary file.
Rules when native-line-end is set:
- At check-out/update/revert convert all line-endings in the working
copy to whatever the platform requires. Store the platform line-end
property in the .svn/entries file (or wherever) to allow checkout
with Unix client and check-in with non-Unix client or vice
versa. The .svn/entries property is "none" or "LF" or "CRLF" etc.,
i.e. an explict line-ending and not just "native".
- At check-out/update/revert there is a -no-convert option to disable
line-end conversion, overriding the native-line-end property. This
also changes the line-end property in the .svn/entries file.
- At commit check the .svn/entries file to determine the
line-end property. When generating the diff between the
working-copy and the text-base if a line-end difference is
explained by the line-ending conversion ignore it. If the
introduced line-endings are incompatible with the .svn/entries
line-end property display an error.
Diff Algorithm:
The diff algorithm is basically as follows: do the line-ending
conversion specified in the .svn/entries file on the text-base to
generate the pristine working-copy. Diff the pristine working-copy
and the actual working copy. Within the diff, undo the line-ending
conversion on the diff for those parts that represent the
text-base. Within the diff, verify that all line-endings on for those
parts that represent the working-copy are consistent with the
.svn/entries property. This diff is now suitable to send to the
repository.
Advantages
==========
- On the wire and repository diff's are small.
- The working copy file gets commited exactly and does not change.[1]
- Any working-copy file that gets comitted can always be retrieved
exactly.
- If an erroneously converted working-copy gets commited the
corruption does not in general get back into the repository.
[1] Any automatic conversion system has to allow the conversion
enabling property to be unset. When this property change is commited
the working copy needs to be changed to match the repository. This
applies whatever scheme we use. Perhaps it should occur when the user
does the propset rather than waiting until the commit?
Disadvantages
=============
- More complicated diff algorithm, I'm not even sure the vdelta
algorithm can be made to operate this way.
- Something I haven't thought of...
Examples
========
Scenario 1: text file with native-line-end property
----------
check-out: text-base CRLF working-copy
abc\n abc\r\n
def\n def\r\n
ghi\n ghi\r\n
edit: text-base CRLF working-copy
abc\n abc\r\n
def\n XXX\r\n
ghi\n ghi\r\n
diff: CRLF working-copy
-def\n
+XXX\r\n
commit: text-base CRLF working-copy
abc\n abc\r\n
XXX\r\n XXX\r\n
ghi\n ghi\r\n
Note that the working-copy does not need to change at commit, and
remains what would appear if the user checked-out on this platform.
check-out: text-base LF working-copy
abc\n abc\n
XXX\r\n XXX\n
ghi\n ghi\n
edit: text-base LF working-copy
abc\n abc\n
XXX\r\n YYY\n
ghi\n ghi\n
diff: LF working-copy
-XXX\r\n
+YYY\n
commit: text-base LF working-copy
abc\n abc\n
YYY\n YYY\n
ghi\n ghi\n
Note that once again the working-copy does not need to change at
commit.
Scenario 2: binary file with erroneous native-line-end property
----------
add: text-base LF working-copy
The .svn/entries line-end indicates LF the platform native.
edit: text-base LF working-copy
some\n
binary\r\n
data
diff: LF working-copy
+some\n
+binary\r\n
+data
Note that the diff contains line-end changes that are incompatible
with the native-line-end property. This might trigger the error, or
it may be delayed until the commit. The commit fails unless the user
removes the native-line-end property
commit: text-base LF working-copy
some\n some\n
binary\r\n binary\r\n
data data
Note that this can only be commited without line-end conversion.
Scenario 3: binary file with erroneous native-line-end property
----------
add: text-base LF working-copy
edit: text-base LF working-copy
more\n
binary\n
stuff
commit: text-base LF working-copy
more\n more\n
binary\n binary\n
stuff stuff
Here the binary does not have a conflicting line-ending, so the
commit succeeds.
check-out: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuff
Here the working-copy is corrupt. If the user recognises this the
native-line-end property can be changed and commited. This, as in any
other scheme, has to update the working-copy. Then the user has the
correct binary file. If the user does not have commit access, they
can use the -no-convert option to get a valid working-copy.
check-out: text-base CRLF working-copy
-no-convert more\n more\n
binary\n binary\n
stuff stuff
If the corruption is unnoticed, and the user continues, the amount of
corruption in the repository is "stable", i.e. the working copy
corruption will not get propogated into the repository. As follows
check-out: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuff
Note the working-copy is corrupt
edit: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuffadded
diff: CRLF working-copy
-stuff
+stuffadded
commit: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuffadded stuffadded
Of course the resulting binary may be useless, but any scheme that
does automatic line-end conversion can produce temporary corruption,
and if this is not noticed problems will inevitably occur.
Hmm, 3:45am, time for bed said Zebedee
--
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org