You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@codematters.co.uk> on 2001/12/14 03:52:59 UTC

Yet another line-end proposal (YALEP?)

Greg Hudson <gh...@mit.edu> writes:

> 1. We can avoid irrevocably destroying data if we make sure all
>    newline translations we do are reversible.  A newline translation
>    is reversible if there are no CRs or LFs in the file which aren't
>    source-format newlines.

This is the property I have been trying to use.

Background (feel free to skip this para it explains why I wan't this,
not how it works) I worked with ClearCase on a C++ project in a mixed
Unix/NT environment earlier this year.  ClearCase does line-ending
conversion on a per-view basis for text files (a view is a working-
copy in subversion terms). We used views that were mounted on both
Unix and NT boxes, i.e. one view was simultaneously mounted on both
machines. This made cross platform development much easier as changes
could be built on both platforms without requiring them to be either
checked-in, or manually copied between views. However it caused real
problems if line-ending conversion was enabled. A view set up to use
NT line-endings say was hard to use from a Unix box, since every line
has already changed. Merging, which is normally a ClearCase strong
point, was disrupted if a set of line-end changes got eroneously
committed. We ended up abandoning the line-end conversion, and using
pre-commit triggers to produce the line-endings we wanted (.ds[pw]'s
had CRLF, source code plain LF).  This initially disposes me to
support no line-end conversion, and for the default to be off if it is
present. However given that lots of people want it, take a deep breath
and here goes.

Proposal
========

 Rules:

 - The text-base always duplicates the repository.

 - Any sort of line-ending can appear in any file in the repository.

 - There is a native-line-end property that can be set on a file. I am
   not sure if this is a separate property from the text/binary thing
   as I am not sure what the text/binary thing does at present!

 Rules when native-line-end is not set:

 - If the property is not set no line-end conversion occurs. The
   working-copy duplicates the repository. File get commited exactly
   as they appear in the working-copy, just your straight binary file.

 Rules when native-line-end is set:

 - At check-out/update/revert convert all line-endings in the working
   copy to whatever the platform requires. Store the platform line-end
   property in the .svn/entries file (or wherever) to allow checkout
   with Unix client and check-in with non-Unix client or vice
   versa. The .svn/entries property is "none" or "LF" or "CRLF" etc.,
   i.e. an explict line-ending and not just "native".

 - At check-out/update/revert there is a -no-convert option to disable
   line-end conversion, overriding the native-line-end property. This
   also changes the line-end property in the .svn/entries file.

 - At commit check the .svn/entries file to determine the
   line-end property. When generating the diff between the
   working-copy and the text-base if a line-end difference is
   explained by the line-ending conversion ignore it. If the
   introduced line-endings are incompatible with the .svn/entries
   line-end property display an error.

 Diff Algorithm:

 The diff algorithm is basically as follows: do the line-ending
 conversion specified in the .svn/entries file on the text-base to
 generate the pristine working-copy. Diff the pristine working-copy
 and the actual working copy. Within the diff, undo the line-ending
 conversion on the diff for those parts that represent the
 text-base. Within the diff, verify that all line-endings on for those
 parts that represent the working-copy are consistent with the
 .svn/entries property.  This diff is now suitable to send to the
 repository.


Advantages
==========

 - On the wire and repository diff's are small.
 - The working copy file gets commited exactly and does not change.[1]
 - Any working-copy file that gets comitted can always be retrieved
   exactly.
 - If an erroneously converted working-copy gets commited the
   corruption does not in general get back into the repository.

[1] Any automatic conversion system has to allow the conversion
enabling property to be unset. When this property change is commited
the working copy needs to be changed to match the repository. This
applies whatever scheme we use. Perhaps it should occur when the user
does the propset rather than waiting until the commit?


Disadvantages
=============

 - More complicated diff algorithm, I'm not even sure the vdelta
   algorithm can be made to operate this way.
 - Something I haven't thought of...


Examples
========

 Scenario 1: text file with native-line-end property
 ----------

 check-out:   text-base    CRLF working-copy
               abc\n        abc\r\n
               def\n        def\r\n
               ghi\n        ghi\r\n

 edit:        text-base    CRLF working-copy
               abc\n        abc\r\n
               def\n        XXX\r\n
               ghi\n        ghi\r\n

 diff:                     CRLF working-copy
                           -def\n
                           +XXX\r\n


 commit:      text-base    CRLF working-copy
               abc\n        abc\r\n
               XXX\r\n      XXX\r\n
               ghi\n        ghi\r\n

 Note that the working-copy does not need to change at commit, and
 remains what would appear if the user checked-out on this platform.


 check-out:   text-base    LF working-copy
               abc\n        abc\n
               XXX\r\n      XXX\n
               ghi\n        ghi\n

 edit:        text-base    LF working-copy
               abc\n        abc\n
               XXX\r\n      YYY\n
               ghi\n        ghi\n

 diff:                     LF working-copy
                           -XXX\r\n
                           +YYY\n

 commit:      text-base    LF working-copy
               abc\n        abc\n
               YYY\n        YYY\n
               ghi\n        ghi\n

 Note that once again the working-copy does not need to change at
 commit.


 Scenario 2: binary file with erroneous native-line-end property
 ----------

 add:         text-base    LF working-copy

 The .svn/entries line-end indicates LF the platform native.

 edit:        text-base    LF working-copy
                            some\n
                            binary\r\n
                            data

 diff:                     LF working-copy
                           +some\n
                           +binary\r\n
                           +data

 Note that the diff contains line-end changes that are incompatible
 with the native-line-end property. This might trigger the error, or
 it may be delayed until the commit. The commit fails unless the user
 removes the native-line-end property

 commit:      text-base    LF working-copy
               some\n       some\n
               binary\r\n   binary\r\n
               data         data

 Note that this can only be commited without line-end conversion.


 Scenario 3: binary file with erroneous native-line-end property
 ----------

 add:         text-base    LF working-copy

 edit:        text-base    LF working-copy
                            more\n
                            binary\n
                            stuff

 commit:      text-base    LF working-copy
               more\n       more\n
               binary\n     binary\n
               stuff        stuff

 Here the binary does not have a conflicting line-ending, so the
 commit succeeds.

 check-out:   text-base    CRLF working-copy
               more\n       more\r\n
               binary\n     binary\r\n
               stuff        stuff

 Here the working-copy is corrupt. If the user recognises this the
 native-line-end property can be changed and commited. This, as in any
 other scheme, has to update the working-copy. Then the user has the
 correct binary file. If the user does not have commit access, they
 can use the -no-convert option to get a valid working-copy.

 check-out:   text-base    CRLF working-copy
 -no-convert   more\n       more\n
               binary\n     binary\n
               stuff        stuff

 If the corruption is unnoticed, and the user continues, the amount of
 corruption in the repository is "stable", i.e. the working copy
 corruption will not get propogated into the repository. As follows

 check-out:   text-base    CRLF working-copy
               more\n       more\r\n
               binary\n     binary\r\n
               stuff        stuff

 Note the working-copy is corrupt

 edit:        text-base    CRLF working-copy
               more\n       more\r\n
               binary\n     binary\r\n
               stuff        stuffadded

 diff:                     CRLF working-copy
                           -stuff
                           +stuffadded

 commit:      text-base    CRLF working-copy
               more\n       more\r\n
               binary\n     binary\r\n
               stuffadded   stuffadded

 Of course the resulting binary may be useless, but any scheme that
 does automatic line-end conversion can produce temporary corruption,
 and if this is not noticed problems will inevitably occur.

Hmm, 3:45am, time for bed said Zebedee

-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Yet another line-end proposal (YALEP?)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Philip Martin <ph...@codematters.co.uk> writes:
>  - There is a native-line-end property that can be set on a file. I am
>    not sure if this is a separate property from the text/binary thing
>    as I am not sure what the text/binary thing does at present!

Philip, to answer your implied question:

A file needs to know if it is text vs binary so that the client can
use `diff' and `patch' to merge repository changes into a locally
modified file.  For binary files, the client won't even try to merge,
it just gives you both copies and lets you figure out what to do (we
plan support for pluggable merge tools, but that's not done yet).

Thus, text vs binary would be relevant even if we weren't supporting
newline conversion nor keyword substitution.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org