You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2018/05/10 11:32:35 UTC

Re: Shelve & checkpoint - next steps

Julian Foad wrote on 2018-04-16:
> Next steps for shelve & checkpoint.
> 
> * change the storage so we can shelve (large) binary files
> 
> * API abstraction to access shelf data
> 
> * store and retrieve base revisions
> 
> * more complete testing -- we should have a way of testing all possible 
> kinds of change
> 
> 
> === Storage for binary files ===
> 
> I made the shelve function walk the WC itself (r1829291), so we can 
> intercept binary files at that point and do something other than diff 
> with them.
> 
>  From r1829295, for binary files it uses git diff binary literal format. 
> This works for a stop-gap, but is inefficient for large files.

Soon I will change it at that interception point, like this. It will 
store a 'binary' file by copying the working version into a directory 
structure that parallels the WC directory structure, inside 
'.svn/shelves/<shelf-name-encoded>-<version>.d/', instead of storing a 
(git binary literal) diff in the patch file.

The file's properties can continue to go into the patch file as a 
property-diff, for the time being.

That should be fast enough for use with very large files.

(And what does being classed as a 'binary' file mean, semantically? It 
means when a modified binary file is shelved and later re-applied to the 
WC, the modifications will not be merged, not even in the 'patch' sense, 
but instead the file will be copied as a whole, in the same way that 
'svn update' and 'svn merge' handle a binary file.)

I have implemented the basics of this. I haven't finished the part the 
detects a conflict when unshelving. When that's done I'll commit.


> Ideally shelves will be able to share the WC pristine store for storing 
> whole file contents. [...]
> 
> 
> === API abstraction ===
> 
> We need libsvn_client APIs to be able to access shelves in the same way 
> as "regular" WC data: export|diff|cat|propget|... for data stored in any 
> shelf. The result of any such API operating on a shelf should be 
> analogous to how the same function would operate on the WC if we first 
> unshelved the change.

Why do we need generic APIs to support these kinds of functions?, we 
might ask. It's not because the user necessarily needs all these 
operations, but to make programming sane. It should be possible to write 
a conceptually simple high-level operation such as "copy all the changes 
found in this WC subtree to this shelf" by setting up the source and 
destination objects and then invoking a common "copy a tree of changes" 
routine, not by writing a new deep implementation of all the guts 
specifically for this source-and-destination pair.

I have started working on such APIs in the 'tree-api' branch (recently 
resurrected from my years-old 'tree-read-api' branch).

> A possible starting point, currently implemented on the 
> 'shelve-checkpoint' branch, is to modify svn_opt_revision_t and the 
> revision-number parsing to accept a shelf name as another kind of 
> revision specifier. This (and the other revprop functions) works so far:
> 
>    $ svn propget -r foo --revprop svn:log
>    This is the log message of shelf 'foo'.
> 
> 
> === Store and retrieve base revisions ===
> 
> Storing the revision number metadata is easy. Svn diff format has always 
> written the base revision of each file in the diff header. The recent 
> 'svn info --viewspec' prototype now provides a way to write a complete 
> description of the revisions and 'shape' (depth and switch settings) of 
> a WC.
> 
> Reading it out is a SMOP. Doing something with it -- that is, doing a 
> 3-way-merge instead of a 'patch' operation -- is conceptually a SMOP but 
> probably more involved.
> 
> Snapshotting the actual content of the base is much more involved if we 
> intend to keep this snapshot attached to each the shelf even though the 
> user runs 'update'. In order to decide whether it is important to do so, 
> I suggest we implement making use of just the revision number metadata 
> and test its performance -- accepting that either repository access or 
> fallback to plain patching would be needed in cases where 'update' has 
> been done.
> 
> 
> Glad to hear any thoughts.
> 
> - Julian


Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote on 2018-05-24:
> I'm thinking to make a changelist behave less like just a temporary tag 
> on a WC path, and more like an object that contains some changes and a 
> log message and metadata about its base revisions, and that can be 
> committed or moved to a 'shelved' state. (See 
> https://cwiki.apache.org/confluence/display/SVN/Shelving+and+Checkpointing+Dev#ShelvingandCheckpointingDev-IntegrateShelvingwithChangelists 
> ).

The model of changelists that I would like to pursue, and its relationship to shelving, is very much like IntelliJ IDEA provides. See the overview and comparison between shelving, stashing, changelists and branches, here:

https://www.jetbrains.com/help/idea/work-on-several-features-simultaneously.html

- Julian

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Nathan Hartman wrote:
> Julian Foad wrote:
>     Since http://svn.apache.org/r1831908 the "svn status" command can
>     operate directly on a shelf.
[...]

> Very nice! When I get some free cycles -- I'm assuming I have to build a 
> client myself to try it? :-)

Yes. It would be good to hear back from you about how it feels to have 
this ability to 'svn st' a shelf.

>     An obvious CLI enhancement would be translate a new "--shelf=SHELF"
>     to "--cl=svn:shelf:SHELF" and translate "--- Changelist
>     'svn:shelf:SHELF'" to "--- Shelf 'SHELF'".
> 
> I don't really see how changelists are related to shelves. [...]

I'm thinking to make a changelist behave less like just a temporary tag 
on a WC path, and more like an object that contains some changes and a 
log message and metadata about its base revisions, and that can be 
committed or moved to a 'shelved' state. (See 
https://cwiki.apache.org/confluence/display/SVN/Shelving+and+Checkpointing+Dev#ShelvingandCheckpointingDev-IntegrateShelvingwithChangelists 
). This is more like how Perforce changelists work ( 
https://www.perforce.com/perforce/doc.current/manuals/cmdref/index.html#CmdRef/p4_shelve.html 
).

- Julian

Re: Shelve & checkpoint - next steps

Posted by Nathan Hartman <ha...@gmail.com>.
On Mon, May 21, 2018 at 12:01 PM Julian Foad <ju...@apache.org> wrote:

> Since http://svn.apache.org/r1831908 the "svn status" command can operate
> directly on a shelf.
>
> $ svn shelves -q
> foo
> $ svn st --cl svn:shelf:foo
> --- Changelist 'svn:shelf:foo':
> A       D1
> A       new-file
> MM      config.txt
> D       hello.txt
> A       D1/D2
>
> The main WC state for those paths may be unmodified, or modified in a
> completely different way:
>
> $ svn st
> A  +    D1
> M       config.txt
>
> This, to me, is the beginning of the more exciting side of shelving, when
> it is no longer doing just what a simple add-on script could do, but is
> becoming more deeply integrated into the work flows and commands that users
> are already familiar with.


Very nice! When I get some free cycles -- I'm assuming I have to build a
client myself to try it? :-)

An obvious CLI enhancement would be translate a new "--shelf=SHELF" to
> "--cl=svn:shelf:SHELF" and translate "--- Changelist 'svn:shelf:SHELF'" to
> "--- Shelf 'SHELF'".


I don't really see how changelists are related to shelves. I thought
changelists were a feature that should have been more descriptively named
"tags" or "filters" or "groups" or "attributes" or something. But I
probably misunderstand. But in any event I like this idea better:

An alternative is to extend the 'revision specifier' dimension instead, as
> I mentioned before, like this (made-up example, not implemented):
>

> $ svn diff --summarize -r foo
> --- Shelf 'foo':
> A       D1
> A       new-file
> MM      config.txt
> D       hello.txt
> A       D1/D2


This makes sense to me because a shelf is like a potential future revision
(or one in the making). And this syntax is so much cleaner and shorter.
Plus this lends itself to things like:

$ svn diff -r 141:foo

to get the difference between an arbitrary revision and the contents of a
shelf. Whether that would be a nightmare to implement is another story
altogether, as is what to do when shelves become a series of checkpoints
and you want differences between arbitrary items in the series...

Anyway, my 2 cents for now.

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Since http://svn.apache.org/r1831908 the "svn status" command can operate directly on a shelf.

$ svn shelves -q
foo
$ svn st --cl svn:shelf:foo 
--- Changelist 'svn:shelf:foo':
A       D1
A       new-file
MM      config.txt
D       hello.txt
A       D1/D2

The main WC state for those paths may be unmodified, or modified in a completely different way:

$ svn st
A  +    D1
M       config.txt

This, to me, is the beginning of the more exciting side of shelving, when it is no longer doing just what a simple add-on script could do, but is becoming more deeply integrated into the work flows and commands that users are already familiar with.

Limitations of this initial experiment include: it only accesses the latest version of a shelf, and it doesn't support extended status such as that shown by --verbose and --show-updates.

An obvious CLI enhancement would be translate a new "--shelf=SHELF" to "--cl=svn:shelf:SHELF" and translate "--- Changelist 'svn:shelf:SHELF'" to "--- Shelf 'SHELF'".

More interesting to me is to evaluate how well this new meaning fits the 'changelist' dimension of the existing UI. It's not the name 'changelist' that matters at this point, it's how it relates to other dimensions like 'revision' -- is it orthogonal, for example.

An alternative is to extend the 'revision specifier' dimension instead, as I mentioned before, like this (made-up example, not implemented):

$ svn diff --summarize -r foo
--- Shelf 'foo':
A       D1
A       new-file
MM      config.txt
D       hello.txt
A       D1/D2

At first sight, the 'revision' dimension looks more appropriate: "don't look at a committed change, look at a shelved change attached to the local WC instead".

The 'changelist' interface usage in existing Subversion means "select a subset of possible files" and that doesn't sound right at first. But note I have in mind to make the changelist concept more like a shelf (in particular, having a log message, and supporting directories), and shelving/unshelving will be moving the complete changelist  (its name, its changes, its log msg etc) to/from the 'shelf' area. So that might still fit; I'm keeping that option open to explore it a bit more.

This particular interface, like all of shelving so far, is new and experimental. Constructive feedback is greatly appreciated.

- Julian

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote on 2018-05-14:
> I propose now switching completely from 'patch' storage to storing whole 
> files, and storing both the base and the working version of each shelved 
> file.

Done in http://svn.apache.org/r1831884 . Some details follow.

>      - 'mkdir', 'rmdir'

Done.

>      - copy and move

TODO.

>      - properties with 'binary' content values
>      - files not recognized as 'binary' but that don't diff/patch nicely

Done. (It uses the same code path as in 'update' and 'merge' so whatever exactly it does should be good enough.)

>      - 3-way merge

Done.

>      - conflict handling like in 'svn update' and 'svn merge'

Partly done, mostly TODO. Tree conflicts are not detected.

>      - ... run 'diff' and pipe the result into 'patch' ..., for now

When I tried that, I ran into a bug in our 'diff' code when using it that way, and moved on to directly invoking 3-way merge at a low level (per file).

- Julian

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Here is a more comprehensive outline of next development steps for 
shelve & checkpoint.

I propose now switching completely from 'patch' storage to storing whole 
files, and storing both the base and the working version of each shelved 
file. This will switch the speed-and-space characteristics, particularly 
noticeable when dealing with large files, from small and slow to fast 
and big.

To start with, we may well store a new copy of both base and working of 
every shelved file, in each version of the shelf. We can evaluate again 
after that and see what better compromise to make, adding more 
complexity to make a better trade-off, such as de-duplicating when the 
same base and/or working version appears in multiple shelf-versions. In 
the meantime, if need be, the presentation layer could do something to 
help the user avoid storing too many versions.


Functional improvements, in approximate priority order:

   * overcome remaining 'patch' format limitations [1]

     - 'mkdir', 'rmdir'
     - copy and move
     - properties with 'binary' content values
     - files not recognized as 'binary' but that don't diff/patch nicely

   * proper merge

     - 3-way merge
     - conflict handling like in 'svn update' and 'svn merge'

   * make existing read-only 'svn' commands work on a shelf:

     - svn status
     - svn export/cat/propget/proplist (for shelf base & working tree)
     - svn diff

Further functional improvements:

   * integrate 'shelf' and 'changelist' concepts [2]:

     - shelving converts a changelist into a 'shelf',
     - unshelving converts a shelf into a changelist;

   * let 'unshelve' update to the correct base or warn if base mismatch


To overcome remaining 'patch' format limitations:

   * develop the whole-file storage format to support (for binary
     files, at first):

     - base as well as working version,
     - Modify / Add / Delete / Replace status,
     - properties,

   * switch to whole-file storage (base and working versions) for all
     files, not just 'binary' files:

     - when unshelving a text-mod in a non-binary file, and for the
       properties of a 'binary' file, run 'diff' and pipe the result
       into 'patch', so the end result is the same as if we had
       continued storing a patch file, for now

   * remove the now-unused support for storing a patch file

   * develop the storage format to support:
     - 'mkdir' and 'rmdir' (with directory properties)

   * develop the storage format to support:
     - copy and move info

   * send property changes through the existing property merge APIs
     so that binary-valued properties (and probably other oddities)
     will be supported


To implement proper merging:

   * reconstruct the delta on demand as svn_diff_tree_processor_t
     (or store base + delta, reconstruct working version on demand);

   * make a merge-into-WC routine that can merge a given diff (via
     svn_diff_tree_processor_t) into the WC, using 3-way text merge
     for text files and simple theirs-or-mine for binary files, and
     can raise conflicts just like 'update' and 'merge' do;

   * let 'unshelve' feed a text file's diff into that 'merge-into-WC';

   * update the conflict handling to be able to refer to a shelf as
     the merge source;

   * might work better or be easier if we store a reference to the
     WC base layout (revision numbers, etc.).


To implement existing read-only 'svn' commands:

   * implement a way to specify a shelf instead of a regular WC
     operation in the CLI and libsvn_client API; for example:

     - a special revision specifier like '-r SHELFNAME'
       with a new value for svn_opt_revision_t.kind
     - a special changelist specifier like '--cl SHELFNAME'
     - a special extra target argument like '^s/SHELFNAME'
     (special rev-spec is prototyped on 'shelve-checkpoint' branch)

   * for 'svn status':

     - implement an API analogous to svn_wc_walk_status() that
       returns svn_wc_status3_t objects;
     - call it from the existing 'svn status' code;

   * for 'svn export/cat/propget/proplist':

     - implement some kind of API for reading tree data from shelf;
     - augment libsvn_client export/cat/prop routines to use this
       new tree API when a shelf is specified;

   * for 'svn diff':

     - implement an svn_diff_tree_processor_t driver API on the shelf;
     - decouple the existing diff output routines from their driver
       (see the diff_driver_info_t parameter to get_diff_processor());


Take advantage of the new interfaces and routines developed here, to 
simplify the rest of svn. This could include:

   * improve the 'status' API to be a better fit for both shelves and
     the main WC storage;

   * implement the new API for reading WC base and working trees, on
     top of the existing WC access APIs, and re-implement existing
     libsvn_client export/cat/prop routines on top of that;

   * rewrite existing 'update' and 'merge' to use the new
     merge-into-WC routine;


- Julian

[1] 
https://cwiki.apache.org/confluence/display/SVN/Shelving+and+Checkpointing+Dev#ShelvingandCheckpointingDev-Shelving.2

[2] 
https://cwiki.apache.org/confluence/display/SVN/Shelving+and+Checkpointing+Dev#ShelvingandCheckpointingDev-IntegrateShelvingwithChangelists

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote:
> What works now?
> 
>    * files recognized as 'binary' (by our svn:mime-type rules) are 
> handled by storing the whole working file; other files still use diff 
> and patch
> 
>    * unshelving a 'binary' file reports a conflict if the WC file is 
> already locally modified (it doesn't check the base version is the same 
> as it was);
> 
>    * unshelving a 'binary' file copies it into the WC (and adds it to 
> version control if not already)
> 
>    * properties of a 'binary' file are still stored in the patch file 
> and handled by patch application

One bit of the existing functionality isn't updated to handle the 
storage of 'binary' files: 'svn shelf-diff' still just copies the 
shelf's patch file to standard output, and doesn't do anything with the 
new 'binary' files storage. For human-readable usage that's no big deal; 
as a data export facility it would be a problem. As it's low priority 
for a UI feature I don't propose to hack something in there at this stage.

Instead I would rather re-implement this export feature once we have put 
in place a better architecture. It should be structured like:

   * fetch the shelf changes, by providing an implementation of our 
existing svn_diff_tree_processor_t API around the storage,

   * plug that into our existing diff output formatting code

Two issues need solving first. Patch file semantics don't match 
svn_diff_tree_processor_t: we need complete 'before' and 'after' files, 
so we need to move away from patch file storage, which is better for 
other reasons too. Then our existing diff output code is not quite as 
decoupled as it needs to be. I made a start -- see get_diff_processor() 
in libsvn_client/diff.c -- but a bit of coupling still remains in the 
"diff_driver_info_t" parameter.

- Julian

Re: Shelve & checkpoint - next steps

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote:
> Julian Foad wrote on 2018-04-16:
>> === Storage for binary files ===
> [...] 
> store a 'binary' file by copying the working version into a directory 
> structure that parallels the WC directory structure, [...]

Done in http://svn.apache.org/r1831344 .

Performance, by a little manual testing, seems in line with what I'd 
expect for doing OS-level file copying.

What works now?

   * files recognized as 'binary' (by our svn:mime-type rules) are 
handled by storing the whole working file; other files still use diff 
and patch

   * unshelving a 'binary' file reports a conflict if the WC file is 
already locally modified (it doesn't check the base version is the same 
as it was);

   * unshelving a 'binary' file copies it into the WC (and adds it to 
version control if not already)

   * properties of a 'binary' file are still stored in the patch file 
and handled by patch application

- Julian