You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2017/08/25 12:35:37 UTC

Shelving / Checkpointing thoughts

First of all: thanks for working on shelving and checkpointing,
Julian. These could become very important and big features. Daunting
to take them on, but it's good to see someone having a go at it.

I've tried to read through all the docs and recent mail threads. Below
are a couple of thoughts.

TL;DR: I'd suggest first going for a very good Shelving
implementation, and not rushing for Checkpointing. Shelving can
already solve some checkpointing use-cases (see below), and
Checkpointing as a full-blown feature needs very careful thought (even
if limited, it's risky to paint ourselves into a corner), and will
quickly raise expectations of "local commits" or "local branching"
(which still seem far away).


== Shelving ==

Looks great so far. Of course a lot of challenges remain for all the
cases which are not yet (correctly) covered by 'svn diff' and 'svn
patch' (property changes, tree operations, binary files, unresolved
conflicts, etc.). Attaching a log message to a shelf is key, and the
association with changelists looks like a good approach.

- Shelves should (eventually) support directories as "versioned
items". Changelists currently don't support directories.

- Suggestion: 'svn shelve --keep', to create a shelf (patch) in the
"shelf storage" but not revert it. That would enable some crude way of
checkpointing your work, through simple patches (which can be applied
later by fuzzy patching, or ...):

    work on feature A
    svn shelve --keep --name "feature A"
    continue work on feature A
    turns out badly, lets go back
    svn revert; svn unshelve "feature A"


== Checkpointing ==

I think only "option 3" looks viable / interesting in the long run
(option 1, storing patches, looks a lot like simple shelving, so not
much more value imho). Or even a completely different approach which
is implemented in working copy storage (option 4? I haven't thought
this through, but I'm afraid of the disk space requirements, and init
I/O cost, of option 3 for large multi-GB working copies).

It's very hard for me to not think of checkpoints as local branches of
some sort. And my users will immediately want to use them in that way.
In all honesty, I think we should aim for powerful local branches (and
think of an architecture / design / ui for that), and then think about
how we can perhaps start with something simpler and more limited as a
first step, but which goes in that direction. I.e. a more holistic
design around "local branches", "local commits", "checkpointing".
What's the big picture?

- After reading the "Terminology" section of
https://wiki.apache.org/subversion/SavePoints, I agree with that
document that "Savepoints" might be a better name. But don't want to
bikeshed over it ...

- In a prior mail-thread between you and Nathan Hartman the "rebasing"
problem was mentioned. In [1] you said:

> Performing an 'update' with a checkpoint series is a bigger ask than it
> might at first seem. In effect, it requires rebasing the series of
> checkpoints on the new base, which gets ugly because of the need to
> handle conflicts (which is ugly enough already in the existing
> single-depth WC).

However, that seems to only sane way to go for me. Rebasing the
checkpoints one by one, and resolving conflicts along the way. Don't
know how you'd do that though, if the checkpoints are revisions 1, 2,
3 in a local repository (with immutable history etc). This is really
something where the "local repository" technique breaks down IMHO. In
contract with DVCS's, in SVN history is immutable. But mutability is
quite important for local branches / commits.

In that sense, a series of patches is more flexible: you can still
apply them with fuzz even if applying them to a different BASE state,
and often that will "just work" (and conflicts "just" need to be
resolved).

In that light, Nathan's latest post in that thread ([2]) was also
interesting, where he suggested to store the BASE together with the
WORK for every checkpoint. I'm not sure if that's the way to go (maybe
you only need a "description of BASE", not the pristines etc), but it
made me realize: don't expect checkpoints to work for "undo 'svn
update'" if you can't restore the original BASE tree (down to the
exact mixed-revisionness).

[1] https://svn.haxx.se/dev/archive-2017-07/0302.shtml
[2] https://svn.haxx.se/dev/archive-2017-08/0064.shtml

-- 
Johan

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Daniel Shahaf wrote:
> Julian Foad wrote on Fri, 25 Aug 2017 21:42 +0100:
>> Let's clarify. We can mean two possible things when we say 'a series of
>> patches':
>>
>>   1. "patch versions": a series of successively better patches, all
>> attempting the same logical thing, all from the same base, and only one
>> of which is applied at any time;
>>
>>   2. a series of patches, each providing a different logical change,
>> where each patch is based on the result of applying the previous one.
>> ("quilt" is a tool for managing path series of this kind. My 'option 3'
>> (local repository) design for checkpointing could also be used in this
>> way, in a primitive way, but would not support revising earlier patches
>> in the series which is a key strength of what "quilt" can do.)
>>
>> I am talking about definition 1 ("patch versions").
>> [...]
>>
>> I propose that we should not attempt to provide any special support for
>> definition 2 within this "shelving" feature; users can manage that
>> themselves by simply remembering which feature names depend on which
>> other ones, or by including some other numbering system within the names.
> 
> Could you explain how you see this working?  In use-case #2, later patches
> in the series typically depend on earlier patches.  I don't see how that use-case
> can be addressed if the patches are all implemented against deltas against
> the same base (for example, because if patch #2 in a 5-patch series is edited,
> all later patches would have to be regenerated).

This kind of shelving (with simple checkpointing add-ons) would *not*
help with use case #2: in particular it would not provide any support
for editing an earlier patch and rebasing the later patches. All I meant
is that you would still be free to use (manually) any naming/numbering
system you wish to help you remember how the patches that you shelve are
related.

- Julian


Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote:
> Stefan Kueng wrote:
>> Another reason to store whole files: patches don't work with non-text
>> files. Which is a major problem especially on Windows where files often
>> are encoded in utf16.
> 
> Making 'svn diff' and 'svn patch' support non-text files is already a
> dependency of shelving-by-patch-files.

See the "Existing Issues to Overcome" section,
https://docs.google.com/document/d/1PVgw0BdPF7v67oxIK7B_Yjmr3p28ojabP5N1PfZTsHk/edit#heading=h.4sj99cgksge9

> At present, 'svn diff --git' supports non-text files by writing into the
> patch file (and encoded into ASCII), the whole of the old file and the
> whole of the new file. So that already does what you are asking: it
> 'stores whole files'. And 'svn patch' handles this as input. Just like
> 'svn merge', it chooses either one side change or the other, and if both
> sides changed that is a conflict. (I haven't tested it, I assume it
> works like that.)

Which implies that binary file support might "just work" if I simply add
the "--git" option in to the "diff" call in the prototype. I plan to try
that.

And before anyone raises their hand to say this design will be too slow
for use cases involving very large binary files, yes, I know, it almost
certainly will be. This is called a prototype for good reason.

- Julian


> Storing whole files will not magically make merging UTF-16 text changes
> work. For that, the merge will need to be taught how to process UTF-16
> text. And if we do that, then 'svn diff' and 'svn patch' could process
> UTF-16 as text too. That would be a great enhancement to Subversion, but
> is an orthogonal issue.


Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Stefan Kueng wrote:
> Another reason to store whole files: patches don't work with non-text
> files. Which is a major problem especially on Windows where files often
> are encoded in utf16.

Making 'svn diff' and 'svn patch' support non-text files is already a
dependency of shelving-by-patch-files.

At present, 'svn diff --git' supports non-text files by writing into the
patch file (and encoded into ASCII), the whole of the old file and the
whole of the new file. So that already does what you are asking: it
'stores whole files'. And 'svn patch' handles this as input. Just like
'svn merge', it chooses either one side change or the other, and if both
sides changed that is a conflict. (I haven't tested it, I assume it
works like that.)

Storing whole files will not magically make merging UTF-16 text changes
work. For that, the merge will need to be taught how to process UTF-16
text. And if we do that, then 'svn diff' and 'svn patch' could process
UTF-16 as text too. That would be a great enhancement to Subversion, but
is an orthogonal issue.

- Julian

Re: Shelving / Checkpointing thoughts

Posted by Stefan Kueng <to...@gmail.com>.

On 27.08.2017 02:09, Branko Čibej wrote:
> On 27.08.2017 02:04, Daniel Shahaf wrote:
>> Julian Foad wrote on Fri, 25 Aug 2017 21:42 +0100:
>>> Let's clarify. We can mean two possible things when we say 'a series of
>>> patches':
>>>
>>>    1. "patch versions": a series of successively better patches, all
>>> attempting the same logical thing, all from the same base, and only one
>>> of which is applied at any time;
>>>
>>>    2. a series of patches, each providing a different logical change,
>>> where each patch is based on the result of applying the previous one.
>>> ("quilt" is a tool for managing path series of this kind. My 'option 3'
>>> (local repository) design for checkpointing could also be used in this
>>> way, in a primitive way, but would not support revising earlier patches
>>> in the series which is a key strength of what "quilt" can do.)
>>>
>>> I am talking about definition 1 ("patch versions").
>>>
>>> I propose patches in a series of patch versions be named "featureA-1",
>>> "featureA-2", ... (This is what I do already, manually, in my own work.)
>>>
>>> I propose that we should not attempt to provide any special support for
>>> definition 2 within this "shelving" feature; users can manage that
>>> themselves by simply remembering which feature names depend on which
>>> other ones, or by including some other numbering system within the names.
>> Could you explain how you see this working?  In use-case #2, later patches
>> in the series typically depend on earlier patches.  I don't see how that use-case
>> can be addressed if the patches are all implemented against deltas against
>> the same base (for example, because if patch #2 in a 5-patch series is edited,
>> all later patches would have to be regenerated).
>>
>> If that's not clear enough I'd be happy to elaborate.
>>
>> Cheers,
>>
>> Daniel
>>
>> P.S. I think brane's suggestion would make the "edit patch #2 in a 5-patch series"
>> use-case a lot easier: diff3 is exactly the right tool for rebasing patch #3 on top
>> of the edited #2.
> 
> Yes, that's one of the reasons I proposed storing whole files instead of
> patches. Rebasing and reordering would become much simpler (although not
> exactly "simple" if we take tree restructuring into account).

Another reason to store whole files: patches don't work with non-text 
files. Which is a major problem especially on Windows where files often 
are encoded in utf16.

Stefan


Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
>
> Apache products aren't allowed to depend on GPL'd code for mandatory /
> core features.
>

Even GPL2 with the linking exception?


> What is the advantage of linking to libgit2 over simply telling users to
> use git-svn with 'git stash' and local branches?
>

None, assuming you've learned the Git command line. I guess git-svn
semantics, too.

- Paul

Re: Shelving / Checkpointing thoughts

Posted by Branko Čibej <br...@apache.org>.
On 06.06.2018 16:53, Paul Hammant wrote:
> [going back in time to August 29th]
>
>
>
>     Apache products aren't allowed to depend on GPL'd code for mandatory /
>     core features.
>
>
> This is academic now as Julian is making great progress with his
> shelve tech that's built in to Subversion and idiomatic, but there are
> MIT licensed versions of Git:
>
>     https://github.com/go-gitea/git/blob/master/LICENSE (MIT)
>
> You could link that in instead of LibGit2 (GPL) for storage and
> dissemination of patch sets.
>
> Sure, a Go run-time library is not as bytespace/ram/CPU efficient as C
> or Rust, but it's closer by a mile than Java or Python (for example).


The problem with any kind of git-based shelving solution is that it
doesn't admit the existence of directories. We'll want to support those
one day.

Also, no, we can't have core functionality written in Go or Rust; it has
to be C because that's what brings us cross-platform portability.

-- Brane

Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
[going back in time to August 29th]



> Apache products aren't allowed to depend on GPL'd code for mandatory /
> core features.
>
>
This is academic now as Julian is making great progress with his shelve
tech that's built in to Subversion and idiomatic, but there are MIT
licensed versions of Git:

    https://github.com/go-gitea/git/blob/master/LICENSE (MIT)

You could link that in instead of LibGit2 (GPL) for storage and
dissemination of patch sets.

Sure, a Go run-time library is not as bytespace/ram/CPU efficient as C or
Rust, but it's closer by a mile than Java or Python (for example).

- Paul

Re: Shelving / Checkpointing thoughts

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Paul Hammant wrote on Sun, Aug 27, 2017 at 07:04:03 -0400:
> Delegating to libgit2 to invisibly handle: shelves, local-branches and
> pull-requests could yield a workable and flexible implementation.

Apache products aren't allowed to depend on GPL'd code for mandatory /
core features.

What is the advantage of linking to libgit2 over simply telling users to
use git-svn with 'git stash' and local branches?

Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
Fully expecting this to have a popularity of -5 on a 0 - 10 scale, I
thought I'd share that I'm progressing with a "subversion shelve" tech on
GitHub, that uses Git for the Shelve/unshelve (making a single 'bundle'
blob) :- https://github.com/paul-hammant/svn-shelve

Pull Requests welcome :)

Id thought Python would be easy to work with here, but I keep getting
caught out when Py.Test chokes on me changing directories in order to do
operations, so I'm wanting some form of try/finally fu for that and maybe
also pushd/popd <https://gist.github.com/Tatsh/7131812>.

It would be great if Subversion could do the command naming trick of Git.
svn-shelve could be handled by the svn command itself, just the same as Git
does - I previously created git-slim and git-size
<https://paulhammant.com/2016/08/15/introducing-git-size-command-and-visualization/>
.

Regards,

- Paul

Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
Here's a prototype in Python2 that makes a git repo in a 'shelve' folder.
Two commits - one with the starting position, and one with the ending
position. The 'svn info' for the resource is copied in too (same file name
with a '.info' suffix).

import sh
import os
from stat import S_IWUSR, S_IREAD


import fileinput

sh.rm("-rf", "maven-gpg-plugin")
sh.rm("-rf", "shelve")

sh.svn("co", "https://svn.apache.org/repos/asf/maven/plugins/trunk/maven-gpg-plugin/")

# Change two files.
for line in fileinput.input("maven-gpg-plugin/pom.xml", inplace=True):
     print "%d: %s" % (fileinput.filelineno(), line),
for line in fileinput.input("maven-gpg-plugin/src/main/java/org/apache/maven/plugin/gpg/SigningBundle.java",
inplace=True):
     print "%d: %s" % (fileinput.filelineno(), line),

# While files changed?
files = sh.svn("st", "maven-gpg-plugin")

sh.mkdir("shelve")
sh.git("init", "shelve")

# Copy originals in.
for line in files.splitlines():
     file_name = line[1:].strip()
     info = sh.svn("info", file_name)
     for iLine in info:
          if iLine.startswith("Checksum:"):
               checksum = iLine.split(" ")[1].strip()
               dir = checksum[0:2]
               sh.mkdir("-p", "shelve/" + "/".join(file_name.split('/')[:-1]))
               sh.cp("maven-gpg-plugin/.svn/pristine/" + dir + "/" +
checksum + ".svn-base", "shelve/" + file_name)
               os.chmod("shelve/" + file_name, S_IWUSR | S_IREAD) #
make writable
     f = open("shelve/" + file_name + ".info", 'w')
     f.writelines(info)
     f.close()

# Do a commit
sh.cd("shelve")
sh.git("add", ".")
sh.git("commit", "-m", "start")
sh.cd("..")

# Copy changed versions in.
for line in files.splitlines():
     file_name = line[1:].strip()
     sh.cp(file_name, "shelve/" + file_name)

# Do a commit
sh.cd("shelve")
sh.git("add", ".")
sh.git("commit", "-m", "finish")
sh.cd("..")

Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
Perhaps easy to prototype in Python, too.

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Paul Hammant wrote:
> Delegating to libgit2 to invisibly handle: shelves, local-branches and
> pull-requests could yield a workable and flexible implementation.
> [...]

It will definitely be worth me looking into the possibilities. Thanks
for the suggestion.

- Julian

Re: Shelving / Checkpointing thoughts

Posted by Paul Hammant <pa...@hammant.org>.
Delegating to libgit2 to invisibly handle: shelves, local-branches and
pull-requests could yield a workable and flexible implementation.

Pull-requests flow back to the central repo, of course) and there
interactive and automatic workflows include: code review, classic CI,
linting and other badge-style data-points.  Those workflows of course are
the purview of other pieces of software that are triggered by Subversion.

- Paul

Re: Shelving / Checkpointing thoughts

Posted by Branko Čibej <br...@apache.org>.
On 27.08.2017 06:00, Daniel Shahaf wrote:
> Branko Čibej wrote on Sun, 27 Aug 2017 02:09 +0200:
>> On 27.08.2017 02:04, Daniel Shahaf wrote:
>>> P.S. I think brane's suggestion would make the "edit patch #2 in a 5-patch series"
>>> use-case a lot easier: diff3 is exactly the right tool for rebasing patch #3 on top
>>> of the edited #2.
>> Yes, that's one of the reasons I proposed storing whole files instead of
>> patches. Rebasing and reordering would become much simpler (although not
>> exactly "simple" if we take tree restructuring into account).
> The part with tree restructuring would "simply" require a working diff3
> implementation for tree changes.  Isn't that exactly what the tree
> conflicts work is supposed to produce?

I was wondering about that, yes. Not sure how one would combine
(un)shelving and conflict resoiltion, but clearly it must be possible
somehow.

-- Brane

Re: Shelving / Checkpointing thoughts

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Sun, 27 Aug 2017 02:09 +0200:
> On 27.08.2017 02:04, Daniel Shahaf wrote:
> > P.S. I think brane's suggestion would make the "edit patch #2 in a 5-patch series"
> > use-case a lot easier: diff3 is exactly the right tool for rebasing patch #3 on top
> > of the edited #2.
> 
> Yes, that's one of the reasons I proposed storing whole files instead of
> patches. Rebasing and reordering would become much simpler (although not
> exactly "simple" if we take tree restructuring into account).

The part with tree restructuring would "simply" require a working diff3
implementation for tree changes.  Isn't that exactly what the tree
conflicts work is supposed to produce?

Re: Shelving / Checkpointing thoughts

Posted by Branko Čibej <br...@apache.org>.
On 27.08.2017 02:04, Daniel Shahaf wrote:
> Julian Foad wrote on Fri, 25 Aug 2017 21:42 +0100:
>> Let's clarify. We can mean two possible things when we say 'a series of
>> patches':
>>
>>   1. "patch versions": a series of successively better patches, all
>> attempting the same logical thing, all from the same base, and only one
>> of which is applied at any time;
>>
>>   2. a series of patches, each providing a different logical change,
>> where each patch is based on the result of applying the previous one.
>> ("quilt" is a tool for managing path series of this kind. My 'option 3'
>> (local repository) design for checkpointing could also be used in this
>> way, in a primitive way, but would not support revising earlier patches
>> in the series which is a key strength of what "quilt" can do.)
>>
>> I am talking about definition 1 ("patch versions").
>>
>> I propose patches in a series of patch versions be named "featureA-1",
>> "featureA-2", ... (This is what I do already, manually, in my own work.)
>>
>> I propose that we should not attempt to provide any special support for
>> definition 2 within this "shelving" feature; users can manage that
>> themselves by simply remembering which feature names depend on which
>> other ones, or by including some other numbering system within the names.
> Could you explain how you see this working?  In use-case #2, later patches
> in the series typically depend on earlier patches.  I don't see how that use-case
> can be addressed if the patches are all implemented against deltas against
> the same base (for example, because if patch #2 in a 5-patch series is edited,
> all later patches would have to be regenerated).
>
> If that's not clear enough I'd be happy to elaborate.
>
> Cheers,
>
> Daniel
>
> P.S. I think brane's suggestion would make the "edit patch #2 in a 5-patch series"
> use-case a lot easier: diff3 is exactly the right tool for rebasing patch #3 on top
> of the edited #2.

Yes, that's one of the reasons I proposed storing whole files instead of
patches. Rebasing and reordering would become much simpler (although not
exactly "simple" if we take tree restructuring into account).

-- Brane

Re: Shelving / Checkpointing thoughts

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Fri, 25 Aug 2017 21:42 +0100:
> Let's clarify. We can mean two possible things when we say 'a series of
> patches':
> 
>   1. "patch versions": a series of successively better patches, all
> attempting the same logical thing, all from the same base, and only one
> of which is applied at any time;
> 
>   2. a series of patches, each providing a different logical change,
> where each patch is based on the result of applying the previous one.
> ("quilt" is a tool for managing path series of this kind. My 'option 3'
> (local repository) design for checkpointing could also be used in this
> way, in a primitive way, but would not support revising earlier patches
> in the series which is a key strength of what "quilt" can do.)
> 
> I am talking about definition 1 ("patch versions").
> 
> I propose patches in a series of patch versions be named "featureA-1",
> "featureA-2", ... (This is what I do already, manually, in my own work.)
> 
> I propose that we should not attempt to provide any special support for
> definition 2 within this "shelving" feature; users can manage that
> themselves by simply remembering which feature names depend on which
> other ones, or by including some other numbering system within the names.

Could you explain how you see this working?  In use-case #2, later patches
in the series typically depend on earlier patches.  I don't see how that use-case
can be addressed if the patches are all implemented against deltas against
the same base (for example, because if patch #2 in a 5-patch series is edited,
all later patches would have to be regenerated).

If that's not clear enough I'd be happy to elaborate.

Cheers,

Daniel

P.S. I think brane's suggestion would make the "edit patch #2 in a 5-patch series"
use-case a lot easier: diff3 is exactly the right tool for rebasing patch #3 on top
of the edited #2.

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Forgot the conclusion...

Julian Foad wrote:
> Johan Corveleyn wrote:
>> Still only one shelf per WC (*the* shelf)? Grouping them through
>> naming ("savepoint-1", "savepoint-2" are two shelved patches belonging
>> to the same series, but "featureA" (which was reverted) is separate
>> because it doesn't have the same prefix)?

Absolutely yes! I thought I had previously indicated so.

>> Or do we need multiple shelves with some name too?

Absolutely not. Not for any reason that I can see.

- Julian

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Johan Corveleyn wrote:
>>   * "Shelving" or "to shelve" means putting something on a shelf. There
>> is one "shelf" per WC.
>>
>>   * The thing we put on the shelf is called a "patch" or a "shelved
>> change", and is analogous to a book or a paper placed on the shelf. A
>> numbered version of a patch can be called a "checkpoint".
>>
>>   * A series of checkpoint patches is a series of "patch versions" or a
>> "checkpoint series". I think this is simpler than introducing a new term.
> 
> Ah yes, of course. Sorry, no need to invent a new term.
> 
> Just wondering then, when we create a "series of patches" that belong
> together, that have some ordering, how do we organize that?

Let's clarify. We can mean two possible things when we say 'a series of
patches':

  1. "patch versions": a series of successively better patches, all
attempting the same logical thing, all from the same base, and only one
of which is applied at any time;

  2. a series of patches, each providing a different logical change,
where each patch is based on the result of applying the previous one.
("quilt" is a tool for managing path series of this kind. My 'option 3'
(local repository) design for checkpointing could also be used in this
way, in a primitive way, but would not support revising earlier patches
in the series which is a key strength of what "quilt" can do.)

I am talking about definition 1 ("patch versions").

I propose patches in a series of patch versions be named "featureA-1",
"featureA-2", ... (This is what I do already, manually, in my own work.)

I propose that we should not attempt to provide any special support for
definition 2 within this "shelving" feature; users can manage that
themselves by simply remembering which feature names depend on which
other ones, or by including some other numbering system within the names.

> Still only one shelf per WC (*the* shelf)? Grouping them through
> naming ("savepoint-1", "savepoint-2" are two shelved patches belonging
> to the same series, but "featureA" (which was reverted) is separate
> because it doesn't have the same prefix)? Or do we need multiple
> shelves with some name too?
> 
> Just one more thought: in the namespace of shelved changes, we might
> want to reserve "svn:" or some such prefix, for internal use, to give
> us possibilities for features built upon the shelving infrastructure.

Good thought.

- Julian


Re: Shelving / Checkpointing thoughts

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Aug 25, 2017 at 4:59 PM, Julian Foad <ju...@apache.org> wrote:
> Johan Corveleyn wrote:
>> On Fri, Aug 25, 2017 at 3:33 PM, Julian Foad <ju...@apache.org> wrote:
> [...]>> The Checkpoint feature could add the copy-and-modify facility
> for the
>>> log message.
>>
>> Yes, maybe we'll need to have some grouping structure / namespacing in
>> the shelves for this. A "rack" or something :-). The rack carries a
>> name ("savepoints", "feature A"); a single shelf in a rack is just
>> 'svn shelve --rack "feature A"'; If I add more shelves to a rack, they
>> get numbered. [...]
> I think the terminology works best, and most in line with other tools
> (p4, hg, bzr) like this:
>
>   * "Shelving" or "to shelve" means putting something on a shelf. There
> is one "shelf" per WC.
>
>   * The thing we put on the shelf is called a "patch" or a "shelved
> change", and is analogous to a book or a paper placed on the shelf. A
> numbered version of a patch can be called a "checkpoint".
>
>   * A series of checkpoint patches is a series of "patch versions" or a
> "checkpoint series". I think this is simpler than introducing a new term.

Ah yes, of course. Sorry, no need to invent a new term.

Just wondering then, when we create a "series of patches" that belong
together, that have some ordering, how do we organize that?

Still only one shelf per WC (*the* shelf)? Grouping them through
naming ("savepoint-1", "savepoint-2" are two shelved patches belonging
to the same series, but "featureA" (which was reverted) is separate
because it doesn't have the same prefix)? Or do we need multiple
shelves with some name too?

Just one more thought: in the namespace of shelved changes, we might
want to reserve "svn:" or some such prefix, for internal use, to give
us possibilities for features built upon the shelving infrastructure.

-- 
Johan

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Johan Corveleyn wrote:
> On Fri, Aug 25, 2017 at 3:33 PM, Julian Foad <ju...@apache.org> wrote:
[...]>> The Checkpoint feature could add the copy-and-modify facility
for the
>> log message.
> 
> Yes, maybe we'll need to have some grouping structure / namespacing in
> the shelves for this. A "rack" or something :-). The rack carries a
> name ("savepoints", "feature A"); a single shelf in a rack is just
> 'svn shelve --rack "feature A"'; If I add more shelves to a rack, they
> get numbered. [...]
I think the terminology works best, and most in line with other tools
(p4, hg, bzr) like this:

  * "Shelving" or "to shelve" means putting something on a shelf. There
is one "shelf" per WC.

  * The thing we put on the shelf is called a "patch" or a "shelved
change", and is analogous to a book or a paper placed on the shelf. A
numbered version of a patch can be called a "checkpoint".

  * A series of checkpoint patches is a series of "patch versions" or a
"checkpoint series". I think this is simpler than introducing a new term.

- Julian

Re: Shelving / Checkpointing thoughts

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Aug 25, 2017 at 3:33 PM, Julian Foad <ju...@apache.org> wrote:
...
>> == Shelving ==
>>
>> Looks great so far. Of course a lot of challenges remain for all the
>> cases which are not yet (correctly) covered by 'svn diff' and 'svn
>> patch' (property changes, tree operations, binary files, unresolved
>> conflicts, etc.). Attaching a log message to a shelf is key, and the
>> association with changelists looks like a good approach.
>>
>> - Shelves should (eventually) support directories as "versioned
>> items". Changelists currently don't support directories.
>
> I agree with all this too. A log message is supported in the prototype
> (but not yet integrated with changelists, as discussed in another thread).

We'll need to have the discussion about what changelists should do
with directories though. Not sure what prior discussions were already
held about this, when the changelist feature was designed ...

>> - Suggestion: 'svn shelve --keep', to create a shelf (patch) in the
>> "shelf storage" but not revert it. That would enable some crude way of
>> checkpointing your work, through simple patches (which can be applied
>> later by fuzzy patching, or ...):
>>
>>     work on feature A
>>     svn shelve --keep --name "feature A"
>>     continue work on feature A
>>     turns out badly, lets go back
>>     svn revert; svn unshelve "feature A"
>
> Yes; implemented now in r1806168.

Great!

> In fact this usage aligns very well with how I use patch files myself. I
> often save a series of patches to checkpoint my development of a
> feature, with names like 'foo-1.patch' ... 'foo-N.patch'.
>
> One thing I like to do in that case is to copy the log message from the
> top of the previous patch into the top of the new patch and update it.
> If Subversion could make that step easier, that would help.
>
> At present the prototype 'svn shelve' accepts a log message with '-m'
> or'-F' but doesn't provide an easy way to invoke an editor, assuming you
> don't want a log message if you don't specify one. One way to solve that
> would be by adopting the same convention as 'commit': pop up an editor
> by default and allow quitting with an empty message if desired. Another
> way could be a command-line option if we think no-log-message use cases
> should be the default. But that's a UI detail that we can leave till later.
>
> The Checkpoint feature could add the copy-and-modify facility for the
> log message.

Yes, maybe we'll need to have some grouping structure / namespacing in
the shelves for this. A "rack" or something :-). The rack carries a
name ("savepoints", "feature A"); a single shelf in a rack is just
'svn shelve --rack "feature A"'; If I add more shelves to a rack, they
get numbered.

svn shelve --keep-local --rack savepoints -> shelf "savepoints-0" (or
just "savepoints"?)
svn shelve --keep-local --rack savepoints -> shelf "savepoints-1"
svn revert
svn unshelve savepoints-0

Just thinking out loud here ...

>
>> == Checkpointing ==
>>
>> I think only "option 3" looks viable / interesting in the long run
>> (option 1, storing patches, looks a lot like simple shelving, so not
>> much more value imho). Or even a completely different approach which
>> is implemented in working copy storage (option 4? I haven't thought
>> this through, but I'm afraid of the disk space requirements, and init
>> I/O cost, of option 3 for large multi-GB working copies).
>>
>> It's very hard for me to not think of checkpoints as local branches of
>> some sort. And my users will immediately want to use them in that way.
>> In all honesty, I think we should aim for powerful local branches (and
>> think of an architecture / design / ui for that), and then think about
>> how we can perhaps start with something simpler and more limited as a
>> first step, but which goes in that direction. I.e. a more holistic
>> design around "local branches", "local commits", "checkpointing".
>> What's the big picture?
>
> Good question. While I am continuing to think about ways in which some
> larger scope towards "local branching" could play out, I don't currently
> see that ever working well for Subversion, at least not the current
> generation 1.x.
>
> Given your comments about shelving being almost enough, I now wonder if
> we should design simple checkpointing as a simple extension to shelving,
> whereby we:
>
>   * automatically increment the patch suffix number,
>     if the latest shelved patch was made with '--keep-local';
>
>   * also transfer the log message transfer from the previous numbered
>     patch and offer it for editing;
>
>   * offer a UI shortcut to revert to the previous version in the series;
>
> and any other similar things.
>
> WDYT?

Yes! I think that would be enough for a good release 1.11, with nice
"shelving" and basic checkpointing (actually mainly UI lipstick with
options / conventions on the "shelve / unshelve" commands) on top of
it. Leaves the door open for more fundamental things in the future ...

>> - After reading the "Terminology" section of
>> https://wiki.apache.org/subversion/SavePoints, I agree with that
>> document that "Savepoints" might be a better name. But don't want to
>> bikeshed over it ...
>
> Could be. (Also, 'checkpoint' is similar to 'checkout' which makes
> command-line completion not work well, and the possible abbreviation
> 'cp' clashes with 'copy'.)
>
>> - In a prior mail-thread between you and Nathan Hartman the "rebasing"
>> problem was mentioned. In [1] you said:
>>
>>> Performing an 'update' with a checkpoint series is a bigger ask than it
>>> might at first seem. In effect, it requires rebasing the series of
>>> checkpoints on the new base, which gets ugly because of the need to
>>> handle conflicts (which is ugly enough already in the existing
>>> single-depth WC).
>>
>> However, that seems to only sane way to go for me. Rebasing the
>> checkpoints one by one, and resolving conflicts along the way. Don't
>> know how you'd do that though, if the checkpoints are revisions 1, 2,
>> 3 in a local repository (with immutable history etc). This is really
>> something where the "local repository" technique breaks down IMHO. In
>> contract with DVCS's, in SVN history is immutable. But mutability is
>> quite important for local branches / commits.
>
> Yes, I have realized that it is going to be tricky to make storage in a
> local repository effectively mutable. There are various options, the
> most plausible being to leave old data in place and keep committing the
> modified versions of changes, tracking them as needed; but it's not simple.
>
>> In that sense, a series of patches is more flexible: you can still
>> apply them with fuzz even if applying them to a different BASE state,
>> and often that will "just work" (and conflicts "just" need to be
>> resolved).
>
> I don't buy that as an easier solution to updating a patch series. Any
> conflicts would be handled worse this way.

Okay, updating the patches when the "new base" comes in might indeed
be better. OTOH: if the user has discarded / forgotten about his
checkpoints, any "early conflict resolution of checkpoints / patches"
might be a waste of time. It's a tradeoff I guess (perhaps updating
them should be done "during rebasing", but the user can postpone
conflicts ...). And what if rebasing checkpoint-2 has a conflict, but
that conflict would not occur when rebasing checkpoint-3 (i.e. a
change was made locally which eliminated the conflict)? Hmmm ...

-- 
Johan

Re: Shelving / Checkpointing thoughts

Posted by Julian Foad <ju...@apache.org>.
Johan, thank you very much for these considered thoughts!

Johan Corveleyn wrote:
> First of all: thanks for working on shelving and checkpointing,
> Julian. These could become very important and big features. Daunting
> to take them on, but it's good to see someone having a go at it.
> 
> I've tried to read through all the docs and recent mail threads. Below
> are a couple of thoughts.
> 
> TL;DR: I'd suggest first going for a very good Shelving
> implementation, and not rushing for Checkpointing. Shelving can
> already solve some checkpointing use-cases (see below), and
> Checkpointing as a full-blown feature needs very careful thought (even
> if limited, it's risky to paint ourselves into a corner), and will
> quickly raise expectations of "local commits" or "local branching"
> (which still seem far away).

Agreed.

> == Shelving ==
> 
> Looks great so far. Of course a lot of challenges remain for all the
> cases which are not yet (correctly) covered by 'svn diff' and 'svn
> patch' (property changes, tree operations, binary files, unresolved
> conflicts, etc.). Attaching a log message to a shelf is key, and the
> association with changelists looks like a good approach.
> 
> - Shelves should (eventually) support directories as "versioned
> items". Changelists currently don't support directories.

I agree with all this too. A log message is supported in the prototype
(but not yet integrated with changelists, as discussed in another thread).

> - Suggestion: 'svn shelve --keep', to create a shelf (patch) in the
> "shelf storage" but not revert it. That would enable some crude way of
> checkpointing your work, through simple patches (which can be applied
> later by fuzzy patching, or ...):
> 
>     work on feature A
>     svn shelve --keep --name "feature A"
>     continue work on feature A
>     turns out badly, lets go back
>     svn revert; svn unshelve "feature A"

Yes; implemented now in r1806168.

In fact this usage aligns very well with how I use patch files myself. I
often save a series of patches to checkpoint my development of a
feature, with names like 'foo-1.patch' ... 'foo-N.patch'.

One thing I like to do in that case is to copy the log message from the
top of the previous patch into the top of the new patch and update it.
If Subversion could make that step easier, that would help.

At present the prototype 'svn shelve' accepts a log message with '-m'
or'-F' but doesn't provide an easy way to invoke an editor, assuming you
don't want a log message if you don't specify one. One way to solve that
would be by adopting the same convention as 'commit': pop up an editor
by default and allow quitting with an empty message if desired. Another
way could be a command-line option if we think no-log-message use cases
should be the default. But that's a UI detail that we can leave till later.

The Checkpoint feature could add the copy-and-modify facility for the
log message.


> == Checkpointing ==
> 
> I think only "option 3" looks viable / interesting in the long run
> (option 1, storing patches, looks a lot like simple shelving, so not
> much more value imho). Or even a completely different approach which
> is implemented in working copy storage (option 4? I haven't thought
> this through, but I'm afraid of the disk space requirements, and init
> I/O cost, of option 3 for large multi-GB working copies).
> 
> It's very hard for me to not think of checkpoints as local branches of
> some sort. And my users will immediately want to use them in that way.
> In all honesty, I think we should aim for powerful local branches (and
> think of an architecture / design / ui for that), and then think about
> how we can perhaps start with something simpler and more limited as a
> first step, but which goes in that direction. I.e. a more holistic
> design around "local branches", "local commits", "checkpointing".
> What's the big picture?

Good question. While I am continuing to think about ways in which some
larger scope towards "local branching" could play out, I don't currently
see that ever working well for Subversion, at least not the current
generation 1.x.

Given your comments about shelving being almost enough, I now wonder if
we should design simple checkpointing as a simple extension to shelving,
whereby we:

  * automatically increment the patch suffix number,
    if the latest shelved patch was made with '--keep-local';

  * also transfer the log message transfer from the previous numbered
    patch and offer it for editing;

  * offer a UI shortcut to revert to the previous version in the series;

and any other similar things.

WDYT?


> - After reading the "Terminology" section of
> https://wiki.apache.org/subversion/SavePoints, I agree with that
> document that "Savepoints" might be a better name. But don't want to
> bikeshed over it ...

Could be. (Also, 'checkpoint' is similar to 'checkout' which makes
command-line completion not work well, and the possible abbreviation
'cp' clashes with 'copy'.)

> - In a prior mail-thread between you and Nathan Hartman the "rebasing"
> problem was mentioned. In [1] you said:
> 
>> Performing an 'update' with a checkpoint series is a bigger ask than it
>> might at first seem. In effect, it requires rebasing the series of
>> checkpoints on the new base, which gets ugly because of the need to
>> handle conflicts (which is ugly enough already in the existing
>> single-depth WC).
> 
> However, that seems to only sane way to go for me. Rebasing the
> checkpoints one by one, and resolving conflicts along the way. Don't
> know how you'd do that though, if the checkpoints are revisions 1, 2,
> 3 in a local repository (with immutable history etc). This is really
> something where the "local repository" technique breaks down IMHO. In
> contract with DVCS's, in SVN history is immutable. But mutability is
> quite important for local branches / commits.

Yes, I have realized that it is going to be tricky to make storage in a
local repository effectively mutable. There are various options, the
most plausible being to leave old data in place and keep committing the
modified versions of changes, tracking them as needed; but it's not simple.

> In that sense, a series of patches is more flexible: you can still
> apply them with fuzz even if applying them to a different BASE state,
> and often that will "just work" (and conflicts "just" need to be
> resolved).

I don't buy that as an easier solution to updating a patch series. Any
conflicts would be handled worse this way.

> In that light, Nathan's latest post in that thread ([2]) was also
> interesting, where he suggested to store the BASE together with the
> WORK for every checkpoint. I'm not sure if that's the way to go (maybe
> you only need a "description of BASE", not the pristines etc), but it
> made me realize: don't expect checkpoints to work for "undo 'svn
> update'" if you can't restore the original BASE tree (down to the
> exact mixed-revisionness).

Yes... I haven't digested and replied to that yet but I intend to.

> [1] https://svn.haxx.se/dev/archive-2017-07/0302.shtml
> [2] https://svn.haxx.se/dev/archive-2017-08/0064.shtml

Very productive. Please don't hold back on further thoughts!

- Julian