You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Jacob Refstrup <ja...@hp.com> on 2004/09/02 16:12:21 UTC

Merge tracking - a proposal for impl. AS

This is a proposal for how to implement merge tracking
with Ancestor Sets (AS) with improved "blame" capability
(being able to blame who merged and who originated the
code).

Repository - "includes" & "excludes" sets:
------------------------------------------

With each node (file or directory) store two immutable 
sparse sets recording merge information for a revision 
if a merge was performed for that node. 

The "includes" set is used to track merges that added one
or more csets; the "excludes" set is used to track merges
that removed one or more csets. 

These sets only need to be present for nodes that were 
merged in that revision. Furthermore, a child node 
(except those with copy history) inherit their parent's 
merge sets and hence a child that has the same includes/
excludes sets won't need to store it for that revision.

We won't store complete "includes/excludes" sets per
node. Rather we would only store the newly included/excluded
versions in the sets. The complete sets can be calculated by 
examining previous revisions that affected a given node. The 
calculation of these sets can be cached on both client 
and server side.

Furthermore we often don't need the complete sets. Often
we will only require a partial set that represents a range
of version numbers - e.g. the sets for r1000 to r1149. This
would be the case if we had branched at r1000 and was merging
at r1149.

Calculating complete sets:
--------------------------

To calculate the complete sets for a given node the following
algorithm is used (sets are modelled as variable length bit
fields). Note we are construction the complete set for the 
range [s_rev..e_rev] (both inclusive):

1. If the node is not a copy (branch) and has a parent we 
   compute the parents sets first.   
2. We start with the parents sets (if appropriate). If no 
   parent we initialize the sets for this node to empty.
	node.include = 0; /* empty set */
	node.exclude = 0; /* empty set */
3. We set c_rev = e_rev.
4. If c_rev didn't change the node then 
	(stop if c_rev == s_rev)
	c_rev--;
	go to step 4.
5. If c_rev has includes and excludes sets for this node we
   update as follows:
	node.include |= node.rev[c_rev].include & ~node.exclude;
	node.exclude |= node.rev[c_rev].exclude & ~node.include;
6. Since c_rev has changed the node:
	node.include |= (1 << c_rev); 
7. Go to step 3.

NB. We don't store these sets in the repository - we would only
cache them.

Caching of sets:
----------------
The calculated complete "includes" / "excludes" sets are completely
cachable as their are based on immutable repository information.

Merging:
--------
In this case merging implies that that either the source or
target is a (indirect) copy of the other. Otherwise it doesn't make
sense talking about merge tracking. The revision at which the copy 
(branch) was made is called the start revision (s_rev). The revision
of the working copy where the merge is taking place is called e_rev.

For each node that is being merged we calculate the complete 
"includes" / "excludes" sets [s_rev..e_rev] for both the merge source 
and target.

The following pseudo C code computes what needs to be merged (the sets
are modelled as variable length bitfields):

  merge.includes = 0;
  merge.excludes = 0;
  merge.includes_prompt = 0;
  merge.excludes_prompt = 0;
  for (r = s_rev, r <= e_rev; r++) {
    if (source.excludes & (1 << r)) {
      if (target.includes & (1 << r))
        merge.excludes_prompt |= (1 << r);
      else
        merge.excludes |= (1 << r);
    }
    if (source.includes & (1 << r)) {
      if (target.excludes & (1 << r))
        merge.includes_prompt |= (1 << r);
     else	   
        merge.includes |= (1 << r);
    }
  }

The resulting sets indicates which revisions needs to be merged -
either included or excluded. The prompt sets represents csets
that the source and target are in "conflict" over the inclusion/
exclusion and where it makes sense to prompt the user.

Assuming that the user finally chooses to merge

merge_final.includes
merge_final.excludes

then that will be stored in the WC ready for the commit:

wc.includes &= ~merge_final.excludes;
wc.includes |=  merge_final.includes;
wc.excludes &= ~merge_final.includes;
wc.excludes |=  merge_final.excludes;

Blame:
------
In order to be able to accurately "blame" for a given node the 
revisions that have merge info need to be treated specially. The
author that did the merge is in essence the integrator and could
be blamed/praised as such.

To track down who wrote the code "blame" would have to then look
at the contributing revisions (from the merge info). Exactly how
to best do this is not clear at this point in time.

Query:
------
Being able to query if a cset (rX) is included would be done by
determining the changed nodes of rX and then determining the
includes/excludes set of those files (using rX..rWC where rWC is
the revision of the WC). If all files are included then rX is
included. 

Pros:
-----
Minimizes merge info storage by storing only 
- on revisions that have merges
- on nodes who's merge info is different from parent
Allows multiple merges in one commit
- mixed includes and excludes
- multiple merge sources and targets

Cons:
-----
Have to calculate includes/excludes sets per node
- offset by enabling caching results on both client & server

Regards,
- Jacob Refstrup




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merge tracking - a proposal for impl. AS

Posted by kf...@collab.net.
Jacob Refstrup <ja...@hp.com> writes:
> This is a proposal for how to implement merge tracking
> with Ancestor Sets (AS) with improved "blame" capability
> (being able to blame who merged and who originated the
> code).
> 
> [...]

Thanks for the thoughtful post, Jacob.

I can't give you much of a technical response right now, as I -- and
at least a few other developers I know -- are deliberately avoiding
raising the merge tracking topic at the moment.  We're trying to get
SVN 1.1.0 out the door, and at the same time address a few other,
smaller-but-still-important issues, in particular reserved checkouts.

Obviously, on a volunteer project, I can't presume to speak for all
the developers.  But my guess is that others have similar priorities
right now.  However, I've saved your proposal.  Even though it's only
a high level overview, it seemed well-thought-out and will help when
we *do* get to merge tracking.  

Merge tracking is something many of us are thinking about in the
background.  I think that when we finally do raise the topic, we'll
probably start with a goal-oriented approach: define exactly what is
meant by "merge tracking", specify exactly what problems it will
solve.  Only then can we talk about implementation.  Your proposal
mixes these two stages a bit.  Because merge tracking is such a broad
topic and means different (though overlapping) things to different
people, there is a high likelihood of misunderstanding and people
talking at cross-purposes, unless we first clearly define the problem
to be solved.

However, I'm *not* proposing that we start that process now :-).  We
need to get 1.1.0 out the door, fix a number of bugs, finish reserved
checkouts (a.k.a. "locking"), and take care of a few other things
first, IMHO.  Merge tracking's day will come, don't worry.  If it's
any comfort, some of us are even starting to budget time for it in the
medium- to long-term future.  But the stuff that needs to be done in
the short-term is extremely compelling, and can't be delayed.

Best,
-Karl Fogel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org