You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Hyrum K. Wright" <hy...@mail.utexas.edu> on 2007/09/10 15:09:26 UTC

Walking Merge History

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A couple of weeks ago, after reading Karl's excellent description[1] of
our delta editor, the idea occurred to me that we could use a similar
construct to walk through an object's ancestry, finding merging and
merged revisions, and allowing a consumer (such as the get_file_revs()
interface) to construct appropriate delta's and the like.  Done right,
this could open up a lot of possibilities, such as merge history graphs.
Done wrong, we could be in a much worse situation than we are now.

Having looked at the problem for a while, and realizing that I'm just
not smart enough to go it alone, I'd like to get some opinions on the
best way to design the interface.  So far, I've got something like:

typedef struct svn_repos__ancestry_callbacks_t
{
  /* An ancestor was found in PATH at REV. */
  svn_error_t *(*found_ancestor)(void *walk_baton,
                                 const char *path,
                                 svn_revnum_t rev,
                                 svn_boolean_t *halt,
                                 apr_pool_t *pool);

} svn_repos__ancestry_callbacks_t;

/* Walk the ancestry of PATH, from END to START, calling CALLBACKS as
   needed. */
svn_error_t *
svn_repos__walk_ancestry(const char *end_path,
                         svn_fs_t *fs,
                         svn_revnum_t start,
                         svn_revnum_t end,
                         svn_boolean_t include_merges,
                         svn_boolean_t stop_on_copy,
                         const svn_repos__ancestry_callbacks_t
                                                         *callbacks,
                         void *callbacks_baton,
                         svn_repos_authz_func_t authz_read_func,
                         void *authz_read_baton,
                         apr_pool_t *pool);

This is great at traversing one line of history, but what about multiple
lines?  Unlike the delta editor where a node is either a directory or a
file, in the case of ancestry, a revision can be both a merging
revision, as well as a revision that somebody made edits to.  Should
there be multiple callbacks?  Which order should they come in?  Is there
a way to send multiple lines of history streamily?

Some other questions:
 * Should we have separate callbacks for flagging merging and "end of
   branch" revisions, or just add flags on the existing callback?

 * Should we implement this walker in as public API, so that the client
   libraries can use it directly?  What would be the best way to do so?
   (If we did do that, we'd need to transmit deltas, similar to
   get_file_revs().)

In some ways, I feel that my current implementation of 'blame -g' is
pretty hacky, and doesn't have a very sound theoretical backing.  I know
that there are some bugs in there, and actually implementing an ancestry
walker would go a long way toward improving correctness.

Thoughts?

- -Hyrum

[1] http://www.red-bean.com/kfogel/beautiful-code/bc-chapter-02.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG5V4mCwOubk4kUXwRAvroAKCaJHCOVsks0Nz3E9AvkQ7K4gILVwCfTfGa
GjWce2emffeZQVl7fkcSrgY=
=FLb5
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by Karl Fogel <kf...@red-bean.com>.
"C. Michael Pilato" <cm...@collab.net> writes:
> Actually, we have precedent already for a different stop mechanism:
> the error SVN_ERR_CEASE_INVOCATION.  Yes, it's an error creation from a pool
> and all that, but I seriously doubt we're talking about noticeable cost.

Well, that's just an error code -- whether the object itself is
pool-allocated or otherwise constructed is an independent question.
Looks like we should re-use that error code, though!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by "C. Michael Pilato" <cm...@collab.net>.
Karl Fogel wrote:
> "Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>> Sounds good.  The plan will be: call found_ancestor() on any interesting
>> ancestor, and then potentially call any of the other callbacks to
>> indicate that something else (branch, copy, merge) happened.
>>
>> Of course, a consumer can leave any of the callbacks as NULL, and the
>> walker will ignore them.
> 
> Yup.  I guess each callback has to have its own '*stop' flag, but
> there's no way to avoid that, I think.

Actually, we have precedent already for a different stop mechanism:
the error SVN_ERR_CEASE_INVOCATION.  Yes, it's an error creation from a pool
and all that, but I seriously doubt we're talking about noticeable cost.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: Walking Merge History

Posted by Karl Fogel <kf...@red-bean.com>.
"Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
> Sounds good.  The plan will be: call found_ancestor() on any interesting
> ancestor, and then potentially call any of the other callbacks to
> indicate that something else (branch, copy, merge) happened.
>
> Of course, a consumer can leave any of the callbacks as NULL, and the
> walker will ignore them.

Yup.  I guess each callback has to have its own '*stop' flag, but
there's no way to avoid that, I think.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Karl Fogel wrote:
> "Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>> To restate your question: could we add a fourth callback ("this node was
>> copied from somewhere else, here's where"), and let the callback
>> determine if the walker should halt or not?  Is this right?
> 
> Yep, that's what I'm proposing.
> 
> In general, if we've got a callback table anyway, then I think the way
> to indicate "something happened" is to add a callback for "something".

Sounds good.  The plan will be: call found_ancestor() on any interesting
ancestor, and then potentially call any of the other callbacks to
indicate that something else (branch, copy, merge) happened.

Of course, a consumer can leave any of the callbacks as NULL, and the
walker will ignore them.

- -Hyrum
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6fxnCwOubk4kUXwRAuV5AJ9177GR8kRkRT/eRn2yO0RQXOFo9QCeMiZ8
aZ+r7HE9e5PEWID3lAMRNjc=
=7oeO
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by Karl Fogel <kf...@red-bean.com>.
"Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
> To restate your question: could we add a fourth callback ("this node was
> copied from somewhere else, here's where"), and let the callback
> determine if the walker should halt or not?  Is this right?

Yep, that's what I'm proposing.

In general, if we've got a callback table anyway, then I think the way
to indicate "something happened" is to add a callback for "something".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Karl Fogel wrote:
> "Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>>> I'm not sure what you meant by "transmit deltas" in this context.
>>> You're already implementing a callback invocation, analogous to
>>> 'svn_file_rev_handler_t' -- isn't that enough?  Maybe I just don't
>>> understand, though.
>> Sorry I was unclear.  Basically, *if* we choose to expose this API
>> through the RA layer, it would be something like get_file_revs(), only
>> in reverse order, and capable of traversing merge history.  Such an
>> interface would probably be well served in sending deltas, akin to
>> get_file_revs().
> 
> Hmm, but what are the deltas to transmit?  These callbacks only return
> metadata anyway, not (say) arbitrarily-sized file contents.

True.  An RA API could be built on top of this which does return the
file contents, though.  But, now that I think of it, that's immaterial
to this discussion, anyway.

>>> A thought: you don't need the 'stop_on_copy' parameter if you have the
>>> boolean '*halt' in the callback's API already, right?
>> Actually, I think we still do.  A given invocation of the
>> found_revision() callback won't know if it is the last revision before a
>> copy.   It could find out, but that would largely duplicate work the
>> ancestry_walker would have already done.
> 
> Oh, the copy itself is not an event that triggers a callback?
> (Perhaps it should be?  And then that callback could set the *stop
> flag?)

The three callback triggers I'm planning now are:
 * An "interesting" revision (i.e., one in which the node changed).  A
   typical, non-branching copy is such a revision.  This will get called
   on every revision which the walker finds interesting.
 * A merging revision, which gets called after the first callback if a
   revision is determined to have a merge in it.
 * The end (beginning) of a merging line of history, IOW, the point the
   branch occurred.

To restate your question: could we add a fourth callback ("this node was
copied from somewhere else, here's where"), and let the callback
determine if the walker should halt or not?  Is this right?

- -Hyrum
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6ezSCwOubk4kUXwRAmsIAJ9JPrO+zNORiLJUsaYgFRYCyvZ28QCfRiEb
J1Q/Y87x/n+yuR7fHhMQEKY=
=sivl
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by Karl Fogel <kf...@red-bean.com>.
"Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>> I'm not sure what you meant by "transmit deltas" in this context.
>> You're already implementing a callback invocation, analogous to
>> 'svn_file_rev_handler_t' -- isn't that enough?  Maybe I just don't
>> understand, though.
>
> Sorry I was unclear.  Basically, *if* we choose to expose this API
> through the RA layer, it would be something like get_file_revs(), only
> in reverse order, and capable of traversing merge history.  Such an
> interface would probably be well served in sending deltas, akin to
> get_file_revs().

Hmm, but what are the deltas to transmit?  These callbacks only return
metadata anyway, not (say) arbitrarily-sized file contents.

>> A thought: you don't need the 'stop_on_copy' parameter if you have the
>> boolean '*halt' in the callback's API already, right?
>
> Actually, I think we still do.  A given invocation of the
> found_revision() callback won't know if it is the last revision before a
> copy.   It could find out, but that would largely duplicate work the
> ancestry_walker would have already done.

Oh, the copy itself is not an event that triggers a callback?
(Perhaps it should be?  And then that callback could set the *stop
flag?)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Karl Fogel wrote:
> First: I think this idea is the way to go!
> 
> "Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>> Some other questions:
>>  * Should we have separate callbacks for flagging merging and "end of
>>    branch" revisions, or just add flags on the existing callback?
> 
> Separate callbacks, IMHO.

Yeah, I've put some more thought into it, and that's the way I'm going
to go.

>>  * Should we implement this walker in as public API, so that the client
>>    libraries can use it directly?  What would be the best way to do so?
>>    (If we did do that, we'd need to transmit deltas, similar to
>>    get_file_revs().)
> 
> Keep it private for now, with a note that it's fine to make it public
> if/when other libraries start needing it.  Better to develop the API
> internally, and make it public only after it's been tuned, anyway.

Sounds good.

> I'm not sure what you meant by "transmit deltas" in this context.
> You're already implementing a callback invocation, analogous to
> 'svn_file_rev_handler_t' -- isn't that enough?  Maybe I just don't
> understand, though.

Sorry I was unclear.  Basically, *if* we choose to expose this API
through the RA layer, it would be something like get_file_revs(), only
in reverse order, and capable of traversing merge history.  Such an
interface would probably be well served in sending deltas, akin to
get_file_revs().

> A thought: you don't need the 'stop_on_copy' parameter if you have the
> boolean '*halt' in the callback's API already, right?

Actually, I think we still do.  A given invocation of the
found_revision() callback won't know if it is the last revision before a
copy.   It could find out, but that would largely duplicate work the
ancestry_walker would have already done.

>> This is great at traversing one line of history, but what about multiple
>> lines?  Unlike the delta editor where a node is either a directory or a
>> file, in the case of ancestry, a revision can be both a merging
>> revision, as well as a revision that somebody made edits to.  Should
>> there be multiple callbacks?  Which order should they come in?  Is there
>> a way to send multiple lines of history streamily?
> 
> A suggestion: state explicitly that the order of callback invocations
> is undefined, because you want maximum freedom to tweak the walk
> driver to be as efficient as possible, at least for now.  Or maybe
> there can be a partial ordering, but just feel free to document
> yourself as much leeway as needed, guided by driver implementation
> constraints.

There will at least be some kind of ordering, but I think we can make it
pretty flexible.  The docs are sparse and need to be filled out.  I'll
do that once I have a better idea of how the interface will work.

>> In some ways, I feel that my current implementation of 'blame -g' is
>> pretty hacky, and doesn't have a very sound theoretical backing.  I know
>> that there are some bugs in there, and actually implementing an ancestry
>> walker would go a long way toward improving correctness.
> 
> I agree, these kinds of abstractions almost always make the code
> cleaner and (sometimes) more correct.
> 
>> Well, having heard nothing but a deafening silence, I'm going to create
>> a branch and commit some of the work I've already done to it.  I don't
>> have much, but it may give people a little bit better of an idea of what
>> I'm proposing.
> 
> Sorry, just catching up on mail, hence the delay.

No problem.  Working on a branch is probably best anyway, since I'll
initially be breaking stuff, but I still want version control, and the
ability for people to see changes as they happen.

Thanks for the suggestions!

- -Hyrum
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6awsCwOubk4kUXwRAh5PAJ9p0YHzHGWLQ59HLUzNnGiRaT8TTACgyXgG
2H7Dom/E6jkeF4/m0649Sf0=
=Dv1G
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by Karl Fogel <kf...@red-bean.com>.
First: I think this idea is the way to go!

"Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
> Some other questions:
>  * Should we have separate callbacks for flagging merging and "end of
>    branch" revisions, or just add flags on the existing callback?

Separate callbacks, IMHO.

>  * Should we implement this walker in as public API, so that the client
>    libraries can use it directly?  What would be the best way to do so?
>    (If we did do that, we'd need to transmit deltas, similar to
>    get_file_revs().)

Keep it private for now, with a note that it's fine to make it public
if/when other libraries start needing it.  Better to develop the API
internally, and make it public only after it's been tuned, anyway.

I'm not sure what you meant by "transmit deltas" in this context.
You're already implementing a callback invocation, analogous to
'svn_file_rev_handler_t' -- isn't that enough?  Maybe I just don't
understand, though.

A thought: you don't need the 'stop_on_copy' parameter if you have the
boolean '*halt' in the callback's API already, right?

> This is great at traversing one line of history, but what about multiple
> lines?  Unlike the delta editor where a node is either a directory or a
> file, in the case of ancestry, a revision can be both a merging
> revision, as well as a revision that somebody made edits to.  Should
> there be multiple callbacks?  Which order should they come in?  Is there
> a way to send multiple lines of history streamily?

A suggestion: state explicitly that the order of callback invocations
is undefined, because you want maximum freedom to tweak the walk
driver to be as efficient as possible, at least for now.  Or maybe
there can be a partial ordering, but just feel free to document
yourself as much leeway as needed, guided by driver implementation
constraints.

> In some ways, I feel that my current implementation of 'blame -g' is
> pretty hacky, and doesn't have a very sound theoretical backing.  I know
> that there are some bugs in there, and actually implementing an ancestry
> walker would go a long way toward improving correctness.

I agree, these kinds of abstractions almost always make the code
cleaner and (sometimes) more correct.

> Well, having heard nothing but a deafening silence, I'm going to create
> a branch and commit some of the work I've already done to it.  I don't
> have much, but it may give people a little bit better of an idea of what
> I'm proposing.

Sorry, just catching up on mail, hence the delay.

> -Hyrum
> 
> [1] http://www.red-bean.com/kfogel/beautiful-code/bc-chapter-02.html

Heh, glad you liked it :-).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Walking Merge History

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hyrum K. Wright wrote:
> A couple of weeks ago, after reading Karl's excellent description[1] of
> our delta editor, the idea occurred to me that we could use a similar
> construct to walk through an object's ancestry, finding merging and
> merged revisions, and allowing a consumer (such as the get_file_revs()
> interface) to construct appropriate delta's and the like.  Done right,
> this could open up a lot of possibilities, such as merge history graphs.
> Done wrong, we could be in a much worse situation than we are now.
> 
> Having looked at the problem for a while, and realizing that I'm just
> not smart enough to go it alone, I'd like to get some opinions on the
> best way to design the interface.  So far, I've got something like:
> 
> typedef struct svn_repos__ancestry_callbacks_t
> {
>   /* An ancestor was found in PATH at REV. */
>   svn_error_t *(*found_ancestor)(void *walk_baton,
>                                  const char *path,
>                                  svn_revnum_t rev,
>                                  svn_boolean_t *halt,
>                                  apr_pool_t *pool);
> 
> } svn_repos__ancestry_callbacks_t;
> 
> /* Walk the ancestry of PATH, from END to START, calling CALLBACKS as
>    needed. */
> svn_error_t *
> svn_repos__walk_ancestry(const char *end_path,
>                          svn_fs_t *fs,
>                          svn_revnum_t start,
>                          svn_revnum_t end,
>                          svn_boolean_t include_merges,
>                          svn_boolean_t stop_on_copy,
>                          const svn_repos__ancestry_callbacks_t
>                                                          *callbacks,
>                          void *callbacks_baton,
>                          svn_repos_authz_func_t authz_read_func,
>                          void *authz_read_baton,
>                          apr_pool_t *pool);
> 
> This is great at traversing one line of history, but what about multiple
> lines?  Unlike the delta editor where a node is either a directory or a
> file, in the case of ancestry, a revision can be both a merging
> revision, as well as a revision that somebody made edits to.  Should
> there be multiple callbacks?  Which order should they come in?  Is there
> a way to send multiple lines of history streamily?
> 
> Some other questions:
>  * Should we have separate callbacks for flagging merging and "end of
>    branch" revisions, or just add flags on the existing callback?
> 
>  * Should we implement this walker in as public API, so that the client
>    libraries can use it directly?  What would be the best way to do so?
>    (If we did do that, we'd need to transmit deltas, similar to
>    get_file_revs().)
> 
> In some ways, I feel that my current implementation of 'blame -g' is
> pretty hacky, and doesn't have a very sound theoretical backing.  I know
> that there are some bugs in there, and actually implementing an ancestry
> walker would go a long way toward improving correctness.
> 
> Thoughts?

Well, having heard nothing but a deafening silence, I'm going to create
a branch and commit some of the work I've already done to it.  I don't
have much, but it may give people a little bit better of an idea of what
I'm proposing.

> -Hyrum
> 
> [1] http://www.red-bean.com/kfogel/beautiful-code/bc-chapter-02.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6DFsCwOubk4kUXwRAmZwAJ0UumLQ8ALbCAX6cYLjJi0P8PfijQCg7qRT
1wNuvfv5nkTmKIoBd9RgEHE=
=BJ/w
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org