You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Rall <dl...@collab.net> on 2007/05/02 00:35:05 UTC

Re: svn commit: r24875 - trunk/www/merge-tracking

On Tue, 01 May 2007, hwright@tigris.org wrote:
...
> --- trunk/www/merge-tracking/design.html	(original)
> +++ trunk/www/merge-tracking/design.html	Tue May  1 12:11:20 2007
...
> +<p>The introduction of merge tracking changes that paradigm.  Log messages
> +for independent revisions are still linearly related as before, but log
> +messages for merging revisions now have children.  These children are log
> +messages for the revisions which have been merged, and they may in turn
> +also have children.</p>
> +
> +<p>The result is a tree structure which the repository layer builds as it
> +collects log message information.  This tree structure then gets serialized
> +and marshaled back to the client, which can then rebuilt the tree if needed.
> +Additionally, less information needs to be explicitly given, as the tree
> +structure itself implies revision relationships.
> +</p>
> +
> +<p>We currently use the <code>svn_log_message_receiver_t</code> interface
> +to return log messages between application layers.  To enable a tree
> +structure, we add another parameter, <code>child_count</code>.  When
> +<code>child_count</code> is zero, the node is a leaf node, when
> +<code>child_count</code> is greater than zero, the node is an interior node,
> +with the given number of children.  These children may also have children and
> +indicate such by their own <code>child_count</code> parameters.  Consumers of
> +this API can be aware of the number of children and rebuild the tree, or pass
> +the values farther up the application stack.  In effect, this method implements
> +a preorder traversal of the log message tree.</p>

When requested, I'd expect the API to return a "log info" data
structure which has an "apr_array_header_t *children;" field (with
elements of type "log info", fleshed out).  With such a structure, the
nelts field of the children's container could be used in lieu of a new
child_count field.

Re: [merge tracking] Behavior of 'log' operations

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Daniel Rall wrote:
> On Wed, 02 May 2007, Hyrum K. Wright wrote:
> 
>> Daniel Rall wrote:
> ...
>>> Wouldn't this would require opening multiple RA sessions to the
>>> repository?  (Meaning multiple TCP sockets, in the usual case.)
>> Nope.  We send multiple log messages serially down one RA session now,
>> don't we?
> 
> Yes.  I'd understood your description to mean that you'd be making
> multiple RA 'log' calls, which looks like was not your intent.
> 
>> This scheme requires that the children be sent in-band, right after the
>> parents.  That wouldn't require any additional connections than the one
>> that we are currently using.
>>
>> Maybe an example here might help:
>> Assume a simplified (x, y) tuple to represent the log message for
>> revision x with child_count y.  Using the last example in the functional
>> spec, the series of messages would be:
>> (24, 2) (14, 0) (12, 2) (10, 0) (9, 0)
>>
>> This unambiguously defines the tree, which can be rebuilt by clients
>> that need it.  Our client doesn't, and can just spit the messages out in
>> the order it gets them.
> 
> That example cleared things up, thanks Hyrum.  It wasn't obvious to me
> that you intended to send down the child log info (which was my
> concern).
> 
> You're thinking that we can implement this scheme using an optional
> child_count parameter, the presence of which changes the handling of
> subsequent log info?
> 
>>>> If it is a case of convenience, we could provide a receiver function for
>>>> clients that builds the log message tree.
> 
> *nod*
> 
>>> The command-line client will always need the data in tree form when
>>> invoked with the --merge-sensitive option (which I'm assuming will be
>>> a new boolean on the RA log API).  In terms of network I/O, it'd be
>>> less efficient to require O(N) RA calls than a single call which
>>> fetches all the data.  Yes, the data will come down in a potentially
>>> "jerky" fashion (for deeply nested trees, which won't be the usual
>>> case), but will remain streamy for log info with no nesting.  I'm not
>>> sure what this means in terms of CPU usage and disk I/O on the
>>> repository-side; we may need to profile to determine the best
>>> strategy.
>> Why will the command line client always need the data in tree form?  The
>> end result is output to the terminal, which is serialized, and basically
>> a preorder traversal of the message tree.  If we serialize the messages
>> through the ra session in the same way, why do we need to construct and
>> then traverse the tree at the client?  We should be able to just spit
>> out the messages as they come, with appropriate merge tracking
>> information pulled from the baton.
> ...
> 
> We are still constructing a tree representation on the repository side
> -- you're suggesting a different representation (pre-order traversal)
> for marshalling the data, which does sound good.

Yeah, sorry if I was unclear about that before.  It is still logically a
tree, but the transmission format uses the above representation.  I'll
clarify the spec.

-Hyrum


Re: [merge tracking] Behavior of 'log' operations

Posted by Daniel Rall <dl...@collab.net>.
On Wed, 02 May 2007, Hyrum K. Wright wrote:

> Daniel Rall wrote:
...
> > Wouldn't this would require opening multiple RA sessions to the
> > repository?  (Meaning multiple TCP sockets, in the usual case.)
> 
> Nope.  We send multiple log messages serially down one RA session now,
> don't we?

Yes.  I'd understood your description to mean that you'd be making
multiple RA 'log' calls, which looks like was not your intent.

> This scheme requires that the children be sent in-band, right after the
> parents.  That wouldn't require any additional connections than the one
> that we are currently using.
> 
> Maybe an example here might help:
> Assume a simplified (x, y) tuple to represent the log message for
> revision x with child_count y.  Using the last example in the functional
> spec, the series of messages would be:
> (24, 2) (14, 0) (12, 2) (10, 0) (9, 0)
> 
> This unambiguously defines the tree, which can be rebuilt by clients
> that need it.  Our client doesn't, and can just spit the messages out in
> the order it gets them.

That example cleared things up, thanks Hyrum.  It wasn't obvious to me
that you intended to send down the child log info (which was my
concern).

You're thinking that we can implement this scheme using an optional
child_count parameter, the presence of which changes the handling of
subsequent log info?

> >> If it is a case of convenience, we could provide a receiver function for
> >> clients that builds the log message tree.

*nod*

> > The command-line client will always need the data in tree form when
> > invoked with the --merge-sensitive option (which I'm assuming will be
> > a new boolean on the RA log API).  In terms of network I/O, it'd be
> > less efficient to require O(N) RA calls than a single call which
> > fetches all the data.  Yes, the data will come down in a potentially
> > "jerky" fashion (for deeply nested trees, which won't be the usual
> > case), but will remain streamy for log info with no nesting.  I'm not
> > sure what this means in terms of CPU usage and disk I/O on the
> > repository-side; we may need to profile to determine the best
> > strategy.
> 
> Why will the command line client always need the data in tree form?  The
> end result is output to the terminal, which is serialized, and basically
> a preorder traversal of the message tree.  If we serialize the messages
> through the ra session in the same way, why do we need to construct and
> then traverse the tree at the client?  We should be able to just spit
> out the messages as they come, with appropriate merge tracking
> information pulled from the baton.
...

We are still constructing a tree representation on the repository side
-- you're suggesting a different representation (pre-order traversal)
for marshalling the data, which does sound good.

Re: [merge tracking] Behavior of 'log' operations

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Daniel Rall wrote:
> On Wed, 02 May 2007, Hyrum K. Wright wrote:
> 
>> Daniel Rall wrote:
>>> On Tue, 01 May 2007, hwright@tigris.org wrote:
>>> ...
>>>> --- trunk/www/merge-tracking/design.html	(original)
>>>> +++ trunk/www/merge-tracking/design.html	Tue May  1 12:11:20 2007
>>> ...
>>>> +<p>The introduction of merge tracking changes that paradigm.  Log messages
>>>> +for independent revisions are still linearly related as before, but log
>>>> +messages for merging revisions now have children.  These children are log
>>>> +messages for the revisions which have been merged, and they may in turn
>>>> +also have children.</p>
>>>> +
>>>> +<p>The result is a tree structure which the repository layer builds as it
>>>> +collects log message information.  This tree structure then gets serialized
>>>> +and marshaled back to the client, which can then rebuilt the tree if needed.
>>>> +Additionally, less information needs to be explicitly given, as the tree
>>>> +structure itself implies revision relationships.
>>>> +</p>
>>>> +
>>>> +<p>We currently use the <code>svn_log_message_receiver_t</code> interface
>>>> +to return log messages between application layers.  To enable a tree
>>>> +structure, we add another parameter, <code>child_count</code>.  When
>>>> +<code>child_count</code> is zero, the node is a leaf node, when
>>>> +<code>child_count</code> is greater than zero, the node is an interior node,
>>>> +with the given number of children.  These children may also have children and
>>>> +indicate such by their own <code>child_count</code> parameters.  Consumers of
>>>> +this API can be aware of the number of children and rebuild the tree, or pass
>>>> +the values farther up the application stack.  In effect, this method implements
>>>> +a preorder traversal of the log message tree.</p>
>>> When requested, I'd expect the API to return a "log info" data
>>> structure which has an "apr_array_header_t *children;" field (with
>>> elements of type "log info", fleshed out).  With such a structure, the
>>> nelts field of the children's container could be used in lieu of a new
>>> child_count field.
>> Would this affect the streamy-ness of the API?  We would have to wait
>> until all the children have been fetched (a potentially long operation)
>> before returning any of them.  Using the child_count scheme, it seems
>> like a client could reconstruct the tree, if it needed the data in tree
>> form.
>>
>> In the case of our command line client, we don't need need a tree.  We
>> can output the messages as they are received, keeping track of the
>> "Result of merge" values in a stack in the receiver baton.
> 
> Wouldn't this would require opening multiple RA sessions to the
> repository?  (Meaning multiple TCP sockets, in the usual case.)

Nope.  We send multiple log messages serially down one RA session now,
don't we?

This scheme requires that the children be sent in-band, right after the
parents.  That wouldn't require any additional connections than the one
that we are currently using.

Maybe an example here might help:
Assume a simplified (x, y) tuple to represent the log message for
revision x with child_count y.  Using the last example in the functional
spec, the series of messages would be:
(24, 2) (14, 0) (12, 2) (10, 0) (9, 0)

This unambiguously defines the tree, which can be rebuilt by clients
that need it.  Our client doesn't, and can just spit the messages out in
the order it gets them.

>> If it is a case of convenience, we could provide a receiver function for
>> clients that builds the log message tree.
> 
> The command-line client will always need the data in tree form when
> invoked with the --merge-sensitive option (which I'm assuming will be
> a new boolean on the RA log API).  In terms of network I/O, it'd be
> less efficient to require O(N) RA calls than a single call which
> fetches all the data.  Yes, the data will come down in a potentially
> "jerky" fashion (for deeply nested trees, which won't be the usual
> case), but will remain streamy for log info with no nesting.  I'm not
> sure what this means in terms of CPU usage and disk I/O on the
> repository-side; we may need to profile to determine the best
> strategy.

Why will the command line client always need the data in tree form?  The
end result is output to the terminal, which is serialized, and basically
a preorder traversal of the message tree.  If we serialize the messages
through the ra session in the same way, why do we need to construct and
then traverse the tree at the client?  We should be able to just spit
out the messages as they come, with appropriate merge tracking
information pulled from the baton.

> In the case where we know we'll want the extra log data, we should
> request it up front.
> 
> In the case where we don't need the data, we won't request it.  The
> func spec doesn't say anything about showing merge tracking info for
> this case -- I assume we want to stick with that?

Yes.  The server should only return merge tracking data when requested.

Am I making much sense?

-Hyrum


[merge tracking] Behavior of 'log' operations

Posted by Daniel Rall <dl...@collab.net>.
On Wed, 02 May 2007, Hyrum K. Wright wrote:

> Daniel Rall wrote:
> > On Tue, 01 May 2007, hwright@tigris.org wrote:
> > ...
> >> --- trunk/www/merge-tracking/design.html	(original)
> >> +++ trunk/www/merge-tracking/design.html	Tue May  1 12:11:20 2007
> > ...
> >> +<p>The introduction of merge tracking changes that paradigm.  Log messages
> >> +for independent revisions are still linearly related as before, but log
> >> +messages for merging revisions now have children.  These children are log
> >> +messages for the revisions which have been merged, and they may in turn
> >> +also have children.</p>
> >> +
> >> +<p>The result is a tree structure which the repository layer builds as it
> >> +collects log message information.  This tree structure then gets serialized
> >> +and marshaled back to the client, which can then rebuilt the tree if needed.
> >> +Additionally, less information needs to be explicitly given, as the tree
> >> +structure itself implies revision relationships.
> >> +</p>
> >> +
> >> +<p>We currently use the <code>svn_log_message_receiver_t</code> interface
> >> +to return log messages between application layers.  To enable a tree
> >> +structure, we add another parameter, <code>child_count</code>.  When
> >> +<code>child_count</code> is zero, the node is a leaf node, when
> >> +<code>child_count</code> is greater than zero, the node is an interior node,
> >> +with the given number of children.  These children may also have children and
> >> +indicate such by their own <code>child_count</code> parameters.  Consumers of
> >> +this API can be aware of the number of children and rebuild the tree, or pass
> >> +the values farther up the application stack.  In effect, this method implements
> >> +a preorder traversal of the log message tree.</p>
> > 
> > When requested, I'd expect the API to return a "log info" data
> > structure which has an "apr_array_header_t *children;" field (with
> > elements of type "log info", fleshed out).  With such a structure, the
> > nelts field of the children's container could be used in lieu of a new
> > child_count field.
> 
> Would this affect the streamy-ness of the API?  We would have to wait
> until all the children have been fetched (a potentially long operation)
> before returning any of them.  Using the child_count scheme, it seems
> like a client could reconstruct the tree, if it needed the data in tree
> form.
>
> In the case of our command line client, we don't need need a tree.  We
> can output the messages as they are received, keeping track of the
> "Result of merge" values in a stack in the receiver baton.

Wouldn't this would require opening multiple RA sessions to the
repository?  (Meaning multiple TCP sockets, in the usual case.)

> If it is a case of convenience, we could provide a receiver function for
> clients that builds the log message tree.

The command-line client will always need the data in tree form when
invoked with the --merge-sensitive option (which I'm assuming will be
a new boolean on the RA log API).  In terms of network I/O, it'd be
less efficient to require O(N) RA calls than a single call which
fetches all the data.  Yes, the data will come down in a potentially
"jerky" fashion (for deeply nested trees, which won't be the usual
case), but will remain streamy for log info with no nesting.  I'm not
sure what this means in terms of CPU usage and disk I/O on the
repository-side; we may need to profile to determine the best
strategy.

In the case where we know we'll want the extra log data, we should
request it up front.

In the case where we don't need the data, we won't request it.  The
func spec doesn't say anything about showing merge tracking info for
this case -- I assume we want to stick with that?

Re: svn commit: r24875 - trunk/www/merge-tracking

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Daniel Rall wrote:
> On Tue, 01 May 2007, hwright@tigris.org wrote:
> ...
>> --- trunk/www/merge-tracking/design.html	(original)
>> +++ trunk/www/merge-tracking/design.html	Tue May  1 12:11:20 2007
> ...
>> +<p>The introduction of merge tracking changes that paradigm.  Log messages
>> +for independent revisions are still linearly related as before, but log
>> +messages for merging revisions now have children.  These children are log
>> +messages for the revisions which have been merged, and they may in turn
>> +also have children.</p>
>> +
>> +<p>The result is a tree structure which the repository layer builds as it
>> +collects log message information.  This tree structure then gets serialized
>> +and marshaled back to the client, which can then rebuilt the tree if needed.
>> +Additionally, less information needs to be explicitly given, as the tree
>> +structure itself implies revision relationships.
>> +</p>
>> +
>> +<p>We currently use the <code>svn_log_message_receiver_t</code> interface
>> +to return log messages between application layers.  To enable a tree
>> +structure, we add another parameter, <code>child_count</code>.  When
>> +<code>child_count</code> is zero, the node is a leaf node, when
>> +<code>child_count</code> is greater than zero, the node is an interior node,
>> +with the given number of children.  These children may also have children and
>> +indicate such by their own <code>child_count</code> parameters.  Consumers of
>> +this API can be aware of the number of children and rebuild the tree, or pass
>> +the values farther up the application stack.  In effect, this method implements
>> +a preorder traversal of the log message tree.</p>
> 
> When requested, I'd expect the API to return a "log info" data
> structure which has an "apr_array_header_t *children;" field (with
> elements of type "log info", fleshed out).  With such a structure, the
> nelts field of the children's container could be used in lieu of a new
> child_count field.

Would this affect the streamy-ness of the API?  We would have to wait
until all the children have been fetched (a potentially long operation)
before returning any of them.  Using the child_count scheme, it seems
like a client could reconstruct the tree, if it needed the data in tree
form.

In the case of our command line client, we don't need need a tree.  We
can output the messages as they are received, keeping track of the
"Result of merge" values in a stack in the receiver baton.

If it is a case of convenience, we could provide a receiver function for
clients that builds the log message tree.

-Hyrum