You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2005/11/15 23:40:21 UTC

[PATCH] expose svn_repos_replay via the RA API, Take 2

Ok, so I've got a second pass at my svn_ra_replay patch, and I figured
it was about time to show it to people.  For those who didn't see the
first mail about this, the idea is to expose the svn_repos_replay API
via the RA layers, so you can reliably export the complete repository
history, which is what you'd really want to do to provide support for
mirroring repositories.  Without this new API, there isn't enough
information exposed by the RA layers to get an exact copy.

As compared to the last version of the patch, this adds authorization
support to svn_repos_replay2, updates the ra_svn protocol document,
and corrects behavior when copying from a previous revision of a file.

If someone could take a good hard look at the authz support I'd
appreciate it.  I think I've managed to get it right, and it seems to
work, but it's kind of a pain, so the more eyes on it the better.

Still to do on this is ra_dav support (can you tell I'm putting that
part off?) and polishing up the tool I've been using to actually test
it, plus intergrating my ad-hoc test framework into the svn unit
tests.

Thanks,

-garrett

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Julian Foad <ju...@btopenworld.com>.

Garrett Rooney wrote:
> On 11/16/05, Julian Foad <ju...@btopenworld.com> wrote:
[...]
>>>The authz support bungles the case
>>>of a directory copied from a location you aren't allowed to see, it
>>>needs to recurse into the directory adding its contents, since you
>>>can't expect the caller to have access to the contents.
>>
>>Er... I don't know about this stuff either, but it strikes me that if the
>>caller is not authorised to see data in a particular area then you should not
>>send that data to the caller.  (Perhaps abort if you discover that is the
>>case.)  Maybe I'm completely misunderstanding, in which case ignore me.
[...]
> Clearly, if a file is copied from the private section to the public
> one people are now allowed to see the file's contents, as of the
> revision that it's copied.   [... so you have to send full text ...]

Duh!  Thanks for the explanation.  (And for the answers I've snipped.)

> Ironically, this logic will probably prove useful for implementing the
> "replay only a particular subtree" kind of thing [...]

Cool.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/16/05, Julian Foad <ju...@btopenworld.com> wrote:

> Again, the present implementation of "diff" uses it, and we want that mode to
> be available, but I would like to have a "diff" mode that just says "tree dir1
> was renamed to dir2" and doesn't report all the files inside it.  (Only if the
> rename/move is completely within the reported tree.)

I have no objection in concept to the idea that diff should be able to
provide such information, but after a little time looking at the
reporter code I must say I have to agree with ghudson in that I'm not
sure what the correct behavior would be for many cases when you're
diffing non-consecutive revisions, and for any of the potential
behaviors I can think of I'm not sure how to implement it without it
being potentially very slow.

> (I hope "update" already does a proper move/rename of a directory on the WC
> side without re-downloading all of the stuff inside it.  Again, only if the
> move/rename is completely within the tree being updated.  Does it?)

I'm not positive, but I don't think it currently does.

> So, it would make sense to me to give the present "diff" API those two
> capabilities.  (It sounds like they'd both need to be optional.)
>
> Even if you don't consider doing that for your "replay" purposed, is this
> something we should aim to do anyway? - does it make sense?

Like I said, I have no objection to providing this info, I just don't
know how exactly it should work in many cases nor how we could
implement it efficiently.

> Should, in fact, the above (existing) functions be provided at a _higher level_
> than your repos_replay?  It seems that perhaps they could all be implemented in
> terms of it?  I'm completely unfamiliar with this area so I'm probably missing
> the fact that those three plus your repos_replay are already all going to be
> implemented on top of a common lower-level API, or something - in other words,
> that you are just adding an API wrapper at this level to functionality that is
> already available lower down.  (A quick brush-off response is fine; I'm not
> ready to learn the details.)

While it's possible that they could be implemented in terms of some
theoretical common API, at this point the two implementations have
little to nothing in common, and making them use the same code would
be a rather large project, and one that would come after tackling the
"how should it work" and "how can it be fast" parts of the problem.

> You wrote elsewhere in the thread:
> > The authz support bungles the case
> > of a directory copied from a location you aren't allowed to see, it
> > needs to recurse into the directory adding its contents, since you
> > can't expect the caller to have access to the contents.
>
> Er... I don't know about this stuff either, but it strikes me that if the
> caller is not authorised to see data in a particular area then you should not
> send that data to the caller.  (Perhaps abort if you discover that is the
> case.)  Maybe I'm completely misunderstanding, in which case ignore me.

Consider the case where you have two directories, one that is private
and nobody can see its contents, and one that is public and everyone
can see it.

Clearly, if a file is copied from the private section to the public
one people are now allowed to see the file's contents, as of the
revision that it's copied.  Unfortunately you can't just have replay
emit a "copy" operation in this case for two reasons.  First, they
aren't allowed to see the area the copy came from, so you can't tell
them that path existed at all, second, they don't have the source file
contents, so you have to force the text delta to be relative to an
empty file instead of relative to the source file.

The current code handles this case just fine.  It doesn't work right
if you copy a directory full of stuff from the private section to the
public one though.  To do that correctly it will need to recurse
through the directory's contents and issue the appropriate mkdirs and
adds, again giving the client only the current version, dropping
entirely the fact that there was a source version that they are not
allowed to know about.

Ironically, this logic will probably prove useful for implementing the
"replay only a particular subtree" kind of thing that clkao wants for
SVK, because the question of "how do I deal with stuff i'm not allowed
to see" has a very similar answer to the question "how do I deal with
stuff I don't want to replay because it's outside my area of
interest".

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Julian Foad <ju...@btopenworld.com>.

Garrett Rooney wrote:
> On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
> 
>>If I'm understanding the big picture here, the essential problem is
>>that none of the existing svn_ra.h functions give you a complete
>>description of a revision's effects on the tree.  If I've got
>>something confused, please let me know.
>>
>>- svn_ra_get_commit_editor2 carries everything --- prop changes, text
>>changes, rename/copy history --- but it's going in the wrong
>>direction: to the server.  We want data coming from the server.
> 
> Correct.
> 
>>- svn_ra_do_update is going in the right direction, server->client,
>>but doesn't carry copy information, because the WC doesn't care and it
>>costs something to recover that information.
>>- svn_ra_do_diff2 goes in the right direction, but if I'm reading
>>right, it doesn't provide history.
> 
> Both update and diff have two problems.  First, they fail to carry
> along copyfrom info, so you have to get that info from log, if you get

While "update" doesn't care about copy-from info, I'd think that "diff", in 
general, should care about it.  OK, the present implementation of a user-level 
"diff" command doesn't use it, but in order to write "proper" (tree-aware) 
diffs I would think we want that information.

> it from log you then find that you don't actually have enough
> information to derive everything you need, because you can't tell the
> difference between a copy and a copy with additional textual
> modifications.  Second, if I do a copy of a directory the update and
> diff APIs will happily tell me that every single file under those
> directories has been changed.  It really hasn't, but from the point of
> view of the users of diff or update it might as well have, since you
> need that info to generate a diff or update a working copy.

Again, the present implementation of "diff" uses it, and we want that mode to 
be available, but I would like to have a "diff" mode that just says "tree dir1 
was renamed to dir2" and doesn't report all the files inside it.  (Only if the 
rename/move is completely within the reported tree.)

(I hope "update" already does a proper move/rename of a directory on the WC 
side without re-downloading all of the stuff inside it.  Again, only if the 
move/rename is completely within the tree being updated.  Does it?)

So, it would make sense to me to give the present "diff" API those two 
capabilities.  (It sounds like they'd both need to be optional.)

Even if you don't consider doing that for your "replay" purposed, is this 
something we should aim to do anyway? - does it make sense?

>>So your new replay method is a fourth function that sends deltas from
>>client to repository, but that actually gives full information.
> 
> Close, I want all the info, and I want it in a way that actually
> corresponds to the underlying change to the repository filesystem, not
> the percieved change from the outside.

Hmm... I can imagine that even if both of those additional behaviours were 
available from the "diff" API, it might still not correspond as closely as you 
would like to the underlying repository change.

Still, something to think about.

Should, in fact, the above (existing) functions be provided at a _higher level_ 
than your repos_replay?  It seems that perhaps they could all be implemented in 
terms of it?  I'm completely unfamiliar with this area so I'm probably missing 
the fact that those three plus your repos_replay are already all going to be 
implemented on top of a common lower-level API, or something - in other words, 
that you are just adding an API wrapper at this level to functionality that is 
already available lower down.  (A quick brush-off response is fine; I'm not 
ready to learn the details.)

You wrote elsewhere in the thread:
> The authz support bungles the case
> of a directory copied from a location you aren't allowed to see, it
> needs to recurse into the directory adding its contents, since you
> can't expect the caller to have access to the contents.

Er... I don't know about this stuff either, but it strikes me that if the 
caller is not authorised to see data in a particular area then you should not 
send that data to the caller.  (Perhaps abort if you discover that is the 
case.)  Maybe I'm completely misunderstanding, in which case ignore me.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by "C. Michael Pilato" <cm...@collab.net>.

"C. Michael Pilato" <cm...@collab.net> writes:

> Chia-liang Kao <cl...@clkao.org> writes:
> 
> > Before someone get to sort all these out, I am +1 on exporting the
> > existing and useful repos_replay.  This is particularly useful for
> > SVN::MIrror, as it's trying very hard to reverse engineer the actual
> > changes made in a certain revision, and reduce the traffic over ra.
> 
> If I'm not mistaken, it was actually an exploration of SVN::Mirror
> (the complexity of the work it does to try to derive this information,
> plus the fact that in some cases it just flatly gets it wrong due to
> the inexistence of suitable Subversion APIs) that led Garrett to this
> proposal.

Eek.  That could be borderline inflammatory.  Having explored
SVN::Mirror a bit, I really believe it is actually doing the very best
it can possible do given the interfaces it has to work with.  Just
wanted to clarify that.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by "C. Michael Pilato" <cm...@collab.net>.

Chia-liang Kao <cl...@clkao.org> writes:

> Before someone get to sort all these out, I am +1 on exporting the
> existing and useful repos_replay.  This is particularly useful for
> SVN::MIrror, as it's trying very hard to reverse engineer the actual
> changes made in a certain revision, and reduce the traffic over ra.

If I'm not mistaken, it was actually an exploration of SVN::Mirror
(the complexity of the work it does to try to derive this information,
plus the fact that in some cases it just flatly gets it wrong due to
the inexistence of suitable Subversion APIs) that led Garrett to this
proposal.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Chia-liang Kao <cl...@clkao.org>.

Garrett Rooney <rooneg <at> electricjellyfish.net> writes:
> > That's exactly where I was going.  The information Garrett wants
> > should certainly be available, and of course we need to avoid
> > transmitting or recovering information when it's expensive if we're
> > just going to throw it away at the other end, but it doesn't follow
> > that we should make the same data look like four or five different
> > sorts of requests.
> 
> It's certainly possible to extend diff/update to do this, but the main
> reason I haven't gone down that road is that replaying a single
> revision isn't very complicated, it gets somewhat more complex when
> you take into account authz stuff, but still, it's not all that bad. 
> On the other hand, the reporter code is very complicated, and I'm not
> sure that adding another mode of operation there, and increasing that
> complexity, is worth the gain of not adding a new RA API.  Sure, what
> I'm doing does add some complexity to the svn_repos_replay API, but it
> seems that in the end, it still results in less total complexity.

Of course I'd like to see a dir_delta that is capable for giving copy_from
information and we just export that function with ra, rather than the
tailor-made apis for client-level commands.

But it's not there.  And the logic for creating copy_from across many
revision is complicated, what if it's copied from the revision that is
included in the range of the delta?  how about delta on two paths?
Do we map the copy_from information?

Before someone get to sort all these out, I am +1 on exporting the
existing and useful repos_replay.  This is particularly useful for
SVN::MIrror, as it's trying very hard to reverse engineer the actual
changes made in a certain revision, and reduce the traffic over ra.

Cheers,
CLK

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
> On 11/15/05, Branko Čibej <br...@xbc.nu> wrote:
> > The real question here is, why do we need a new API at all? Can't we
> > simply add copyfrom info to ra_update? And wouldn't doing that magically
> > make svn_repos_update and svn_repos_replay identical (for a certain
> > combination of parameters)? Making the API smaller is a good thing.
>
> That's exactly where I was going.  The information Garrett wants
> should certainly be available, and of course we need to avoid
> transmitting or recovering information when it's expensive if we're
> just going to throw it away at the other end, but it doesn't follow
> that we should make the same data look like four or five different
> sorts of requests.

It's certainly possible to extend diff/update to do this, but the main
reason I haven't gone down that road is that replaying a single
revision isn't very complicated, it gets somewhat more complex when
you take into account authz stuff, but still, it's not all that bad. 
On the other hand, the reporter code is very complicated, and I'm not
sure that adding another mode of operation there, and increasing that
complexity, is worth the gain of not adding a new RA API.  Sure, what
I'm doing does add some complexity to the svn_repos_replay API, but it
seems that in the end, it still results in less total complexity.

> (I think it's odd the way URL vs. URL comparisons use a reporter,
> which is allegedly for describing working copies, to specify the
> 'from' tree, but I guess that's just expressing the simple case in
> terms of the harder case, which makes sense.)
>
> So we could have a function that takes an editor and a 'to' URL, and
> provides a reporter (like diff2), but also takes a bitmask indicating
> what sorts of information that editor is interested in: text deltas;
> property deltas; copyfrom info; ... anything else?  The goal would be
> to re-implement all the existing calls sending deltas from repo to
> client in terms of this one call.

This is certainly a possible approach.  I'll take some time tomorrow
and look at what would be required to add it to the reporter code.  It
may turn out to be easier than I expect, and it would give us the
advantage of being able to replay the difference between arbitrary
revisions (i.e. skipping over revisions you don't care about), which
would be an interesting ability to have.

I still suspect that it will be rather complicated to retrofit this
ability into the reporter code, and making that part of the codebase
more complex does not seem like a good idea to me.

-garrett

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Jim Blandy <ji...@red-bean.com>.

On 11/15/05, Branko Čibej <br...@xbc.nu> wrote:
> The real question here is, why do we need a new API at all? Can't we
> simply add copyfrom info to ra_update? And wouldn't doing that magically
> make svn_repos_update and svn_repos_replay identical (for a certain
> combination of parameters)? Making the API smaller is a good thing.

That's exactly where I was going.  The information Garrett wants
should certainly be available, and of course we need to avoid
transmitting or recovering information when it's expensive if we're
just going to throw it away at the other end, but it doesn't follow
that we should make the same data look like four or five different
sorts of requests.

(I think it's odd the way URL vs. URL comparisons use a reporter,
which is allegedly for describing working copies, to specify the
'from' tree, but I guess that's just expressing the simple case in
terms of the harder case, which makes sense.)

So we could have a function that takes an editor and a 'to' URL, and
provides a reporter (like diff2), but also takes a bitmask indicating
what sorts of information that editor is interested in: text deltas;
property deltas; copyfrom info; ... anything else?  The goal would be
to re-implement all the existing calls sending deltas from repo to
client in terms of this one call.

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/16/05, Branko Čibej <br...@xbc.nu> wrote:

> What about the DAV report format itself? Do we need a new report, or can
> we reuse an existing one?

I'm not actually sure yet.  I need to learn more about how the DAV
report format works before I can answer that question.  I made a first
attempt at this last week, but got scared off by libsvn_ra_dav and
mod_dav_svn, then decided that it made more sense to work on solving
the "how does authz fit into this picture" problem first, before
worrying about how this will map onto our DAV infrastructure.

Hopefully I'll be able to finish up the last part of the authz stuff
today or tomorrow, then I'll be able to tackle the DAV problem.

-garrett

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Branko Čibej <br...@xbc.nu>.

Garrett Rooney wrote:
> On 11/16/05, Branko Čibej <br...@xbc.nu> wrote:
>
>   
>> I still don't see what the difference is, semantically, between "replay
>> revision BAR" and "update from / at BAR-1 to / at BAR". I understand
>> that the exact datasets that are sent back today are different, but I
>> still think that replay can be seen as simply a special case of update.
>>     
>
> I agree that it seems tempting to say "why can't we express this all
> in terms of one API", but as clkao and ghudson point out, there are
> complicated issues that would need to be resolved with regard to what
> the behavior would be in many common cases.  The fact that diff and
> update need to work for multiple revision ranges introduces a number
> of problems that a simple replay API just doesn't have to care about.
>
>   
>> The cost of actually changing update in such a way that it can be used
>> to replace replay might be large, perhaps too large to bother at this
>> point. But in the long run, I think it would make sense to at least
>> investigate the possibility.
>>     
>
> Now that I've spent some time looking at the reporter code and
> thinking about how I would add support for replay-like output, I'm
> pretty much totally convinced that it will be both complex to
> implement even the simple cases and difficult to determine what the
> correct behavior should be in the nontrivial cases, which will, of
> course, be even harder to implement.
>   
Fair enough.

What about the DAV report format itself? Do we need a new report, or can 
we reuse an existing one?

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/16/05, Branko Čibej <br...@xbc.nu> wrote:

> I still don't see what the difference is, semantically, between "replay
> revision BAR" and "update from / at BAR-1 to / at BAR". I understand
> that the exact datasets that are sent back today are different, but I
> still think that replay can be seen as simply a special case of update.

I agree that it seems tempting to say "why can't we express this all
in terms of one API", but as clkao and ghudson point out, there are
complicated issues that would need to be resolved with regard to what
the behavior would be in many common cases.  The fact that diff and
update need to work for multiple revision ranges introduces a number
of problems that a simple replay API just doesn't have to care about.

> The cost of actually changing update in such a way that it can be used
> to replace replay might be large, perhaps too large to bother at this
> point. But in the long run, I think it would make sense to at least
> investigate the possibility.

Now that I've spent some time looking at the reporter code and
thinking about how I would add support for replay-like output, I'm
pretty much totally convinced that it will be both complex to
implement even the simple cases and difficult to determine what the
correct behavior should be in the nontrivial cases, which will, of
course, be even harder to implement.

-garrett

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Branko Čibej <br...@xbc.nu>.

C. Michael Pilato wrote:
> Branko Čibej <br...@xbc.nu> writes:
>
>   
>> Garrett Rooney wrote:
>>     
>>> On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
>>>
>>>       
>>>> I guess I'm surprised that we need three ways of doing essentially the
>>>> same thing: comparing revisions in the repository and transmitting
>>>> their differences.
>>>>
>>>>         
>>> It's a combination of providing the right info and avoiding sending
>>> info that isn't required.  Personally, it seems like the fact that we
>>> already have this kind of API underneath (we use it for svnadmin dump,
>>> for example) indicates that there is a use case for exposing this sort
>>> of API, as opposed to requiring people to do backflips to make the
>>> existing APIs bend to their needs.
>>>
>>>       
>> The real question here is, why do we need a new API at all? Can't we
>> simply add copyfrom info to ra_update? And wouldn't doing that
>> magically make svn_repos_update and svn_repos_replay identical (for a
>> certain combination of parameters)? Making the API smaller is a good
>> thing.
>>     
>
> In short, no.  
>
> In medium, of course we could but the maintenance wouldn't justify it.
>
> In long, the update algorithm is entirely different than the replay
> algorithm.  
>
> Update is all about calculating the simplest delta between two
> arbitrary trees, more-or-less ignorant of history, and assuming that
> the editor implementor has no knowledge of the repository other than
> the "source" tree -- it exists to drive the update of a working copy.
> The arbitrariness of the trees allows for them to be rooted anywhere
> (not just '/' ... not even at history-related places!).  The ra_update
> UI is reporter-based, intentionally empowering it to build just such
> an arbitrary tree.
>
> The replay algorithm is rooted at '/', assumes the editor has full
> knowledge of the repository, and attempts to replay the normalized
> actions of a single revision regardless of the efficiency of those
> actions.  There is no need for a reporter in this interface, as the
> source tree must comform to very strict standards (single revision,
> rooted at '/', etc.)
>
> I suppose you could shoehorn in something, but I frankly don't see the
> point.
>   
I still don't see what the difference is, semantically, between "replay 
revision BAR" and "update from / at BAR-1 to / at BAR". I understand 
that the exact datasets that are sent back today are different, but I 
still think that replay can be seen as simply a special case of update.

The cost of actually changing update in such a way that it can be used 
to replace replay might be large, perhaps too large to bother at this 
point. But in the long run, I think it would make sense to at least 
investigate the possibility.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by "C. Michael Pilato" <cm...@collab.net>.

Branko Čibej <br...@xbc.nu> writes:

> Garrett Rooney wrote:
> > On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
> >
> >> I guess I'm surprised that we need three ways of doing essentially the
> >> same thing: comparing revisions in the repository and transmitting
> >> their differences.
> >>
> >
> > It's a combination of providing the right info and avoiding sending
> > info that isn't required.  Personally, it seems like the fact that we
> > already have this kind of API underneath (we use it for svnadmin dump,
> > for example) indicates that there is a use case for exposing this sort
> > of API, as opposed to requiring people to do backflips to make the
> > existing APIs bend to their needs.
> >
> The real question here is, why do we need a new API at all? Can't we
> simply add copyfrom info to ra_update? And wouldn't doing that
> magically make svn_repos_update and svn_repos_replay identical (for a
> certain combination of parameters)? Making the API smaller is a good
> thing.

In short, no.  

In medium, of course we could but the maintenance wouldn't justify it.

In long, the update algorithm is entirely different than the replay
algorithm.  

Update is all about calculating the simplest delta between two
arbitrary trees, more-or-less ignorant of history, and assuming that
the editor implementor has no knowledge of the repository other than
the "source" tree -- it exists to drive the update of a working copy.
The arbitrariness of the trees allows for them to be rooted anywhere
(not just '/' ... not even at history-related places!).  The ra_update
UI is reporter-based, intentionally empowering it to build just such
an arbitrary tree.

The replay algorithm is rooted at '/', assumes the editor has full
knowledge of the repository, and attempts to replay the normalized
actions of a single revision regardless of the efficiency of those
actions.  There is no need for a reporter in this interface, as the
source tree must comform to very strict standards (single revision,
rooted at '/', etc.)

I suppose you could shoehorn in something, but I frankly don't see the
point.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Jim Blandy <ji...@red-bean.com>.

On 11/16/05, Greg Hudson <gh...@mit.edu> wrote:
> On Wed, 2005-11-16 at 17:02 -0800, Jim Blandy wrote:
> > We must be disconnected.  How can it possibly not be well-defined to
> > compose deltas?
>
> Because our deltas aren't expressed wholly in terms of the previously
> existing contents of the tree.  If we have:
>
>   r1: add foo (no history)
>   r2: delete foo
>   r3: add foo (copy history foo@1)
>   r4: add bar (copy history foo@1)
>
> Then what is the composition of r1-r4?

By 'r1-r4' you mean all four revisions, right?

If 'copy with history' could refer to a file added in the same delta,
the composition would be:
- add foo (no history)
- add bar (copied from our new foo)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Greg Hudson <gh...@MIT.EDU>.

On Wed, 2005-11-16 at 17:02 -0800, Jim Blandy wrote:
> We must be disconnected.  How can it possibly not be well-defined to
> compose deltas?

Because our deltas aren't expressed wholly in terms of the previously
existing contents of the tree.  If we have:

  r1: add foo (no history)
  r2: delete foo
  r3: add foo (copy history foo@1)
  r4: add bar (copy history foo@1)

Then what is the composition of r1-r4?

I believe most modern SCMs, particularly ones that have to deal with
distributed operation, restrict deltas to referring to the previously
existing contents of the tree.  That means you can't express the
resurrection of a file, but it makes composition well-defined.  Although
I believe they also tend not to do delta composition so much, and just
transmit whole batches of deltas around.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Jim Blandy <ji...@red-bean.com>.

On 11/16/05, Greg Hudson <gh...@mit.edu> wrote:
> I don't quite understand what you're getting at here.  We could
> implement update in terms of this new svn_repos_replay API, but it would
> mean having to download the entire Subversion repository in order to
> check out the head revision, and that would be too inefficient.
>
> Our schema is structured so that it's efficient to compare two trees in
> such a way as to update a working copy, but I'm not sure if it's
> efficient (or even necessarily well-defined) to combine multiple revs
> into what looks like a single commit.

We must be disconnected.  How can it possibly not be well-defined to
compose deltas?

(I don't mean to soak up people's time further here.  The burden is on
me to show that something cool can be done, and I don't have time at
the moment to pull that together.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Greg Hudson <gh...@MIT.EDU>.

On Wed, 2005-11-16 at 14:58 -0800, Jim Blandy wrote:
> But fundamentally, the data in the repository is the only thing that's
> real.  If you have an API that can accurately and selectively describe
> what's in the repository, then *by definition* you can do update and
> other hairy things in terms of it.  The algorithm can't possibly
> matter, because it's driven by the same data in the end.

I don't quite understand what you're getting at here.  We could
implement update in terms of this new svn_repos_replay API, but it would
mean having to download the entire Subversion repository in order to
check out the head revision, and that would be too inefficient.

Our schema is structured so that it's efficient to compare two trees in
such a way as to update a working copy, but I'm not sure if it's
efficient (or even necessarily well-defined) to combine multiple revs
into what looks like a single commit.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Jim Blandy <ji...@red-bean.com>.

On 11/16/05, Greg Hudson <gh...@mit.edu> wrote:
> While I'm all for having a simple API, I think that asking Garrett to
> undertake two research projects for the sake of a simpler API falls
> clearly on the "best is the enemy of the good" side of the line.

I'm sympathetic to this argument.  Repository replication is an
important goal, and it sounds like the "simplification" would be a
substantial increase in the amount of work; I don't want to wear out
Garrett's inspiration.

What's more, I don't actually have in mind clear answers to the
mixed-version WC questions.  I have the feeling that there should be a
"you're already doing something much harder, just express this as a
special case" kind of argument, but inklings are cheap, and I'm
unfamiliar with the struggles you folks all went through getting that
to work.

But fundamentally, the data in the repository is the only thing that's
real.  If you have an API that can accurately and selectively describe
what's in the repository, then *by definition* you can do update and
other hairy things in terms of it.  The algorithm can't possibly
matter, because it's driven by the same data in the end.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Greg Hudson <gh...@MIT.EDU>.

On Wed, 2005-11-16 at 03:09 +0100, Branko Čibej wrote:
> The real question here is, why do we need a new API at all? Can't we 
> simply add copyfrom info to ra_update?

I think the flaw here is in the word "simply".

I believe that simply defining what copyfrom information means for a
diff across multiple revisions (to say nothing of a diff starting from a
complicated working copy) is a research project.

Once you've done that, figuring out how to efficiently compute that
information given our current repository schema is also, I believe, a
research project.

Giving copyfrom information for a single revision, on the other hand, is
easy.  The meaning is clear (you give exactly what information was
specified at commit time for the revision), and the algorithm is clear
(you use the revision's changes table).

While I'm all for having a simple API, I think that asking Garrett to
undertake two research projects for the sake of a simpler API falls
clearly on the "best is the enemy of the good" side of the line.  And
I'm not even sure the "simpler" API would be best; if the semantics of
copyfrom information over multiple revisions are sufficiently dodgy, the
resulting API might have a single function with highly obscure semantics
for some calling cases.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Branko Čibej <br...@xbc.nu>.

Garrett Rooney wrote:
> On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
>   
>> I guess I'm surprised that we need three ways of doing essentially the
>> same thing: comparing revisions in the repository and transmitting
>> their differences.
>>     
>
> It's a combination of providing the right info and avoiding sending
> info that isn't required.  Personally, it seems like the fact that we
> already have this kind of API underneath (we use it for svnadmin dump,
> for example) indicates that there is a use case for exposing this sort
> of API, as opposed to requiring people to do backflips to make the
> existing APIs bend to their needs.
>   
The real question here is, why do we need a new API at all? Can't we 
simply add copyfrom info to ra_update? And wouldn't doing that magically 
make svn_repos_update and svn_repos_replay identical (for a certain 
combination of parameters)? Making the API smaller is a good thing.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/15/05, Jim Blandy <ji...@red-bean.com> wrote:
> If I'm understanding the big picture here, the essential problem is
> that none of the existing svn_ra.h functions give you a complete
> description of a revision's effects on the tree.  If I've got
> something confused, please let me know.
>
> - svn_ra_get_commit_editor2 carries everything --- prop changes, text
> changes, rename/copy history --- but it's going in the wrong
> direction: to the server.  We want data coming from the server.

Correct.

> - svn_ra_do_update is going in the right direction, server->client,
> but doesn't carry copy information, because the WC doesn't care and it
> costs something to recover that information.
> - svn_ra_do_diff2 goes in the right direction, but if I'm reading
> right, it doesn't provide history.

Both update and diff have two problems.  First, they fail to carry
along copyfrom info, so you have to get that info from log, if you get
it from log you then find that you don't actually have enough
information to derive everything you need, because you can't tell the
difference between a copy and a copy with additional textual
modifications.  Second, if I do a copy of a directory the update and
diff APIs will happily tell me that every single file under those
directories has been changed.  It really hasn't, but from the point of
view of the users of diff or update it might as well have, since you
need that info to generate a diff or update a working copy.

> So your new replay method is a fourth function that sends deltas from
> client to repository, but that actually gives full information.

Close, I want all the info, and I want it in a way that actually
corresponds to the underlying change to the repository filesystem, not
the percieved change from the outside.

> I guess I'm surprised that we need three ways of doing essentially the
> same thing: comparing revisions in the repository and transmitting
> their differences.

It's a combination of providing the right info and avoiding sending
info that isn't required.  Personally, it seems like the fact that we
already have this kind of API underneath (we use it for svnadmin dump,
for example) indicates that there is a use case for exposing this sort
of API, as opposed to requiring people to do backflips to make the
existing APIs bend to their needs.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Jim Blandy <ji...@red-bean.com>.

If I'm understanding the big picture here, the essential problem is
that none of the existing svn_ra.h functions give you a complete
description of a revision's effects on the tree.  If I've got
something confused, please let me know.

- svn_ra_get_commit_editor2 carries everything --- prop changes, text
changes, rename/copy history --- but it's going in the wrong
direction: to the server.  We want data coming from the server.
- svn_ra_do_update is going in the right direction, server->client,
but doesn't carry copy information, because the WC doesn't care and it
costs something to recover that information.
- svn_ra_do_diff2 goes in the right direction, but if I'm reading
right, it doesn't provide history.

So your new replay method is a fourth function that sends deltas from
client to repository, but that actually gives full information.

I guess I'm surprised that we need three ways of doing essentially the
same thing: comparing revisions in the repository and transmitting
their differences.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] expose svn_repos_replay via the RA API, Take 2

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On 11/15/05, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> If someone could take a good hard look at the authz support I'd
> appreciate it.  I think I've managed to get it right, and it seems to
> work, but it's kind of a pain, so the more eyes on it the better.

Predictably, there is at least one problem in the code that occurred
to me on the drive home from work.  The authz support bungles the case
of a directory copied from a location you aren't allowed to see, it
needs to recurse into the directory adding its contents, since you
can't expect the caller to have access to the contents.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org