You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Sperling <st...@elego.de> on 2011/02/23 16:32:17 UTC

initial thoughts on issue #3818

I filed a new issue today (issue #3818, "fix handling of externals in
wc-ng" http://subversion.tigris.org/issues/show_bug.cgi?id=3818).

I had a brief chat with sbutler at the elego office before filing
this issue. Below are basic ideas we've had for approaching the
problem. Feedback appreciated!

The basic problem we have in the current design is as follows:

Given a local_abspath, there's an ambiguity about which wcroots
are associated with it when externals are involved.
E.g. given the path was foo/bar/baz, where bar has an svn:externals
property that configures an external to be downloaded into the
folder 'baz'. When we try to find a wcroot for this path, there are
two possible outcomes.  One is the wcroot of the relpath 'foo',
the other is the wcroot of the external itself ('baz').

     (svn:externals ^/branches/foo baz)
         |
 .../foo/bar/baz
 o----------------wcroot1
             o----wcroot2

The ambiguity arises because the local_abspath of wcroot2 is also
a local relpath within wcroot1. So, ideally, we should decouple the
concept of a wcroot from the path. We could tie it to a wc_id instead.
This would allow us to use a single wc.db to manage several wcroots,
one for the parent working copy, and more for any externals within
this working copy.

We can look at this set of wcroots as a tree with the root node being
the wcroot of the parent working copy and any wcroots for externals
within it being children of the root node. This way we can also
easily represent externals nested within externals (children of
children). I'm thinking about using this tree abstraction in a new API
for wcroots in libsvn_wc. It would allow usual tree operations (insert,
delete, iterate nodes etc.) to manage wcroots. Every node apart from
the root node represents an external.

This model could later be extended to support multiple trees of wcroots
when we start managing more than one working copy within a single wc.db.
(Steve also suggested to use this feature later for storing different
versions of the conflicting trees involved in tree conflicts.)

To support this tree abstraction within wc.db we'd need a way of
identifying a local_relpath as the root of an external, and obtaining
the wc_id of this external. Any children of this local_relpath would
carry this wc_id. A nested external would work the same way. It changes
the wc_id again for a subtree of the external.

(Most of?) our existing queries already make use of the wc_id, so they
work within the parent or within an external. The calling code can
decide which working copy to operate on by passing the appropriate wc_id
(or maybe a wcroot obtained from the tree abstraction API).

Does anyone see any serious issues with this approach?  Thanks!

Re: initial thoughts on issue #3818

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 05:50:57PM +0000, Philip Martin wrote:
> Stefan Sperling <st...@elego.de> writes:
> 
> > Given a local_abspath, there's an ambiguity about which wcroots
> > are associated with it when externals are involved.
> > E.g. given the path was foo/bar/baz, where bar has an svn:externals
> > property that configures an external to be downloaded into the
> > folder 'baz'. When we try to find a wcroot for this path, there are
> > two possible outcomes.  One is the wcroot of the relpath 'foo',
> > the other is the wcroot of the external itself ('baz').
> >
> >      (svn:externals ^/branches/foo baz)
> >          |
> >  .../foo/bar/baz
> >  o----------------wcroot1
> >              o----wcroot2
> >
> > The ambiguity arises because the local_abspath of wcroot2 is also
> > a local relpath within wcroot1. So, ideally, we should decouple the
> > concept of a wcroot from the path. We could tie it to a wc_id instead.
> 
> It?
> 
> We could tie a wcroot to a wc_id instead.

The above. A wc_id maps to a unique wcroot and vice-versa.

> We could tie a path to a wc_id instead.      This one?

No, because if externals are in use, a path can map to more than one wc_id
(the ID of the parent WC, or the ID of the external WC).

> So you are planning to move the admin date for svn:externals into the
> wc.db in the parent/root working copy?  And use a distinct wc_id for
> each external?  It sound plausible, and I think having svn:externals in
> a single wc.db is the way forward.  It's a bit like "switch and change
> the wc_id as well".

Yes. And that allows us to tell apart switched paths from externals
(see my other reply).

> I don't really see how it solves the original problem: for a given path
> which wc_id applies?

You can never know the answer to this question if you're just given a path.
That's why we need either a wcroot or a wc_id to go along with a path to
resolve ambiguity.

> I suppose having only one database makes it easier
> to solve.  Having identified a wc_id are you planning to pass a
> wc_id/relpath pair around?

Or a wcroot/relpath pair. Or something similar that encodes enough
information (suggestions welcome).

> I'm not really sure exactly what you are proposing.  How does a wcroot
> differ from a wc_id?

The wcroot (svn_wc__db_wcroot_t to be precise) is an object that contains
the wc_id among other things. So we could have an API that returns the
wcroot given a wc_id, i.e. once you know the ID you can get at the root
and then at more information about the WC.

> Does wcroot refer to svn_wc__db_wcroot_t?  Or to some more general
> "directory containing a .svn"?

To a svn_wc__db_wcroot_t.

A .svn directory contains a wc.db which in this new model would manage
data for several working copies. Every one of those working copies has
a wcroot object associated with it.

The working copies can be nested in case there are externals involved
(i.e. a tree structure of nested wcroots represents a working copy
which contains externals).

Initially (in 1.7) we would only support the nested case to support
svn:externals. Later we can extend this design to cover more use cases,
e.g. allowing users to use a shared meta data area (e.g. ~/.svn) for
all or a subset of their working copies.

Is it more clear now?

And is there anything I'm overlooking that might not work for some reason?

Re: initial thoughts on issue #3818

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> Given a local_abspath, there's an ambiguity about which wcroots
> are associated with it when externals are involved.
> E.g. given the path was foo/bar/baz, where bar has an svn:externals
> property that configures an external to be downloaded into the
> folder 'baz'. When we try to find a wcroot for this path, there are
> two possible outcomes.  One is the wcroot of the relpath 'foo',
> the other is the wcroot of the external itself ('baz').
>
>      (svn:externals ^/branches/foo baz)
>          |
>  .../foo/bar/baz
>  o----------------wcroot1
>              o----wcroot2
>
> The ambiguity arises because the local_abspath of wcroot2 is also
> a local relpath within wcroot1. So, ideally, we should decouple the
> concept of a wcroot from the path. We could tie it to a wc_id instead.

It?

We could tie a wcroot to a wc_id instead.
We could tie a path to a wc_id instead.      This one?

> This would allow us to use a single wc.db to manage several wcroots,
> one for the parent working copy, and more for any externals within
> this working copy.
>
> We can look at this set of wcroots as a tree with the root node being
> the wcroot of the parent working copy and any wcroots for externals
> within it being children of the root node. This way we can also
> easily represent externals nested within externals (children of
> children). I'm thinking about using this tree abstraction in a new API
> for wcroots in libsvn_wc. It would allow usual tree operations (insert,
> delete, iterate nodes etc.) to manage wcroots. Every node apart from
> the root node represents an external.
>
> This model could later be extended to support multiple trees of wcroots
> when we start managing more than one working copy within a single wc.db.
> (Steve also suggested to use this feature later for storing different
> versions of the conflicting trees involved in tree conflicts.)
>
> To support this tree abstraction within wc.db we'd need a way of
> identifying a local_relpath as the root of an external, and obtaining
> the wc_id of this external. Any children of this local_relpath would
> carry this wc_id. A nested external would work the same way. It changes
> the wc_id again for a subtree of the external.
>
> (Most of?) our existing queries already make use of the wc_id, so they
> work within the parent or within an external. The calling code can
> decide which working copy to operate on by passing the appropriate wc_id
> (or maybe a wcroot obtained from the tree abstraction API).
>
> Does anyone see any serious issues with this approach?  Thanks!

So you are planning to move the admin date for svn:externals into the
wc.db in the parent/root working copy?  And use a distinct wc_id for
each external?  It sound plausible, and I think having svn:externals in
a single wc.db is the way forward.  It's a bit like "switch and change
the wc_id as well".

I don't really see how it solves the original problem: for a given path
which wc_id applies?  I suppose having only one database makes it easier
to solve.  Having identified a wc_id are you planning to pass a
wc_id/relpath pair around?

I'm not really sure exactly what you are proposing.  How does a wcroot
differ from a wc_id?

Does wcroot refer to svn_wc__db_wcroot_t?  Or to some more general
"directory containing a .svn"?

-- 
Philip

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 04:13:21PM +0000, Julian Foad wrote:
> We need a handle that we can pass around that references the whole
> nesting of WCs (where "WC" is defined as the scope of a single wcroot).
> That could be a new thing that we can invent, but I wonder if instead it
> would be reasonable to extend the functionality of the existing "wcroot"
> object so that it knows about its place within a hierarchy of parent and
> child wcroots.

Hmmm...
What you are saying above sounds just like the hierarchy of wcroots
we are proposing, doesn't it?

I'm not planning on inventing a new data type for wcroot, unless
that's really needed. It's likely that the existing types and APIs
could be used directly, or with small modifications, to support this idea.

Note that I would like the relationship to be tracked both ways.
I.e. given a wcroot of the parent WC you can discover all its child
working copies (i.e. all the externals within it). And given a wcroot
of an external it should be easy to find the wcroot for its parent WC.

We should design the data structures and APIs in a way that doesn't assume
too much about the way the code will want to interpret and manipulate
information about externals (to avoid problems such as we have with copyfrom,
where it was discovered much later that having a one-way relationship
really hurts for some use cases, and it's really hard to fix retroactively).

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Julian Foad <ju...@wandisco.com>.
On Wed, 2011-02-23, Julian Foad wrote:
> Stefan Sperling wrote:
> >      (svn:externals ^/branches/foo baz)
> >          |
> >  .../foo/bar/baz
> >  o----------------wcroot1
> >              o----wcroot2
> > 
> > The ambiguity arises because the local_abspath of wcroot2 is also
> > a local relpath within wcroot1.
> 
> 'baz' is not a versioned node within wcroot1 so I don't see any
> ambiguity.  I thought the intended interpretation was that each external
> tree is a separate WC, so, working with your example, I would expect
> each abspath to have an unambiguous interpretation as follows:
> 
> abspath .../foo         => wcroot1, relpath ".../foo"
> abspath .../foo/bar     => wcroot1, relpath ".../foo/bar"
> abspath .../foo/bar/baz => wcroot2, relpath ""
> 
> What is wrong with that approach?  [...]

Sorry, I spoke hastily.  I thought you meant ambiguity was the core of
the problem, but reading through your description in the issue, it looks
like the problem is more the difficulty of navigating "up" from an
external WC to its parent WC.  When we pass around a "wcroot" object,
that provides no straightforward way to navigate up to a parent WC, nor
even to know that the WC is "external".

We need a handle that we can pass around that references the whole
nesting of WCs (where "WC" is defined as the scope of a single wcroot).
That could be a new thing that we can invent, but I wonder if instead it
would be reasonable to extend the functionality of the existing "wcroot"
object so that it knows about its place within a hierarchy of parent and
child wcroots.

Just thinking out loud.

- Julian



Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Branko Čibej <br...@e-reka.si>.
On 24.02.2011 11:45, Johan Corveleyn wrote:
> - Maybe intra-repos externals would benifit from a new name, e.g.
> "internals" ;-)?

Symbolic links. :) Except that the name is already taken.

-- Brane

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Feb 23, 2011 at 5:08 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote:
> On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
>> (I've appended the issue subject to the subject line.)
>>
>> Stefan Sperling wrote:
>>> I filed a new issue today (issue #3818, "fix handling of externals in
>>> wc-ng" http://subversion.tigris.org/issues/show_bug.cgi?id=3818).
>>>
>>> I had a brief chat with sbutler at the elego office before filing
>>> this issue. Below are basic ideas we've had for approaching the
>>> problem. Feedback appreciated!
>>>
>>> The basic problem we have in the current design is as follows:
>>>
>>> Given a local_abspath, there's an ambiguity about which wcroots
>>> are associated with it when externals are involved.
>>> E.g. given the path was foo/bar/baz, where bar has an svn:externals
>>> property that configures an external to be downloaded into the
>>> folder 'baz'. When we try to find a wcroot for this path, there are
>>> two possible outcomes.  One is the wcroot of the relpath 'foo',
>>> the other is the wcroot of the external itself ('baz').
>>>
>>>      (svn:externals ^/branches/foo baz)
>>>          |
>>>  .../foo/bar/baz
>>>  o----------------wcroot1
>>>              o----wcroot2
>>>
>>> The ambiguity arises because the local_abspath of wcroot2 is also
>>> a local relpath within wcroot1.
>>
>> 'baz' is not a versioned node within wcroot1 so I don't see any
>> ambiguity.  I thought the intended interpretation was that each external
>> tree is a separate WC, so, working with your example, I would expect
>> each abspath to have an unambiguous interpretation as follows:
>>
>> abspath .../foo         => wcroot1, relpath ".../foo"
>> abspath .../foo/bar     => wcroot1, relpath ".../foo/bar"
>> abspath .../foo/bar/baz => wcroot2, relpath ""
>>
>> What is wrong with that approach?  (Let's assume 'baz' is a directory.
>> For file externals, this doesn't work so neatly, since a file can't
>> currently be a WC root, so 'wcroot2' would have to be interpreted
>> differently or faked or something.)
>>
>> - Julian
>>
>>
>>>  So, ideally, we should decouple the
>>> concept of a wcroot from the path. We could tie it to a wc_id instead.
>>> This would allow us to use a single wc.db to manage several wcroots,
>>> one for the parent working copy, and more for any externals within
>>> this working copy.
>
> I've always thought we could handle it a different way: the externals
> are just part of the same WC as the directories which contain them,
> it's just the repos_id and/or repos_relpath which happen to point to a
> disjoint location.  In other words, externals are nothing more than
> switched paths, save for the fact that their existence is communicable
> to other clients via the properties (whereas "pure" switched paths are
> single-client-only).

Interesting discussion. I thought (directory) externals were already
implemented this way, i.e. the client automatically performs a switch
upon processing the externals.

Because if you interrupt an "svn update" after changing external
definitions, the external parts that were not already updated appear
as switched (and you can fix the half-hearted situation by running
"svn switch" on the not-yet-processed externals to switch them to the
correct external position). This is the same for both intra-repos
externals as foreign-repos externals (and the workaround also works in
both cases (except that you need --relocate if the change of externals
changed the repos location)).

This is described in this issue:
http://subversion.tigris.org/issues/show_bug.cgi?id=3751 (Interrupting
an update after change of externals leaves working copy in
half-switched state)

FWIW:
- I'd really like for commit (and anything else) to recurse into
intra-repos externals by default (or if not by default, at least
support it as an option).

- I agree the situation may be different for foreign-repos externals.

- Maybe the concepts of intra-repos externals vs. foreign-repos
externals should be fleshed out more, and be handled differently. A
lot of things that make sense for intra-repos externals don't make
sense for foreign-repos externals and vice versa (like recursing, and
reusing the same connections, or also oft requested: configuring an
external to point to the same revision as its parent).

- Maybe intra-repos externals would benifit from a new name, e.g.
"internals" ;-)?

Just my 0.02c ...
-- 
Johan

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Feb 23, 2011 at 15:16, Stefan Sperling <st...@elego.de> wrote:
> On Wed, Feb 23, 2011 at 08:52:18PM +0100, Bert Huijben wrote:
>> I think just seeing every node below a workingcopy as a node in the parent
>> working copy will make things harder instead of simpler. So I would suggest
>> moving the redefining of normal externals to 1.8. (But I still think we have
>> to fix some file external issues for 1.7).
>
> I understand your concerns, but I'm afraid fixing file externals isn't
> the only problem we have :(
>
> As explained in issue #3818, flagging a tree conflict on an external
> is broken, which is a regression from 1.6.x.
> Trying to fix that, it's easy to see that other things like status on
> conflicted externals is also broken. Basically, every caller of
> svn_wc__db_wcroot_parse_local_abspath() has potential problems with
> opening the wrong wc.db that need to be worked around.

I believe the core of this issue is the "anchor/target" problem that
was discovered and designed back in the early Subversion days.

When you refer to "foo/bar/baz", are you asking about baz itself? or
are you asking about the child in bar?

In this particular scenario, conflict data is recorded in the parent,
so the query is not just a path, but a child within a path.

As Philip seemed to imply, introducing a wc_id or wcroot isn't going
to solve the problem. You're still given an abspath. I believe the
missing piece is the intentions around that path.

> I'd rather try to fix externals properly than adding workarounds.
> It seems as if there was an assumption that externals would just
> continue to work as they did in 1.6. But at least in this one case,
> we already know that this isn't true. How likely is it that there
> are more problems lurking that our test suite isn't currently exposing?

Honestly, I would recommend getting these things working as the main
priority. We can continue to revisit later. The new datastore can
represent a lot more concepts than before (elsethread, there are
discussions about switch vs externals and multi-repos... the new db
can have every item from a different repos; we just don't have the UI
model to present that to the user; but we could!)

>...
> So, yes, it might be less effort to add workarounds for known-broken
> cases than changing the way externals are handled in a more fundamental way.
> Would you rather go the route of adding workarounds for now?
> I could try that, but it doesn't feel very reassuring because I'd like
> 1.7 to be a very good quality release.

Agreed, but I'm seeing this as a query issue rather than a modeling
issue. You're right that an abspath is not enough. But I think you're
looking in the wrong place (wc_id or wcroot_t) to solve the problem.

In the anchor/target design, you have several possible outcomes:

1) anchor=foo/bar, target="": we're asking about the bar subdir
2) anchor=foo, target="bar": we're asking about the file bar
3) anchor=foo, target="bar": we're asking about the subdir bar from
the parent foo

In the old adm_access model, these different types of queries were
managed by where the adm_access pointed to. In (3), you're getting the
"stub" information. In (1), the access_t pointed to the primary
information in the subdir. In (2), you're getting file information
from the parent dir's access_t.

> I also don't think that fixing file externals in the current 'design'
> (or lack of it) will be easy. So we might as well try a new design.
> We can still consider adding workarounds when the proper solution
> takes too long to mature to make the release in time.
> I'm happy to try this out on a branch first to avoid making things
> on trunk any worse than they are now.

Does the above help to focus your thoughts?

In particular, where exactly are the symptoms? Are they actually
widespread? Or can we solve some targeted cases and move along?

Cheers,
-g

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 08:52:18PM +0100, Bert Huijben wrote:
> I think just seeing every node below a workingcopy as a node in the parent
> working copy will make things harder instead of simpler. So I would suggest
> moving the redefining of normal externals to 1.8. (But I still think we have
> to fix some file external issues for 1.7).

I understand your concerns, but I'm afraid fixing file externals isn't
the only problem we have :(

As explained in issue #3818, flagging a tree conflict on an external
is broken, which is a regression from 1.6.x.
Trying to fix that, it's easy to see that other things like status on
conflicted externals is also broken. Basically, every caller of
svn_wc__db_wcroot_parse_local_abspath() has potential problems with
opening the wrong wc.db that need to be worked around.

I'd rather try to fix externals properly than adding workarounds.
It seems as if there was an assumption that externals would just
continue to work as they did in 1.6. But at least in this one case,
we already know that this isn't true. How likely is it that there
are more problems lurking that our test suite isn't currently exposing?

So, yes, it might be less effort to add workarounds for known-broken
cases than changing the way externals are handled in a more fundamental way.
Would you rather go the route of adding workarounds for now?
I could try that, but it doesn't feel very reassuring because I'd like
1.7 to be a very good quality release.

I also don't think that fixing file externals in the current 'design'
(or lack of it) will be easy. So we might as well try a new design.
We can still consider adding workarounds when the proper solution
takes too long to mature to make the release in time.
I'm happy to try this out on a branch first to avoid making things
on trunk any worse than they are now.

RE: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Hyrum K Wright [mailto:hyrum@hyrumwright.org]
> Sent: woensdag 23 februari 2011 19:47
> To: Julian Foad; b@ted.stsp.name; Hyrum K Wright;
> dev@subversion.apache.org
> Cc: Stefan Sperling
> Subject: Re: initial thoughts on issue #3818: fix handling of externals in
wc-ng
> 
> On Wed, Feb 23, 2011 at 12:26 PM, Stefan Sperling <st...@elego.de> wrote:
> > On Wed, Feb 23, 2011 at 04:26:52PM +0000, Julian Foad wrote:
> >> Hyrum K Wright wrote:
> >> > On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad
> <ju...@wandisco.com> wrote:
> >> > > Stefan Sperling wrote:
> >> > >>  So, ideally, we should decouple the
> >> > >> concept of a wcroot from the path. We could tie it to a wc_id
instead.
> >> > >> This would allow us to use a single wc.db to manage several
wcroots,
> >> > >> one for the parent working copy, and more for any externals within
> >> > >> this working copy.
> >> >
> >> > I've always thought we could handle it a different way: the externals
> >> > are just part of the same WC as the directories which contain them,
> >> > it's just the repos_id and/or repos_relpath which happen to point to
a
> >> > disjoint location.  In other words, externals are nothing more than
> >> > switched paths, save for the fact that their existence is
communicable
> >> > to other clients via the properties (whereas "pure" switched paths
are
> >> > single-client-only).
> >> >
> >> > Since all the plumbing for switched paths already exists, we should
> >> > just be able to reuse it for the externals cases.  In fact, as
> >> > currently implemented, switch can do some *really* interesting
things,
> >> > which Philip could probably better illuminate than me.
> >> >
> >> > In short, I don't think the answer is a set of wcroots but rather one
> >> > wcroot with a set of nodes (possibly pointing to various repos).
> >>
> >> +1 to that.  In fact, I wrote about that idea a while back:
> >>
> >>   Subject: [RFC] 'External' and 'Switched': common ground
> >>   From: Julian Foad <julian.foad_at_wandisco.com>
> >>   Date: Fri, 20 Aug 2010 18:54:09 +0100
> >>   <http://svn.haxx.se/dev/archive-2010-08/0529.shtml>
> >>
> >> There are a few intentional differences in the way external directories
> >> (and files) behave, compared with switched nodes, but it looks far more
> >> sensible to implement them that way.
> >
> > We need to preserve 1.6 semantics of how operations affect externals.
> 
> Why?  People have been wanting to commit across externals and working
> copies for a *long* time.  Putting them all in the same WC gives us
> this functionality for free (as well as other currently-not-atomic
> operations).  Perserving these same limiting 1.6 semantics is
> something of a hard sell.

Because externals can come from different repositories and it would be very
caller unfriendly to return nodes from different repositories as if they are
from the same repository.

That way every loop throught the direct children of a directory would have
to verify the repository of every node. I think we should try to make the db
queries simpler (and faster) instead of heavier.

And we can't change the current behavior of externals on the libsvn_client
api or we would break all third party applications. (I think we can safely
invent a new type of externals).


I think just seeing every node below a workingcopy as a node in the parent
working copy will make things harder instead of simpler. So I would suggest
moving the redefining of normal externals to 1.8. (But I still think we have
to fix some file external issues for 1.7).

Whether or not externals are stored in the same db+pristine store as the
parent working copy is a different topic and that is very easy compared to
redefining what a 'child node' is. If somebody wants to do  that for 1.7 he
has my +1.

	Bert


Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 12:46:44PM -0600, Hyrum K Wright wrote:
> On Wed, Feb 23, 2011 at 12:26 PM, Stefan Sperling <st...@elego.de> wrote:
> > We need to preserve 1.6 semantics of how operations affect externals.
> 
> Why?  People have been wanting to commit across externals and working
> copies for a *long* time.  Putting them all in the same WC gives us
> this functionality for free (as well as other currently-not-atomic
> operations).  Perserving these same limiting 1.6 semantics is
> something of a hard sell.

I would argue that we need to keep the existing behaviour working,
preferably as the default. If people want different behaviour, e.g. commit
across externals, it's possible to make that happen regardless of which
design we use to represent externals internally of libsvn_wc.

I think you're missing an important difference between switched paths
and externals, just like we did with file externals.

Externals can come from a different repository, while switched paths cannot.

At the UI level, a switch can be used as a shortcut to avoid a long checkout,
or to quickly commit a set of local changes to a new temporary branch in
case that aren't yet suitable for commit to the branch the working copy
originally came from. A switch is temporary and local to the working copy.

This is very different to the 'modules' concept externals implement.
Participants in introductory SVN courses I give are usually much more
interested in svn:externals than in svn switch. They immediately want
to use externals for very different use cases (e.g. pulling meta-project
components together from different repositories, or even for managing
variants of their products). An external is used as a permanent reference
to possibly foreign repositories.

I think it's a good idea to keep these concepts entirely separate.

> > If we cannot easily tell the difference between a switched path
> > and an external, that is bad design, IMHO, and will lead to problems
> > down the road just like we had with file externals in 1.6.
> > Let's not repeat that mistake. Externals are special and need their own,
> > unique, representation in our design.
> 
> I submit that this may not be true.  They have given us so many
> headaches precisely because we claim they are special.  Treating
> externals as switched nodes will eliminate a number of these special
> cases.

It's entirely possible that people might want Subversion to treat external
references from foreign repositories different than switched paths (which
by definition come from the same repository as the rest of the working copy).

The most obvious example is that they might want to auto-commit to
switched paths, but not to foreign repositories referenced by externals.
For instance, say I don't have commit access to the repository the
external came from. Should svn try to commit to it, and fail every time,
with no way to turn this off? I would say "no".

> (And while I think this is a good discussion to have, I'm hopeful it
> doesn't lead to further delays in shipping 1.7.)

I think it's very good to have this discussion now, whether or not
it delays the release. It should probably have happened sooner :)
We cannot ship 1.7 with a broken externals implementation anyway.

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Hyrum K Wright <hy...@hyrumwright.org>.
On Wed, Feb 23, 2011 at 12:26 PM, Stefan Sperling <st...@elego.de> wrote:
> On Wed, Feb 23, 2011 at 04:26:52PM +0000, Julian Foad wrote:
>> Hyrum K Wright wrote:
>> > On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
>> > > Stefan Sperling wrote:
>> > >>  So, ideally, we should decouple the
>> > >> concept of a wcroot from the path. We could tie it to a wc_id instead.
>> > >> This would allow us to use a single wc.db to manage several wcroots,
>> > >> one for the parent working copy, and more for any externals within
>> > >> this working copy.
>> >
>> > I've always thought we could handle it a different way: the externals
>> > are just part of the same WC as the directories which contain them,
>> > it's just the repos_id and/or repos_relpath which happen to point to a
>> > disjoint location.  In other words, externals are nothing more than
>> > switched paths, save for the fact that their existence is communicable
>> > to other clients via the properties (whereas "pure" switched paths are
>> > single-client-only).
>> >
>> > Since all the plumbing for switched paths already exists, we should
>> > just be able to reuse it for the externals cases.  In fact, as
>> > currently implemented, switch can do some *really* interesting things,
>> > which Philip could probably better illuminate than me.
>> >
>> > In short, I don't think the answer is a set of wcroots but rather one
>> > wcroot with a set of nodes (possibly pointing to various repos).
>>
>> +1 to that.  In fact, I wrote about that idea a while back:
>>
>>   Subject: [RFC] 'External' and 'Switched': common ground
>>   From: Julian Foad <julian.foad_at_wandisco.com>
>>   Date: Fri, 20 Aug 2010 18:54:09 +0100
>>   <http://svn.haxx.se/dev/archive-2010-08/0529.shtml>
>>
>> There are a few intentional differences in the way external directories
>> (and files) behave, compared with switched nodes, but it looks far more
>> sensible to implement them that way.
>
> We need to preserve 1.6 semantics of how operations affect externals.

Why?  People have been wanting to commit across externals and working
copies for a *long* time.  Putting them all in the same WC gives us
this functionality for free (as well as other currently-not-atomic
operations).  Perserving these same limiting 1.6 semantics is
something of a hard sell.

> When treating externals just like switched paths, how will you make sure
> that wc.db queries run within the context of one WC don't touch things
> within an external WC?
>
> Filtering on the wc_id makes this very easy.
> Having to filter on a repos URL instead, how do you avoid filtering out
> switched paths which are in fact part of the parent WC and not an external?
>
> Or do you want to filter on the values of svn:externals within rows
> returned by the query? Again, that is much harder than just filtering
> on the wc_id, and requires parsing the properties to get at information
> that might even been needed within SQL queries.
>
> If we cannot easily tell the difference between a switched path
> and an external, that is bad design, IMHO, and will lead to problems
> down the road just like we had with file externals in 1.6.
> Let's not repeat that mistake. Externals are special and need their own,
> unique, representation in our design.

I submit that this may not be true.  They have given us so many
headaches precisely because we claim they are special.  Treating
externals as switched nodes will eliminate a number of these special
cases.

(And while I think this is a good discussion to have, I'm hopeful it
doesn't lead to further delays in shipping 1.7.)

-Hyrum

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 04:26:52PM +0000, Julian Foad wrote:
> Hyrum K Wright wrote:
> > On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
> > > Stefan Sperling wrote:
> > >>  So, ideally, we should decouple the
> > >> concept of a wcroot from the path. We could tie it to a wc_id instead.
> > >> This would allow us to use a single wc.db to manage several wcroots,
> > >> one for the parent working copy, and more for any externals within
> > >> this working copy.
> > 
> > I've always thought we could handle it a different way: the externals
> > are just part of the same WC as the directories which contain them,
> > it's just the repos_id and/or repos_relpath which happen to point to a
> > disjoint location.  In other words, externals are nothing more than
> > switched paths, save for the fact that their existence is communicable
> > to other clients via the properties (whereas "pure" switched paths are
> > single-client-only).
> > 
> > Since all the plumbing for switched paths already exists, we should
> > just be able to reuse it for the externals cases.  In fact, as
> > currently implemented, switch can do some *really* interesting things,
> > which Philip could probably better illuminate than me.
> > 
> > In short, I don't think the answer is a set of wcroots but rather one
> > wcroot with a set of nodes (possibly pointing to various repos).
> 
> +1 to that.  In fact, I wrote about that idea a while back:
> 
>   Subject: [RFC] 'External' and 'Switched': common ground
>   From: Julian Foad <julian.foad_at_wandisco.com>
>   Date: Fri, 20 Aug 2010 18:54:09 +0100
>   <http://svn.haxx.se/dev/archive-2010-08/0529.shtml>
> 
> There are a few intentional differences in the way external directories
> (and files) behave, compared with switched nodes, but it looks far more
> sensible to implement them that way.

We need to preserve 1.6 semantics of how operations affect externals.
When treating externals just like switched paths, how will you make sure
that wc.db queries run within the context of one WC don't touch things
within an external WC?

Filtering on the wc_id makes this very easy.
Having to filter on a repos URL instead, how do you avoid filtering out
switched paths which are in fact part of the parent WC and not an external?

Or do you want to filter on the values of svn:externals within rows
returned by the query? Again, that is much harder than just filtering
on the wc_id, and requires parsing the properties to get at information
that might even been needed within SQL queries.

If we cannot easily tell the difference between a switched path
and an external, that is bad design, IMHO, and will lead to problems
down the road just like we had with file externals in 1.6.
Let's not repeat that mistake. Externals are special and need their own,
unique, representation in our design.

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 23, 2011 at 04:26:52PM +0000, Julian Foad wrote:
> Hyrum K Wright wrote:
> > On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
> > > Stefan Sperling wrote:
> > >>  So, ideally, we should decouple the
> > >> concept of a wcroot from the path. We could tie it to a wc_id instead.
> > >> This would allow us to use a single wc.db to manage several wcroots,
> > >> one for the parent working copy, and more for any externals within
> > >> this working copy.
> > 
> > I've always thought we could handle it a different way: the externals
> > are just part of the same WC as the directories which contain them,
> > it's just the repos_id and/or repos_relpath which happen to point to a
> > disjoint location.  In other words, externals are nothing more than
> > switched paths, save for the fact that their existence is communicable
> > to other clients via the properties (whereas "pure" switched paths are
> > single-client-only).
> > 
> > Since all the plumbing for switched paths already exists, we should
> > just be able to reuse it for the externals cases.  In fact, as
> > currently implemented, switch can do some *really* interesting things,
> > which Philip could probably better illuminate than me.
> > 
> > In short, I don't think the answer is a set of wcroots but rather one
> > wcroot with a set of nodes (possibly pointing to various repos).
> 
> +1 to that.  In fact, I wrote about that idea a while back:
> 
>   Subject: [RFC] 'External' and 'Switched': common ground
>   From: Julian Foad <julian.foad_at_wandisco.com>
>   Date: Fri, 20 Aug 2010 18:54:09 +0100
>   <http://svn.haxx.se/dev/archive-2010-08/0529.shtml>
> 
> There are a few intentional differences in the way external directories
> (and files) behave, compared with switched nodes, but it looks far more
> sensible to implement them that way.

We need to preserve 1.6 semantics of how operations affect externals.
When treating externals just like switched paths, how will you make sure
that wc.db queries run within the context of one WC don't touch things
within an external WC?

Filtering on the wc_id makes this very easy.
Having to filter on a repos URL instead, how do you avoid filtering out
switched paths which are in fact part of the parent WC and not an external?

Or do you want to filter on the values of svn:externals within rows
returned by the query? Again, that is much harder than just filtering
on the wc_id, and requires parsing the properties to get at information
that might even been needed within SQL queries.

If we cannot easily tell the difference between a switched path
and an external, that is bad design, IMHO, and will lead to problems
down the road just like we had with file externals in 1.6.
Let's not repeat that mistake. Externals are special and need their own,
unique, representation in our design.

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Julian Foad <ju...@wandisco.com>.
Hyrum K Wright wrote:
> On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
> > Stefan Sperling wrote:
> >>  So, ideally, we should decouple the
> >> concept of a wcroot from the path. We could tie it to a wc_id instead.
> >> This would allow us to use a single wc.db to manage several wcroots,
> >> one for the parent working copy, and more for any externals within
> >> this working copy.
> 
> I've always thought we could handle it a different way: the externals
> are just part of the same WC as the directories which contain them,
> it's just the repos_id and/or repos_relpath which happen to point to a
> disjoint location.  In other words, externals are nothing more than
> switched paths, save for the fact that their existence is communicable
> to other clients via the properties (whereas "pure" switched paths are
> single-client-only).
> 
> Since all the plumbing for switched paths already exists, we should
> just be able to reuse it for the externals cases.  In fact, as
> currently implemented, switch can do some *really* interesting things,
> which Philip could probably better illuminate than me.
> 
> In short, I don't think the answer is a set of wcroots but rather one
> wcroot with a set of nodes (possibly pointing to various repos).

+1 to that.  In fact, I wrote about that idea a while back:

  Subject: [RFC] 'External' and 'Switched': common ground
  From: Julian Foad <julian.foad_at_wandisco.com>
  Date: Fri, 20 Aug 2010 18:54:09 +0100
  <http://svn.haxx.se/dev/archive-2010-08/0529.shtml>

There are a few intentional differences in the way external directories
(and files) behave, compared with switched nodes, but it looks far more
sensible to implement them that way.

- Julian



Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Hyrum K Wright <hy...@hyrumwright.org>.
On Wed, Feb 23, 2011 at 9:53 AM, Julian Foad <ju...@wandisco.com> wrote:
> (I've appended the issue subject to the subject line.)
>
> Stefan Sperling wrote:
>> I filed a new issue today (issue #3818, "fix handling of externals in
>> wc-ng" http://subversion.tigris.org/issues/show_bug.cgi?id=3818).
>>
>> I had a brief chat with sbutler at the elego office before filing
>> this issue. Below are basic ideas we've had for approaching the
>> problem. Feedback appreciated!
>>
>> The basic problem we have in the current design is as follows:
>>
>> Given a local_abspath, there's an ambiguity about which wcroots
>> are associated with it when externals are involved.
>> E.g. given the path was foo/bar/baz, where bar has an svn:externals
>> property that configures an external to be downloaded into the
>> folder 'baz'. When we try to find a wcroot for this path, there are
>> two possible outcomes.  One is the wcroot of the relpath 'foo',
>> the other is the wcroot of the external itself ('baz').
>>
>>      (svn:externals ^/branches/foo baz)
>>          |
>>  .../foo/bar/baz
>>  o----------------wcroot1
>>              o----wcroot2
>>
>> The ambiguity arises because the local_abspath of wcroot2 is also
>> a local relpath within wcroot1.
>
> 'baz' is not a versioned node within wcroot1 so I don't see any
> ambiguity.  I thought the intended interpretation was that each external
> tree is a separate WC, so, working with your example, I would expect
> each abspath to have an unambiguous interpretation as follows:
>
> abspath .../foo         => wcroot1, relpath ".../foo"
> abspath .../foo/bar     => wcroot1, relpath ".../foo/bar"
> abspath .../foo/bar/baz => wcroot2, relpath ""
>
> What is wrong with that approach?  (Let's assume 'baz' is a directory.
> For file externals, this doesn't work so neatly, since a file can't
> currently be a WC root, so 'wcroot2' would have to be interpreted
> differently or faked or something.)
>
> - Julian
>
>
>>  So, ideally, we should decouple the
>> concept of a wcroot from the path. We could tie it to a wc_id instead.
>> This would allow us to use a single wc.db to manage several wcroots,
>> one for the parent working copy, and more for any externals within
>> this working copy.

I've always thought we could handle it a different way: the externals
are just part of the same WC as the directories which contain them,
it's just the repos_id and/or repos_relpath which happen to point to a
disjoint location.  In other words, externals are nothing more than
switched paths, save for the fact that their existence is communicable
to other clients via the properties (whereas "pure" switched paths are
single-client-only).

Since all the plumbing for switched paths already exists, we should
just be able to reuse it for the externals cases.  In fact, as
currently implemented, switch can do some *really* interesting things,
which Philip could probably better illuminate than me.

In short, I don't think the answer is a set of wcroots but rather one
wcroot with a set of nodes (possibly pointing to various repos).

-Hyrum

>> We can look at this set of wcroots as a tree with the root node being
>> the wcroot of the parent working copy and any wcroots for externals
>> within it being children of the root node. This way we can also
>> easily represent externals nested within externals (children of
>> children). I'm thinking about using this tree abstraction in a new API
>> for wcroots in libsvn_wc. It would allow usual tree operations (insert,
>> delete, iterate nodes etc.) to manage wcroots. Every node apart from
>> the root node represents an external.
>>
>> This model could later be extended to support multiple trees of wcroots
>> when we start managing more than one working copy within a single wc.db.
>> (Steve also suggested to use this feature later for storing different
>> versions of the conflicting trees involved in tree conflicts.)
>>
>> To support this tree abstraction within wc.db we'd need a way of
>> identifying a local_relpath as the root of an external, and obtaining
>> the wc_id of this external. Any children of this local_relpath would
>> carry this wc_id. A nested external would work the same way. It changes
>> the wc_id again for a subtree of the external.
>>
>> (Most of?) our existing queries already make use of the wc_id, so they
>> work within the parent or within an external. The calling code can
>> decide which working copy to operate on by passing the appropriate wc_id
>> (or maybe a wcroot obtained from the tree abstraction API).
>>
>> Does anyone see any serious issues with this approach?  Thanks!
>
>
>

Re: initial thoughts on issue #3818: fix handling of externals in wc-ng

Posted by Julian Foad <ju...@wandisco.com>.
(I've appended the issue subject to the subject line.)

Stefan Sperling wrote:
> I filed a new issue today (issue #3818, "fix handling of externals in
> wc-ng" http://subversion.tigris.org/issues/show_bug.cgi?id=3818).
> 
> I had a brief chat with sbutler at the elego office before filing
> this issue. Below are basic ideas we've had for approaching the
> problem. Feedback appreciated!
> 
> The basic problem we have in the current design is as follows:
> 
> Given a local_abspath, there's an ambiguity about which wcroots
> are associated with it when externals are involved.
> E.g. given the path was foo/bar/baz, where bar has an svn:externals
> property that configures an external to be downloaded into the
> folder 'baz'. When we try to find a wcroot for this path, there are
> two possible outcomes.  One is the wcroot of the relpath 'foo',
> the other is the wcroot of the external itself ('baz').
> 
>      (svn:externals ^/branches/foo baz)
>          |
>  .../foo/bar/baz
>  o----------------wcroot1
>              o----wcroot2
> 
> The ambiguity arises because the local_abspath of wcroot2 is also
> a local relpath within wcroot1.

'baz' is not a versioned node within wcroot1 so I don't see any
ambiguity.  I thought the intended interpretation was that each external
tree is a separate WC, so, working with your example, I would expect
each abspath to have an unambiguous interpretation as follows:

abspath .../foo         => wcroot1, relpath ".../foo"
abspath .../foo/bar     => wcroot1, relpath ".../foo/bar"
abspath .../foo/bar/baz => wcroot2, relpath ""

What is wrong with that approach?  (Let's assume 'baz' is a directory.
For file externals, this doesn't work so neatly, since a file can't
currently be a WC root, so 'wcroot2' would have to be interpreted
differently or faked or something.)

- Julian


>  So, ideally, we should decouple the
> concept of a wcroot from the path. We could tie it to a wc_id instead.
> This would allow us to use a single wc.db to manage several wcroots,
> one for the parent working copy, and more for any externals within
> this working copy.
> 
> We can look at this set of wcroots as a tree with the root node being
> the wcroot of the parent working copy and any wcroots for externals
> within it being children of the root node. This way we can also
> easily represent externals nested within externals (children of
> children). I'm thinking about using this tree abstraction in a new API
> for wcroots in libsvn_wc. It would allow usual tree operations (insert,
> delete, iterate nodes etc.) to manage wcroots. Every node apart from
> the root node represents an external.
> 
> This model could later be extended to support multiple trees of wcroots
> when we start managing more than one working copy within a single wc.db.
> (Steve also suggested to use this feature later for storing different
> versions of the conflicting trees involved in tree conflicts.)
> 
> To support this tree abstraction within wc.db we'd need a way of
> identifying a local_relpath as the root of an external, and obtaining
> the wc_id of this external. Any children of this local_relpath would
> carry this wc_id. A nested external would work the same way. It changes
> the wc_id again for a subtree of the external.
> 
> (Most of?) our existing queries already make use of the wc_id, so they
> work within the parent or within an external. The calling code can
> decide which working copy to operate on by passing the appropriate wc_id
> (or maybe a wcroot obtained from the tree abstraction API).
> 
> Does anyone see any serious issues with this approach?  Thanks!