You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2008/09/11 20:38:38 UTC

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

On Thu, 2008-09-11 at 11:57 -0700, Greg Stein wrote:
> In short, I'm not seeing any reason to modify the definitions. Mike
> explained them rather well, and I don't see any problems with those
> definitions.

Huh.

> On Thu, Sep 11, 2008 at 10:47 AM, Julian Foad
> <ju...@btopenworld.com> wrote:
> >...
> > Getting warmer. How about:
> >
> >  * ACTUAL is the tree on the local disk, ignoring Subversion
> >    administrative directories, and regarding every node as having
> >    no Subversion properties.
> 
> Eh? That's just what is on disk. I'm not sure that is a relevant tree.
> 
> What is on disk is only relevant in the context of a working copy.
> More specifically, how it relates/differs from WORKING.
> 
> Files/dirs present, but not in WORKING: unversioned nodes
> Files/dirs in WORKING, but not present: missing nodes

Maybe you don't have the same definition of "a tree" as I do. I am
assuming we mean the sort of tree that is described by a Subversion
delta editor. A tree of nodes; each node is either a file or a dir; each
node has properties; each dir has 0 or more child nodes; each file has
content which is a blob of 0 or more bytes.

When you say, "Files/dirs present, but not in WORKING: unversioned
nodes", what about them? They are part of the ACTUAL tree? Yes, I say.

When you say, "Files/dirs in WORKING, but not present: missing nodes",
what about them? They are part of the ACTUAL tree? Yes, I say.

And files/dirs that are in WORKING and present on disk as nodes of the
correct type? Yes, I say. How about you?

And files/dirs that are on disk where WORKING says there's a node of the
other type? Yes, I say. How about you?

And the properties of each node are?


> >...
> >> BASE + Subversion-managed changes = WORKING.
> >> WORKING + non-Subversion-managed changes = ACTUAL.
> 
> Yup. Note that WORKING *may* include text-mod flags. If somebody does
> an "svn edit", then a flag will get recorded saying "looks like this
> file was modified" (or is likely to have been). But WORKING is purely
> an admin thing. You have to look at ACTUAL to find *real* text mods.

You're now talking about WORKING including "flags". This is not
impossible: I've wondered whether these "trees" need to be augmented by
bits of metadata like this. So, are you're saying that the term "WORKING
tree" defines of a set of state recorded in the implementation, rather
than defining an abstract tree concept?

> > I was also questioning the intent of defining WORKING as a tree that has
> > the BASE file contents. That seems silly: what useful concept does that
> 
> It doesn't "have them" ... it is just that most of the WORKING tree's
> contents == BASE's contents, simply because they haven't been
> modified. There isn't any real "container" or "superset" or anything.

OK, I think we agree there.

> > represent? It seems to me that it represents an implementation artifact:
> > the set of modifications that Subversion records explicitly in its
> > meta-data rather than the modifications that Subversion scans for
> > dynamically. That's not a distinction of much interest to the higher
> > layers of software.
> 
> WORKING is entirely an admin thing. To find the complete set of
> modifications, you also have to look at the ACTUAL files which
> correspond to WORKING files. (you don't have to examine the entire
> ACTUAL tree! ... sometimes unversioned files are irrelevant)

Ahh... I was envisaging these "kinds of tree" as concepts that would be
visible through the API. That there would be ways to ask through the
API, "what are the differences between our WORKING tree and our ACTUAL
tree?" (so I can remind the user that they need to issue some Subversion
tree-rearrangement commands), or "what is the value of svn:mime-type on
the WORKING version of file 'foo'?" (so I can display it in an
appropriate editor).

That's where I want there to be a clear external concept of "I'm asking
about the user's working version" versus "Now I'm asking the same
question about the pristine version". Whether we need to expose three
trees, to be able to distinguish not only the pristine version but also
between the working version as told to Subversion, and the nodes on disk
as modified outside Subversion, I'm not 100% sure, but it seems
reasonable that we do need to distinguish these.

Clearly we're not yet seeing eye to eye. I hope we're getting a bit
closer.

- Julian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Sep 11, 2008 at 03:27:58PM -0700, Greg Stein wrote:
> On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <ju...@btopenworld.com> wrote:
> > And files/dirs that are on disk where WORKING says there's a node of the
> > other type? Yes, I say. How about you?
> 
> The node is in both WORKING and ACTUAL, but there is now a problem. I
> don't know that we have a name for this kind of change. This isn't
> really unversioned or missing... something else.

I'd put it like this:

  The node in WORKING is obstructed by an incompatible node in ACTUAL.

So we could use "obstructed", or "incompatible", for example.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Greg Stein <gs...@gmail.com>.
On Tue, Sep 16, 2008 at 8:42 AM, Julian Foad <ju...@btopenworld.com> wrote:
>...
>> The design is currently along the lines of (2), and the user of the
>> API will never have to redirect. You use one of the two tree APIs
>> based on what you're looking for.
>
> I'll take another look when we've got something more concrete to look
> at.

Haven't you looked at libsvn_wc/wc_db.h yet? It's quite concrete :-P
I'm experimenting with some client code using that API now. I expect
the underlying datastore API to look much like that, unless I run into
severe problems writing stuff on top of it.

Cheers,
-g

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Julian Foad <ju...@btopenworld.com>.
On Fri, 2008-09-12 at 09:36 -0700, Greg Stein wrote:
> Hey Julian,
> 
> Thanks for all the thinking about this, but I'm just not seeing it as
> being all that complicated. Near the end of this note, you point out
> more or less what I'm thinking:
> 
> * one BASE tree and its API
> * one API to access the WORKING/ACTUAL "tree" in a blended form

Two trees should be fine as long as the APIs are common wherever
possible. For example, I don't want the "diff BASE against REPOS" API to
be completely different from the "diff WORKING/ACTUAL against REPOS"
API. I was looking to make it simpler by defining independent concepts,
not more complicated!

> The WORKING and ACTUAL trees are conceptually different, but are
> generally used together. I've detailed the potential differences in
> wc-ng-design.
> 
> I also tend to disagree with the notion of trying to work with these
> trees independently. All three are tied to a specific path in the
> local filesystem. Given PATH, you will have an associated BASE tree, a
> WORKING tree, and at PATH on the disk, the ACTUAL tree. I don't see a
> need to work with them independently because that will simply never
> happen (nor need to, afaik).

My example didn't strike a chord with you? OK, we'll see how it goes.

> And note that we generally shouldn't try to construct trees (and
> especially not their contents!) in memory since that is unbounded.
> Yes, I know we do, but we should avoid it whenever possible.

Yes - I pointed that out and the relevance of my example didn't depend
on that.

> The design is currently along the lines of (2), and the user of the
> API will never have to redirect. You use one of the two tree APIs
> based on what you're looking for.

I'll take another look when we've got something more concrete to look
at.

Thanks,
- Julian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Greg Stein <gs...@gmail.com>.
Hey Julian,

Thanks for all the thinking about this, but I'm just not seeing it as
being all that complicated. Near the end of this note, you point out
more or less what I'm thinking:

* one BASE tree and its API
* one API to access the WORKING/ACTUAL "tree" in a blended form

The WORKING and ACTUAL trees are conceptually different, but are
generally used together. I've detailed the potential differences in
wc-ng-design.

I also tend to disagree with the notion of trying to work with these
trees independently. All three are tied to a specific path in the
local filesystem. Given PATH, you will have an associated BASE tree, a
WORKING tree, and at PATH on the disk, the ACTUAL tree. I don't see a
need to work with them independently because that will simply never
happen (nor need to, afaik).

And note that we generally shouldn't try to construct trees (and
especially not their contents!) in memory since that is unbounded.
Yes, I know we do, but we should avoid it whenever possible.

The design is currently along the lines of (2), and the user of the
API will never have to redirect. You use one of the two tree APIs
based on what you're looking for.

Cheers,
-g

On Fri, Sep 12, 2008 at 3:38 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Hi Greg.
>
> Here's a bunch more theoretical waffle from me on the subject! Enjoy :-)
>
>
> STAND-ALONE TREES, OR TREES LINKED INTO A WC
>
> One big distinction I now see is this:
>
> When we define the meaning of one kind of Tree (say WORKING), I prefer
> to define it as a stand-alone entity which can answer questions about
> itself. However, I did say "WORKING gets its file content from ACTUAL",
> which is contrary to that. The alternative design is that the concept of
> "WORKING tree" has meaning only when it is embedded in a WC which links
> it to a corresponding BASE tree and a corresponding ACTUAL tree. In that
> case, it can answer questions that involve getting data from its
> corresponding BASE or ACTUAL tree.
>
> I then went on to suggest as an option that ACTUAL could present the
> properties from WORKING. But it would be a bad idea to have each tree
> depend on the other like this, because it would introduce a cyclic
> dependency between those two trees. That doesn't sound too bad when you
> first think about reading from an existing tree, but when you think
> about preparing some modifications, or especially building a new tree
> from scratch, it would get really hairy.
>
> If we define the trees as stand-alone concepts that can exist with or
> without being linked in to a WC, it becomes relatively easy to build a
> new tree in memory, such as from the "dry run" result of a merge. All
> the tree manipulation functions can be used, and we don't have to link
> this dry-run tree into the WC in order to create and examine it. This
> could remove a whole bunch of complexity that is currently in wc-1.0 to
> handle dry runs of certain client-layer operations such as "merge" which
> the WC would otherwise not need to know so much about.
>
>
> On Thu, 2008-09-11 at 15:27 -0700, Greg Stein wrote:
>> On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <ju...@btopenworld.com> wrote:
>> >...
>> > Maybe you don't have the same definition of "a tree" as I do. I am
>> > assuming we mean the sort of tree that is described by a Subversion
>> > delta editor. A tree of nodes; each node is either a file or a dir; each
>> > node has properties; each dir has 0 or more child nodes; each file has
>> > content which is a blob of 0 or more bytes.
>>
>> Sure...
>>
>> > When you say, "Files/dirs present, but not in WORKING: unversioned
>> > nodes", what about them? They are part of the ACTUAL tree? Yes, I say.
>>
>> ACTUAL, yes.
>>
>> > When you say, "Files/dirs in WORKING, but not present: missing nodes",
>> > what about them? They are part of the ACTUAL tree? Yes, I say.
>>
>> No. Those nodes are *missing* from ACTUAL. They should be there since
>> WORKING says they should be. Thus, they are missing.
>
> I agree. Sorry, I made a careless copy-n-paste-o mistake here. So, we
> agree that such files/dirs are NOT part of the ACTUAL tree.
>
>> > And files/dirs that are in WORKING and present on disk as nodes of the
>> > correct type? Yes, I say. How about you?
>>
>> In both WORKING and ACTUAL, yes.
>>
>> > And files/dirs that are on disk where WORKING says there's a node of the
>> > other type? Yes, I say. How about you?
>>
>> The node is in both WORKING and ACTUAL, but there is now a problem. I
>> don't know that we have a name for this kind of change. This isn't
>> really unversioned or missing... something else.
>
> "Obstructed" is a word that we use.
>
> But, in terms of defining the meaning of the tree kind "ACTUAL", I don't
> see a problem with saying that the ACTUAL tree contains a directory at
> path "foo/bar", while the corresponding WORKING tree contains a file at
> path "foo/bar" (or a directory, or a symlink, or nothing).
>
> When we want to describe the CHANGE of node kind, that's when we're
> considering the relationship between two kinds of tree rooted at the
> same path. As far as I'm concerned, I am presently concentrating on the
> definition of one kind of tree. We'll come to expressing relationships
> between different kinds later.
>
>> Also: note that we should be talking about symlinks, too. They are
>> moving to a first-order node type in the new WC library.
>
> OK. Good.
>
>> > And the properties of each node are?
>>
>> Whatever WORKING says about the properties. ACTUAL cannot represent them.
>
> Ah... Two different levels of abstraction, perhaps.
>
> Your first answer, "Whatever WORKING says", is the answer to "from a
> high level point of view, what are the properties of the working node at
> PATH?" Indeed, from this point of view, ACTUAL does not have the
> answer[1], and so the desired answer needs to be fetched from WORKING
> instead. The question is, which layer redirects and fetches the answer
> from WORKING instead: the caller, or the ACTUAL tree?
>
> Let design 1 be: the caller redirects its question. The caller has to
> know that ACTUAL does not have the working properties. The model could
> be that the ACTUAL tree says "You ask me? I tell you there are no
> properties." The caller knows to ignore this answer and go elsewhere if
> it really wants to find the WORKING properties.[2]
>
> Let design 2 be: the ACTUAL tree redirects to the WORKING tree. The
> model of the ACTUAL tree is that it knows the properties, even though
> under the hood it has to go to the corresponding WORKING tree to find
> them. This model is very different because the trees are not
> independent. Whenever we ask a question about an ACTUAL tree there has
> to be a corresponding WORKING tree linked to it, or provided by the
> caller.
>
> Let's implement the "svn add" subcommand, in pseudo-Python, assuming
> design (1).
>
>  # Add the disk node at PATH to the tree TREE, recursively.
>  # Make a full in-memory representation, including file contents.
>  #   (That's not a good example of how to implement for real.)
>  # Give every node in the tree no properties.
>
>  def build_actual_tree(tree, path):
>    disk_node_kind = os.get_node_kind(path)
>    if disk_node_kind == file:
>      tree.add_file(path = target_path,
>                    content = os.readfile(target_path),
>                    properties = {})
>    elif disk_node_kind == dir:
>      tree.add_dir(path = target_path,
>                   properties = {})
>      for child_path in os.readdir(target_path):
>        build_actual_tree(tree, child_path)
>
>  # Take an unversioned "actual" tree NEW_ACTUAL_SUBTREE, and
>  # schedule it for addition in the working copy WC.
>  # Assume NEW_ACTUAL_SUBTREE has no properties, and set the
>  # "working" properties to ones calculated by the auto-props
>  # mechanism.
>
>  def wc.add_unversioned_tree(new_actual_subtree):
>    new_working_subtree = new_actual_subtree.deep_copy()
>    for node in new_working_subtree:
>      assert node.properties == {}
>      node.properties = generate_auto_props(node)
>    new_base_subtree = SvnTreeCreateEmpty()
>    wc.add_subtree(new_base_subtree,
>                   new_working_subtree,
>                   new_actual_subtree)
>
>  # Make the unversioned disk tree at TARGET_PATH become versioned
>  # in the working copy WC which must already include TARGET_PATH's
>  # parent dir as a versioned directory.
>
>  def svn_client_add(wc, target_path):
>    new_actual_subtree = SvnTreeCreateEmpty()
>    build_actual_tree(new_actual_subtree, target_path)
>    wc.add_unversioned_tree(new_actual_subtree)
>
>
> The point I hope it demonstrates is that we can construct and manipulate
> an ACTUAL tree model by itself, and only later link it to a WORKING tree
> and a BASE tree within a WC.
>
> I should repeat the experiment with design (2) and contrast them, but I
> haven't time.
>
>
>> Unversioned nodes (things in ACTUAL, but not WORKING) will (obviously)
>> have no properties.
>
>> >> >> BASE + Subversion-managed changes = WORKING.
>> >> >> WORKING + non-Subversion-managed changes = ACTUAL.
>> >>
>> >> Yup. Note that WORKING *may* include text-mod flags. If somebody does
>> >> an "svn edit", then a flag will get recorded saying "looks like this
>> >> file was modified" (or is likely to have been). But WORKING is purely
>> >> an admin thing. You have to look at ACTUAL to find *real* text mods.
>> >
>> > You're now talking about WORKING including "flags". This is not
>> > impossible: I've wondered whether these "trees" need to be augmented by
>> > bits of metadata like this. So, are you're saying that the term "WORKING
>> > tree" defines of a set of state recorded in the implementation, rather
>> > than defining an abstract tree concept?
>>
>> Not sure what you mean by "recorded in the implementation". The
>> WORKING tree has a set of flags (and other state) that records its
>> delta from BASE. Simple as that.
>
> Right, in the implementation of the WC library with its metadata store.
> But in the MODEL of the working tree, i.e. what the API user sees when
> asking questions about it, the tree consists of only files and
> directories and properties. Well, and some other metadata about it (its
> relationship to the repository etc.), but it should not expose the flags
> that record its delta from BASE. In other words, I would expect an API
> like
>
>  svn_wc_get_property 'MIME type' of path 'A/foo' in WORKING tree ()
>
> to respond with
>
>  "text/plain"
>
> not with
>
>  "same as in BASE"
>
> Basically, I am talking purely from a black-box perspective as a user of
> the WC library, whereas I think you are talking about what goes on
> inside the WC library.
>
>
> [...]
>> > Whether we need to expose three
>> > trees, to be able to distinguish not only the pristine version but also
>> > between the working version as told to Subversion, and the nodes on disk
>> > as modified outside Subversion, I'm not 100% sure, but it seems
>> > reasonable that we do need to distinguish these.
>>
>> Yes. The BASE is very distinct, and has separate APIs to operate on
>> it. The WORKING/ACTUAL is a much more grey boundary, and the API
>> doesn't try to expose them as two entirely separate trees.
>
> I was trying to say we should expose three trees (BASE, WORKING, ACTUAL)
> separately, but you're saying we should expose two (BASE,
> WORKING/ACTUAL). You may well be right. In that case, we need an
> out-of-band (out-of-tree) mechanism for describing the differences
> between WORKING and ACTUAL.
>
>> Definitely seems that it would be a Good Thing to enumerate how these
>> two trees can differ. It is a finite list. I'll update the doc with
>> that.
>>
>> Cheers,
>> -g
>
> [1] assuming that the definitions of WORKING and ACTUAL are, as we have
> mostly been assuming, fairly closely tied to what data is stored where
> in the implementation. For example, saying that ACTUAL does not directly
> "have" the properties because they are not operating-system artifacts.
>
> [2] Or the model could be that the ACTUAL tree says "You ask me for
> properties? Don't be daft. ERROR!"
>
> - Julian
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Julian Foad <ju...@btopenworld.com>.
Hi Greg.

Here's a bunch more theoretical waffle from me on the subject! Enjoy :-)


STAND-ALONE TREES, OR TREES LINKED INTO A WC

One big distinction I now see is this:

When we define the meaning of one kind of Tree (say WORKING), I prefer
to define it as a stand-alone entity which can answer questions about
itself. However, I did say "WORKING gets its file content from ACTUAL",
which is contrary to that. The alternative design is that the concept of
"WORKING tree" has meaning only when it is embedded in a WC which links
it to a corresponding BASE tree and a corresponding ACTUAL tree. In that
case, it can answer questions that involve getting data from its
corresponding BASE or ACTUAL tree.

I then went on to suggest as an option that ACTUAL could present the
properties from WORKING. But it would be a bad idea to have each tree
depend on the other like this, because it would introduce a cyclic
dependency between those two trees. That doesn't sound too bad when you
first think about reading from an existing tree, but when you think
about preparing some modifications, or especially building a new tree
from scratch, it would get really hairy.

If we define the trees as stand-alone concepts that can exist with or
without being linked in to a WC, it becomes relatively easy to build a
new tree in memory, such as from the "dry run" result of a merge. All
the tree manipulation functions can be used, and we don't have to link
this dry-run tree into the WC in order to create and examine it. This
could remove a whole bunch of complexity that is currently in wc-1.0 to
handle dry runs of certain client-layer operations such as "merge" which
the WC would otherwise not need to know so much about.


On Thu, 2008-09-11 at 15:27 -0700, Greg Stein wrote:
> On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <ju...@btopenworld.com> wrote:
> >...
> > Maybe you don't have the same definition of "a tree" as I do. I am
> > assuming we mean the sort of tree that is described by a Subversion
> > delta editor. A tree of nodes; each node is either a file or a dir; each
> > node has properties; each dir has 0 or more child nodes; each file has
> > content which is a blob of 0 or more bytes.
> 
> Sure...
> 
> > When you say, "Files/dirs present, but not in WORKING: unversioned
> > nodes", what about them? They are part of the ACTUAL tree? Yes, I say.
> 
> ACTUAL, yes.
> 
> > When you say, "Files/dirs in WORKING, but not present: missing nodes",
> > what about them? They are part of the ACTUAL tree? Yes, I say.
> 
> No. Those nodes are *missing* from ACTUAL. They should be there since
> WORKING says they should be. Thus, they are missing.

I agree. Sorry, I made a careless copy-n-paste-o mistake here. So, we
agree that such files/dirs are NOT part of the ACTUAL tree.

> > And files/dirs that are in WORKING and present on disk as nodes of the
> > correct type? Yes, I say. How about you?
> 
> In both WORKING and ACTUAL, yes.
> 
> > And files/dirs that are on disk where WORKING says there's a node of the
> > other type? Yes, I say. How about you?
> 
> The node is in both WORKING and ACTUAL, but there is now a problem. I
> don't know that we have a name for this kind of change. This isn't
> really unversioned or missing... something else.

"Obstructed" is a word that we use.

But, in terms of defining the meaning of the tree kind "ACTUAL", I don't
see a problem with saying that the ACTUAL tree contains a directory at
path "foo/bar", while the corresponding WORKING tree contains a file at
path "foo/bar" (or a directory, or a symlink, or nothing).

When we want to describe the CHANGE of node kind, that's when we're
considering the relationship between two kinds of tree rooted at the
same path. As far as I'm concerned, I am presently concentrating on the
definition of one kind of tree. We'll come to expressing relationships
between different kinds later.

> Also: note that we should be talking about symlinks, too. They are
> moving to a first-order node type in the new WC library.

OK. Good.

> > And the properties of each node are?
> 
> Whatever WORKING says about the properties. ACTUAL cannot represent them.

Ah... Two different levels of abstraction, perhaps.

Your first answer, "Whatever WORKING says", is the answer to "from a
high level point of view, what are the properties of the working node at
PATH?" Indeed, from this point of view, ACTUAL does not have the
answer[1], and so the desired answer needs to be fetched from WORKING
instead. The question is, which layer redirects and fetches the answer
from WORKING instead: the caller, or the ACTUAL tree?

Let design 1 be: the caller redirects its question. The caller has to
know that ACTUAL does not have the working properties. The model could
be that the ACTUAL tree says "You ask me? I tell you there are no
properties." The caller knows to ignore this answer and go elsewhere if
it really wants to find the WORKING properties.[2]

Let design 2 be: the ACTUAL tree redirects to the WORKING tree. The
model of the ACTUAL tree is that it knows the properties, even though
under the hood it has to go to the corresponding WORKING tree to find
them. This model is very different because the trees are not
independent. Whenever we ask a question about an ACTUAL tree there has
to be a corresponding WORKING tree linked to it, or provided by the
caller.

Let's implement the "svn add" subcommand, in pseudo-Python, assuming
design (1).

  # Add the disk node at PATH to the tree TREE, recursively.
  # Make a full in-memory representation, including file contents.
  #   (That's not a good example of how to implement for real.)
  # Give every node in the tree no properties.

  def build_actual_tree(tree, path):
    disk_node_kind = os.get_node_kind(path)
    if disk_node_kind == file:
      tree.add_file(path = target_path,
                    content = os.readfile(target_path),
                    properties = {})
    elif disk_node_kind == dir:
      tree.add_dir(path = target_path,
                   properties = {})
      for child_path in os.readdir(target_path):
        build_actual_tree(tree, child_path)

  # Take an unversioned "actual" tree NEW_ACTUAL_SUBTREE, and
  # schedule it for addition in the working copy WC.
  # Assume NEW_ACTUAL_SUBTREE has no properties, and set the
  # "working" properties to ones calculated by the auto-props
  # mechanism.

  def wc.add_unversioned_tree(new_actual_subtree):
    new_working_subtree = new_actual_subtree.deep_copy()
    for node in new_working_subtree:
      assert node.properties == {}
      node.properties = generate_auto_props(node)
    new_base_subtree = SvnTreeCreateEmpty()
    wc.add_subtree(new_base_subtree,
                   new_working_subtree,
                   new_actual_subtree)
 
  # Make the unversioned disk tree at TARGET_PATH become versioned
  # in the working copy WC which must already include TARGET_PATH's
  # parent dir as a versioned directory.

  def svn_client_add(wc, target_path):
    new_actual_subtree = SvnTreeCreateEmpty()
    build_actual_tree(new_actual_subtree, target_path)
    wc.add_unversioned_tree(new_actual_subtree)


The point I hope it demonstrates is that we can construct and manipulate
an ACTUAL tree model by itself, and only later link it to a WORKING tree
and a BASE tree within a WC.

I should repeat the experiment with design (2) and contrast them, but I
haven't time.
      

> Unversioned nodes (things in ACTUAL, but not WORKING) will (obviously)
> have no properties.

> >> >> BASE + Subversion-managed changes = WORKING.
> >> >> WORKING + non-Subversion-managed changes = ACTUAL.
> >>
> >> Yup. Note that WORKING *may* include text-mod flags. If somebody does
> >> an "svn edit", then a flag will get recorded saying "looks like this
> >> file was modified" (or is likely to have been). But WORKING is purely
> >> an admin thing. You have to look at ACTUAL to find *real* text mods.
> >
> > You're now talking about WORKING including "flags". This is not
> > impossible: I've wondered whether these "trees" need to be augmented by
> > bits of metadata like this. So, are you're saying that the term "WORKING
> > tree" defines of a set of state recorded in the implementation, rather
> > than defining an abstract tree concept?
> 
> Not sure what you mean by "recorded in the implementation". The
> WORKING tree has a set of flags (and other state) that records its
> delta from BASE. Simple as that.

Right, in the implementation of the WC library with its metadata store.
But in the MODEL of the working tree, i.e. what the API user sees when
asking questions about it, the tree consists of only files and
directories and properties. Well, and some other metadata about it (its
relationship to the repository etc.), but it should not expose the flags
that record its delta from BASE. In other words, I would expect an API
like

  svn_wc_get_property 'MIME type' of path 'A/foo' in WORKING tree ()

to respond with

  "text/plain"

not with

  "same as in BASE"

Basically, I am talking purely from a black-box perspective as a user of
the WC library, whereas I think you are talking about what goes on
inside the WC library.


[...]
> > Whether we need to expose three
> > trees, to be able to distinguish not only the pristine version but also
> > between the working version as told to Subversion, and the nodes on disk
> > as modified outside Subversion, I'm not 100% sure, but it seems
> > reasonable that we do need to distinguish these.
> 
> Yes. The BASE is very distinct, and has separate APIs to operate on
> it. The WORKING/ACTUAL is a much more grey boundary, and the API
> doesn't try to expose them as two entirely separate trees.

I was trying to say we should expose three trees (BASE, WORKING, ACTUAL)
separately, but you're saying we should expose two (BASE,
WORKING/ACTUAL). You may well be right. In that case, we need an
out-of-band (out-of-tree) mechanism for describing the differences
between WORKING and ACTUAL.

> Definitely seems that it would be a Good Thing to enumerate how these
> two trees can differ. It is a finite list. I'll update the doc with
> that.
> 
> Cheers,
> -g

[1] assuming that the definitions of WORKING and ACTUAL are, as we have
mostly been assuming, fairly closely tied to what data is stored where
in the implementation. For example, saying that ACTUAL does not directly
"have" the properties because they are not operating-system artifacts.

[2] Or the model could be that the ACTUAL tree says "You ask me for
properties? Don't be daft. ERROR!"

- Julian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: WC-NG: the trees BASE, WORKING and ACTUAL [was: svn commit: r33021 - branches/explore-wc/subversion/libsvn_wc]

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Sep 11, 2008 at 1:38 PM, Julian Foad <ju...@btopenworld.com> wrote:
>...
> Maybe you don't have the same definition of "a tree" as I do. I am
> assuming we mean the sort of tree that is described by a Subversion
> delta editor. A tree of nodes; each node is either a file or a dir; each
> node has properties; each dir has 0 or more child nodes; each file has
> content which is a blob of 0 or more bytes.

Sure...

> When you say, "Files/dirs present, but not in WORKING: unversioned
> nodes", what about them? They are part of the ACTUAL tree? Yes, I say.

ACTUAL, yes.

> When you say, "Files/dirs in WORKING, but not present: missing nodes",
> what about them? They are part of the ACTUAL tree? Yes, I say.

No. Those nodes are *missing* from ACTUAL. They should be there since
WORKING says they should be. Thus, they are missing.

> And files/dirs that are in WORKING and present on disk as nodes of the
> correct type? Yes, I say. How about you?

In both WORKING and ACTUAL, yes.

> And files/dirs that are on disk where WORKING says there's a node of the
> other type? Yes, I say. How about you?

The node is in both WORKING and ACTUAL, but there is now a problem. I
don't know that we have a name for this kind of change. This isn't
really unversioned or missing... something else.

Also: note that we should be talking about symlinks, too. They are
moving to a first-order node type in the new WC library.

> And the properties of each node are?

Whatever WORKING says about the properties. ACTUAL cannot represent them.

Unversioned nodes (things in ACTUAL, but not WORKING) will (obviously)
have no properties.

>> >> BASE + Subversion-managed changes = WORKING.
>> >> WORKING + non-Subversion-managed changes = ACTUAL.
>>
>> Yup. Note that WORKING *may* include text-mod flags. If somebody does
>> an "svn edit", then a flag will get recorded saying "looks like this
>> file was modified" (or is likely to have been). But WORKING is purely
>> an admin thing. You have to look at ACTUAL to find *real* text mods.
>
> You're now talking about WORKING including "flags". This is not
> impossible: I've wondered whether these "trees" need to be augmented by
> bits of metadata like this. So, are you're saying that the term "WORKING
> tree" defines of a set of state recorded in the implementation, rather
> than defining an abstract tree concept?

Not sure what you mean by "recorded in the implementation". The
WORKING tree has a set of flags (and other state) that records its
delta from BASE. Simple as that.

>...
>> > represent? It seems to me that it represents an implementation artifact:
>> > the set of modifications that Subversion records explicitly in its
>> > meta-data rather than the modifications that Subversion scans for
>> > dynamically. That's not a distinction of much interest to the higher
>> > layers of software.
>>
>> WORKING is entirely an admin thing. To find the complete set of
>> modifications, you also have to look at the ACTUAL files which
>> correspond to WORKING files. (you don't have to examine the entire
>> ACTUAL tree! ... sometimes unversioned files are irrelevant)
>
> Ahh... I was envisaging these "kinds of tree" as concepts that would be
> visible through the API. That there would be ways to ask through the
> API, "what are the differences between our WORKING tree and our ACTUAL
> tree?" (so I can remind the user that they need to issue some Subversion
> tree-rearrangement commands), or "what is the value of svn:mime-type on
> the WORKING version of file 'foo'?" (so I can display it in an
> appropriate editor).
>
> That's where I want there to be a clear external concept of "I'm asking
> about the user's working version" versus "Now I'm asking the same
> question about the pristine version".

Right. And the API embodies that. There are functions with "_base_" in
the name. Those operate *only* on the BASE tree (aka "pristine").

There are other APIs that operate on the WORKING/ACTUAL tree. The line
here gets a bit fuzzier. The ACTUAL tree can only differ from WORKING
in very limited ways. And the idea of a "modified file in the WORKING
tree" comes from the ACTUAL tree: that is where the contents are
located, where they get modified, and where the state information is
recorded.

> Whether we need to expose three
> trees, to be able to distinguish not only the pristine version but also
> between the working version as told to Subversion, and the nodes on disk
> as modified outside Subversion, I'm not 100% sure, but it seems
> reasonable that we do need to distinguish these.

Yes. The BASE is very distinct, and has separate APIs to operate on
it. The WORKING/ACTUAL is a much more grey boundary, and the API
doesn't try to expose them as two entirely separate trees.

Definitely seems that it would be a Good Thing to enumerate how these
two trees can differ. It is a finite list. I'll update the doc with
that.

Cheers,
-g

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org