You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@lyra.org> on 2001/01/01 00:09:01 UTC

Re: (FS) operational question

On Sun, Dec 31, 2000 at 06:28:48PM -0500, Greg Hudson wrote:
> Greg Stein wrote:
> >> If we do lazy updating on the client, then we get fragmentation. If
> >> we want to update them all, then the server response grows larger.
> 
> (We're talking about the version resources here, not the version in
> the entries file, yes?)

The version resource URLs, yes.

[ quick aside: a version resource is a specific instance of a
  version-controlled resource (VCR) for a given version. e.g. "revision 6 of
  foo.c". we have one VCR ("foo.c") and multiple version resources. The
  version resource URL is handed out by the server to tell the client where
  to fetch the version resource. ]

Assume that the revision number is embedded in the version resource URL.
Since the client cannot rebuild the URL when a revision-change occurs, it
must receive those from the server.

1) we receive new version resource URLs for every file/dir in the WC each
   time a revision is created. obviously, the server response can now grow
   to be quite large.

or

2) we only update version resource URLs for the things that change. since we
   must update version resource URLs and revision numbers in tandem, this
   means that we do not update revision numbers for the things that don't
   change. the side effect is that we now get a scattering of revision
   numbers in the WC, generating a large client-state report during an
   "update" process.

> What's the downside of fragmentation?

The size of the report that details the client-state during an update. We
have to tell the server "here is our state" and the server responds with the
bits things that will need to be updated.

If the WC is all one revision, then we tell the server "I'm at revision 67"
and that's it. If the client is fragmented, then we say "the root is 67,
root/foo is 68, root/foo/bar.c is 73, root/foo/baz.c is 71, ..."

The simple summary: if a revision number is embedded within the version
resource URL, then we end up with large network requests or responses. Pick
one :-)

My proposed solution is to use the ID within the version resource URL. That
would eliminate the need to update them (keeping the server response small),
yet we can still update revision numbers within the WC (keeping the request
size small).

The cost of this change is to allow mod_dav_svn to be able to open nodes by
ID (used during a fetch). During a commit, I'd open a tree for the latest
revision, open the node in question, and validate the provided ID matches
the ID of what I just opened (if not, the client is not up to date).

The nice side effect (which I just realized) is that the client can make
changes against v67 of a file and commit it, even though the repository is
at v1000. If that file hasn't changed, then mod_dav_svn / FS would consider
it up to date since the ID matched.

In the revision-in-the-URL approach, the client would say "I am changing
v67" and the server would need to open *two* nodes (v67 and v1000) and check
whether their IDs match.

> > It's that last sentence I'm not believing... Why does the server
> > response have to get bigger?  After an update, the client knows that
> > *every* entity within the update's "purview" is now at the new
> > revision number.  But only entities that actually changed in the
> > update need space in the server's response.
> 
> > You don't need to update those URLs; libsvn_wc can do it, you don't
> > worry about it.
> 
> libsvn_wc cannot do it; a version resource URL is opaque and cannot be
> operated on by the client.

Correct. Based on the assumption:

> (If we assume Subversion conventions for
> those URLs, breaking interoperability, then we don't need to store
> version URLs in the first place.  Although... our other bit of
> non-interoperability is in "update" as well; we could declare that
> particular operation to be non-interoperable.)

The use of DAV is to promote *future* interoperability. When that future
arrives, is anybody's guess. But I'm thinking it will be sooner rather than
later. I've already had an inquiry from somebody asking how "thick" the
server is because they would like to make their server SVN-compatible (e.g.
our client would operate against their server, too). Granted, they could
also build in specific SVN behaviors, but they would much prefer to stick to
plain old DAV whereever possible.

Having a server that is as close to the DeltaV spec is also goodness because
of the growth in DAV clients.


All that said: I know people are concerned about whether the use of DAV is
impacting SVN's design. In this case, it is to a *very* limited extent. As I
mentioned: I think the only new API needed is to open a node by id. That
function already exists within the FS, but is currently private. I asked Jim
about exposing it once before, but he said the benefit was small (open by
path was nearly as fast as open by ID). There was also something related to
clones, but that doesn't apply in our case: we *don't* want to see a clone.

The extra API doesn't change the model.

Karl also questioned whether the insertion of the ID into the URL locks us
into a particular server model. Absolutely not. That is *exactly* why the
URLs are provided *only* by the server. The client doesn't know the ID is in
there, and it makes no assumptions that imply the ID is in there. If we
change the model on the server, we simply return different version resource
URLs. Simple as that, and the client is none the wiser.

In summary: the ID-based version resource URL requires exposing an API on
the server, and it does not lock us into a particular server model. It
minimizes the request and response sizes for our network operations (we
always say this is the lengthy time and tradeoff other stuff against it;
well, why make it longer than necessary? :-). Using the ID also gives us a
better mapping against the DeltaV model, which promotes future interop.

Shorter summary: we have much benefits, few costs.

My initial query was whether this would work within the FS. Nobody is
considering that, but whether it is "right" or not. Can I stop explaining
why it is right now? And get back to whether the use of ID is feasible for
the FS?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: (FS) operational question

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Dec 31, 2000 at 07:50:25PM -0500, Greg Hudson wrote:
> > If the revision number is to be updated for an entry, then we must
> > also update the version resource URL. If not... We In Big Heap-um
> > Trouble.
> 
> Sorry to continue distracting, but why?  We know that the version
> resource URL is still for the same node-revision as the updated entry,
> by virtue of the fact that the server didn't report an update for that
> file or directory.  What trouble do we get into by updating the entry
> but holding onto a non-canonical version resource URL?

We use it during the commit process. With the old revision in there, it will
appear out of date.

Hmm... but if the server is doing the double-open-node test, then it will
see there isn't a problem.

It seems you're probably right... at commit time, we should be okay, despite
not having the "right" URL. Given that all other ops are read-only, and that
it would read the correct node, then those may be okay also.

Something might turn up, but it seems fine on the surface. However, there
are still modelling issues within DeltaV with the embedded-revision
semantic. The root of the issue is that you check in a change to foo.c, yet
you get new version resources for *other* things. The ideal from the
modelling issue is the embedded-ID semantic -- it matches DeltaV, we always
have the canonical URL (in the update case above), and we end up with the
minimized sizes for the request/response.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: (FS) operational question

Posted by Karl Fogel <kf...@galois.collab.net>.
Greg Stein <gs...@lyra.org> writes:
> Another point about why 1-to-1 mapping of URL to resource is handy: caching
> proxies. Even though v67 and v73 might be the same, the proxy will still
> need to fetch it.

Hmmm... (see below)...

> > The client isn't really using that information; it doesn't get to
> > actually "know" the node revision number.  It's just acting as an
> > information store for the server.
> 
> Agreed.
> 
> > Although we do open ourselves up to
> > the possibility of poorly written clients which irresponsibly dissect
> > the supposedly-opaque version resource URL.
> 
> We'll cover our butts quite a bit because other clients will want to use
> libsvn_client (or even libsvn_ra_dav). But your point is still valid.
> 
> Even so... I happen to have no sympathy for those guys :-)  F'em. :-)

Heh.  I'm beginning to think using the node revision in the Version
Resource URL isn't so icko after all.

Greg S, your call, you understand the issues best here.  Now that I'm
not longer confused between Version Resource URL and entries file
revision numbers, I feel less worried about what goes on the in the VR
URL. :-)

-K

Re: (FS) operational question

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Jan 02, 2001 at 12:34:48AM -0500, Greg Hudson wrote:
> >    A. If we store a revision number in the opaque Version Resource
> >       URL, then it can become out of date with the revision as stored
> >       in the entries file.
> [...]
> > However, it turns out (A) is okay, albeit still not ideal, because
> > that out-of-dateness isn't actually going to hurt anything.
> 
> It might affect interoperability with other DeltaV clients in the long
> term.  Although Subversion thinks in terms of "one revision for the
> entire tree, with allowed variances in the working copy and lost of
> optimizations to minimize the cost of going between revisions," DeltaV
> thinks in terms of independently versioned resources--more like RCS.
> Correct me if I'm wrong, Greg.  (As one of the Gregs, I get to say
> "Greg" without being ambiguous.)

You're right... Greg :-).

Actually, Subversion's concept of "one revision for the entire tree" has a
correspondence in DeltaV: they're called "baselines". DeltaV clients that
understand baselines will be able to work very well against an SVN server.
Baseline "7" maps directly to revision 7 of our tree.

[ there is a way to fetch a baseline by name; we'll assign revision numbers
  as names; if we tag/label a revision, that will actually be another name
  for a baseline; note that when a client ask for -r7, we'll consult the
  baseline to get it. ]

[ it is interesting to note that (it appears) nothing in the DAV code
  (client or server) really cares about revision numbers or their sequential
  nature. The revisions could be "alpha" and "beta" for all it cares.
  post-V1, I want to look into this some more to see just how independent we
  are; if we can manage to make the (entire) client independent, then it
  could work against any DeltaV server that does baselines. ]

> So, in Subversion, when we change file A but not file B, it's natural
> to assume that the canonical most recent revision of file B has
> increased.  But in DeltaV, that is not natural.

Well-stated.

Geoff Clemm has expressed a concern that the "creation" of the extra version
for file B could create an interop problem. I'm not entirely sure that I
agree, but Greg's "natural-ness" is a good way to describe my unease with
the rev of file B.

Another point about why 1-to-1 mapping of URL to resource is handy: caching
proxies. Even though v67 and v73 might be the same, the proxy will still
need to fetch it.

> >    B. But if we store a hard node revision instead... Well, maybe we
> >       can't think of a bad effect up front, but I'm sure we all agree
> >       there's something deeply icky about storing that stuff anywhere
> >       on the client side.  It's internal to the fs and the fs's
> >       immediate callers.  Using it on the client side seems guaranteed
> >       to lead to badness down the road.
> 
> The client isn't really using that information; it doesn't get to
> actually "know" the node revision number.  It's just acting as an
> information store for the server.

Agreed.

> Although we do open ourselves up to
> the possibility of poorly written clients which irresponsibly dissect
> the supposedly-opaque version resource URL.

We'll cover our butts quite a bit because other clients will want to use
libsvn_client (or even libsvn_ra_dav). But your point is still valid.

Even so... I happen to have no sympathy for those guys :-)  F'em. :-)

> > You are fluent in both, but most Subversion developers are only
> > fluent in the first (with the exception of Greg Hudson, maybe).
> 
> I'm not really fluent, just cocky. :)

hehe... Well, you're doing quite well from where I'm sitting. And if you get
out of line, then I'll just smack ya down :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: (FS) operational question

Posted by Karl Fogel <kf...@galois.collab.net>.
Greg Stein <gs...@lyra.org> writes:
> >    B. But if we store a hard node revision instead... Well, maybe we
> >       can't think of a bad effect up front, but I'm sure we all agree
> >       there's something deeply icky about storing that stuff anywhere
> >       on the client side.  It's internal to the fs and the fs's
> >       immediate callers.  Using it on the client side seems guaranteed
> >       to lead to badness down the road.
> 
> This part of "we" doesn't agree it is a problem. The FS exposes a concept of
> unique nodes. These are identified with a thing called an "ID". If the
> identification mechanism ever changes (e.g. the IDs are no longer
> persistent), then we simply update how the version resource URL is
> constructed. Let's say the ID concept is removed; fine, we switch to
> rev/path (or whatever the heck is introduced in its place).
> 
> But the true point is that it doesn't matter what is in that URL, as long as
> the client treats it opaquely. We have different optimizations and tradeoffs
> that can be made based on its construction, but the system still works.

Okay, gotcha.

> The question is still based on the original subject "(FS) operational
> question" -- can we introduce an extra API so that we can access a node more
> directly? Right now, we have: (rev, path) -> ID -> node. The modelling will
> be much more ideal if we can use the ID.

Yah.  Jim will be back on the 3rd, he's best qualified to answer that.

-K

Re: (FS) operational question

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jan 01, 2001 at 08:57:14PM -0600, Karl Fogel wrote:
>...
> What's really happening is that we're running into the long-feared
> Subversion/DAV mismatch.  Let's face it, given Subversion's primitives
> -- revisions and paths -- one wouldn't normally come up with a network
> protocol that looks anything like the one we're using.

We also have a notion of an immutable revision of a file. A new revision is
created during each commit. We do not update other files (since they do not
participate in the commit).

DeltaV has an analogous concept called a "version resource." URLs are used
to find these resources. The ideal situation is to have a 1-to-1 mapping
between URLs and the resources (technically, you could have multiple URLs
identify a given resource, but that poses problems with the DAV:version
property).

>...
> Now we have the first instance where a part of the DAV world really
> threatened to impinge on Subversion.  I had thought that whatever DAV
> needed it could *construct* based on the visible working copy
> metadata, but I was wrong: some of the DAV data is deliberately
> opaque, essentially just a cookie the server hands to you to identify
> which resource you have.  The cookie contains information, but not
> information anyone but the server is technically allowed to decode.
> Thus, the cookie cannot be updated by the client, and will get
> out-of-date if the client bumps local revision numbers even on
> entities that were not mentioned explicitly in the network traffic.
> Is this a fair summary of the situation?

Correct. A "cookie" is a great way to describe the version resource URL (and
a few other URLs the server returns).

> BUT: we're saved this time by the fact that this out-of-dateness turns
> out not to hurt anything (is that a correct understanding of the
> conversation you and Greg H just had?).

So far, it does appear that way. The mapping is not ideal, however, and the
"creation" of URLs for resources which haven't really change could pose
interop issues in the future.

> I just hope nothing more
> serious bites us in the future...

I'm hoping to avoid that, thus my suggestion.

> So, the problems were:
> 
>    A. If we store a revision number in the opaque Version Resource
>       URL, then it can become out of date with the revision as stored
>       in the entries file.

Depends: we can also update the URL to track the changing revision number.
But that has the size issue that I've mentioned. Letting it fall "out of
date" appears to be okay.

>    B. But if we store a hard node revision instead... Well, maybe we
>       can't think of a bad effect up front, but I'm sure we all agree
>       there's something deeply icky about storing that stuff anywhere
>       on the client side.  It's internal to the fs and the fs's
>       immediate callers.  Using it on the client side seems guaranteed
>       to lead to badness down the road.

This part of "we" doesn't agree it is a problem. The FS exposes a concept of
unique nodes. These are identified with a thing called an "ID". If the
identification mechanism ever changes (e.g. the IDs are no longer
persistent), then we simply update how the version resource URL is
constructed. Let's say the ID concept is removed; fine, we switch to
rev/path (or whatever the heck is introduced in its place).

But the true point is that it doesn't matter what is in that URL, as long as
the client treats it opaquely. We have different optimizations and tradeoffs
that can be made based on its construction, but the system still works.

The question is still based on the original subject "(FS) operational
question" -- can we introduce an extra API so that we can access a node more
directly? Right now, we have: (rev, path) -> ID -> node. The modelling will
be much more ideal if we can use the ID.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: (FS) operational question

Posted by Karl Fogel <kf...@galois.collab.net>.
Greg Stein <gs...@lyra.org> writes:
> Sigh...
> 
> THAT is why I want to remove the revision number from the URL!

Sorry if I'm being dense.

A big part of the problem here is that two different languages are
being spoken: Subversion and DAV/DeltaV.  You are fluent in both, but
most Subversion developers are only fluent in the first (with the
exception of Greg Hudson, maybe).  Your quick summary of what a
version resource is helped a lot, though, thanks.

What's really happening is that we're running into the long-feared
Subversion/DAV mismatch.  Let's face it, given Subversion's primitives
-- revisions and paths -- one wouldn't normally come up with a network
protocol that looks anything like the one we're using.  So far, you
have heroically borne the entire burden, carefully bending DAV to work
with what Subversion offers it (I'm not being sarcastic at all here,
by the way, don't know if that's apparent by email).

Now we have the first instance where a part of the DAV world really
threatened to impinge on Subversion.  I had thought that whatever DAV
needed it could *construct* based on the visible working copy
metadata, but I was wrong: some of the DAV data is deliberately
opaque, essentially just a cookie the server hands to you to identify
which resource you have.  The cookie contains information, but not
information anyone but the server is technically allowed to decode.
Thus, the cookie cannot be updated by the client, and will get
out-of-date if the client bumps local revision numbers even on
entities that were not mentioned explicitly in the network traffic.
Is this a fair summary of the situation?

BUT: we're saved this time by the fact that this out-of-dateness turns
out not to hurt anything (is that a correct understanding of the
conversation you and Greg H just had?).  I just hope nothing more
serious bites us in the future...

So, the problems were:

   A. If we store a revision number in the opaque Version Resource
      URL, then it can become out of date with the revision as stored
      in the entries file.

   B. But if we store a hard node revision instead... Well, maybe we
      can't think of a bad effect up front, but I'm sure we all agree
      there's something deeply icky about storing that stuff anywhere
      on the client side.  It's internal to the fs and the fs's
      immediate callers.  Using it on the client side seems guaranteed
      to lead to badness down the road.

However, it turns out (A) is okay, albeit still not ideal, because
that out-of-dateness isn't actually going to hurt anything.

Is this the planet you were looking for?

-K

Re: (FS) operational question

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jan 01, 2001 at 08:08:24PM -0600, Karl Fogel wrote:
> Greg Stein <gs...@lyra.org> writes:
>...
> > If the revision number is to be updated for an entry, then we must also
> > update the version resource URL. If not... We In Big Heap-um Trouble.
> > Therefore, we must respond with all of the new version resource URLs when an
> > update occurs.
> 
> We're already in trouble, because we're recording the same information
> twice: the revision number in the entries file is apparently not
> sufficient, so we're recording it in a version resource URL too??
> 
> This is what "network layer implementation creeping into working copy
> library" means. :-)

Sigh...

THAT is why I want to remove the revision number from the URL!

-g

-- 
Greg Stein, http://www.lyra.org/

Re: (FS) operational question

Posted by Karl Fogel <kf...@galois.collab.net>.
Greg Stein <gs...@lyra.org> writes:
> > But the client-state report is not driven by the version URL; it's
> > driven by the version numbers in the wc's entries files.
> Agreed. And when fragmentation occurs, this report is larger.

But fragmentation of the versions in the entries files won't occur
(beyond what's strictly necessary), because the version numbers of
unchanged entities can be updated along with the changed ones.

To do this, svn_wc_get_update_editor() needs to take revision arg and
a target arguments arg, so it can know what entities were "included"
in the update.  It already takes the former, and can easily take the
latter.

> If the revision number is to be updated for an entry, then we must also
> update the version resource URL. If not... We In Big Heap-um Trouble.
> Therefore, we must respond with all of the new version resource URLs when an
> update occurs.

We're already in trouble, because we're recording the same information
twice: the revision number in the entries file is apparently not
sufficient, so we're recording it in a version resource URL too??

This is what "network layer implementation creeping into working copy
library" means. :-)

Re: (FS) operational question

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Dec 31, 2000 at 07:31:37PM -0500, Greg Hudson wrote:
> > 2) we only update version resource URLs for the things that change. since we
> >    must update version resource URLs and revision numbers in tandem, this
> >    means that we do not update revision numbers for the things that don't
> >    change. the side effect is that we now get a scattering of revision
> >    numbers in the WC, generating a large client-state report during an
> >    "update" process.
> 
> But the client-state report is not driven by the version URL; it's
> driven by the version numbers in the wc's entries files.

Agreed. And when fragmentation occurs, this report is larger.

> (And we
> *were* planning to update all of those, since libsvn_wc can do that
> without a large report from the server, and since we have to make a
> pass over the working directory anyway.  Although I have some concerns
> about file churn and backup volume, personally.)

If the revision number is to be updated for an entry, then we must also
update the version resource URL. If not... We In Big Heap-um Trouble.
Therefore, we must respond with all of the new version resource URLs when an
update occurs.

> > My initial query was whether this would work within the FS. Nobody
> > is considering that, but whether it is "right" or not.
> 
> Well, Jim may be the only person qualified to answer that question at
> the moment, and I think he's away right now.

Grr...


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/