You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2002/04/29 20:09:15 UTC

dump format refinement

A lot of private discussion seems to indicate that everyone really
prefers the second dump format much more than the first.  Here's a
refinement of that format, done by example.


Refinement of proposal #2:  (after discussion with gstein)
=========================

Start with all RFC822-style headers at the top.  The final header will
be a 'Content-length:', followed by the content.  Thus our record
boundaries can be inferred.

The content section will have two implicit parts: a property hash, and
the fulltext.  The division between these two sections will be implied
by the "END" tag at the end of the prophash.  In the case of a
directory node or a revision, only the prophash will be present in the
content.

Here's an example of revision 1422, whereby I added a new directory
"baz", added a new file "bop" inside it, and modified the file "foo.c":

---------------------------

Revision-number: 1422
Content-length: 74

K 6
author
V 7
sussman
K 3
log
V 17
Added two files, changed a third.
END

Node-path: /bar/baz
Node-revision: 1422
Node-kind: dir
Node-action: added
Content-checksum:  oj3eu729
Content-length: 29

K 10
svn:ignore
V 4
TAGS
END

Node-path: /bar/baz/bop
Node-revision: 1422
Node-kind: file
Node-action: added
Content-checksum:  bzz35te7
Content-length: 124

K 12
svn:keywords
V 15
LastChangedDate
K 14
svn:executable
V 2
on
END
Here is the text of the newly added 'bop' file.
Whee.

Node-path: /bar/foo.c
Node-revision: 1422
Node-kind: file
Node-action: added
Content-checksum:  Ae73te7et
Content-length: 105

END
Here is the fulltext of my change to an existing /bar/foo.c.
Notice that this file has no properties.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Apr 29, 2002 at 03:09:15PM -0500, Ben Collins-Sussman wrote:
>...
> Node-path: /bar/baz
> Node-revision: 1422
> Node-kind: dir
> Node-action: added
> Content-checksum:  oj3eu729
> Content-length: 29

That should be Content-MD5 (assuming the checksum will be MD5), which is a
standard (HTTP) header.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Apr 29, 2002 at 04:07:59PM -0500, Ben Collins-Sussman wrote:
> Marcus Comstedt <ma...@mc.pp.se> writes:
> > Ben Collins-Sussman <su...@collab.net> writes:
> > > Indeed, it would be tricky.  I wonder how httpd deals with this when
> > > you GET a file....?
> > 
> > If it's really a file, then it would probably just stat it to get the
> > length.
> 
> I bet mod_dav_svn *does* streamily suck the whole file out of
> libsvn_fs into a tmpfile, and then probably hands that tmpfile to
> httpd.

Nope. We stream the file right into Apache's output filter stack. And as you
found, there is an FS API for fetching the length :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Ben Collins-Sussman <su...@collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:

> Ben Collins-Sussman <su...@collab.net> writes:
> 
> > Indeed, it would be tricky.  I wonder how httpd deals with this when
> > you GET a file....?
> 
> If it's really a file, then it would probably just stat it to get the
> length.

I bet mod_dav_svn *does* streamily suck the whole file out of
libsvn_fs into a tmpfile, and then probably hands that tmpfile to
httpd.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Ben Collins-Sussman <su...@collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:

> Ben Collins-Sussman <su...@collab.net> writes:
> 
> > Indeed, it would be tricky.  I wonder how httpd deals with this when
> > you GET a file....?
> 
> If it's really a file, then it would probably just stat it to get the
> length. 

*doink*

I should read svn_fs.h more closely.  We already have a function for this:

/* Set *LENGTH_P to the length of the file PATH in ROOT, in bytes.  Do
   any necessary temporary allocation in POOL.  */
svn_error_t *svn_fs_file_length (apr_off_t *length_p,
                                 svn_fs_root_t *root,
                                 const char *path,
                                 apr_pool_t *pool);


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Daniel Stenberg <da...@haxx.se>.
On 29 Apr 2002, Marcus Comstedt wrote:

> > Indeed, it would be tricky.  I wonder how httpd deals with this when you
> > GET a file....?
>
> If it's really a file, then it would probably just stat it to get the
> length.  For dynamic content, there is a mechanism in HTTP/1.1 that allows
> you to transfer the content as an unspecified number of chunks, each
> prefixed with its own content-length.  Don't know if any httpd actually
> implements it though.

Chunked transfer-encoding is mandatory for HTTP 1.1, so every server that
claims 1.1 should support it... All the major ones do, afaik.

Details in RFC2616 section 3.6.

-- 
      Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Marcus Comstedt <ma...@mc.pp.se>.
Ben Collins-Sussman <su...@collab.net> writes:

> Indeed, it would be tricky.  I wonder how httpd deals with this when
> you GET a file....?

If it's really a file, then it would probably just stat it to get the
length.  For dynamic content, there is a mechanism in HTTP/1.1 that
allows you to transfer the content as an unspecified number of chunks,
each prefixed with its own content-length.  Don't know if any httpd
actually implements it though.


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Ben Collins-Sussman <su...@collab.net>.
Mark Benedetto King <bk...@answerfriend.com> writes:

> On Mon, Apr 29, 2002 at 03:09:15PM -0500, Ben Collins-Sussman wrote:
> > Start with all RFC822-style headers at the top.  The final header will
> > be a 'Content-length:', followed by the content.  Thus our record
> > boundaries can be inferred.
> 
> Wouldn't this make it difficult to stream? 

Indeed, it would be tricky.  I wonder how httpd deals with this when
you GET a file....?

I suppose a file's contents could be streamily read from the fs,
written to a tmpfile, then stat the tmpfile for a size, then dump the
tmpfile into the dumpfile.  Maybe gstein has a suggestion.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Mark Benedetto King <bk...@answerfriend.com>.
On Mon, Apr 29, 2002 at 03:09:15PM -0500, Ben Collins-Sussman wrote:
> Start with all RFC822-style headers at the top.  The final header will
> be a 'Content-length:', followed by the content.  Thus our record
> boundaries can be inferred.

Wouldn't this make it difficult to stream? 

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Apr 30, 2002 at 07:24:27AM -0500, Ben Collins-Sussman wrote:
>...
> Your proposal is clear:  but the idea of a single node having multiple
> immediate ancestors is a "future" concept that doesn't yet exist in
> svn_fs.h.  
> 
> If we *do* add this concept to our filesystem someday, then I say it's
> appropriate to add this field to our dumpformat.  That's why we're
> going to version our dumpformat after all.  Version 1 of our format
> represents svn 1.0 fs concepts, and will be forward-compatible.  I
> think that the prev-node-path field should be saved for a future
> version of our dumpformat, once our fs actually has this feature.

You wouldn't even need to bump the version number. Adding a header is
forward-compatible. Either it is present (future code), or it isn't (old
code, or future code but not applicable).

Things like copy-from information, and Branko's prev-node-path, can be
omitted from the "headers" in a compatible fashion.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Ben Collins-Sussman <su...@collab.net>.
Branko Čibej <br...@xbc.nu> writes:

> This proposal looks fine, except for one detail: there's no way to
> represent revision history as a DAG. Sure, we can't actually create
> forks and joins in the FS now, but our format should support them.
> 
> I suggest adding another header, Prev-node-path, that tells us what
> the path to this node was in the previous revision. If the path didn't
> change, it can be omitted. If the node forked, you get several new
> nodes with the same Prev-node-path. If it joined, the new node gets
> more than one Prev-node-path header (so it's not unique).

Your proposal is clear:  but the idea of a single node having multiple
immediate ancestors is a "future" concept that doesn't yet exist in
svn_fs.h.  

If we *do* add this concept to our filesystem someday, then I say it's
appropriate to add this field to our dumpformat.  That's why we're
going to version our dumpformat after all.  Version 1 of our format
represents svn 1.0 fs concepts, and will be forward-compatible.  I
think that the prev-node-path field should be saved for a future
version of our dumpformat, once our fs actually has this feature.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: dump format refinement

Posted by Branko Čibej <br...@xbc.nu>.
This proposal looks fine, except for one detail: there's no way to 
represent revision history as a DAG. Sure, we can't actually create 
forks and joins in the FS now, but our format should support them.

I suggest adding another header, Prev-node-path, that tells us what the 
path to this node was in the previous revision. If the path didn't 
change, it can be omitted. If the node forked, you get several new nodes 
with the same Prev-node-path. If it joined, the new node gets more than 
one Prev-node-path header (so it's not unique).

Note that Prev-node-path is separate from copy history.

Hope this was clear enough ...


Ben Collins-Sussman wrote:

>A lot of private discussion seems to indicate that everyone really
>prefers the second dump format much more than the first.  Here's a
>refinement of that format, done by example.
>
>
>Refinement of proposal #2:  (after discussion with gstein)
>=========================
>
>Start with all RFC822-style headers at the top.  The final header will
>be a 'Content-length:', followed by the content.  Thus our record
>boundaries can be inferred.
>
>The content section will have two implicit parts: a property hash, and
>the fulltext.  The division between these two sections will be implied
>by the "END" tag at the end of the prophash.  In the case of a
>directory node or a revision, only the prophash will be present in the
>content.
>
>Here's an example of revision 1422, whereby I added a new directory
>"baz", added a new file "bop" inside it, and modified the file "foo.c":
>
>---------------------------
>
>Revision-number: 1422
>Content-length: 74
>
>K 6
>author
>V 7
>sussman
>K 3
>log
>V 17
>Added two files, changed a third.
>END
>
>Node-path: /bar/baz
>Node-revision: 1422
>Node-kind: dir
>Node-action: added
>Content-checksum:  oj3eu729
>Content-length: 29
>
>K 10
>svn:ignore
>V 4
>TAGS
>END
>
>Node-path: /bar/baz/bop
>Node-revision: 1422
>Node-kind: file
>Node-action: added
>Content-checksum:  bzz35te7
>Content-length: 124
>
>K 12
>svn:keywords
>V 15
>LastChangedDate
>K 14
>svn:executable
>V 2
>on
>END
>Here is the text of the newly added 'bop' file.
>Whee.
>
>Node-path: /bar/foo.c
>Node-revision: 1422
>Node-kind: file
>Node-action: added
>Content-checksum:  Ae73te7et
>Content-length: 105
>
>END
>Here is the fulltext of my change to an existing /bar/foo.c.
>Notice that this file has no properties.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>For additional commands, e-mail: dev-help@subversion.tigris.org
>


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org