You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bert Huijben <be...@vmoo.com> on 2010/08/18 20:05:50 UTC

RE: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data


> -----Original Message-----
> From: philip@apache.org [mailto:philip@apache.org]
> Sent: woensdag 18 augustus 2010 12:17
> To: commits@subversion.apache.org
> Subject: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data
> 
> Author: philip
> Date: Wed Aug 18 19:16:59 2010
> New Revision: 986865
> 
> URL: http://svn.apache.org/viewvc?rev=986865&view=rev
> Log:
> Some initial thoughts about NODE_DATA from the Sheffield meeting.
> 
> * notes/wc-ng/node-data: New.
> 
> Added:
>     subversion/trunk/notes/wc-ng/node-data   (with props)
> 
> Added: subversion/trunk/notes/wc-ng/node-data
> URL: http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/node-
> data?rev=986865&view=auto
> ==========================================================
> ====================
> --- subversion/trunk/notes/wc-ng/node-data (added)
> +++ subversion/trunk/notes/wc-ng/node-data Wed Aug 18 19:16:59 2010
> @@ -0,0 +1,255 @@
> +NODE_DATA Design (Sheffield 2010-08-18)
> +=======================================
> +
> +Essentially it replaces BASE_NODE and WORKING_NODE by combining all
> +the existing columns with a new op_depth column where op_depth == 0 is
> +the old BASE_NODE and op_depth != 0 is the old WORKING_NODE.	

I don't think it *replaces* BASE_NODE and WORKING_NODE. It will contain data for both, but it doesn't replace these tables.

Maybe it can replace WORKING_NODE, but BASE_NODE has more information than the columns you list here. Thinks like the repos_relpath and copyfrom_* are only defined on BASE and/or an operation root.

That is why they are still modeled to stay on BASE_NODE and WORKING_NODE. And last time I looked at the design, translated_size and last_mod_time (used for optimizing away comparisions from things like 'svn status') were still on BASE and WORKING as they are only relevant for nodes that are in the wc.

> +
> +  Category     Columns
> +
> +  indexing:    wc_id, local_relpath, parent_relpath, op_depth
> +  status:      presence
> +  node-rev:    repos_id, repos_relpath, revnum
> +  content:     kind, properties, depth, target, checksum
> +  last-change: changed_rev, changed_date, changed_author
> +  wc-cache:    translated_size, last_mod_time
> +  misc:        dav_cache, file_external
> +
> +When op_depth == 0 the node-rev columns represent the checked-out
> +repository node, otherwise they represent the copyfrom node.
> +
> +Presence has the same six values as BASE_NODE/WORKING_NODE:
> normal,
> +incomplete, absent, excluded, not-present, base-deleted.  There are
> +some presence/op_depth constraints, e.g. base-deleted is not valid for
> +op_depth 0 and absent is not valid for op_depth != 0.
> +

<snip>

	Bert

Re: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@vmoo.com> writes:

> How would this handle deleted nodes in one layer (then some overlays) and
> then calling _read_children(). I think that would become a union/select over
> multiple layers? We already had some performance issues there in the past
> and I hope this only makes this query easier. (SELECT DISTINCT name where
> parent_relpath=? or something)
>
> Before this new idea I expected that we didn't have to query the NODE_DATA
> if you were just querying _read_info() for kind and status. So for those two
> most common fields I didn't expect any slowdown over the current model. 
> With moving everything in one table we will need the sqlite index for
> optimization in a few more cases to keep the same speed. (I think SQLite can
> handle this for us as one of the nice features of using a real database, but
> nevertheless, I think we should try to verify this before moving everything
> into one table)

I'm not an SQL expert, much less an SQLite expert, however BASE_NODE
is still available by adding op_depth=0 to the query.  WORKING_NODE is
a bit more complicated as one needs to get the biggest op_depth>0, so
select op_depth>0, order by op_depth and limit to 1.  Obviously we
will have to include op_depth in the SQLite index.

In cases such as _read_info where both BASE_NODE and WORKING_NODE are
required we can ask for the biggest op_depth first and if this turns
out to be zero then we find out that there is no WORKING_NODE and get
the BASE_NODE with one query.  For unmodified nodes this might be
faster than separate BASE/WORKING.

I'm not sure how _read_children would be affected.  SELECT DISTINCT
probably allows us to count them, but I don't know how to construct
the query to return the greatest op_depth for each name.

-- 
Philip

RE: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data

Posted by Bert Huijben <be...@vmoo.com>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: donderdag 19 augustus 2010 3:39
> To: Greg Stein
> Cc: Bert Huijben; dev@subversion.apache.org; philip@apache.org
> Subject: Re: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-
> data
> 
> Greg Stein <gs...@gmail.com> writes:
> 
> > But that said, there is an argument for combining all three conceptual
> > tables into one. Is that was you guys were suggesting?
> 
> Yes.  The tables are so similar.  For example, base_node's
> repos_id/repos_relpath/revnum and the working_node's
> copyfrom_id/copyfrom_relpath/copyfrom_revnum and both a sort of
> "repos-node-rev".  For op_depth 0 the repos-node-rev is always set,
> there is pristine content and it's the same node in the repository and
> wc.  For other op_depth the repos-node-rev is optional, copies have
> it, adds don't; but when it exists it means much the same: the node
> has pristine content.  op_depth tells us whether the repos-node-rev is
> a copy or a base and that's exactly what op_depth is for.
> 
> Now op_depth 0 can be split out into a separate base_node table, our
> current model, but during our meeting we were wondering if that is
> necessary or useful.
> 
> We do have to have all fields at all op_depth, and some are not always
> valid.  dav_cache only applies to op_depth 0, translated_size only
> applies the the greatest op_depth for any local_relpath, etc.; but
> most of the fields are common.  Even dav_cache might apply to higher
> levels, perhaps it could be useful for the copyfrom?

How would this handle deleted nodes in one layer (then some overlays) and
then calling _read_children(). I think that would become a union/select over
multiple layers? We already had some performance issues there in the past
and I hope this only makes this query easier. (SELECT DISTINCT name where
parent_relpath=? or something)

Before this new idea I expected that we didn't have to query the NODE_DATA
if you were just querying _read_info() for kind and status. So for those two
most common fields I didn't expect any slowdown over the current model. 
With moving everything in one table we will need the sqlite index for
optimization in a few more cases to keep the same speed. (I think SQLite can
handle this for us as one of the nice features of using a real database, but
nevertheless, I think we should try to verify this before moving everything
into one table)

	Bert

Re: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data

Posted by Philip Martin <ph...@wandisco.com>.
Greg Stein <gs...@gmail.com> writes:

> But that said, there is an argument for combining all three conceptual
> tables into one. Is that was you guys were suggesting?

Yes.  The tables are so similar.  For example, base_node's
repos_id/repos_relpath/revnum and the working_node's
copyfrom_id/copyfrom_relpath/copyfrom_revnum and both a sort of
"repos-node-rev".  For op_depth 0 the repos-node-rev is always set,
there is pristine content and it's the same node in the repository and
wc.  For other op_depth the repos-node-rev is optional, copies have
it, adds don't; but when it exists it means much the same: the node
has pristine content.  op_depth tells us whether the repos-node-rev is
a copy or a base and that's exactly what op_depth is for.

Now op_depth 0 can be split out into a separate base_node table, our
current model, but during our meeting we were wondering if that is
necessary or useful.

We do have to have all fields at all op_depth, and some are not always
valid.  dav_cache only applies to op_depth 0, translated_size only
applies the the greatest op_depth for any local_relpath, etc.; but
most of the fields are common.  Even dav_cache might apply to higher
levels, perhaps it could be useful for the copyfrom?

-- 
Philip

Re: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Aug 18, 2010 at 16:05, Bert Huijben <be...@vmoo.com> wrote:
>> -----Original Message-----
>> From: philip@apache.org [mailto:philip@apache.org]
>> Sent: woensdag 18 augustus 2010 12:17
>> To: commits@subversion.apache.org
>> Subject: svn commit: r986865 - /subversion/trunk/notes/wc-ng/node-data
>>
>> Author: philip
>> Date: Wed Aug 18 19:16:59 2010
>> New Revision: 986865
>>
>> URL: http://svn.apache.org/viewvc?rev=986865&view=rev
>> Log:
>> Some initial thoughts about NODE_DATA from the Sheffield meeting.
>>
>> * notes/wc-ng/node-data: New.
>>
>> Added:
>>     subversion/trunk/notes/wc-ng/node-data   (with props)
>>
>> Added: subversion/trunk/notes/wc-ng/node-data
>> URL: http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/node-
>> data?rev=986865&view=auto
>> ==========================================================
>> ====================
>> --- subversion/trunk/notes/wc-ng/node-data (added)
>> +++ subversion/trunk/notes/wc-ng/node-data Wed Aug 18 19:16:59 2010
>> @@ -0,0 +1,255 @@
>> +NODE_DATA Design (Sheffield 2010-08-18)
>> +=======================================
>> +
>> +Essentially it replaces BASE_NODE and WORKING_NODE by combining all
>> +the existing columns with a new op_depth column where op_depth == 0 is
>> +the old BASE_NODE and op_depth != 0 is the old WORKING_NODE.
>
> I don't think it *replaces* BASE_NODE and WORKING_NODE. It will contain data for both, but it doesn't replace these tables.
>
> Maybe it can replace WORKING_NODE, but BASE_NODE has more information than the columns you list here. Thinks like the repos_relpath and copyfrom_* are only defined on BASE and/or an operation root.
>
> That is why they are still modeled to stay on BASE_NODE and WORKING_NODE. And last time I looked at the design, translated_size and last_mod_time (used for optimizing away comparisions from things like 'svn status') were still on BASE and WORKING as they are only relevant for nodes that are in the wc.
>

Right. dav_cache is another. copyfrom_* is arguable, as moving those
into NODE_DATA would better support a copy of a mixed-rev working copy
(and in the future, mixed-repos).

But that said, there is an argument for combining all three conceptual
tables into one. Is that was you guys were suggesting?

Cheers,
-g