You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2011/09/02 16:30:20 UTC

SQL indices a WC format bump and 1.7

The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7

   SELECT local_relpath, op_depth, presence, kind
   FROM nodes
   WHERE wc_id = ?1 AND parent_relpath = ?2
   GROUP BY local_relpath
   ORDER BY op_depth DESC

performs poorly and doesn't scale well with working copy size.  This
causes recursive functions that use svn_wc__internal_walk_children to be
slow, things like "svn info --depth infinity".  I fixed this on trunk by
changing the query and some C code, see r1164426.

On IRC Bert pointed out that we can fix the problem by introducing a new
index:

   NODES(wc_id, parent_relpath, local_relpath, op_depth)

This improves things dramatically.  If we add this new index we can
revert r1164426, the index provides a slightly larger performance gain
than the query/C code change.

Bert also suggests changing our other indices by adding wc_id and/or
local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
UNIQUE indices are better?

I think that the I_ROOT and I_LOCAL_ABSPATH indices are unnecessary
given that columns they index are defined as UNIQUE. Can anyone confirm
that we don't need indices on UNIQUE columns?

It's possible that we don't need I_EXTERNALS_PARENT as none of the
queries look like they will use it.  Perhaps we should drop it?

So how should we fix the 1.7 performance problem?

- Use r1164426, my non-schema change fix.

- Create a new WC format 30 with the new index.

- Create a new WC format 30 with all the schema changes in the patch
  below.

Changing the WC format would involve auto-upgrading format 29 working
copies.  We need to decide whether we want the minimal format 30 change
in 1.7 before we develop this feature on trunk.


Patch to change indices below.  This is not a complete patch, it doesn't
bump the format or auto-upgrade.  It does pass the regression tests and
improve the performance of recursive info and propset.

Index: subversion/libsvn_wc/wc-metadata.sql
===================================================================
--- subversion/libsvn_wc/wc-metadata.sql	(revision 1164459)
+++ subversion/libsvn_wc/wc-metadata.sql	(working copy)
@@ -58,7 +58,6 @@
 /* Note: a repository (identified by its UUID) may appear at multiple URLs.
    For example, http://example.com/repos/ and https://example.com/repos/.  */
 CREATE INDEX I_UUID ON REPOSITORY (uuid);
-CREATE INDEX I_ROOT ON REPOSITORY (root);
 
 
 /* ------------------------------------------------------------------------- */
@@ -71,7 +70,6 @@
   local_abspath  TEXT UNIQUE
   );
 
-CREATE UNIQUE INDEX I_LOCAL_ABSPATH ON WCROOT (local_abspath);
 
 
 /* ------------------------------------------------------------------------- */
@@ -178,8 +176,10 @@
   PRIMARY KEY (wc_id, local_relpath)
   );
 
-CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
-CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
+CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
+                                                    local_relpath);
+CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
+                                                        local_relpath);
 
 
 /* ------------------------------------------------------------------------- */
@@ -478,7 +478,10 @@
 
   );
 
-CREATE INDEX I_NODES_PARENT ON NODES (wc_id, parent_relpath, op_depth);
+CREATE UNIQUE INDEX I_NODES_PARENT1 ON NODES (wc_id, parent_relpath,
+                                              op_depth, local_relpath);
+CREATE UNIQUE INDEX I_NODES_PARENT2 ON NODES (wc_id, parent_relpath,
+                                              local_relpath, op_depth);
 
 /* Many queries have to filter the nodes table to pick only that version
    of each node with the highest (most "current") op_depth.  This view
@@ -567,7 +570,8 @@
   PRIMARY KEY (wc_id, local_relpath)
 );
 
-CREATE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath);
+CREATE UNIQUE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath,
+                                                     local_relpath);
 CREATE UNIQUE INDEX I_EXTERNALS_DEFINED ON EXTERNALS (wc_id,
                                                       def_local_relpath,
                                                       local_relpath);
@@ -804,8 +808,10 @@
   PRIMARY KEY (wc_id, local_relpath)
   );
 
-CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
-CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
+CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
+                                                    local_relpath);
+CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
+                                                        local_relpath);
 
 INSERT INTO ACTUAL_NODE SELECT
   wc_id, local_relpath, parent_relpath, properties, conflict_old,

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: SQL indices a WC format bump and 1.7

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@qqmail.nl> writes:

> Another option (good or bad) would be to just update the code to create
> format 29 working copies to create different indexes.
>
> Our use of SQL would ensure that we would get the same result with the old
> or new indexes, but working copies created with newer clients would be
> faster than those that use the original format 29.
>
> A future format 30 bump could then upgrade slow 29 working copies to the
> same format 30.
> (This may or may not be possible using our compatibility guarantees)

I suppose it is possible, but it would make it more difficult to respond
to reports that 1.7 is slow.  We would have to determine if the working
copy had the correct indices, either by asking for it to be reproduced
in a new working copy or by querying the wc.db file.


-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

RE: SQL indices a WC format bump and 1.7

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: vrijdag 2 september 2011 16:30
> To: dev@subversion.apache.org
> Subject: SQL indices a WC format bump and 1.7
> 
> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
> 
>    SELECT local_relpath, op_depth, presence, kind
>    FROM nodes
>    WHERE wc_id = ?1 AND parent_relpath = ?2
>    GROUP BY local_relpath
>    ORDER BY op_depth DESC
> 
> performs poorly and doesn't scale well with working copy size.  This
> causes recursive functions that use svn_wc__internal_walk_children to be
> slow, things like "svn info --depth infinity".  I fixed this on trunk by
> changing the query and some C code, see r1164426.
> 
> On IRC Bert pointed out that we can fix the problem by introducing a new
> index:
> 
>    NODES(wc_id, parent_relpath, local_relpath, op_depth)
> 
> This improves things dramatically.  If we add this new index we can
> revert r1164426, the index provides a slightly larger performance gain
> than the query/C code change.
> 
> Bert also suggests changing our other indices by adding wc_id and/or
> local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
> UNIQUE indices are better?
> 
> I think that the I_ROOT and I_LOCAL_ABSPATH indices are unnecessary
> given that columns they index are defined as UNIQUE. Can anyone confirm
> that we don't need indices on UNIQUE columns?
> 
> It's possible that we don't need I_EXTERNALS_PARENT as none of the
> queries look like they will use it.  Perhaps we should drop it?
> 
> So how should we fix the 1.7 performance problem?
> 
> - Use r1164426, my non-schema change fix.
> 
> - Create a new WC format 30 with the new index.
> 
> - Create a new WC format 30 with all the schema changes in the patch
>   below.
> 
> Changing the WC format would involve auto-upgrading format 29 working
> copies.  We need to decide whether we want the minimal format 30 change
> in 1.7 before we develop this feature on trunk.

Another option (good or bad) would be to just update the code to create
format 29 working copies to create different indexes.

Our use of SQL would ensure that we would get the same result with the old
or new indexes, but working copies created with newer clients would be
faster than those that use the original format 29.

A future format 30 bump could then upgrade slow 29 working copies to the
same format 30.
(This may or may not be possible using our compatibility guarantees)


	Bert




Re: SQL indices a WC format bump and 1.7

Posted by Mark Mielke <ma...@mark.mielke.cc>.
On 10/02/2011 03:26 PM, Branko Čibej wrote:
> On 02.09.2011 16:30, Philip Martin wrote:
>> Bert also suggests changing our other indices by adding wc_id and/or
>> local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
>> UNIQUE indices are better?
> Just imagine, if the UNIQUE constraint did not imply an index, every
> INSERT or UPDATE would have to scan the whole table in order to verify
> the constraint. That would be ... less than efficient.
>

Just in case it is useful to consider:

Sophisticated database engines generally have a "planning" phase and a 
"execution" phase. The planning phase takes the query and attempts to 
determine the most efficient plan to execute your query. Part of this 
planning effort involves determining whether they key you are looking up 
has high selectivity or low selectivity. How many tuples in the table 
will have a matching key?

For non-unique indexes, the database engine either needs to guess or it 
needs to check against some statistical analysis results done on the 
table to see whether the key looks like it will have high selectivity or 
low selectivity. Either it could guess wrong, or it could increase the 
planning time.

For unique indexes, it can assume that there will be only 0 or 1 results.

Therefore, if you have a key which is unique, you really should define 
it as such.

I don't know if SQLlite is sophisticated enough for the above to matter 
or not, though. For example, it might assume low selectivity and it 
makes no difference.

The statement about adding key fields to make the key be unique confuses 
me a bit, though. Adding fields to the key will generally make the index 
larger and the lookups slower. In some databases that are able to do 
lookups using only the index and return results from this index - 
including all necessary data for the query (matching fields to start, 
and returning fields at the end) can be a speedup, but I would normally 
assume this was not true until proven that it was true.

Anyways - I'm not familiar with the exact scenario you are talking 
about. Just wishing to help...

-- 
Mark Mielke<ma...@mielke.cc>


Re: SQL indices a WC format bump and 1.7

Posted by Branko Čibej <br...@xbc.nu>.
On 02.09.2011 16:30, Philip Martin wrote:
> Bert also suggests changing our other indices by adding wc_id and/or
> local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
> UNIQUE indices are better?

Just imagine, if the UNIQUE constraint did not imply an index, every
INSERT or UPDATE would have to scan the whole table in order to verify
the constraint. That would be ... less than efficient.

-- Brane


Re: SQL indices a WC format bump and 1.7

Posted by Mark Mielke <ma...@mark.mielke.cc>.
On 09/02/2011 12:02 PM, Mark Phippard wrote:
> On Fri, Sep 2, 2011 at 11:42 AM, Philip Martin
> <ph...@wandisco.com>  wrote:
>> Yes, that's why I think we can delete them.  However we rarely write to
>> these tables so the overhead of having the superfluous index is
>> negligible.  There is no real need to remove them from 1.7 if people
>> would prefer to make a more minimal change for 1.7.
> I do not know about SQLite, but in other databases I have worked with,
> the database engine is intelligent enough to not create anything
> superfluous in this sort of scenario where the index you want to
> create already exists.

My experience is the opposite. Small database systems auto create 
indexes for you. Big database systems treat the DBA as God when it comes 
to what indexes or should not exist - even indexes that would seem very 
desirable to create.

The index can frequently be as big, or sometimes bigger - than the table 
itself. For updates to have to update both indexes multiples the number 
of random writes that need to be done.

Minimize indexes wherever possible. Do not create them if they do not 
need to exist. Do not include columns in the index unless they can be 
demonstrated to provide value in expected user cases.

The sample index that somebody suggested seemed to include both the 
GROUP BY column and the ORDER BY column in the index. I would test 
without these before assuming they provide value in the general case...

-- 
Mark Mielke<ma...@mielke.cc>


RE: SQL indices a WC format bump and 1.7

Posted by Bob Archer <Bo...@amsi.com>.
> Mark Phippard wrote on Fri, Sep 02, 2011 at 12:02:02 -0400:
> > On Fri, Sep 2, 2011 at 11:42 AM, Philip Martin
> > <ph...@wandisco.com> wrote:
> > > Hyrum K Wright <hy...@wandisco.com> writes:
> > >
> > >>   sqlite> select * from sqlite_master where type = 'index' and
> > >> tbl_name = 'WCROOT';
> > >>   index|sqlite_autoindex_WCROOT_1|WCROOT|8|
> > >>   index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX
> > >> I_LOCAL_ABSPATH ON WCROOT (local_abspath)
> > >>
> > >> would both indicate there are two indices on the WCROOT table,
> > >> though we only define one.  I believe one of these indices is due
> > >> to the UNIQUEness of the local_abspath column.
> > >
> > > Yes, that's why I think we can delete them.  However we rarely write
> > > to these tables so the overhead of having the superfluous index is
> > > negligible.  There is no real need to remove them from 1.7 if people
> > > would prefer to make a more minimal change for 1.7.
> >
> > I do not know about SQLite, but in other databases I have worked with,
> > the database engine is intelligent enough to not create anything
> > superfluous in this sort of scenario where the index you want to
> > create already exists.

Most? Hmm.. the ones I use happily do what you tell them tool. For SQL Server I have a tool that checks for duplicate indicies for this reason.

Also, the ones I've worked with always use an index to enforce a unique constraint so making your own is redundant. 

Also, someone mentioned keeping it cause you didn't add to this table much. However, I would disagree. Also, the data storage size would be doubled for no good reason.

BOb


Re: SQL indices a WC format bump and 1.7

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Mark Phippard wrote on Fri, Sep 02, 2011 at 12:02:02 -0400:
> On Fri, Sep 2, 2011 at 11:42 AM, Philip Martin
> <ph...@wandisco.com> wrote:
> > Hyrum K Wright <hy...@wandisco.com> writes:
> >
> >>   sqlite> select * from sqlite_master where type = 'index' and
> >> tbl_name = 'WCROOT';
> >>   index|sqlite_autoindex_WCROOT_1|WCROOT|8|
> >>   index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX I_LOCAL_ABSPATH
> >> ON WCROOT (local_abspath)
> >>
> >> would both indicate there are two indices on the WCROOT table, though
> >> we only define one.  I believe one of these indices is due to the
> >> UNIQUEness of the local_abspath column.
> >
> > Yes, that's why I think we can delete them.  However we rarely write to
> > these tables so the overhead of having the superfluous index is
> > negligible.  There is no real need to remove them from 1.7 if people
> > would prefer to make a more minimal change for 1.7.
> 
> I do not know about SQLite, but in other databases I have worked with,
> the database engine is intelligent enough to not create anything
> superfluous in this sort of scenario where the index you want to
> create already exists.
> 

Apparently that isn't the case for SQLite, then.  I also remember
playing around with EXPLAIN and explictly seeing two indices being
updated.

Re: SQL indices a WC format bump and 1.7

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Sep 2, 2011 at 11:42 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Hyrum K Wright <hy...@wandisco.com> writes:
>
>>   sqlite> select * from sqlite_master where type = 'index' and
>> tbl_name = 'WCROOT';
>>   index|sqlite_autoindex_WCROOT_1|WCROOT|8|
>>   index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX I_LOCAL_ABSPATH
>> ON WCROOT (local_abspath)
>>
>> would both indicate there are two indices on the WCROOT table, though
>> we only define one.  I believe one of these indices is due to the
>> UNIQUEness of the local_abspath column.
>
> Yes, that's why I think we can delete them.  However we rarely write to
> these tables so the overhead of having the superfluous index is
> negligible.  There is no real need to remove them from 1.7 if people
> would prefer to make a more minimal change for 1.7.

I do not know about SQLite, but in other databases I have worked with,
the database engine is intelligent enough to not create anything
superfluous in this sort of scenario where the index you want to
create already exists.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: SQL indices a WC format bump and 1.7

Posted by Philip Martin <ph...@wandisco.com>.
Hyrum K Wright <hy...@wandisco.com> writes:

>   sqlite> select * from sqlite_master where type = 'index' and
> tbl_name = 'WCROOT';
>   index|sqlite_autoindex_WCROOT_1|WCROOT|8|
>   index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX I_LOCAL_ABSPATH
> ON WCROOT (local_abspath)
>
> would both indicate there are two indices on the WCROOT table, though
> we only define one.  I believe one of these indices is due to the
> UNIQUEness of the local_abspath column.

Yes, that's why I think we can delete them.  However we rarely write to
these tables so the overhead of having the superfluous index is
negligible.  There is no real need to remove them from 1.7 if people
would prefer to make a more minimal change for 1.7.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: SQL indices a WC format bump and 1.7

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Fri, Sep 2, 2011 at 9:30 AM, Philip Martin
<ph...@wandisco.com> wrote:
> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>
>   SELECT local_relpath, op_depth, presence, kind
>   FROM nodes
>   WHERE wc_id = ?1 AND parent_relpath = ?2
>   GROUP BY local_relpath
>   ORDER BY op_depth DESC
>
> performs poorly and doesn't scale well with working copy size.  This
> causes recursive functions that use svn_wc__internal_walk_children to be
> slow, things like "svn info --depth infinity".  I fixed this on trunk by
> changing the query and some C code, see r1164426.
>
> On IRC Bert pointed out that we can fix the problem by introducing a new
> index:
>
>   NODES(wc_id, parent_relpath, local_relpath, op_depth)
>
> This improves things dramatically.  If we add this new index we can
> revert r1164426, the index provides a slightly larger performance gain
> than the query/C code change.
>
> Bert also suggests changing our other indices by adding wc_id and/or
> local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
> UNIQUE indices are better?
>
> I think that the I_ROOT and I_LOCAL_ABSPATH indices are unnecessary
> given that columns they index are defined as UNIQUE. Can anyone confirm
> that we don't need indices on UNIQUE columns?

I believe that UNIQUE columns are auto indexed.  For example:
  sqlite> PRAGMA INDEX_LIST('wcroot');
  0|I_LOCAL_ABSPATH|1
  1|sqlite_autoindex_WCROOT_1|1

and

  sqlite> select * from sqlite_master where type = 'index' and
tbl_name = 'WCROOT';
  index|sqlite_autoindex_WCROOT_1|WCROOT|8|
  index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX I_LOCAL_ABSPATH
ON WCROOT (local_abspath)

would both indicate there are two indices on the WCROOT table, though
we only define one.  I believe one of these indices is due to the
UNIQUEness of the local_abspath column.

Also, from http://www.sqlite.org/faq.html#q7 (in describing the
sqlite_master table):
  For automatically created indices (used to implement the PRIMARY KEY
or UNIQUE constraints) the sql field is NULL.

>From this documentation, it would seem that our indices are redundant.

> It's possible that we don't need I_EXTERNALS_PARENT as none of the
> queries look like they will use it.  Perhaps we should drop it?
>
> So how should we fix the 1.7 performance problem?
>
> - Use r1164426, my non-schema change fix.
>
> - Create a new WC format 30 with the new index.
>
> - Create a new WC format 30 with all the schema changes in the patch
>  below.
>
> Changing the WC format would involve auto-upgrading format 29 working
> copies.  We need to decide whether we want the minimal format 30 change
> in 1.7 before we develop this feature on trunk.
>
>
> Patch to change indices below.  This is not a complete patch, it doesn't
> bump the format or auto-upgrade.  It does pass the regression tests and
> improve the performance of recursive info and propset.
>
> Index: subversion/libsvn_wc/wc-metadata.sql
> ===================================================================
> --- subversion/libsvn_wc/wc-metadata.sql        (revision 1164459)
> +++ subversion/libsvn_wc/wc-metadata.sql        (working copy)
> @@ -58,7 +58,6 @@
>  /* Note: a repository (identified by its UUID) may appear at multiple URLs.
>    For example, http://example.com/repos/ and https://example.com/repos/.  */
>  CREATE INDEX I_UUID ON REPOSITORY (uuid);
> -CREATE INDEX I_ROOT ON REPOSITORY (root);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -71,7 +70,6 @@
>   local_abspath  TEXT UNIQUE
>   );
>
> -CREATE UNIQUE INDEX I_LOCAL_ABSPATH ON WCROOT (local_abspath);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -178,8 +176,10 @@
>   PRIMARY KEY (wc_id, local_relpath)
>   );
>
> -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
> -CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
> +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
> +                                                    local_relpath);
> +CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
> +                                                        local_relpath);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -478,7 +478,10 @@
>
>   );
>
> -CREATE INDEX I_NODES_PARENT ON NODES (wc_id, parent_relpath, op_depth);
> +CREATE UNIQUE INDEX I_NODES_PARENT1 ON NODES (wc_id, parent_relpath,
> +                                              op_depth, local_relpath);
> +CREATE UNIQUE INDEX I_NODES_PARENT2 ON NODES (wc_id, parent_relpath,
> +                                              local_relpath, op_depth);
>
>  /* Many queries have to filter the nodes table to pick only that version
>    of each node with the highest (most "current") op_depth.  This view
> @@ -567,7 +570,8 @@
>   PRIMARY KEY (wc_id, local_relpath)
>  );
>
> -CREATE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath);
> +CREATE UNIQUE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath,
> +                                                     local_relpath);
>  CREATE UNIQUE INDEX I_EXTERNALS_DEFINED ON EXTERNALS (wc_id,
>                                                       def_local_relpath,
>                                                       local_relpath);
> @@ -804,8 +808,10 @@
>   PRIMARY KEY (wc_id, local_relpath)
>   );
>
> -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
> -CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
> +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
> +                                                    local_relpath);
> +CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
> +                                                        local_relpath);
>
>  INSERT INTO ACTUAL_NODE SELECT
>   wc_id, local_relpath, parent_relpath, properties, conflict_old,
>
> --
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com
>



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: SQL indices a WC format bump and 1.7

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Sep 2, 2011 at 18:32, Peter Samuelson <pe...@p12n.org> wrote:
>
> [Greg Stein]
>> Since we are selecting only "current nodes" for each local_relpath,
>> then I presume we don't need the apr_hash_get() in there, correct?
>
> Yeah, I confirmed with him on IRC that the wc_db.c change (chiefly
> deleting the extraneous apr_hash_get() call) is a cleanup that could be
> applied separately.  I could go either way.  It is a micro-optimization
> on a set of changes whose original purpose was optimization.

Oh! Whoops. My diff was backwards. I thought the apr_hash_get() was
being *added* ... but that's not the case.

Sure, this little optimization is just fine.

Cheers,
-g

Re: SQL indices a WC format bump and 1.7

Posted by Peter Samuelson <pe...@p12n.org>.
[Greg Stein]
> Since we are selecting only "current nodes" for each local_relpath,
> then I presume we don't need the apr_hash_get() in there, correct?

Yeah, I confirmed with him on IRC that the wc_db.c change (chiefly
deleting the extraneous apr_hash_get() call) is a cleanup that could be
applied separately.  I could go either way.  It is a micro-optimization
on a set of changes whose original purpose was optimization.

-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/

Re: SQL indices a WC format bump and 1.7

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Sep 2, 2011 at 12:35, Philip Martin <ph...@wandisco.com> wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>>
>>    SELECT local_relpath, op_depth, presence, kind
>>    FROM nodes
>>    WHERE wc_id = ?1 AND parent_relpath = ?2
>>    GROUP BY local_relpath
>>    ORDER BY op_depth DESC
>>
>> performs poorly and doesn't scale well with working copy size.
>
> After some more discussion in IRC it seems that the above SQL is
> dubious: the selected columns may not all come from the same row, and we
> may not get the highest op_depth.  Using NODES_CURRENT would fix the
> problem as does my C solution, and the combination of the two is now on
> trunk.  So I'll be proposing r1164426 and r1164614 to fix the 1.7
> performance problem without any schema changes.

There are some differences in wc_db.c after your two changes:

$ svn diff -r HEAD wc_db.c -x -p
Index: wc_db.c
===================================================================
--- wc_db.c	(revision 1164712)
+++ wc_db.c	(working copy)
@@ -7005,8 +7005,7 @@ struct read_children_info_baton_t
   apr_pool_t *result_pool;
 };

-/* What we really want to store about a node.  This relies on the
-   offset of svn_wc__db_info_t being zero. */
+/* What we really want to store about a node */
 struct read_children_info_item_t
 {
   struct svn_wc__db_info_t info;
@@ -7443,6 +7442,7 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
   const char *dir_relpath;
   svn_sqlite__stmt_t *stmt;
   svn_boolean_t have_row;
+  apr_int64_t op_depth;

   SVN_ERR_ASSERT(svn_dirent_is_absolute(dir_abspath));

@@ -7462,10 +7462,13 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
       struct svn_wc__db_walker_info_t *child;
       const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL);
       const char *name = svn_relpath_basename(child_relpath, NULL);
-      apr_int64_t op_depth = svn_sqlite__column_int(stmt, 1);
       svn_error_t *err;

-      child = apr_palloc(result_pool, sizeof(*child));
+      child = apr_hash_get(*nodes, name, APR_HASH_KEY_STRING);
+      if (child == NULL)
+        child = apr_palloc(result_pool, sizeof(*child));
+
+      op_depth = svn_sqlite__column_int(stmt, 1);
       child->status = svn_sqlite__column_token(stmt, 2, presence_map);
       if (op_depth > 0)
         {


Since we are selecting only "current nodes" for each local_relpath,
then I presume we don't need the apr_hash_get() in there, correct?

(and I already know that we can eliminate the apparent diffs around
op_depth with no harm)

I'd really like to put the code back to pre-r1164426 if at all
possible (makes it clearer what we're voting for in STATUS).

Cheers,
-g

Re: SQL indices a WC format bump and 1.7

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>
>    SELECT local_relpath, op_depth, presence, kind
>    FROM nodes
>    WHERE wc_id = ?1 AND parent_relpath = ?2
>    GROUP BY local_relpath
>    ORDER BY op_depth DESC
>
> performs poorly and doesn't scale well with working copy size.

After some more discussion in IRC it seems that the above SQL is
dubious: the selected columns may not all come from the same row, and we
may not get the highest op_depth.  Using NODES_CURRENT would fix the
problem as does my C solution, and the combination of the two is now on
trunk.  So I'll be proposing r1164426 and r1164614 to fix the 1.7
performance problem without any schema changes.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: SQL indices a WC format bump and 1.7

Posted by Branko Čibej <br...@xbc.nu>.
On 03.10.2011 11:03, Philip Martin wrote:
> Branko Čibej <br...@xbc.nu> writes:
>
>> On 02.09.2011 22:56, Greg Stein wrote:
>>> On Fri, Sep 2, 2011 at 13:18, Mark Phippard <ma...@gmail.com> wrote:
>>>> Pardon my ignorance, but in a scenario like this where we want to just
>>>> change some of the indexes, aren't we able to just bump the WC format
>>>> on the fly automatically?  IOW, can't we just make a format 30 with
>>>> all these index changes and have it automatically upgrade any format
>>>> 29 WC it comes across?
>>> We cannot bump the format during the 1.7.x series because 1.7.0 would
>>> not understand format N+1 produced by (say) 1.7.2.
>>>
>>> One thing that we could do: have 1.7.x "understand" all formats from
>>> 29 through 39. We only make changes that are both forward and backward
>>> compatible (e.g. presence/absence of an index does not affect 1.7.x
>>> from using the wc.db). For 1.8.0, we autobump to format 40 with our
>>> various changes.
>>>
>> Wouldn't it just be easier to look at the actual database to see if the
>> indexes are there. Adding an index does not change the database schema,
>> only query performance (hopefully for the better). Therefore it doesn't
>> require a format bump.
> The SQL that would have made use of the index is probably broken (I'm
> not enough of an SQLite expert to say for sure).

Oh, that query is broken, certainly. One should never use GROUP BY and
expect to get consistent values in the grouped fields.

Just pointing out that adding (or even removing) an index does not
require a wc-db version bump.

-- Brane


Re: SQL indices a WC format bump and 1.7

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@xbc.nu> writes:

> On 02.09.2011 22:56, Greg Stein wrote:
>> On Fri, Sep 2, 2011 at 13:18, Mark Phippard <ma...@gmail.com> wrote:
>>> Pardon my ignorance, but in a scenario like this where we want to just
>>> change some of the indexes, aren't we able to just bump the WC format
>>> on the fly automatically?  IOW, can't we just make a format 30 with
>>> all these index changes and have it automatically upgrade any format
>>> 29 WC it comes across?
>> We cannot bump the format during the 1.7.x series because 1.7.0 would
>> not understand format N+1 produced by (say) 1.7.2.
>>
>> One thing that we could do: have 1.7.x "understand" all formats from
>> 29 through 39. We only make changes that are both forward and backward
>> compatible (e.g. presence/absence of an index does not affect 1.7.x
>> from using the wc.db). For 1.8.0, we autobump to format 40 with our
>> various changes.
>>
>
> Wouldn't it just be easier to look at the actual database to see if the
> indexes are there. Adding an index does not change the database schema,
> only query performance (hopefully for the better). Therefore it doesn't
> require a format bump.

The SQL that would have made use of the index is probably broken (I'm
not enough of an SQLite expert to say for sure).  There is a better
solution on trunk--it changed the SQL and does some processing in C and
so does not require an index.  The change has been proposed for backport
to 1.7 but not yet accepted.  It's possible to fix the 1.7 performance
problem by manually adding the index to a 1.7 wc, but that leaves the
broken (?) SQL.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: SQL indices a WC format bump and 1.7

Posted by Branko Čibej <br...@xbc.nu>.
On 02.09.2011 22:56, Greg Stein wrote:
> On Fri, Sep 2, 2011 at 13:18, Mark Phippard <ma...@gmail.com> wrote:
>> Pardon my ignorance, but in a scenario like this where we want to just
>> change some of the indexes, aren't we able to just bump the WC format
>> on the fly automatically?  IOW, can't we just make a format 30 with
>> all these index changes and have it automatically upgrade any format
>> 29 WC it comes across?
> We cannot bump the format during the 1.7.x series because 1.7.0 would
> not understand format N+1 produced by (say) 1.7.2.
>
> One thing that we could do: have 1.7.x "understand" all formats from
> 29 through 39. We only make changes that are both forward and backward
> compatible (e.g. presence/absence of an index does not affect 1.7.x
> from using the wc.db). For 1.8.0, we autobump to format 40 with our
> various changes.
>

Wouldn't it just be easier to look at the actual database to see if the
indexes are there. Adding an index does not change the database schema,
only query performance (hopefully for the better). Therefore it doesn't
require a format bump.

-- Brane

Re: SQL indices a WC format bump and 1.7

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Sep 2, 2011 at 13:18, Mark Phippard <ma...@gmail.com> wrote:
> Pardon my ignorance, but in a scenario like this where we want to just
> change some of the indexes, aren't we able to just bump the WC format
> on the fly automatically?  IOW, can't we just make a format 30 with
> all these index changes and have it automatically upgrade any format
> 29 WC it comes across?

We cannot bump the format during the 1.7.x series because 1.7.0 would
not understand format N+1 produced by (say) 1.7.2.

One thing that we could do: have 1.7.x "understand" all formats from
29 through 39. We only make changes that are both forward and backward
compatible (e.g. presence/absence of an index does not affect 1.7.x
from using the wc.db). For 1.8.0, we autobump to format 40 with our
various changes.

Cheers,
-g

Re: SQL indices a WC format bump and 1.7

Posted by Philip Martin <ph...@wandisco.com>.
Mark Phippard <ma...@gmail.com> writes:

> Pardon my ignorance, but in a scenario like this where we want to just
> change some of the indexes, aren't we able to just bump the WC format
> on the fly automatically?  IOW, can't we just make a format 30 with
> all these index changes and have it automatically upgrade any format
> 29 WC it comes across?

Yes, that is what I was proposing.  The question is how many of the
index changes should go in 1.7.  (It's less important now that I think
we should fix the problem by rewriting the query.)

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: SQL indices a WC format bump and 1.7

Posted by Mark Phippard <ma...@gmail.com>.
Pardon my ignorance, but in a scenario like this where we want to just
change some of the indexes, aren't we able to just bump the WC format
on the fly automatically?  IOW, can't we just make a format 30 with
all these index changes and have it automatically upgrade any format
29 WC it comes across?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/