You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@subversion.apache.org by da...@apache.org on 2012/03/04 21:14:01 UTC

svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Author: danielsh
Date: Sun Mar  4 20:14:01 2012
New Revision: 1296868

URL: http://svn.apache.org/viewvc?rev=1296868&view=rev
Log:
* subversion/libsvn_fs_fs/rep-cache-db.sql
  (I_HASH): New index.

Suggested by: Trent Nelson <tr...@snakebite.org>

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Modified: subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql?rev=1296868&r1=1296867&r2=1296868&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql Sun Mar  4 20:14:01 2012
@@ -33,6 +33,11 @@ CREATE TABLE rep_cache (
   expanded_size INTEGER NOT NULL
   );
 
+/* There isn't an implicit index on a TEXT column, so create an explicit one. */
+CREATE INDEX I_HASH on REP_CACHE (hash);
+
+/* The index didn't exist until 1.7.5; therefore, some USER_VERSION=1
+   rep-cache.db files out there DO NOT contain an I_HASH index. */
 PRAGMA USER_VERSION = 1;

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Daniel Shahaf <da...@elego.de>.

Trent Nelson wrote on Wed, Mar 14, 2012 at 15:05:20 -0700:
> On 3/14/12 3:40 PM, "Daniel Shahaf" <da...@elego.de> wrote:
> 
> >Philip: I recalled last year's discussions about implied indexes, but
> >between Trent's reported observations on IRC and a back-of-the-envelope
> >test with sqlite3(1) I was led to believe that an implied index does not
> >get created for in this case (due to the TEXT column, as my comment
> >says).
> >
> >I'm more than happy to revert this on trunk (if it hasn't been already)
> >assuming it's indeed superfluous.
> >
> >Trent -- have you looked into things from your end yet?  Can you confirm
> >or deny the hypothesis that the explicit INDEX was necessary in your
> >environment?
> 
> Try as I might, I can't reproduce the original behavior that set us off
> down this superfluous index route.  +1 to revert; I was wrong.
> 

Done.

Thanks to Philip for spotting the bug.

Daniel

> 	Trent.
>

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Trent Nelson <tr...@snakebite.org>.

On 3/14/12 3:40 PM, "Daniel Shahaf" <da...@elego.de> wrote:

>Philip: I recalled last year's discussions about implied indexes, but
>between Trent's reported observations on IRC and a back-of-the-envelope
>test with sqlite3(1) I was led to believe that an implied index does not
>get created for in this case (due to the TEXT column, as my comment
>says).
>
>I'm more than happy to revert this on trunk (if it hasn't been already)
>assuming it's indeed superfluous.
>
>Trent -- have you looked into things from your end yet?  Can you confirm
>or deny the hypothesis that the explicit INDEX was necessary in your
>environment?

Try as I might, I can't reproduce the original behavior that set us off
down this superfluous index route.  +1 to revert; I was wrong.

	Trent.

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Daniel Shahaf <da...@elego.de>.

Philip: I recalled last year's discussions about implied indexes, but
between Trent's reported observations on IRC and a back-of-the-envelope
test with sqlite3(1) I was led to believe that an implied index does not
get created for in this case (due to the TEXT column, as my comment
says).

I'm more than happy to revert this on trunk (if it hasn't been already)
assuming it's indeed superfluous.

Trent -- have you looked into things from your end yet?  Can you confirm
or deny the hypothesis that the explicit INDEX was necessary in your
environment?

Trent Nelson wrote on Tue, Mar 06, 2012 at 12:11:57 -0800:
> 
> 
> On 3/6/12 5:35 PM, "Philip Martin" <ph...@wandisco.com> wrote:
> 
> >Philip Martin <ph...@wandisco.com> writes:
> >
> >> It may be TEXT but it is also PRIMARY KEY and according to the SQLite
> >> docs:
> >>
> >> http://sqlite.org/lang_createtable.html
> >>
> >>    INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY
> >>    constraints are implemented by creating an index in the database (in
> >>    the same way as a "CREATE UNIQUE INDEX" statement would). Such an
> >>    index is used like any other index in the database to optimize
> >>    queries. As a result, there often no advantage (but significant
> >>    overhead) in creating an index on a set of columns that are already
> >>    collectively subject to a UNIQUE or PRIMARY KEY constraint.
> >
> >If I create a repository using 1.7 and look at the rep-cache.db I see:
> >
> >$ sqlite3 rep-cache.db "select * from sqlite_master" | grep index
> >index|sqlite_autoindex_rep_cache_1|rep_cache|4|
> >
> >An index has been created automatically and so adding another index can
> >only slow things down.
> 
> Yeah this is interesting; you've provided a wealth of information contrary
> to everything I said to Daniel yesterday.  I did some SQLite index tuning
> a month ago and I could have sworn having a TEXT field as primary key
> inhibited index creation -- but, as you've demonstrated with your explain
> plan and sqlite_master query, that's clearly not the case.
> 
> To rewind things a little: I was manually repairing my asf mirror
> yesterday that happened to sync earlier last week at the wrong time and
> picked up a dodgy revision.  I manually deleted the affected rows from
> rep-cache.db and noticed that all my select queries seemed to be taking an
> inordinate amount of time to complete (5s+ at least).  I explain plan the
> problematic query, saw no indexes, created one manually, and wallah,
> problem solved, all queries returned instantly and explain plan showed
> index usage.
> 
> I've got a bunch of zfs snapshots of the repo I can have a play around
> with tomorrow to see if I can replicate.
> 
> Plausible theories off the top of my head:
> 
> a) There's something different with my env and indexes were not being
> created automatically.  (I do remember having a lot of trouble getting the
> asf repo to load from dumps and then complete from subsequent syncs.)
> b) There's something else going on.
> 
> (Plausible theories?  Yes.  Good theories?  Debatable ;-)
> 
> 
> 	Trent.
>

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> If I create a repository using 1.7 and look at the rep-cache.db I see:
>
> $ sqlite3 rep-cache.db "select * from sqlite_master" | grep index
> index|sqlite_autoindex_rep_cache_1|rep_cache|4|

I'm using SQLite 3.7.8 on Linux.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Trent Nelson <tr...@snakebite.org>.

On 3/6/12 5:35 PM, "Philip Martin" <ph...@wandisco.com> wrote:

>Philip Martin <ph...@wandisco.com> writes:
>
>> It may be TEXT but it is also PRIMARY KEY and according to the SQLite
>> docs:
>>
>> http://sqlite.org/lang_createtable.html
>>
>>    INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY
>>    constraints are implemented by creating an index in the database (in
>>    the same way as a "CREATE UNIQUE INDEX" statement would). Such an
>>    index is used like any other index in the database to optimize
>>    queries. As a result, there often no advantage (but significant
>>    overhead) in creating an index on a set of columns that are already
>>    collectively subject to a UNIQUE or PRIMARY KEY constraint.
>
>If I create a repository using 1.7 and look at the rep-cache.db I see:
>
>$ sqlite3 rep-cache.db "select * from sqlite_master" | grep index
>index|sqlite_autoindex_rep_cache_1|rep_cache|4|
>
>An index has been created automatically and so adding another index can
>only slow things down.

Yeah this is interesting; you've provided a wealth of information contrary
to everything I said to Daniel yesterday.  I did some SQLite index tuning
a month ago and I could have sworn having a TEXT field as primary key
inhibited index creation -- but, as you've demonstrated with your explain
plan and sqlite_master query, that's clearly not the case.

To rewind things a little: I was manually repairing my asf mirror
yesterday that happened to sync earlier last week at the wrong time and
picked up a dodgy revision.  I manually deleted the affected rows from
rep-cache.db and noticed that all my select queries seemed to be taking an
inordinate amount of time to complete (5s+ at least).  I explain plan the
problematic query, saw no indexes, created one manually, and wallah,
problem solved, all queries returned instantly and explain plan showed
index usage.

I've got a bunch of zfs snapshots of the repo I can have a play around
with tomorrow to see if I can replicate.

Plausible theories off the top of my head:

a) There's something different with my env and indexes were not being
created automatically.  (I do remember having a lot of trouble getting the
asf repo to load from dumps and then complete from subsequent syncs.)
b) There's something else going on.

(Plausible theories?  Yes.  Good theories?  Debatable ;-)

	Trent.

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> It may be TEXT but it is also PRIMARY KEY and according to the SQLite
> docs:
>
> http://sqlite.org/lang_createtable.html
>
>    INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY
>    constraints are implemented by creating an index in the database (in
>    the same way as a "CREATE UNIQUE INDEX" statement would). Such an
>    index is used like any other index in the database to optimize
>    queries. As a result, there often no advantage (but significant
>    overhead) in creating an index on a set of columns that are already
>    collectively subject to a UNIQUE or PRIMARY KEY constraint.

If I create a repository using 1.7 and look at the rep-cache.db I see:

$ sqlite3 rep-cache.db "select * from sqlite_master" | grep index
index|sqlite_autoindex_rep_cache_1|rep_cache|4|

An index has been created automatically and so adding another index can
only slow things down.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn commit: r1296868 - /subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql

Posted by Philip Martin <ph...@wandisco.com>.

danielsh@apache.org writes:

> Author: danielsh
> Date: Sun Mar  4 20:14:01 2012
> New Revision: 1296868
>
> URL: http://svn.apache.org/viewvc?rev=1296868&view=rev
> Log:
> * subversion/libsvn_fs_fs/rep-cache-db.sql
>   (I_HASH): New index.
>
> Suggested by: Trent Nelson <tr...@snakebite.org>
>
> Modified:
>     subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql
>
> Modified: subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql?rev=1296868&r1=1296867&r2=1296868&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/rep-cache-db.sql Sun Mar  4 20:14:01 2012
> @@ -33,6 +33,11 @@ CREATE TABLE rep_cache (
>    expanded_size INTEGER NOT NULL
>    );
>  
> +/* There isn't an implicit index on a TEXT column, so create an explicit one. */
> +CREATE INDEX I_HASH on REP_CACHE (hash);

It may be TEXT but it is also PRIMARY KEY and according to the SQLite
docs:

http://sqlite.org/lang_createtable.html

   INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY
   constraints are implemented by creating an index in the database (in
   the same way as a "CREATE UNIQUE INDEX" statement would). Such an
   index is used like any other index in the database to optimize
   queries. As a result, there often no advantage (but significant
   overhead) in creating an index on a set of columns that are already
   collectively subject to a UNIQUE or PRIMARY KEY constraint.

It appears creating the index is unnecessary and may be actively bad.

This reminds me that shortly before 1.7 I was asking about the exact
opposite of this change: whether removing existing indices from PRIMARY
KEYs, and making indices UNIQUE where possible, would be a good idea.

-- 
Philip