You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2010/02/18 18:21:55 UTC

Is sqlite fast enough?

How fast are sqlite queries be compared to the old-wc entry caching?
Operations like update benefit from faster locking, but in old-wc
status does no locking at all.  Also the multiple entries files give
the data a tree-like organisation that is not modelled quite so well
by an SQL table.  Is sqlite going to be at least as fast as old-wc at
running status when the OS cache is hot?  I've been vaguely assumming
that it would be OK but this week I felt it was time to investigate.
What would we do in 3 months time if wc-ng turns out to be an order of
magnitude slower than old-wc?

I need a big working copy to make status take a significant amount of
time, so I used a shell script[1] to create about 100,000 files in
about 900 directories.  Then I wrote a python script[2] to create and
populate a single sqlite database with rows representing a similar set
of files and directories.  Finally I wrote a simple C client[3] to
query the sqlite database in manner that vaguely approximates status.

Initially result were not encouraging.  On my machine 'svn status'
using 1.6 takes about 1.1 seconds and the sqlite client took nearly 60
seconds to query the database, and most of that was in sqlite3_step.
That sort of slowdown would be disastrous.  After some thought I
realised I needed another index:

 create unique index i_local on base_node (wc_id, local_relpath)

With that added the sqlite client completes the queries in about 0.35
of a second, comfortably faster than the 1.6 status.  Please note that
the sqlite client is only reading the base_node table, it's not doing
stat() on any working files or reading working_node or actual_node.
Also the base_node table in my test has fewer columns than wc-ng will
use.  As a final test I patched[4] my 1.6 build to remove most of the
querying of the working files.  With this change 'svn status' goes
down to about 0.55 of a second, still slower than the sqlite client.

My major concern that sqlite might be an order of magnitude slower
than old-wc is gone.  Obviously the locking operations are likely to
be faster and the read-only operations look good too, although the
margin is not that great.


[1] script to create a large working copy

#!/bin/sh

svnadmin=svnadmin
svn=svn
url=file://`pwd`/repo

rm -rf repo wc empty
$svnadmin create repo
$svn mkdir -mm $url/trunk
touch empty
for i in `seq  0 99` ; do $svn import -mm empty $url/trunk/foo$i; done
rm empty
for i in `seq  0 29` ; do $svn cp -mm -r101 $url/trunk $url/trunk/zag$i; done
for i in `seq  0 29` ; do $svn cp -mm -r131 $url/trunk $url/trunk/zig$i; done
$svn co $url/trunk wc

[2] script to create a sqlite database

#!/usr/bin/python

import os, sqlite3

try: os.remove('wcx.db')
except: pass

c = sqlite3.connect('wcx.db')
c.execute("""create table repository (
               id integer primary key autoincrement,
               root text unique not null,
               uuid text not null)""")
c.execute("""create index i_uuid on repository (uuid)""")
c.execute("""create index i_root on repository (root)""")
c.execute("""create table wcroot (
               id integer primary key autoincrement,
               local_abspath text unique)""")
c.execute("""create unique index i_local_abspath on wcroot (local_abspath)""")
c.execute("""create table base_node (
               wc_id integer not null references wcroot (id),
               local_relpath text not null,
               repos_id integer references repository (id),
               repos_relpath text,
               parent_relpath text,
               presence text not null,
               kind text not null)""")
c.execute("""create unique index i_local on base_node (wc_id, local_relpath)""")
c.execute("""create index i_parent on base_node (wc_id, parent_relpath)""")
c.execute("""insert into repository (root, uuid) values (
               "http://example.com/repo",
               "f738be9e-409d-481f-b246-1fb6a969aba2")""")
c.execute("""insert into wcroot(local_abspath) values ("/wc")""")

c.execute("""insert into base_node (
               wc_id,
               local_relpath,
               repos_id,
               repos_relpath,
               parent_relpath,
               presence,
               kind)
             values (
               1,
               "",
               1,
               "trunk",
               NULL,
               "normal",
               "dir")""")

for i in range(100):
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "foo"""+str(i)+"""",
                   1,
                   "trunk/foo"""+str(i)+"""",
                   "",
                   "normal",
                   "file")""")
    if i >= 30:
        continue;
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zag"""+str(i)+"""",
                   1,
                   "trunk/zag"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")
    c.execute("""insert into base_node (
                   wc_id,
                   local_relpath,
                   repos_id,
                   repos_relpath,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zig"""+str(i)+"""",
                   1,
                   "trunk/zig"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")

    for j in range(100):
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zag"""+str(i)+"/foo"+str(j)+"""",
                       1,
                       "trunk/zag"""+str(i)+"/foo"+str(j)+"""",
                       "zag"""+str(i)+"""",
                       "normal",
                       "file")""")
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/foo"+str(j)+"""",
                       1,
                       "trunk/zig"""+str(i)+"/foo"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "file")""")
        if j >= 30:
            continue
        c.execute("""insert into base_node (
                       wc_id,
                       local_relpath,
                       repos_id,
                       repos_relpath,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/zag"+str(j)+"""",
                       1,
                       "trunk/zig"""+str(i)+"/zag"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "dir")""")
        for k in range(100):
            c.execute("""insert into base_node (
                           wc_id,
                           local_relpath,
                           repos_id,
                           repos_relpath,
                           parent_relpath,
                           presence,
                           kind)
                         values (
                           1,
                           "zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           1,
                           "trunk/zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           "zig"""+str(i)+"/zag"+str(j)+"""",
                           "normal",
                           "file")""")

c.commit()

[3] client to query the database

#include "svn_pools.h"
#include "svn_sqlite.h"
#include <stdio.h>

static svn_error_t *
status_query(svn_sqlite__db_t *sdb,
	     const char *local_relpath,
	     svn_boolean_t display,
	     apr_pool_t *pool)
{
  svn_sqlite__stmt_t *stmt;
  svn_boolean_t have_row;
  const char *kind;
  apr_pool_t *subpool;
  apr_array_header_t *subdirs = apr_array_make(pool, 10, sizeof(const char *));
  int i;

  SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 0));
  SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
  SVN_ERR(svn_sqlite__step(&have_row, stmt));
  if (!have_row)
    {
      SVN_ERR(svn_sqlite__reset(stmt));
      return SVN_NO_ERROR;
    }
  kind = svn_sqlite__column_text(stmt, 0, pool);
  if (display)
    printf("%s %s\n", local_relpath, kind);
  SVN_ERR(svn_sqlite__reset(stmt));
  
  if (!strcmp(kind, "dir"))
    {
      SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 1));
      SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
      SVN_ERR(svn_sqlite__step(&have_row, stmt));
      while (have_row)
	{
	  const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL);
	  kind = svn_sqlite__column_text(stmt, 1, NULL);
          if (!strcmp(kind, "dir"))
	    APR_ARRAY_PUSH(subdirs, const char *)
	      = apr_pstrdup(pool, child_relpath);
	  else if (display)
            printf("%s %s\n", child_relpath, kind);
	  SVN_ERR(svn_sqlite__step(&have_row, stmt));
	}
      SVN_ERR(svn_sqlite__reset(stmt));
    }

  subpool = svn_pool_create(pool);
  for (i = 0; i < subdirs->nelts; ++i)
    {
      svn_pool_clear(subpool);
      SVN_ERR(status_query(sdb, APR_ARRAY_IDX(subdirs, i, const char*), display,
			   subpool));
    }
  svn_pool_destroy(subpool);

  return SVN_NO_ERROR;
}

int main()
{
  apr_pool_t *pool;  
  svn_sqlite__db_t *sdb;
  const char * const statements[] = {
    "select kind from base_node" \
    "   where wc_id = ?1 and local_relpath = ?2;",
    "select local_relpath, kind from base_node" \
    "   where wc_id = ?1 and parent_relpath = ?2;",
    NULL
  };

  apr_initialize();
  pool = svn_pool_create(NULL);
  SVN_INT_ERR(svn_sqlite__open(&sdb, "wcx.db", svn_sqlite__mode_rwcreate,
			       statements, 0, NULL, pool, pool));
  SVN_INT_ERR(status_query(sdb, "", FALSE, pool));

  return EXIT_SUCCESS;
}

[4] patch for 1.6

Index: subversion/libsvn_wc/status.c
===================================================================
--- subversion/libsvn_wc/status.c	(revision 905665)
+++ subversion/libsvn_wc/status.c	(working copy)
@@ -272,6 +272,7 @@
 
   svn_lock_t *repos_lock = NULL;
 
+#if 0
   /* Check for a repository lock. */
   if (repos_locks)
     {
@@ -302,6 +303,7 @@
   SVN_ERR(svn_wc__get_tree_conflict(&tree_conflict, path, adm_access, pool));
 
   if (! entry)
+#endif
     {
       /* return a fairly blank structure. */
       stat = apr_pcalloc(pool, sizeof(*stat));
@@ -316,25 +318,7 @@
       stat->tree_conflict = tree_conflict;
       stat->file_external = FALSE;
 
-      /* If this path has no entry, but IS present on disk, it's
-         unversioned.  If this file is being explicitly ignored (due
-         to matching an ignore-pattern), the text_status is set to
-         svn_wc_status_ignored.  Otherwise the text_status is set to
-         svn_wc_status_unversioned. */
-      if (path_kind != svn_node_none)
-        {
-          if (is_ignored)
-            stat->text_status = svn_wc_status_ignored;
-          else
-            stat->text_status = svn_wc_status_unversioned;
-        }
-
-      /* If this path has no entry, is NOT present on disk, and IS a
-         tree conflict victim, count it as missing. */
-      if ((path_kind == svn_node_none) && tree_conflict)
-        stat->text_status = svn_wc_status_missing;
-
-      stat->repos_lock = repos_lock;
+      stat->repos_lock = NULL;
       stat->url = NULL;
       stat->ood_last_cmt_rev = SVN_INVALID_REVNUM;
       stat->ood_last_cmt_date = 0;

-- 
Philip

RE: Is sqlite fast enough?

Posted by Bert Huijben <be...@vmoo.com>.


> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: donderdag 18 februari 2010 19:22
> To: dev@subversion.apache.org
> Subject: Is sqlite fast enough?
> 
> How fast are sqlite queries be compared to the old-wc entry caching?
> Operations like update benefit from faster locking, but in old-wc status
does
> no locking at all.  Also the multiple entries files give the data a
tree-like
> organisation that is not modelled quite so well by an SQL table.  Is
sqlite going
> to be at least as fast as old-wc at running status when the OS cache is
hot?

We assume it is as fast and for several operations it is faster.

In the old, nearly unmaintainable WC-1.0 code we didn't use one entry
cache.. but in several common cases 3 separate ones that weren't kept in
sync.

Issues like that made the code less and less maintainable.

Over running the entire testsuite we were about 50% slower than 1.6 around
September 2009, while we performed more tests and include entry dumping in
the tests. Since then we do doubled property handling. (Both in-db and
in-files).

This all with several cachings from the entries removed. No single database
yet and absolutely no performance tuning.

Just moving to the pristine store and moving the properties will bring us
around the same performance as 1.6 on our testsuite and then we haven't
looked at the gains by not reading all information at once every time.

To get one svn_wc_entry_t instance, we have to read BASE, ACTUAL and WORKING
for every node in the same directory and in most cases also for the parents
of the current directory to just fill in the entries values.
(And then I forget the information in the parent stubs that we also need)

We generally feel that we will be ok and most likely better off with WC-NG.
Besides that the entries were far from maintainable now. Yes we could add a
flag here and there. But every flag made subversion slower and less
maintainable. And it doesn't fix the real issues as being able to update
under a conflict, being able to revert all operations, handle case
insensitive filesystems, etc. etc.
Most of these issues weren't fixable without redesigning the storage layer.

> I've been vaguely assumming that it would be OK but this week I felt it
was
> time to investigate.
> What would we do in 3 months time if wc-ng turns out to be an order of
> magnitude slower than old-wc?
The current state is 50% slower and I don't see any reason why it would get
any slower when we move to a single DB or a more stable pristine store that
needs less file operations. 

We still use far too many svn_wc_entry_t operations and I don't see these
disappear over just a few weeks. I would be very surprised if we get to a
single database before the second half of April.  (Feel free to prove me
wrong on this ;-)

If it is slower than the older version we have code that we can tune further
and I'm 100% sure that we get a faster Subversion working copy on Windows.
(Just losing the lock slowdown is such a difference that it can't get slower
for large working copies).

WC-1.0 was beyond the point where we could tune it. I'm sure we can get
WC-NG database a few times faster than it is now by better tuning and
profiling. But it doesn't seem that it is necessary to do that to see huge
performance gains against 1.6. (And I really hope we see that tuning there
is not the right place because processing the actual in WC files takes the
largest share of the processing time)

> I need a big working copy to make status take a significant amount of
time, so
> I used a shell script[1] to create about 100,000 files in about 900
directories.
> Then I wrote a python script[2] to create and populate a single sqlite
> database with rows representing a similar set of files and directories.
Finally I
> wrote a simple C client[3] to query the sqlite database in manner that
> vaguely approximates status.

How did you use transactions?
See the sqlite documentations. Using transactions the right way changes
performance characteristics gigantically. We can't use most of these
optimizations before we get to a single database. (Transactions help of
course on writing, but the read performance of operations that quickly
follow each other increases considerably too)

> 
> Initially result were not encouraging.  On my machine 'svn status'
> using 1.6 takes about 1.1 seconds and the sqlite client took nearly 60
seconds
> to query the database, and most of that was in sqlite3_step.
> That sort of slowdown would be disastrous.  After some thought I realised
I
> needed another index:
> 
>  create unique index i_local on base_node (wc_id, local_relpath)

I don't think we have more than a few baseic indexes yet. (Or transactions
around the performance critical code blocks for that matter). We postponed
that work to future performance tuning.

> With that added the sqlite client completes the queries in about 0.35 of a
> second, comfortably faster than the 1.6 status.  Please note that the
sqlite
> client is only reading the base_node table, it's not doing
> stat() on any working files or reading working_node or actual_node.
> Also the base_node table in my test has fewer columns than wc-ng will use.
> As a final test I patched[4] my 1.6 build to remove most of the querying
of
> the working files.  With this change 'svn status' goes down to about 0.55
of a
> second, still slower than the sqlite client.
> 
> My major concern that sqlite might be an order of magnitude slower than
> old-wc is gone.  Obviously the locking operations are likely to be faster
and
> the read-only operations look good too, although the margin is not that
> great.

In a typical SVN status call some files in the working copy are changed and
need a compare against the file in the pristine store. Especially if the
files use keywords and/or eol settings. Most of a svn status disk io is in
these scans; not in accessing the entry/node/tree state.

So even if it would be slower for these simple reads I don't think you would
see that. The better caching (and read ahead characteristics) of a single
file compared to hundreds or thousands of small (< sector-size) sized files
we had in WC-1.0 should hide any difference in real world usage.


Real performance measurements will require a lot more work in libsvn_wc. We
are nowhere near the point where we can calculate the final speed. But
almost every change we make now makes WC-NG faster while we spend the first
half of 2009 making WC-NG slower by removing entry cache specifics.

	Bert

> 
> 
> [1] script to create a large working copy
> 
> #!/bin/sh
> 
> svnadmin=svnadmin
> svn=svn
> url=file://`pwd`/repo
> 
> rm -rf repo wc empty
> $svnadmin create repo
> $svn mkdir -mm $url/trunk
> touch empty
> for i in `seq  0 99` ; do $svn import -mm empty $url/trunk/foo$i; done rm
> empty for i in `seq  0 29` ; do $svn cp -mm -r101 $url/trunk
$url/trunk/zag$i;
> done for i in `seq  0 29` ; do $svn cp -mm -r131 $url/trunk
$url/trunk/zig$i;
> done $svn co $url/trunk wc
> 
> [2] script to create a sqlite database
> 
> #!/usr/bin/python
> 
> import os, sqlite3
> 
> try: os.remove('wcx.db')
> except: pass
> 
> c = sqlite3.connect('wcx.db')
> c.execute("""create table repository (
>                id integer primary key autoincrement,
>                root text unique not null,
>                uuid text not null)""")
> c.execute("""create index i_uuid on repository (uuid)""")
> c.execute("""create index i_root on repository (root)""")
> c.execute("""create table wcroot (
>                id integer primary key autoincrement,
>                local_abspath text unique)""") c.execute("""create unique
index
> i_local_abspath on wcroot (local_abspath)""") c.execute("""create table
> base_node (
>                wc_id integer not null references wcroot (id),
>                local_relpath text not null,
>                repos_id integer references repository (id),
>                repos_relpath text,
>                parent_relpath text,
>                presence text not null,
>                kind text not null)""")
> c.execute("""create unique index i_local on base_node (wc_id,
> local_relpath)""") c.execute("""create index i_parent on base_node (wc_id,
> parent_relpath)""") c.execute("""insert into repository (root, uuid)
values (
>                "http://example.com/repo",
>                "f738be9e-409d-481f-b246-1fb6a969aba2")""")
> c.execute("""insert into wcroot(local_abspath) values ("/wc")""")
> 
> c.execute("""insert into base_node (
>                wc_id,
>                local_relpath,
>                repos_id,
>                repos_relpath,
>                parent_relpath,
>                presence,
>                kind)
>              values (
>                1,
>                "",
>                1,
>                "trunk",
>                NULL,
>                "normal",
>                "dir")""")
> 
> for i in range(100):
>     c.execute("""insert into base_node (
>                    wc_id,
>                    local_relpath,
>                    repos_id,
>                    repos_relpath,
>                    parent_relpath,
>                    presence,
>                    kind)
>                  values (
>                    1,
>                    "foo"""+str(i)+"""",
>                    1,
>                    "trunk/foo"""+str(i)+"""",
>                    "",
>                    "normal",
>                    "file")""")
>     if i >= 30:
>         continue;
>     c.execute("""insert into base_node (
>                    wc_id,
>                    local_relpath,
>                    repos_id,
>                    repos_relpath,
>                    parent_relpath,
>                    presence,
>                    kind)
>                  values (
>                    1,
>                    "zag"""+str(i)+"""",
>                    1,
>                    "trunk/zag"""+str(i)+"""",
>                    "",
>                    "normal",
>                    "dir")""")
>     c.execute("""insert into base_node (
>                    wc_id,
>                    local_relpath,
>                    repos_id,
>                    repos_relpath,
>                    parent_relpath,
>                    presence,
>                    kind)
>                  values (
>                    1,
>                    "zig"""+str(i)+"""",
>                    1,
>                    "trunk/zig"""+str(i)+"""",
>                    "",
>                    "normal",
>                    "dir")""")
> 
>     for j in range(100):
>         c.execute("""insert into base_node (
>                        wc_id,
>                        local_relpath,
>                        repos_id,
>                        repos_relpath,
>                        parent_relpath,
>                        presence,
>                        kind)
>                      values (
>                        1,
>                        "zag"""+str(i)+"/foo"+str(j)+"""",
>                        1,
>                        "trunk/zag"""+str(i)+"/foo"+str(j)+"""",
>                        "zag"""+str(i)+"""",
>                        "normal",
>                        "file")""")
>         c.execute("""insert into base_node (
>                        wc_id,
>                        local_relpath,
>                        repos_id,
>                        repos_relpath,
>                        parent_relpath,
>                        presence,
>                        kind)
>                      values (
>                        1,
>                        "zig"""+str(i)+"/foo"+str(j)+"""",
>                        1,
>                        "trunk/zig"""+str(i)+"/foo"+str(j)+"""",
>                        "zig"""+str(i)+"""",
>                        "normal",
>                        "file")""")
>         if j >= 30:
>             continue
>         c.execute("""insert into base_node (
>                        wc_id,
>                        local_relpath,
>                        repos_id,
>                        repos_relpath,
>                        parent_relpath,
>                        presence,
>                        kind)
>                      values (
>                        1,
>                        "zig"""+str(i)+"/zag"+str(j)+"""",
>                        1,
>                        "trunk/zig"""+str(i)+"/zag"+str(j)+"""",
>                        "zig"""+str(i)+"""",
>                        "normal",
>                        "dir")""")
>         for k in range(100):
>             c.execute("""insert into base_node (
>                            wc_id,
>                            local_relpath,
>                            repos_id,
>                            repos_relpath,
>                            parent_relpath,
>                            presence,
>                            kind)
>                          values (
>                            1,
>
"zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
>                            1,
>
"trunk/zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
>                            "zig"""+str(i)+"/zag"+str(j)+"""",
>                            "normal",
>                            "file")""")
> 
> c.commit()
> 
> [3] client to query the database
> 
> #include "svn_pools.h"
> #include "svn_sqlite.h"
> #include <stdio.h>
> 
> static svn_error_t *
> status_query(svn_sqlite__db_t *sdb,
> 	     const char *local_relpath,
> 	     svn_boolean_t display,
> 	     apr_pool_t *pool)
> {
>   svn_sqlite__stmt_t *stmt;
>   svn_boolean_t have_row;
>   const char *kind;
>   apr_pool_t *subpool;
>   apr_array_header_t *subdirs = apr_array_make(pool, 10, sizeof(const char
> *));
>   int i;
> 
>   SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 0));
>   SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
>   SVN_ERR(svn_sqlite__step(&have_row, stmt));
>   if (!have_row)
>     {
>       SVN_ERR(svn_sqlite__reset(stmt));
>       return SVN_NO_ERROR;
>     }
>   kind = svn_sqlite__column_text(stmt, 0, pool);
>   if (display)
>     printf("%s %s\n", local_relpath, kind);
>   SVN_ERR(svn_sqlite__reset(stmt));
> 
>   if (!strcmp(kind, "dir"))
>     {
>       SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, 1));
>       SVN_ERR(svn_sqlite__bindf(stmt, "is", 1, local_relpath));
>       SVN_ERR(svn_sqlite__step(&have_row, stmt));
>       while (have_row)
> 	{
> 	  const char *child_relpath = svn_sqlite__column_text(stmt, 0,
> NULL);
> 	  kind = svn_sqlite__column_text(stmt, 1, NULL);
>           if (!strcmp(kind, "dir"))
> 	    APR_ARRAY_PUSH(subdirs, const char *)
> 	      = apr_pstrdup(pool, child_relpath);
> 	  else if (display)
>             printf("%s %s\n", child_relpath, kind);
> 	  SVN_ERR(svn_sqlite__step(&have_row, stmt));
> 	}
>       SVN_ERR(svn_sqlite__reset(stmt));
>     }
> 
>   subpool = svn_pool_create(pool);
>   for (i = 0; i < subdirs->nelts; ++i)
>     {
>       svn_pool_clear(subpool);
>       SVN_ERR(status_query(sdb, APR_ARRAY_IDX(subdirs, i, const char*),
> display,
> 			   subpool));
>     }
>   svn_pool_destroy(subpool);
> 
>   return SVN_NO_ERROR;
> }
> 
> int main()
> {
>   apr_pool_t *pool;
>   svn_sqlite__db_t *sdb;
>   const char * const statements[] = {
>     "select kind from base_node" \
>     "   where wc_id = ?1 and local_relpath = ?2;",
>     "select local_relpath, kind from base_node" \
>     "   where wc_id = ?1 and parent_relpath = ?2;",
>     NULL
>   };
> 
>   apr_initialize();
>   pool = svn_pool_create(NULL);
>   SVN_INT_ERR(svn_sqlite__open(&sdb, "wcx.db",
> svn_sqlite__mode_rwcreate,
> 			       statements, 0, NULL, pool, pool));
>   SVN_INT_ERR(status_query(sdb, "", FALSE, pool));
> 
>   return EXIT_SUCCESS;
> }
> 
> [4] patch for 1.6
> 
> Index: subversion/libsvn_wc/status.c
> ==========================================================
> =========
> --- subversion/libsvn_wc/status.c	(revision 905665)
> +++ subversion/libsvn_wc/status.c	(working copy)
> @@ -272,6 +272,7 @@
> 
>    svn_lock_t *repos_lock = NULL;
> 
> +#if 0
>    /* Check for a repository lock. */
>    if (repos_locks)
>      {
> @@ -302,6 +303,7 @@
>    SVN_ERR(svn_wc__get_tree_conflict(&tree_conflict, path, adm_access,
> pool));
> 
>    if (! entry)
> +#endif
>      {
>        /* return a fairly blank structure. */
>        stat = apr_pcalloc(pool, sizeof(*stat)); @@ -316,25 +318,7 @@
>        stat->tree_conflict = tree_conflict;
>        stat->file_external = FALSE;
> 
> -      /* If this path has no entry, but IS present on disk, it's
> -         unversioned.  If this file is being explicitly ignored (due
> -         to matching an ignore-pattern), the text_status is set to
> -         svn_wc_status_ignored.  Otherwise the text_status is set to
> -         svn_wc_status_unversioned. */
> -      if (path_kind != svn_node_none)
> -        {
> -          if (is_ignored)
> -            stat->text_status = svn_wc_status_ignored;
> -          else
> -            stat->text_status = svn_wc_status_unversioned;
> -        }
> -
> -      /* If this path has no entry, is NOT present on disk, and IS a
> -         tree conflict victim, count it as missing. */
> -      if ((path_kind == svn_node_none) && tree_conflict)
> -        stat->text_status = svn_wc_status_missing;
> -
> -      stat->repos_lock = repos_lock;
> +      stat->repos_lock = NULL;
>        stat->url = NULL;
>        stat->ood_last_cmt_rev = SVN_INVALID_REVNUM;
>        stat->ood_last_cmt_date = 0;
> 
> --
> Philip

Re: Is sqlite fast enough?

Posted by Johan Corveleyn <jc...@gmail.com>.

On Mon, Feb 22, 2010 at 12:50 PM, Matthew Bentham <mj...@artvps.com> wrote:
> On 22/02/2010 11:42, Matthew Bentham wrote:
>>
>> On 22/02/2010 11:13, Philip Martin wrote:
>>>
>>> Matthew Bentham<mj...@artvps.com>   writes:
>>>
>>>> For me on CYGWIN_NT-6.0-WOW64 brahe 1.7.1(0.218/5/3) 2009-12-07 11:48
>>>> i686 Cygwin
>>>
>>> Thanks for testing!
>>>
>>>> $ svn --version
>>>> svn, version 1.6.9 (r901367)
>>>>
>>>> Create the test repo using the shell script, repeat "$ time svn
>>>> status" a few times:
>>>> real    0m37.303s
>>>> real    0m15.754s
>>>> real    0m15.832s
>>>
>>> I know Subversion is slow on Windows but that is extreme, it's about
>>> the same as my Linux machine when the cache is cold and it has to wait
>>> for real disk IO; once in memory its an order of magnitude faster.  I
>>> suspect the cygwin layer might be contributing to that (15 seconds of
>>> CPU).  Would it be possible for you to try "svn status" with a
>>> non-cygwin client?
>>>
>>
>> Sure:
>>
>> /cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe
>> --version
>> svn, version 1.6.5 (r38866)
>>     compiled Aug 21 2009, 21:38:11
>>
>> time /cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe
>> status
>>
>> real    0m8.569s
>> real    0m8.599s
>> real    0m8.611s
>>
>> Quite a bit faster :-)  Not as fast as your 1.1s on Debian though :-(
>> The machine is a 2.5Ghz Core2 Quad running Vista 64.
>>
>> Matthew
>>
>
> MMm, those latter tests were also done within a cygwin bash shell so that I
> could use "time", but I've just tried in a Windows cmd shell using my
> wristwatch and got the same times.
>

Just adding my .02 € to this oldish thread ...

First, great effort for trying to get a more objective, measured idea
of what kind of performance improvement might be expected ... so
thanks for that.

Second, when I'm looking at client-side svn performance, I always test
with a cold cache on the client. IMHO, that's the situation that's
most similar to "real life" usage. A user will almost never run "svn
status" two times consecutively. Same for update, merge, ... all the
actions that require a lot of I/O on the client.

At least not in our company. Most users only update once a day, so
that's always straight from disk. I think the same applies to "status"
(user does lots of stuff in IDE, surfs the 'net a little, consults
some documentation, compiles, tests; oh now it's time to commit some
of this, so let's get a status). So in my book those numbers where the
svn client can find everything in disk cache are not relevant at all.

Just to add some more real-life info (I've done a lot of tests lately
with our 1.6 FSFS repo and 1.6 (windows (SlikSVN)) clients on
different hardware, in an effort to estimate where the bottlenecks
are, where we should invest first, ...):

"Real-life" working copy with 3581 dirs with 35260 files in them.
Performing svn status (clean WC, nothing modified, freshly checked out).

1) "old" desktop pc - Windows XP - 7200 rpm HDD - Pentium D 2.8 GHz - 4Gb RAM
- With cold cache (after reboot, and waiting couple of mins)
$ time svn status
real    2m26.156s
real    2m0.421s

- With hot cache (second time svn status after the first one)
real    0m4.156s
real    0m4.512s

2) "new" desktop pc - Windows Vista - 10k rpm HDD - Core 2 Duo 3.0 GHz
- 4 Gb RAM
- With "cold" cache [1]
real    0m42.822s
real    0m38.844s
real    0m30.030s
real    0m17.066s
real    0m15.272s

- With hot cache
real    0m8.908s
real    0m4.774s
real    0m4.712s
real    0m4.773s
real    0m4.758s

3) SSD desktop pc - Windows Vista - SSD Intel X25-M - Core 2 Duo 3.0
GHz - 4 Gb RAM
- With "cold" cache [1]
real    0m14.788s
real    0m7.378s
real    0m6.583s
real    0m6.456s

- With hot cache
real    0m4.758s
real    0m4.758s
real    0m4.743s
real    0m4.738s

[1] I don't really understand why these numbers go down (status gets
faster) even with reboots in between. Then again, I don't know enough
about Windows' caching strategy. Maybe someone else can comment on
this, but my supposition is that maybe there is some built-in indexing
functionality that makes Windows pull files into RAM right after boot,
because it knows those files are read a lot, so have a high chance of
being read again etc ...

As you can see, the numbers for "hot cache" are almost always the
same, regardless of the storage system (the one exception I'll
attribute to me not having waited long enough after reboot, maybe some
windows services were still starting). That's just not realistic,
that's certainly not what our users feel ... Going down from 2 minutes
(7200 rpm HDD) to 17-ish seconds (10k rpm) to 7-ish seconds (SSD)
seems more correct to me.

Being optimistic I'm hoping that WC-NG will bring sub-5-seconds
statuses for this working copy to the masses, i.e. to all those
developers stuck with laptops and desktops with 7k rpm HDDs :).

Thanks for all the continued efforts!

Johan

Re: Is sqlite fast enough?

Posted by Matthew Bentham <mj...@artvps.com>.

On 22/02/2010 11:42, Matthew Bentham wrote:
> On 22/02/2010 11:13, Philip Martin wrote:
>> Matthew Bentham<mj...@artvps.com>   writes:
>>
>>> For me on CYGWIN_NT-6.0-WOW64 brahe 1.7.1(0.218/5/3) 2009-12-07 11:48
>>> i686 Cygwin
>>
>> Thanks for testing!
>>
>>> $ svn --version
>>> svn, version 1.6.9 (r901367)
>>>
>>> Create the test repo using the shell script, repeat "$ time svn
>>> status" a few times:
>>> real    0m37.303s
>>> real    0m15.754s
>>> real    0m15.832s
>>
>> I know Subversion is slow on Windows but that is extreme, it's about
>> the same as my Linux machine when the cache is cold and it has to wait
>> for real disk IO; once in memory its an order of magnitude faster.  I
>> suspect the cygwin layer might be contributing to that (15 seconds of
>> CPU).  Would it be possible for you to try "svn status" with a
>> non-cygwin client?
>>
>
> Sure:
>
> /cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe --version
> svn, version 1.6.5 (r38866)
>      compiled Aug 21 2009, 21:38:11
>
> time /cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe
> status
>
> real    0m8.569s
> real    0m8.599s
> real    0m8.611s
>
> Quite a bit faster :-)  Not as fast as your 1.1s on Debian though :-(
> The machine is a 2.5Ghz Core2 Quad running Vista 64.
>
> Matthew
>

MMm, those latter tests were also done within a cygwin bash shell so 
that I could use "time", but I've just tried in a Windows cmd shell 
using my wristwatch and got the same times.

M

Re: Is sqlite fast enough?

Posted by Matthew Bentham <mj...@artvps.com>.

On 22/02/2010 11:13, Philip Martin wrote:
> Matthew Bentham<mj...@artvps.com>  writes:
>
>> For me on CYGWIN_NT-6.0-WOW64 brahe 1.7.1(0.218/5/3) 2009-12-07 11:48
>> i686 Cygwin
>
> Thanks for testing!
>
>> $ svn --version
>> svn, version 1.6.9 (r901367)
>>
>> Create the test repo using the shell script, repeat "$ time svn
>> status" a few times:
>> real    0m37.303s
>> real    0m15.754s
>> real    0m15.832s
>
> I know Subversion is slow on Windows but that is extreme, it's about
> the same as my Linux machine when the cache is cold and it has to wait
> for real disk IO; once in memory its an order of magnitude faster.  I
> suspect the cygwin layer might be contributing to that (15 seconds of
> CPU).  Would it be possible for you to try "svn status" with a
> non-cygwin client?
>

Sure:

/cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe --version
svn, version 1.6.5 (r38866)
    compiled Aug 21 2009, 21:38:11

time /cygdrive/c/Program\ Files\ \(x86\)/CollabNet\ Subversion/svn.exe 
status

real    0m8.569s
real    0m8.599s
real    0m8.611s

Quite a bit faster :-)  Not as fast as your 1.1s on Debian though :-(
The machine is a 2.5Ghz Core2 Quad running Vista 64.

Matthew

Re: Is sqlite fast enough?

Posted by Philip Martin <ph...@wandisco.com>.

Matthew Bentham <mj...@artvps.com> writes:

> For me on CYGWIN_NT-6.0-WOW64 brahe 1.7.1(0.218/5/3) 2009-12-07 11:48
> i686 Cygwin

Thanks for testing!

> $ svn --version
> svn, version 1.6.9 (r901367)
>
> Create the test repo using the shell script, repeat "$ time svn
> status" a few times:
> real    0m37.303s
> real    0m15.754s
> real    0m15.832s

I know Subversion is slow on Windows but that is extreme, it's about
the same as my Linux machine when the cache is cold and it has to wait
for real disk IO; once in memory its an order of magnitude faster.  I
suspect the cygwin layer might be contributing to that (15 seconds of
CPU).  Would it be possible for you to try "svn status" with a
non-cygwin client?

-- 
Philip

Re: Is sqlite fast enough?

Posted by Matthew Bentham <mj...@artvps.com>.

On 18/02/2010 21:58, kmradke@rockwellcollins.com wrote:
> Greg Stein<gs...@gmail.com>  wrote on 02/18/2010 03:31:19 PM:
>> Awesome work! We've just been assuming/hoping it would be fast enough,
>> and would resolve any problems "later". It is good to see we're in the
>> right ballpark.
>>
>> On Thu, Feb 18, 2010 at 13:21, Philip Martin<philip.
>> martin@wandisco.com>  wrote:
>>> How fast are sqlite queries be compared to the old-wc entry caching?
>>> Operations like update benefit from faster locking, but in old-wc
>>> status does no locking at all.  Also the multiple entries files give
>>> the data a tree-like organisation that is not modelled quite so well
>>> by an SQL table.  Is sqlite going to be at least as fast as old-wc at
>>> running status when the OS cache is hot?  I've been vaguely assumming
>>> that it would be OK but this week I felt it was time to investigate.
>>> What would we do in 3 months time if wc-ng turns out to be an order of
>>> magnitude slower than old-wc?
>>> ...
>
> What platform were these test executed on?  We need to make sure
> windows platforms are just as zippy.
>
> Kevin R.

For me on CYGWIN_NT-6.0-WOW64 brahe 1.7.1(0.218/5/3) 2009-12-07 11:48 
i686 Cygwin

$ svn --version
svn, version 1.6.9 (r901367)

Create the test repo using the shell script, repeat "$ time svn status" 
a few times:
real    0m37.303s
real    0m15.754s
real    0m15.832s

Create "wcx.db" using the python script, repeat "time 
simplesqliteclient.exe" a few times:

real    0m0.107s
real    0m0.100s
real    0m0.093s

I'm not really in a position to patch svn 1.6 to stop it from doing the 
actual file accesses, so I can't do the last test that Philip performed.

Great results!

Matthew

Re: Is sqlite fast enough?

Posted by km...@rockwellcollins.com.

Mark Phippard <ma...@gmail.com> wrote on 02/19/2010 09:32:23 AM:

> On Fri, Feb 19, 2010 at 6:43 AM,  <km...@rockwellcollins.com> wrote:
> 
> > Good to hear.  I know the TortoiseSVN devs are not using the dev svn
> > branch
> > currently because of the previous huge performance issues they saw in
> > the past.  Sounds like it might be time to get them to try again.  I
> > know they monitor this list.  (Unfortunately I'm not setup to do 
windows
> > builds either, or I would have just tested instead of asking...)
> 
> I would not recommend TortoiseSVN or other tools move to trunk until
> the WC metadata is consolidated into a single .svn folder.  Until that
> happens, the performance is not there and you still have to deal with
> the ongoing churn.

I was only thinking the TortoiseSVN dev building against the svn dev
libs (like it used to in the past.)  However, it sounds like you feel
the svn dev libs are churning enough that things should wait until
at least the .svn folders are combined before getting much wider
potential tests.  Fair enough, and exactly the info I was looking
to get.

Maybe I can dream and hope that my repeated curiosity of these
things will force me to setup a windows build environment...

Kevin R.

Re: Is sqlite fast enough?

Posted by Mark Phippard <ma...@gmail.com>.

On Fri, Feb 19, 2010 at 6:43 AM,  <km...@rockwellcollins.com> wrote:

> Good to hear.  I know the TortoiseSVN devs are not using the dev svn
> branch
> currently because of the previous huge performance issues they saw in
> the past.  Sounds like it might be time to get them to try again.  I
> know they monitor this list.  (Unfortunately I'm not setup to do windows
> builds either, or I would have just tested instead of asking...)

I would not recommend TortoiseSVN or other tools move to trunk until
the WC metadata is consolidated into a single .svn folder.  Until that
happens, the performance is not there and you still have to deal with
the ongoing churn.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

RE: Is sqlite fast enough?

Posted by km...@rockwellcollins.com.

> > > What platform were these test executed on?  We need to make sure
> > > windows platforms are just as zippy.
> > 
> > I tested on Debian stable.  I don't currently build on Windows, but
> > all the code I used is in my first email so you should be able to
> > reproduce it if you want.  I do have a VM running XP so perhaps I
> > should learn how to do a Windows build.
> > 
> > Much in the same way that we have been assuming that sqlite will
> > perform acceptably, we have been assuming that it will perform
> > acceptably on Windows.  Subversion has always had less performance on
> > Windows because of the way it interacts with the filesystem and I
> > think the assumption is that Windows might see a bigger performance
> > gain than Unix.  Running the sort of tests I ran would be a good thing
> > to do.
> 
> I can confirm that I see a much bigger performance improvement on 
Windows
> than the other developers are reporting on other operating systems. This 
is
> mostly related to using fewer files for the same operations. (Creating 
files
> on a journaling filesystem is much slower than on systems without a 
journal.
> And Windows does full journaling on creating empty or small files). 
> 
> Write/read performance inside a single file (read: database) is much 
more
> comparable between operating systems as most filesystems and operating
> systems optimize for this case. (NTFS isn't particularly optimized for
> creating thousands of empty files and then deleting them a few seconds
> later, but it is a fast filesystem for e.g. large databases and huge 
media
> files).
> 
> Another slowdown commonly experienced with Subversion 1.0-1.6 on Windows 
is
> that some files are locked by virusscanners/indexers/other subversion
> clients monitoring the same files. The Subversion libraries then have to
> perform some wait loop. 
> Switching to a central database file that isn't deleted/replaced after 
every
> operation allows better concurrent access for these tools. So you should
> also see fewer delays here.

Good to hear.  I know the TortoiseSVN devs are not using the dev svn 
branch
currently because of the previous huge performance issues they saw in
the past.  Sounds like it might be time to get them to try again.  I
know they monitor this list.  (Unfortunately I'm not setup to do windows
builds either, or I would have just tested instead of asking...)

Kevin R.

RE: Is sqlite fast enough?

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: vrijdag 19 februari 2010 12:03
> To: kmradke@rockwellcollins.com
> Cc: dev@subversion.apache.org
> Subject: Re: Is sqlite fast enough?
> 
> kmradke@rockwellcollins.com writes:
> 
> > What platform were these test executed on?  We need to make sure
> > windows platforms are just as zippy.
> 
> I tested on Debian stable.  I don't currently build on Windows, but
> all the code I used is in my first email so you should be able to
> reproduce it if you want.  I do have a VM running XP so perhaps I
> should learn how to do a Windows build.
> 
> Much in the same way that we have been assuming that sqlite will
> perform acceptably, we have been assuming that it will perform
> acceptably on Windows.  Subversion has always had less performance on
> Windows because of the way it interacts with the filesystem and I
> think the assumption is that Windows might see a bigger performance
> gain than Unix.  Running the sort of tests I ran would be a good thing
> to do.

I can confirm that I see a much bigger performance improvement on Windows
than the other developers are reporting on other operating systems. This is
mostly related to using fewer files for the same operations. (Creating files
on a journaling filesystem is much slower than on systems without a journal.
And Windows does full journaling on creating empty or small files). 

Write/read performance inside a single file (read: database) is much more
comparable between operating systems as most filesystems and operating
systems optimize for this case. (NTFS isn't particularly optimized for
creating thousands of empty files and then deleting them a few seconds
later, but it is a fast filesystem for e.g. large databases and huge media
files).

Another slowdown commonly experienced with Subversion 1.0-1.6 on Windows is
that some files are locked by virusscanners/indexers/other subversion
clients monitoring the same files. The Subversion libraries then have to
perform some wait loop. 
Switching to a central database file that isn't deleted/replaced after every
operation allows better concurrent access for these tools. So you should
also see fewer delays here.

	Bert

Re: Is sqlite fast enough?

Posted by Philip Martin <ph...@wandisco.com>.

kmradke@rockwellcollins.com writes:

> What platform were these test executed on?  We need to make sure
> windows platforms are just as zippy.

I tested on Debian stable.  I don't currently build on Windows, but
all the code I used is in my first email so you should be able to
reproduce it if you want.  I do have a VM running XP so perhaps I
should learn how to do a Windows build.

Much in the same way that we have been assuming that sqlite will
perform acceptably, we have been assuming that it will perform
acceptably on Windows.  Subversion has always had less performance on
Windows because of the way it interacts with the filesystem and I
think the assumption is that Windows might see a bigger performance
gain than Unix.  Running the sort of tests I ran would be a good thing
to do.

-- 
Philip

Re: Is sqlite fast enough?

Posted by km...@rockwellcollins.com.

Greg Stein <gs...@gmail.com> wrote on 02/18/2010 03:31:19 PM:
> Awesome work! We've just been assuming/hoping it would be fast enough,
> and would resolve any problems "later". It is good to see we're in the
> right ballpark.
> 
> On Thu, Feb 18, 2010 at 13:21, Philip Martin <philip.
> martin@wandisco.com> wrote:
> > How fast are sqlite queries be compared to the old-wc entry caching?
> > Operations like update benefit from faster locking, but in old-wc
> > status does no locking at all.  Also the multiple entries files give
> > the data a tree-like organisation that is not modelled quite so well
> > by an SQL table.  Is sqlite going to be at least as fast as old-wc at
> > running status when the OS cache is hot?  I've been vaguely assumming
> > that it would be OK but this week I felt it was time to investigate.
> > What would we do in 3 months time if wc-ng turns out to be an order of
> > magnitude slower than old-wc?
> >...

What platform were these test executed on?  We need to make sure
windows platforms are just as zippy.

Kevin R.

Re: Is sqlite fast enough?

Posted by Greg Stein <gs...@gmail.com>.

Awesome work! We've just been assuming/hoping it would be fast enough,
and would resolve any problems "later". It is good to see we're in the
right ballpark.

On Thu, Feb 18, 2010 at 13:21, Philip Martin <ph...@wandisco.com> wrote:
> How fast are sqlite queries be compared to the old-wc entry caching?
> Operations like update benefit from faster locking, but in old-wc
> status does no locking at all.  Also the multiple entries files give
> the data a tree-like organisation that is not modelled quite so well
> by an SQL table.  Is sqlite going to be at least as fast as old-wc at
> running status when the OS cache is hot?  I've been vaguely assumming
> that it would be OK but this week I felt it was time to investigate.
> What would we do in 3 months time if wc-ng turns out to be an order of
> magnitude slower than old-wc?
>...