You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2010/10/29 07:58:02 UTC

SQLite and SELECT WHERE local_relpath LIKE

We have started using queries of the form

   SELECT ... WHERE ... AND local_relpath LIKE ...

and I was curious about the performance of LIKE.  So I tried a large
database:

$ sqlite3 wcx.db "select count(*) from nodes"
377021

and a LIKE query:

$ time sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zig1/zag27' or local_relpath like 'zig1/zag27/%' escape '%'"
101

real    0m0.398s
user    0m0.348s
sys     0m0.048s

Using escape adds a bit to the time, as does the double check on
local_relpath caused by '/%' instead of '%', but both of those are small
effects.

A suggested optimisation is

$ time sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath >= 'zig1/zag27' and local_relpath < 'zig1/zag28'"
101

real    0m0.005s
user    0m0.004s
sys     0m0.000s

I checked using "select *" and the two queries return the same data.

Can we reliably calculate the "next string" for a given UTF-8 string?
That is 'zig1/zag28' given 'zig1/zag27'?  Can we treat the string as a
bytes and just increment and carry?

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Philip Martin wrote on Fri, Oct 29, 2010 at 08:58:02 +0100:
>> Can we treat the string as a bytes and just increment and carry?
>
> No, this might invalidate the UTF-8 sequence.  If a character spans
> multiple bytes, then those bytes have header bits of the form /^1+0/
> (in regex), so incrementing the byte blindly will eventually invalidate
> the byte sequence by turning the 0 into a 1.

It turns out we don't need a general "next character", just the next
character after '/', i.e. '0'.  Thus

  local_relpath LIKE 'foo/bar/%' ESCAPE '%'

becomes 

  local_relpath > 'foo/bar/' AND local_relpath < 'foo/bar0'


-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Philip Martin wrote on Fri, Oct 29, 2010 at 08:58:02 +0100:
> Can we treat the string as a bytes and just increment and carry?

No, this might invalidate the UTF-8 sequence.  If a character spans
multiple bytes, then those bytes have header bits of the form /^1+0/
(in regex), so incrementing the byte blindly will eventually invalidate
the byte sequence by turning the 0 into a 1.

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Florian Weimer <fw...@bfk.de>.
* Philip Martin:

>> I suppose I will have to try using EXPLAIN.

As expected, the slow query doesn't use the index.  I think you should
bring this up on SQLite user mailing list.

-- 
Florian Weimer                <fw...@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> I suppose I will have to try using EXPLAIN.

For the fast query
  "select ... where wc_id = ...
                and local_relpath > ... and local_relpath < ..."
explain starts:

0|Ttace|0|0|0||00|
1|Integer|1|1|0||00|
2|String8|0|2|0|zig1/zag27/|00|
3|String8|0|3|0|zig1/zag270|00|
4|Goto|0|42|0||00|
5|OpenRead|0|10|0|21|00|
6|OpenRead|1|11|0|keyinfo(3,BINARY,BINARY)|00|
7|SCopy|1|4|0||00|
8|Copy|2|5|0||00|
9|SeekGt|1|39|4|2|00|
10|Copy|3|5|0||00|
11|IdxGE|1|39|4|2|00|
12|Column|1|1|6||00|
13|IsNull|6|38|0||00|
14|IdxRowid|1|6|0||00|
15|Seek|0|6|0||00|
...
45|Goto|0|5|0||00|


For the slow query
 "select ... where wc_id = ...
               and local_relpath = ...
               or (local_relpath > ... and local_relpath < ...)"
explain starts:

0|Trace|0|0|0||00|
1|Integer|1|1|0||00|
2|String8|0|2|0|zig1/zag27|00|
3|String8|0|3|0|zig1/zag27/|00|
4|String8|0|4|0|zig1/zag270|00|
5|Goto|0|40|0||00|
6|OpenRead|0|10|0|21|00|
7|Rewind|0|38|0||00|
8|Column|0|0|5||00|
9|Ne|1|12|5|collseq(BINARY)|6c|
10|Column|0|1|6||00|
11|Eq|2|15|6|collseq(BINARY)|61|
12|Column|0|1|6||00|
13|Le|3|37|6|collseq(BINARY)|69|
14|Ge|4|37|6|collseq(BINARY)|69|
...
43|Goto|0|6|0||00|

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Peter Samuelson <pe...@p12n.org>.
[Philip Martin]
>   sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zag1/zag27'"
> 
>   sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"

>   sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zig1/zag27' or (local_
relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"

AND has higher precedence than OR, so you want more parens:

    sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and (local_relpath = 'zig1/zag27' or (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270'))"

Peter

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

>   sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zag1/zag27'"

Typo, that should be 'zig1/zag27'.

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Oct 29, 2010 at 05:57, Philip Martin <ph...@wandisco.com> wrote:
> Florian Weimer <fw...@bfk.de> writes:
>
>> It seems an optimizer issue.  Which version of SQLite do you use?
>
> I was using 3.6.21-2~bpo50 on Debian/stable.  I've just built a local
> 3.7.3-1 and get the same result.
>
> The database has 377021 rows.  The exact commands are:
>
>  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zag1/zag27'"
>
>  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"
>
> Which select 1 row and 100 rows and take 0.006s.  The combined command
>
>  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zig1/zag27' or (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"
>
> selects 101 rows and takes 0.35s.
>...

Why not simply use two queries? Run the first query, and if you get a
row, then run the second query and append the results.

Cheers,
-g

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Florian Weimer <fw...@bfk.de> writes:

> It seems an optimizer issue.  Which version of SQLite do you use?

I was using 3.6.21-2~bpo50 on Debian/stable.  I've just built a local
3.7.3-1 and get the same result.

The database has 377021 rows.  The exact commands are:

  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zag1/zag27'"

  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"

Which select 1 row and 100 rows and take 0.006s.  The combined command

  sqlite3 wcx.db "select count(*) from nodes where wc_id = 1 and local_relpath = 'zig1/zag27' or (local_relpath > 'zig1/zag27/' and local_relpath < 'zig1/zag270')"

selects 101 rows and takes 0.35s.

The script to create the database is:


#!/usr/bin/python

import os, sqlite3

try: os.remove('wcx.db')
except: pass

c = sqlite3.connect('wcx.db')
c.execute("""pragma case_sensitive_like=1""")
c.execute("""pragma foreign_keys=on""")
c.execute("""pragma synchronous=off""")
c.execute("""create table repository (
               id integer primary key autoincrement,
               root text unique not null,
               uuid text not null)""")
c.execute("""create index i_uuid on repository (uuid)""")
c.execute("""create index i_root on repository (root)""")
c.execute("""create table wcroot (
               id integer primary key autoincrement,
               local_abspath text unique)""")
c.execute("""create unique index i_local_abspath on wcroot (local_abspath)""")
c.execute("""create table nodes (
               wc_id integer not null references wcroot (id),
               local_relpath text not null,
               op_depth integer not null,
               parent_relpath text,
               repos_id integer references repository (id),
               repos_path text,
               revision integer,
               presence text not null,
               depth text,
               moved_here integer,
               moved_to text,
               kind text not null,
               changed_revision integer,
               changed_date integer,
               changed_author text,
               checksum text
               properties blob,
               translated_size integer,
               last_mod_time integer,
               dav_cache blob,
               symlink_target text,
               file_external text,
               primary key(wc_id, local_relpath, op_depth))""")
c.execute("""create index i_parent on nodes (wc_id,
                                             parent_relpath,
                                             local_relpath, op_depth)""")
c.execute("""create table lock (
                repos_id integer not null references repository (id),
                repos_relpath text not null,
                lock_token text not null,
                lock_owner text,
                lock_comment text,
                lock_date integer,
                primary key (repos_id, repos_relpath))""")
c.execute("""insert into repository (root, uuid) values (
               "http://example.com/repo",
               "f738be9e-409d-481f-b246-1fb6a969aba2")""")
c.execute("""insert into wcroot(local_abspath) values ("/wc")""")

c.execute("""insert into nodes (
               wc_id,
               local_relpath,
               op_depth,
               repos_id,
               repos_path,
               parent_relpath,
               presence,
               kind)
             values (
               1,
               "",
               0,
               1,
               "trunk",
               NULL,
               "normal",
               "dir")""")

for i in range(100):
    c.execute("""insert into nodes (
                   wc_id,
                   local_relpath,
                   op_depth,
                   repos_id,
                   repos_path,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "foo"""+str(i)+"""",
                   0,
                   1,
                   "trunk/foo"""+str(i)+"""",
                   "",
                   "normal",
                   "file")""")
    if i >= 60:
        continue;
    c.execute("""insert into nodes (
                   wc_id,
                   local_relpath,
                   op_depth,
                   repos_id,
                   repos_path,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zag"""+str(i)+"""",
                   0,
                   1,
                   "trunk/zag"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")
    c.execute("""insert into nodes (
                   wc_id,
                   local_relpath,
                   op_depth,
                   repos_id,
                   repos_path,
                   parent_relpath,
                   presence,
                   kind)
                 values (
                   1,
                   "zig"""+str(i)+"""",
                   0,
                   1,
                   "trunk/zig"""+str(i)+"""",
                   "",
                   "normal",
                   "dir")""")

    for j in range(100):
        c.execute("""insert into nodes (
                       wc_id,
                       local_relpath,
                       op_depth,
                       repos_id,
                       repos_path,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zag"""+str(i)+"/foo"+str(j)+"""",
                       0,
                       1,
                       "trunk/zag"""+str(i)+"/foo"+str(j)+"""",
                       "zag"""+str(i)+"""",
                       "normal",
                       "file")""")
        if j % 10 == 1:
          c.execute("""insert into nodes (
                         wc_id,
                         local_relpath,
                         op_depth,
                         repos_id,
                         repos_path,
                         parent_relpath,
                         presence,
                         kind)
                       values (
                         1,
                         "zag"""+str(i)+"/foo"+str(j)+"""",
                         3,
                         1,
                         "trunk/zag"""+str(i)+"/foo"+str(j)+"""",
                         "zag"""+str(i)+"""",
                         "base-delete",
                         "file")""")
          c.execute("""insert into nodes (
                         wc_id,
                         local_relpath,
                         op_depth,
                         repos_id,
                         repos_path,
                         parent_relpath,
                         presence,
                         kind)
                       values (
                         1,
                         "zag"""+str(i)+"/bar"+str(j)+"""",
                         3,
                         null,
                         null,
                         "zag"""+str(i)+"""",
                         "normal",
                         "file")""")
        c.execute("""insert into nodes (
                       wc_id,
                       local_relpath,
                       op_depth,
                       repos_id,
                       repos_path,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/foo"+str(j)+"""",
                       0,
                       1,
                       "trunk/zig"""+str(i)+"/foo"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "file")""")
        if j >= 60:
            continue
        c.execute("""insert into nodes (
                       wc_id,
                       local_relpath,
                       op_depth,
                       repos_id,
                       repos_path,
                       parent_relpath,
                       presence,
                       kind)
                     values (
                       1,
                       "zig"""+str(i)+"/zag"+str(j)+"""",
                       0,
                       1,
                       "trunk/zig"""+str(i)+"/zag"+str(j)+"""",
                       "zig"""+str(i)+"""",
                       "normal",
                       "dir")""")
        for k in range(100):
            c.execute("""insert into nodes (
                           wc_id,
                           local_relpath,
                           op_depth,
                           repos_id,
                           repos_path,
                           parent_relpath,
                           presence,
                           kind)
                         values (
                           1,
                           "zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           0,
                           1,
                           "trunk/zig"""+str(i)+"/zag"+str(j)+"/foo"+str(k)+"""",
                           "zig"""+str(i)+"/zag"+str(j)+"""",
                           "normal",
                           "file")""")

c.commit()

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Florian Weimer <fw...@bfk.de>.
* Philip Martin:

>> and that is as slow as LIKE.  Adding that "local_relpath =" is the
>> problem, without it I get the children in 0.006s.  With it I get the
>> path and the children but it takes 0.35s.
>
> SQL is tricky: I can run a query using either
>
>       local_relpath = 'zig1/zag27
>
> or using
>
>      local_relpath > 'zig1/zag27/' AND local_relpath < 'zig1/zag270'
>
> and it takes 0.006s, but if I put in both parts in together with OR it
> takes 0.35s.  That's 2 orders of magnitude!  I suppose I will have to
> try using EXPLAIN.

It seems an optimizer issue.  Which version of SQLite do you use?

-- 
Florian Weimer                <fw...@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> I need something like
>
>      local_relpath = 'zig1/zag27
>      OR (local_relpath > 'zig1/zag27/' AND local_relpath < 'zig1/zag270')
>
> and that is as slow as LIKE.  Adding that "local_relpath =" is the
> problem, without it I get the children in 0.006s.  With it I get the
> path and the children but it takes 0.35s.

SQL is tricky: I can run a query using either

      local_relpath = 'zig1/zag27

or using

     local_relpath > 'zig1/zag27/' AND local_relpath < 'zig1/zag270'

and it takes 0.006s, but if I put in both parts in together with OR it
takes 0.35s.  That's 2 orders of magnitude!  I suppose I will have to
try using EXPLAIN.

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Philip Martin <ph...@wandisco.com>.
Florian Weimer <fw...@bfk.de> writes:

> * Philip Martin:
>
>> We have started using queries of the form
>>
>>    SELECT ... WHERE ... AND local_relpath LIKE ...
>>
>> and I was curious about the performance of LIKE.
>
> LIKE is ASCII-case-insensitive in SQLite, and the indexes are
> case-sensitive by default, so SQLite can't do the usual range
> optimization.
>
> You could try pragma case_sensitive_like, or try switching to the GLOB
> operator.

We already use case_sensitive_like. (I did forgot to set it initially in
my tests but setting it doesn't appear to make much difference).

Bert pointed out that my two statements were not quite equivalent. To
get

     local_relpath = 'zig1/zag27'
     OR local_relpath LIKE 'zig1/zag27/%' ESCAPE '%'

I need something like

     local_relpath = 'zig1/zag27
     OR (local_relpath > 'zig1/zag27/' AND local_relpath < 'zig1/zag270')

and that is as slow as LIKE.  Adding that "local_relpath =" is the
problem, without it I get the children in 0.006s.  With it I get the
path and the children but it takes 0.35s.

-- 
Philip

Re: SQLite and SELECT WHERE local_relpath LIKE

Posted by Florian Weimer <fw...@bfk.de>.
* Philip Martin:

> We have started using queries of the form
>
>    SELECT ... WHERE ... AND local_relpath LIKE ...
>
> and I was curious about the performance of LIKE.

LIKE is ASCII-case-insensitive in SQLite, and the indexes are
case-sensitive by default, so SQLite can't do the usual range
optimization.

You could try pragma case_sensitive_like, or try switching to the GLOB
operator.

-- 
Florian Weimer                <fw...@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99