You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-dev@jakarta.apache.org by Peter Hawkins <pe...@globalvision.com.au> on 2001/05/03 11:39:30 UTC

JDBC*Store efficiency

Folks,

In light of recent comments about slide running slowly when we have 100+
documents, I thought I'd take a look into it. I'd say that there isn't a lot
we could do with indices in the database, but rather that we're being very
inneficient in our SQL. I did some measurements and found that on doing a
PROPFIND in a scope containing two text files, slide issued 61 select
statements (approx 1000ms). With 32 files, a PROPFIND resulted in 438
selects (4400ms). The startup time on a query is pretty significant and so
we can expect this to have a dramatic effect with any number of documents.

There is scope for "JOIN" to reduce the number of selects, but I suspect
that this is not the best solution on its own.

For example the sequence

  select * from objects where uri='/users/root/foo'
  select * from children where uri='/users/root/foo'
  select * from links where linkto='/users/root/foo'
  select * from permissions where object='/users/foo'

  select * from objects where uri='/users/root/bar'
  select * from children where uri='/users/roo/bar'
  select * from links where linkto='/users/root/bar'
  select * from permissions where object='/users/root/bar'

[etc]

accounts for a lot of the work.

Surely a far better way to do it would be to perform a _single_ search on a
scope which returns a set of records corresponding to its children with
fields along the lines of:

		uri classname linkcount permission

there could be several of these for a uri, corresponding to different
permissions (one per role for example), or better yet, compute the target
before issuing the select, passing the role/person in question as a part of
the "WHERE" clause.

Now... all this is really nice but is would involve a little more than
simply reworking the schema. What I propose is that we extend the store's
cache for the JDBC case and build it so that it gets populated by the
children of the current node using a single search

Thinking initially about a depth=1 PROPFIND, if we work on the statistical
likelihood of an initial request for a uri being followed by requests for
its children, we can save time by dragging all of them from the store at
once. To prevent the cache growing too fast, there is plenty of scope for
reducing its size by altering the schema - the classname could be a pointer
to a lookup table and permissions could be expressed using lookup tables as
well. We might reasonably be able to expect to make a typical record around
32 bytes.

Now... for the situation where we're talking about recursive PROPFINDs I can
see a few ways to proceed - but let's say we pass the depth with the initial
request and have the store retrieve the entire relevant part of the tree.
This is probably OK sizewise and is perhaps the simplest way to do it
programmatically.

There is no need for testing for nodes being "clean" or "dirty" for now -
that could be a future extension, but for now all we're talking about is
condensing all the searches pertaining to a *single* PROPFIND command
("snapshot") into one and using a cache during the execution of it. We could
scrub the cache at the end of the PROPFIND for now and still ought to see a
significant performance boost.

Comments?

Regards,
Peter




RE: JDBC*Store efficiency

Posted by Peter Hawkins <pe...@globalvision.com.au>.
Remy

>First of all, thanks for the detailed report :)

No probs :)

>Really ? I thought query on the children and permissions tables (among
>others) would get a big boost if they were indexed.

Sorry if that wasn't clear - I meant "indices alone" with a revamped schema
yes I'd use indices all over.

>Ok. The thing is, I have no idea which particular db schema would yield the
>best performance. This probably also depends on the db used.

I have a few ideas and yes it might  vary but not by too much I'd say.

>The current implementation of a JDBC store is meant to be mostly a
>"reference implementation", used as an example to develop new stores. I
>expect the performance of JDBC stores tweaked for a particular database to
>be a lot better.
>
>Basically, I would suggest leaving the JDBC store as is, and work in any
>optimised store in another package.

Oh yes, I agree - we need a simple reference one. The thing that led me to
post was that
in addition to creating a new store the way I suggested, we need to modify
the higher layers to pass down the depth of a search. (see later)

>The child store (here the JDBC store) is free to ovverride the default
>caching when it makes sense. Here, I think it would be an excellent idea to
>do that, since hopefully it would reduce the number of queries needed.

Yea, that's what I meant by "extend" - forgive me if I'm not yet up with
java terminology :) I meant "extend" as in "extend a class". Sounds like
"override" is the right term. :) Another one for the C -> Java
mind-conversion notebook.

>The SlideToken could hold a recommended caching policy field which would be
>set by the client. The store could use it to fill out its cache in a more
>intelligent fashion, knowing what's coming next.

Love it. I'll give it a go.

>Yes, this makes a lot of sense.

OK then I have a new project :) Seemed sensible to me but I just wanted to
run it by you.

Peter


Re: JDBC*Store efficiency

Posted by Remy Maucherat <re...@apache.org>.
> Folks,

First of all, thanks for the detailed report :)

> In light of recent comments about slide running slowly when we have 100+
> documents, I thought I'd take a look into it. I'd say that there isn't a
lot
> we could do with indices in the database,

Really ? I thought query on the children and permissions tables (among
others) would get a big boost if they were indexed.

> but rather that we're being very
> inneficient in our SQL. I did some measurements and found that on doing a
> PROPFIND in a scope containing two text files, slide issued 61 select
> statements (approx 1000ms). With 32 files, a PROPFIND resulted in 438
> selects (4400ms). The startup time on a query is pretty significant and so
> we can expect this to have a dramatic effect with any number of documents.
>
> There is scope for "JOIN" to reduce the number of selects, but I suspect
> that this is not the best solution on its own.
>
> For example the sequence
>
>   select * from objects where uri='/users/root/foo'
>   select * from children where uri='/users/root/foo'
>   select * from links where linkto='/users/root/foo'
>   select * from permissions where object='/users/foo'
>
>   select * from objects where uri='/users/root/bar'
>   select * from children where uri='/users/roo/bar'
>   select * from links where linkto='/users/root/bar'
>   select * from permissions where object='/users/root/bar'
>
> [etc]
>
> accounts for a lot of the work.
>
> Surely a far better way to do it would be to perform a _single_ search on
a
> scope which returns a set of records corresponding to its children with
> fields along the lines of:
>
> uri classname linkcount permission

Ok. The thing is, I have no idea which particular db schema would yield the
best performance. This probably also depends on the db used.

The current implementation of a JDBC store is meant to be mostly a
"reference implementation", used as an example to develop new stores. I
expect the performance of JDBC stores tweaked for a particular database to
be a lot better.

Basically, I would suggest leaving the JDBC store as is, and work in any
optimised store in another package.

> there could be several of these for a uri, corresponding to different
> permissions (one per role for example), or better yet, compute the target
> before issuing the select, passing the role/person in question as a part
of
> the "WHERE" clause.
>
> Now... all this is really nice but is would involve a little more than
> simply reworking the schema. What I propose is that we extend the store's
> cache for the JDBC case and build it so that it gets populated by the
> children of the current node using a single search

The child store (here the JDBC store) is free to ovverride the default
caching when it makes sense. Here, I think it would be an excellent idea to
do that, since hopefully it would reduce the number of queries needed.

> Thinking initially about a depth=1 PROPFIND, if we work on the statistical
> likelihood of an initial request for a uri being followed by requests for
> its children, we can save time by dragging all of them from the store at
> once. To prevent the cache growing too fast, there is plenty of scope for
> reducing its size by altering the schema - the classname could be a
pointer
> to a lookup table and permissions could be expressed using lookup tables
as
> well. We might reasonably be able to expect to make a typical record
around
> 32 bytes.
>
> Now... for the situation where we're talking about recursive PROPFINDs I
can
> see a few ways to proceed - but let's say we pass the depth with the
initial
> request and have the store retrieve the entire relevant part of the tree.
> This is probably OK sizewise and is perhaps the simplest way to do it
> programmatically.

The SlideToken could hold a recommended caching policy field which would be
set by the client. The store could use it to fill out its cache in a more
intelligent fashion, knowing what's coming next.

> There is no need for testing for nodes being "clean" or "dirty" for now -
> that could be a future extension, but for now all we're talking about is
> condensing all the searches pertaining to a *single* PROPFIND command
> ("snapshot") into one and using a cache during the execution of it. We
could
> scrub the cache at the end of the PROPFIND for now and still ought to see
a
> significant performance boost.

Yes, this makes a lot of sense.

Remy