You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by legolas wood <le...@gmail.com> on 2006/11/03 12:16:10 UTC
does Derby has some form of performance problem ?
Hi
Thank you for reading my post
i want to know whether this benchmark result is correct or not.
we are using derby as embedded database and it took a long time to start
the database.
we use a proper shutdown every time we are leaving the application.
does any one here used derby for real world jobs (embedded mode) that
database size growth to +100 MB ?
benchmark result
http://www.h2database.com/html/frame.html?performance.html&main
is it usual for derby to take 20-30 seconds to start ?
does startup time depend on database size ?
Thanks
Re: slow subqueries
Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.
>>>>>>>>>>>> Jim Newsham wrote (2006-11-11 10:07:18):
>
>
> > > This is why a nested loop is not going to work here... 20,000 squared
> > > operations is very expensive, let alone millions squared. For a query
> > with
> > > this profile, the inner query should only be executed once.
> >
> > Perhaps you can get the behavior you desire by explicitly creating
> > a temporary table, selecting the data from your inner query into
> > the temporary table, then re-writing your main query to join against
> > the temporary table?
>
> I'd be willing to try that. Some questions come to mind:
>
> - Is there a way in sql to select the results of a query into a
> - table?
insert into someTable (select * from someOtherTable);
> - When I'm done with the temp table, can I just drop it as for a normal
> table?
Yes. DROP table works on temporary tables.
> - I think I read somewhere that I can't add an index to temporary tables.
> Is this true? I can't even have a primary key?
Yes, no indices nor primary keys.
> If so, I think the performance will still be intractable for
> thousands or millions of results in the temp table.
Why not let your temporary table be a normal table in your schema?
>
> Thanks,
> Jim
>
>
--
Bernt Marius Johnsen, Database Technology Group,
Staff Engineer, Technical Lead Derby/Java DB
Sun Microsystems, Trondheim, Norway
RE: slow subqueries
Posted by de...@segel.com.
This is an "ugly" work around to an existing bug.
AFAIK, the problem was when you used an IN clause in the inner select.
This is a known derby bug.
Going to a temp table, you probably won't see any improvement, and you don't
have the ability to put an index on the temp table. Unless you're returning
over 10K rows on the inner query, a simple table scan would be more
efficient. (At least for IDS et al. YMMV for Derby.)
> -----Original Message-----
> From: Jim Newsham [mailto:jnewsham@referentia.com]
> Sent: Saturday, November 11, 2006 2:07 PM
> To: 'Derby Discussion'
> Subject: RE: slow subqueries
>
>
>
> > > This is why a nested loop is not going to work here... 20,000 squared
> > > operations is very expensive, let alone millions squared. For a query
> > with
> > > this profile, the inner query should only be executed once.
> >
> > Perhaps you can get the behavior you desire by explicitly creating
> > a temporary table, selecting the data from your inner query into
> > the temporary table, then re-writing your main query to join against
> > the temporary table?
>
> I'd be willing to try that. Some questions come to mind:
>
> - Is there a way in sql to select the results of a query into a table?
> - When I'm done with the temp table, can I just drop it as for a normal
> table?
> - I think I read somewhere that I can't add an index to temporary tables.
> Is this true? I can't even have a primary key? If so, I think the
> performance will still be intractable for thousands or millions of results
> in the temp table.
>
> Thanks,
> Jim
>
RE: slow subqueries
Posted by Jim Newsham <jn...@referentia.com>.
> > This is why a nested loop is not going to work here... 20,000 squared
> > operations is very expensive, let alone millions squared. For a query
> with
> > this profile, the inner query should only be executed once.
>
> Perhaps you can get the behavior you desire by explicitly creating
> a temporary table, selecting the data from your inner query into
> the temporary table, then re-writing your main query to join against
> the temporary table?
I'd be willing to try that. Some questions come to mind:
- Is there a way in sql to select the results of a query into a table?
- When I'm done with the temp table, can I just drop it as for a normal
table?
- I think I read somewhere that I can't add an index to temporary tables.
Is this true? I can't even have a primary key? If so, I think the
performance will still be intractable for thousands or millions of results
in the temp table.
Thanks,
Jim
Re: slow subqueries
Posted by Bryan Pendleton <bp...@amberpoint.com>.
> This is why a nested loop is not going to work here... 20,000 squared
> operations is very expensive, let alone millions squared. For a query with
> this profile, the inner query should only be executed once.
Perhaps you can get the behavior you desire by explicitly creating
a temporary table, selecting the data from your inner query into
the temporary table, then re-writing your main query to join against
the temporary table?
thanks,
bryan
Re: slow subqueries
Posted by Army <qo...@gmail.com>.
Jim Newsham wrote:
>
> You previously mentioned a Derby property called
> derby.language.maxMemoryPerTable. I don't see it mentioned in the current
> documentation, but I found a jira issue which contains some proposed
> documentation (http://issues.apache.org/jira/browse/DERBY-1397). But it's
> still not entirely clear to me. Is this property only used when deciding
> whether to perform a hash join or nested loop join? Or does it also
> influence how much of the hash join is kept in memory, or when that hash
> join is spilled to disk?
Great question. My first thought was that the property *just* determined
whether or not to perform a hash join. But I did some looking around in the
code and was a bit surprised to see that the answer depends on whether or not
the inner result set of the hash join is a base table. If the inner result set
*is* a base table then the maxMemoryPerTable property does in fact determine
when we will spill to disk. For any other hash join, though, the determination
of when to spill over is based on the JVM memory size at the time the in-memory
hash table is created.
> Essentially, I'd like to know, if using the above query with optimizer
> overrides which force a hash join, what is the threshold for spilling a
> hash to disk, and is this a tunable parameter?
Since your queries involve hash joins with subqueries as the inner table, I
think this means that spill-over to disk occurs when the the size of the
in-memory result set reaches or exceeds one hundredth of the *current* memory in
the JVM. In the code this is coded as:
Runtime.getRuntime().totalMemory()/100;
Thus for your case, there does not appear to be a way to tune the threshold for
spilling the materialized subquery rows to disk. At least, no direct way; you
can of course change the start and max heap size for the JVM, which indirectly
affects the spill-over threshold. But I'm not sure that's what you were looking
for...
Army
RE: slow subqueries
Posted by Jim Newsham <jn...@referentia.com>.
> Army wrote:
>
> > In order to get Derby to recognize the equijoin predicate, the subquery
> > has to appear in the FROM list. So I created a view for the subquery:
>
> <snip view>
>
> > and then I changed the query to select and join with that view:
>
> <snip>
>
> > Working off the modified query above I used "optimizer overrides", which
> > were added as part of Derby 10.2, to 1) fix the join order so that the
> > subquery becomes the "inner" table, and 2) to force ("guarantee") a hash
> > join using the equijoin predicate, thereby allowing materialization of
> > the inner table (i.e. of the subquery). The query to do that is as
> > follows:
> >
> > select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
> > time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
> > where time.id = v_hash.id;
>
> Just one thing to mention as an afterthought: you do not have to create a
> separate view for this to work. I did that for the sake of clarity, but
> you
> could just as easily write:
>
> select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
> time,
> (select distinct
> time.id from time, double_sample, band
> where band.id = double_sample.fk_band_id
> and double_sample.fk_time_id = time.id and
> band.id in (11, 12)
> ) v_hash --DERBY-PROPERTIES joinStrategy=HASH
> where time.id = v_hash.id;
>
> Of course after writing all of that I realized that you said your queries
> are
> generated (as opposed to hand-written) and that they "are only
> representative of
> my problem. [...] While the above sample query can be easily modified to
> run
> well, I'm not sure how to do it in the general case."
>
> So maybe this isn't going to work for you, after all...
Hi Army,
Thanks for the equijoin/subquery example, it makes sense to me now.
Regarding the optimizer override, I should have no problem getting those
hints into my sql statement as I'm generating the sql statements myself, and
they all follow a similar structure. So on the contrary, your above idea
seems very feasible.
I have three possible workaround ideas, but I'm going to pursue this one
first, as it seems likely to work reasonably well while not taking too much
development time, which is a critical issue now. (The other ideas are to
simulate a subquery by iterating over two sorted queries in parallel; and to
use a temporary table for inner query results, using an equijoin with the
outer query).
You previously mentioned a Derby property called
derby.language.maxMemoryPerTable. I don't see it mentioned in the current
documentation, but I found a jira issue which contains some proposed
documentation (http://issues.apache.org/jira/browse/DERBY-1397). But it's
still not entirely clear to me. Is this property only used when deciding
whether to perform a hash join or nested loop join? Or does it also
influence how much of the hash join is kept in memory, or when that hash
join is spilled to disk? Essentially, I'd like to know, if using the above
query with optimizer overrides which force a hash join, what is the
threshold for spilling a hash to disk, and is this a tunable parameter?
Thanks,
Jim
Re: slow subqueries
Posted by Army <qo...@gmail.com>.
Army wrote:
> In order to get Derby to recognize the equijoin predicate, the subquery
> has to appear in the FROM list. So I created a view for the subquery:
<snip view>
> and then I changed the query to select and join with that view:
<snip>
> Working off the modified query above I used "optimizer overrides", which
> were added as part of Derby 10.2, to 1) fix the join order so that the
> subquery becomes the "inner" table, and 2) to force ("guarantee") a hash
> join using the equijoin predicate, thereby allowing materialization of
> the inner table (i.e. of the subquery). The query to do that is as
> follows:
>
> select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
> time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
> where time.id = v_hash.id;
Just one thing to mention as an afterthought: you do not have to create a
separate view for this to work. I did that for the sake of clarity, but you
could just as easily write:
select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
time,
(select distinct
time.id from time, double_sample, band
where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and
band.id in (11, 12)
) v_hash --DERBY-PROPERTIES joinStrategy=HASH
where time.id = v_hash.id;
Of course after writing all of that I realized that you said your queries are
generated (as opposed to hand-written) and that they "are only representative of
my problem. [...] While the above sample query can be easily modified to run
well, I'm not sure how to do it in the general case."
So maybe this isn't going to work for you, after all...
Army
Re: slow subqueries
Posted by Army <qo...@gmail.com>.
Jim Newsham wrote:
> Thanks for the tips... both on getting compilation time, and getting the
> plan without running the full query. Does the query plan really show up in
> the derby log?
It did for me, and I've always thought that was where it was supposed to go.
Maybe I'm missing something about your environment, though...
> To turn my query into an equijoin subquery, would I just change "where x in
> (...subquery...)" to "where x = (...subquery...)"? I believe I could do
> that above, if I change the outer query to return distinct results.
In Derby an equijoin predicate has to exist between two result sets in the FROM
list of the query. So I don't think changing "where x in " to "where x = " will
be enough.
I did some (very quick) playing with the following query that you posted in an
earlier thread:
select count(*) from time where time.id =
(select distinct
time.id from time, double_sample, band
where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and
band.id in (11, 12));
In order to get Derby to recognize the equijoin predicate, the subquery has to
appear in the FROM list. So I created a view for the subquery:
create view v_hash as
select distinct
time.id from time, double_sample, band
where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and
band.id in (11, 12);
and then I changed the query to select and join with that view:
select count(*) from time, v_hash where time.id = v_hash.id;
When I did that, Derby chose a query plan that put "time" as the inner table and
then did a nested loop join with a table scan on "time"--but that's not what you
wanted. Which brings us to your next question:
> But if I understand what you wrote below correctly, this is a required
> condition but not a guarantee that a hash join will be used. If the above
> transformation makes sense (including for performance) and there is a way to
> guarantee hash join materialization including caching to disk, I would love
> to hear about it.
Working off the modified query above I used "optimizer overrides", which were
added as part of Derby 10.2, to 1) fix the join order so that the subquery
becomes the "inner" table, and 2) to force ("guarantee") a hash join using the
equijoin predicate, thereby allowing materialization of the inner table (i.e. of
the subquery). The query to do that is as follows:
select count(*) from --DERBY-PROPERTIES joinOrder=FIXED
time, v_hash --DERBY-PROPERTIES joinStrategy=HASH
where time.id = v_hash.id;
Note that a "return" is required after each Derby override.
At that point the query plan showed a hash join in which the view "v_hash" was
materialized into memory--which is, I think, what you are looking for.
Are you working with 10.2, and if so would something like this work for you? I
did all of this with zero rows, so I have no idea how or if this will work for
tables with *millions* of rows in them. Might be worth a shot, though...
> In the end, we'd like to stick with Derby if we can, because it is pure Java
> and embedded, and this is a real big plus for deployment within a desktop
> application such as ours. I'm hoping that I can come up with a feasible
> workaround for the subquery performance issue we have, and hope that in the
> near future Derby can make my hack obsolete.
If we're lucky the above workaround will make things more bearable from a
performance standpoint. It's a bit of a pain in that you would have to rewrite
all of your queries, but I'll leave it up to you decide whether or not it's
worth it...
Army
RE: slow subqueries
Posted by Jim Newsham <jn...@referentia.com>.
Hi everyone, and thanks Army for your detailed response. My responses are
interspersed below...
> -----Original Message-----
> From: Army [mailto:qozinx@gmail.com]
> Sent: Friday, November 10, 2006 6:53 AM
> To: Derby Discussion
> Subject: Re: slow subqueries
>
> Jim Newsham wrote:
> > Why do queries with subqueries perform so poorly?
>
> This is one of the places where Derby optimization could definitely use
> some
> improvement. There are several know issues with the optimization of
> subqueries
> that can potentially lead to sub-optimal plans, including unreasonably
> high cost
> estimates (DERBY-1905, DERBY-1260), lack of an effective pruning algorithm
> (DERBY-1907), and a possibly insufficient "timeout" mechanism (DERBY-
> 1906).
> There has been work in this area to hopefully improve the situation for
> 10.2
> (DERBY-805, DERBY-781, DERBY-1007), but as you yourself have noticed, we
> have a
> long way to go.
>
> > The following query completes almost instantly. I suppose Derby
> simplifies
> > the subquery internally to "band.id = 11":
> >
> > ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> distinct
> > time.id from time, double_sample, band where band.id =
> > double_sample.fk_band_id and double_sample.fk_time_id = time.id and
> band.id
> > in (11));
> >
> > This query "does not complete":
> >
> > ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> distinct
> > time.id from time, double_sample, band where band.id =
> > double_sample.fk_band_id and double_sample.fk_time_id = time.id and
> band.id
> > in (11, 12));
>
> Using the DDL that you posted in an earlier email I ran these queries and
> they
> both completed in less than a second. Note, though, that I didn't have
> any data
> in the tables :) That may sound like useless information to you, but it
> tells
> me that the query compilation is at least completing in a reasonable
> amount of
> time, which is a first step.
>
> See http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
>
> In your case do you know how much time is being spent in compilation vs
> execution? The easiest way to tell this in ij is to prepare the query
> first,
> then execute it:
>
> ij> elapsedtime on;
> ij> prepare ps1 as 'select count(*) from time where time.id in (select
> distinct
> time.id from time, double_sample, band where band.id =
> double_sample.fk_band_id
> and double_sample.fk_time_id = time.id and band.id in (11))';
> ELAPSED TIME = 2344 milliseconds
> ij> execute ps1;
> 1
> -----------
> 0
>
> 1 row selected
> ELAPSED TIME = 60 milliseconds
> ij> prepare ps2 as 'select count(*) from time where time.id in (select
> distinct
> time.id from time, double_sample, band where band.id =
> double_sample.fk_band_id
> and double_sample.fk_time_id = time.id and band.id in (11, 12))';
> ELAPSED TIME = 301 milliseconds
> ij> execute ps2;
> 1
> -----------
> 0
>
> 1 row selected
> ELAPSED TIME = 10 milliseconds
>
> If you have a script or a Java program that loads data into your tables,
> and if
> you are willing and able to post that script/program to the list, then
> those who
> are following this discussion might be more easily able to reproduce the
> problem, which would help investigation. If you cannot or do not want to
> post
> such a script/program, then any description about the kind and amount of
> data in
> the relevant tables would be helpful, as well. For example, what value do
> you
> expect "count(*)" to return in these queries? How many rows does the
> subquery
> in the IN clause return? How many rows are in the various tables?
Unfortunately I don't have a data load script that I can release at the
moment. My biggest problem is that I'm working on something that needs to
have been done yesterday. I'd be happy to see performance enhancements in
Derby which apply to the cases I am using, but they won't come soon enough.
I need to find an interim solution. Though if I do have some time later
I'll try to put together a test schema, data set, and collection of queries
that I can release.
Both the inner query and the outer query may return a very large number of
records. In my test data set, this is approximately 20,000 each, although
in practice it can be a much larger number... perhaps millions or more.
This is why a nested loop is not going to work here... 20,000 squared
operations is very expensive, let alone millions squared. For a query with
this profile, the inner query should only be executed once. If it needs to
be cached, fine... but...
In my previous email I suggested the idea of Derby generating two internal
result sets (one for the outer, and one for the inner query) and then
scanning these two result sets in parallel to produce the external result
set. I know I don't really understand the internals of Derby... does this
even make sense? Is it feasible? It seems like the most efficient solution
if both the inner and outer queries have a very large number of results.
For this to work, the internal result sets would need to be sorted on the
column that they're related on. For the cases I'm working with, this would
always be a primary key column.
In fact, as a workaround to the performance problem I have, I'm very
seriously considering simulating my subqueries outside of Derby, by writing
a ResultSet implementation which merges the results of two other (Derby)
result sets iterated over in parallel as described above. Integrating this
into my other code is going to be a pretty ugly hack, and obviously it would
be much more efficient if Derby could do this internally.
Obviously, I'd like to hear about any flaws you see with this approach (both
within Derby and for my hack/workaround).
>
> > I've been searching/reading around and trying to understand a little
> about
> > how Derby works and how to check the statement plan. I've read through
> the
> > tuning document. Unfortunately if a query doesn't complete, I can't get
> the
> > statement plan.
>
> Actually, I think you only need to retrieve the first row in order to get
> the
> statement plan. For example, if you set
> "derby.language.logQueryPlan=true" and
> then run the following:
>
> ij> get cursor c1 as 'select * from time where time.id in (select distinct
> time.id from time, double_sample, band where band.id =
> double_sample.fk_band_id
> and double_sample.fk_time_id = time.id and band.id in (11, 12))';
> ij> next c1;
> ij> close c1;
> ij> exit;
>
> then look at derby.log, you should (hopefully) see the query plan.
>
> NOTE: I removed the "count(*)" from the query so that we can fetch one row
> at a
> time. If you do this, you should see the query plan in the log file after
> the
> first row has been retrieved. Of course, whether or not this will help
> depends
> on how long it takes to retrieve the first row--but it's worth a try.
Thanks for the tips... both on getting compilation time, and getting the
plan without running the full query. Does the query plan really show up in
the derby log? I was getting the plan in ij, which is not so convenient. I
suppose this is because I configured logging using a database-wide property
instead of a global property.
>
> > Is the entire inner query being re-evaluated for every candidate record
> in
> > the outer query?
>
> That's possible, yes. DERBY-781 made some changes to materialize a
> subquery by
> use of a hash join, but a hash join requires an equijoin predicate. For
> your
> query there is no explicit equijoin between "time" and the subquery, so I
> do not
> think DERBY-781 will help. Therefore it is quite possible that the
> subquery is
> in fact being re-evaluated multiple times, as you say.
To turn my query into an equijoin subquery, would I just change "where x in
(...subquery...)" to "where x = (...subquery...)"? I believe I could do
that above, if I change the outer query to return distinct results.
But if I understand what you wrote below correctly, this is a required
condition but not a guarantee that a hash join will be used. If the above
transformation makes sense (including for performance) and there is a way to
guarantee hash join materialization including caching to disk, I would love
to hear about it.
>
> > I read somewhere that inner queries may not be materialized if they are
> > larger than a certain threshold.
>
> When talking about hash joins, the estimated in-memory size of the inner
> table/query has to be less than derby.language.maxMemoryPerTable, where
> that
> property is specified in kilobytes and defaults to 1024 (i.e. default max
> mem
> per table is 1 meg). You can try setting that property to a larger value
> (within the constraints of JVM heap), but I do not know if that will help.
> It
> may make hash joins more feasible *within* the subquery, but I do not
> think it
> will allow materialization of the subquery itself since, as mentioned
> above,
> there is no explicit equijoin with the subquery.
>
> > If the inner result set is too large for memory, couldn't it be cached
> > to disk?
>
> Yes, it can. And if, for example, you set the
> derby.language.maxMemoryPerTable
> property to a value that is too large for memory, this "spill-over" to
> disk will
> in fact occur. I think there is an improvement filed somewhere to make it
> so
> that instead of "aborting" a hash join when the inner result set is too
> large,
> the optimizer should just make the join more expensive to account for the
> spill-to-disk i/o. But I don't think anyone has made that improvement
> yet...
>
> So in the end, there's probably not much information in this email that
> will
> tell you how to make your query run more quickly. As you can see, though,
> there
> is plenty of room for improvement in this particular area of Derby. If
> this is
> something that is exciting and/or essential for you, you should definitely
> feel
> free to join the development community and to help address the problem.
> We'd be
> thrilled to have you!
>
> Army
I appreciate your feedback, and there was plenty of useful information in
your response. I really wish I had the time to dig into Derby's source code
to see what it's doing, let alone spend time contributing improvements.
Unfortunately I'm working on borrowed time and have absolutely no time to
deviate.
In the end, we'd like to stick with Derby if we can, because it is pure Java
and embedded, and this is a real big plus for deployment within a desktop
application such as ours. I'm hoping that I can come up with a feasible
workaround for the subquery performance issue we have, and hope that in the
near future Derby can make my hack obsolete.
Thanks,
Jim Newsham
Re: slow subqueries
Posted by Army <qo...@gmail.com>.
Jim Newsham wrote:
> Why do queries with subqueries perform so poorly?
This is one of the places where Derby optimization could definitely use some
improvement. There are several know issues with the optimization of subqueries
that can potentially lead to sub-optimal plans, including unreasonably high cost
estimates (DERBY-1905, DERBY-1260), lack of an effective pruning algorithm
(DERBY-1907), and a possibly insufficient "timeout" mechanism (DERBY-1906).
There has been work in this area to hopefully improve the situation for 10.2
(DERBY-805, DERBY-781, DERBY-1007), but as you yourself have noticed, we have a
long way to go.
> The following query completes almost instantly. I suppose Derby simplifies
> the subquery internally to "band.id = 11":
>
> ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
> time.id from time, double_sample, band where band.id =
> double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
> in (11));
>
> This query "does not complete":
>
> ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
> time.id from time, double_sample, band where band.id =
> double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
> in (11, 12));
Using the DDL that you posted in an earlier email I ran these queries and they
both completed in less than a second. Note, though, that I didn't have any data
in the tables :) That may sound like useless information to you, but it tells
me that the query compilation is at least completing in a reasonable amount of
time, which is a first step.
See http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
In your case do you know how much time is being spent in compilation vs
execution? The easiest way to tell this in ij is to prepare the query first,
then execute it:
ij> elapsedtime on;
ij> prepare ps1 as 'select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and band.id in (11))';
ELAPSED TIME = 2344 milliseconds
ij> execute ps1;
1
-----------
0
1 row selected
ELAPSED TIME = 60 milliseconds
ij> prepare ps2 as 'select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and band.id in (11, 12))';
ELAPSED TIME = 301 milliseconds
ij> execute ps2;
1
-----------
0
1 row selected
ELAPSED TIME = 10 milliseconds
If you have a script or a Java program that loads data into your tables, and if
you are willing and able to post that script/program to the list, then those who
are following this discussion might be more easily able to reproduce the
problem, which would help investigation. If you cannot or do not want to post
such a script/program, then any description about the kind and amount of data in
the relevant tables would be helpful, as well. For example, what value do you
expect "count(*)" to return in these queries? How many rows does the subquery
in the IN clause return? How many rows are in the various tables?
> I've been searching/reading around and trying to understand a little about
> how Derby works and how to check the statement plan. I've read through the
> tuning document. Unfortunately if a query doesn't complete, I can't get the
> statement plan.
Actually, I think you only need to retrieve the first row in order to get the
statement plan. For example, if you set "derby.language.logQueryPlan=true" and
then run the following:
ij> get cursor c1 as 'select * from time where time.id in (select distinct
time.id from time, double_sample, band where band.id = double_sample.fk_band_id
and double_sample.fk_time_id = time.id and band.id in (11, 12))';
ij> next c1;
ij> close c1;
ij> exit;
then look at derby.log, you should (hopefully) see the query plan.
NOTE: I removed the "count(*)" from the query so that we can fetch one row at a
time. If you do this, you should see the query plan in the log file after the
first row has been retrieved. Of course, whether or not this will help depends
on how long it takes to retrieve the first row--but it's worth a try.
> Is the entire inner query being re-evaluated for every candidate record in
> the outer query?
That's possible, yes. DERBY-781 made some changes to materialize a subquery by
use of a hash join, but a hash join requires an equijoin predicate. For your
query there is no explicit equijoin between "time" and the subquery, so I do not
think DERBY-781 will help. Therefore it is quite possible that the subquery is
in fact being re-evaluated multiple times, as you say.
> I read somewhere that inner queries may not be materialized if they are
> larger than a certain threshold.
When talking about hash joins, the estimated in-memory size of the inner
table/query has to be less than derby.language.maxMemoryPerTable, where that
property is specified in kilobytes and defaults to 1024 (i.e. default max mem
per table is 1 meg). You can try setting that property to a larger value
(within the constraints of JVM heap), but I do not know if that will help. It
may make hash joins more feasible *within* the subquery, but I do not think it
will allow materialization of the subquery itself since, as mentioned above,
there is no explicit equijoin with the subquery.
> If the inner result set is too large for memory, couldn't it be cached
> to disk?
Yes, it can. And if, for example, you set the derby.language.maxMemoryPerTable
property to a value that is too large for memory, this "spill-over" to disk will
in fact occur. I think there is an improvement filed somewhere to make it so
that instead of "aborting" a hash join when the inner result set is too large,
the optimizer should just make the join more expensive to account for the
spill-to-disk i/o. But I don't think anyone has made that improvement yet...
So in the end, there's probably not much information in this email that will
tell you how to make your query run more quickly. As you can see, though, there
is plenty of room for improvement in this particular area of Derby. If this is
something that is exciting and/or essential for you, you should definitely feel
free to join the development community and to help address the problem. We'd be
thrilled to have you!
Army
slow subqueries (was: StackOverflowError)
Posted by Jim Newsham <jn...@referentia.com>.
Hi everyone,
The stack overflow problem went away after I upgraded to 10.2.1.6. There's
no longer an error running the query, however, the query is very slow. To
be honest, I haven't run the query to completion, as my computer makes a
prohibitively annoying sound when the CPU is pegged and I have to kill derby
after a number of minutes. But I know the query runs for many, many minutes
without completing. Why do queries with subqueries perform so poorly?
The following query completes almost instantly. I suppose Derby simplifies
the subquery internally to "band.id = 11":
ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id =
double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
in (11));
This query "does not complete":
ij(SAMPLEBASE)> select count(*) from time where time.id in (select distinct
time.id from time, double_sample, band where band.id =
double_sample.fk_band_id and double_sample.fk_time_id = time.id and band.id
in (11, 12));
I've been searching/reading around and trying to understand a little about
how Derby works and how to check the statement plan. I've read through the
tuning document. Unfortunately if a query doesn't complete, I can't get the
statement plan.
Is the entire inner query being re-evaluated for every candidate record in
the outer query? I read somewhere that inner queries may not be
materialized if they are larger than a certain threshold. But if the outer
query is also very large, then this seems a poor strategy. If the inner
result set is too large for memory, couldn't it be cached to disk?
But a much better optimization would be for Derby to notice that the outer
variable (time.id) and the result of the inner query (time.id) are both
indexed/unique (already sorted?), and then to internally generate two
sub-resultsets which are processed together in parallel, instead of using a
very expensive nested loop.
The above queries are only representative of my problem. I have a generic
API which uses a sql database under the covers. Sql queries are generated
dynamically. The queries only involve a small number of tables, but will
typically involve subqueries, unions, intersects. While the above sample
query can be easily modified to run well, I'm not sure how to do it in the
general case.
Any feedback or responses to the above comments are welcome.
Regards,
Jim
> -----Original Message-----
> From: Rajesh Kartha [mailto:kartha02@gmail.com]
> Sent: Monday, November 06, 2006 6:04 PM
> To: Derby Discussion
> Subject: Re: StackOverflowError
>
>
> Hello Jim,
>
> Is it possible to try this query out with the latest GA release 10.2.1.6 ?
>
> http://db.apache.org/derby/derby_downloads.html#Latest+Official+Release
>
> There have been some fixes in the related code generation areas for 10.2:
> DERBY-766, DERBY-1714 etc.
>
> Also, can post the schema that can reproduce this issue ?
>
> Regards,
> Rajesh
>
>
>
>
>
> Jim Newsham wrote:
>
> >Hi everyone,
> >
> >I thought the problem would go away if I gave Derby a better written
> query,
> >so I fixed my query generator to be a bit smarter. Unfortunately, I
> still
> >get a stack overflow error. Here's the new query:
> >
> >ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >from time, double_sample, band where band.id = double_sample.fk_band_id
> and
> >double_sample.fk_time_id = time.id and (band.id = 39 or band.id = 55));
> >1
> >-----------
> >ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >java.lang.StackOverflowError'.
> >
> >
> >This looks like an ugly bug for such a simple query. I didn't find any
> bug
> >in jira which seemed to relate to this. Is this a known bug?
> >
> >Any advice on how to work around the problem is appreciated.
> >
> >I'm using Derby 10.1.3.1.
> >
> >Thanks,
> >Jim
> >
> >P.S. I just included the outer query ("select count(*) from..") to
> >reproduce the problem in ij. My program actually uses jdbc, executes the
> >inner query, and calls ResultSet.last(). The result is the same, a
> >StackOverflowError. Here's the stack trace I get in my app:
> >
> >Caused by: org.apache.derby.client.am.SqlException: The exception
> >'java.lang.StackOverflowError' was thrown while evaluating an expression.
> >SQLSTATE: XJ001: Java exception: ': java.lang.StackOverflowError'.
> > at org.apache.derby.client.am.ResultSet.completeSqlca(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.parseFetchError(Unknown
> > Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.parseCNTQRYreply(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.readPositioningFetch(Unknow
> n
> >Source)
> > at
> >org.apache.derby.client.net.ResultSetReply.readPositioningFetch(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSet.readPositioningFetch_(Unknown
> >Source)
> > at org.apache.derby.client.am.ResultSet.getRowCount(Unknown
> Source)
> > at org.apache.derby.client.am.ResultSet.lastX(Unknown Source)
> > at org.apache.derby.client.am.ResultSet.last(Unknown Source)
> > at
> >com.referentia.sdf.monitor.samplebase.derby.QueryDataSet.getSize(QueryDat
> aSe
> >t.java:139)
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jim Newsham [mailto:jnewsham@referentia.com]
> >>Sent: Monday, November 06, 2006 11:21 AM
> >>To: 'Derby Discussion'
> >>Subject: StackOverflowError
> >>
> >>
> >>Any reason why I should get a stack overflow error with the following
> >>query?
> >>
> >>Yes, I know the query is a bit odd... it's not hand-written. The query
> >>generator could be optimized. Nevertheless... is the stack overflow
> here
> >>considered a bug or a limitation? If limitation, what specifically is
> the
> >>limitation?
> >>
> >>
> >>ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >>from time, double_sample, band where band.id = double_sample.fk_band_id
> >>and
> >>double_sample.fk_time_id = time.id and band.id = 57 union select id from
> >>time where 1 = 0);
> >>1
> >>-----------
> >>ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >>evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >>java.lang.StackOverflowError'.
> >>
> >>Thanks,
> >>Jim
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
>
RE: StackOverflowError
Posted by Jim Newsham <jn...@referentia.com>.
Thanks for the response, Rajesh. I'll try the latest version of Derby later
and report on my results. In the meantime, here's schema for the tables
involved.
-- a sample time
create table time (
id int not null generated always as identity primary key,
time timestamp not null unique
);
-- a node; nodes are hierarchically arranged structures which contain bands
create table node (
id int not null generated always as identity (start with
1),
name varchar(32) not null,
fk_parent_id int not null,
constraint node_pk primary key (id),
constraint node_unique unique (name, fk_parent_id),
constraint node_fk_parent foreign key (fk_parent_id) references node (id)
on delete cascade
);
-- insert root node with self as parent to satisfy "not null" constraint
insert into node (name, fk_parent_id) values ('', 1);
-- a band; an identifier for a series of data
create table band (
id int not null generated always as identity primary key,
fk_node_id int not null,
name varchar(32) not null,
type int not null,
constraint band_unique unique (fk_node_id, name, type),
constraint band_fk_node foreign key (fk_node_id) references node (id) on
delete cascade
);
-- a sample in a double band
create table double_sample (
fk_band_id int not null,
fk_time_id int not null,
value double not null,
constraint double_sample_unique unique (fk_band_id, fk_time_id),
constraint double_sample_fk_band foreign key (fk_band_id) references band
(id) on delete cascade,
constraint double_sample_fk_time foreign key (fk_time_id) references time
(id) on delete cascade
);
Regards,
Jim
> -----Original Message-----
> From: Rajesh Kartha [mailto:kartha02@gmail.com]
> Sent: Monday, November 06, 2006 6:04 PM
> To: Derby Discussion
> Subject: Re: StackOverflowError
>
>
> Hello Jim,
>
> Is it possible to try this query out with the latest GA release 10.2.1.6 ?
>
> http://db.apache.org/derby/derby_downloads.html#Latest+Official+Release
>
> There have been some fixes in the related code generation areas for 10.2:
> DERBY-766, DERBY-1714 etc.
>
> Also, can post the schema that can reproduce this issue ?
>
> Regards,
> Rajesh
>
>
>
>
>
> Jim Newsham wrote:
>
> >Hi everyone,
> >
> >I thought the problem would go away if I gave Derby a better written
> query,
> >so I fixed my query generator to be a bit smarter. Unfortunately, I
> still
> >get a stack overflow error. Here's the new query:
> >
> >ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >from time, double_sample, band where band.id = double_sample.fk_band_id
> and
> >double_sample.fk_time_id = time.id and (band.id = 39 or band.id = 55));
> >1
> >-----------
> >ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >java.lang.StackOverflowError'.
> >
> >
> >This looks like an ugly bug for such a simple query. I didn't find any
> bug
> >in jira which seemed to relate to this. Is this a known bug?
> >
> >Any advice on how to work around the problem is appreciated.
> >
> >I'm using Derby 10.1.3.1.
> >
> >Thanks,
> >Jim
> >
> >P.S. I just included the outer query ("select count(*) from..") to
> >reproduce the problem in ij. My program actually uses jdbc, executes the
> >inner query, and calls ResultSet.last(). The result is the same, a
> >StackOverflowError. Here's the stack trace I get in my app:
> >
> >Caused by: org.apache.derby.client.am.SqlException: The exception
> >'java.lang.StackOverflowError' was thrown while evaluating an expression.
> >SQLSTATE: XJ001: Java exception: ': java.lang.StackOverflowError'.
> > at org.apache.derby.client.am.ResultSet.completeSqlca(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.parseFetchError(Unknown
> > Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.parseCNTQRYreply(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSetReply.readPositioningFetch(Unknow
> n
> >Source)
> > at
> >org.apache.derby.client.net.ResultSetReply.readPositioningFetch(Unknown
> >Source)
> > at
> >org.apache.derby.client.net.NetResultSet.readPositioningFetch_(Unknown
> >Source)
> > at org.apache.derby.client.am.ResultSet.getRowCount(Unknown
> Source)
> > at org.apache.derby.client.am.ResultSet.lastX(Unknown Source)
> > at org.apache.derby.client.am.ResultSet.last(Unknown Source)
> > at
> >com.referentia.sdf.monitor.samplebase.derby.QueryDataSet.getSize(QueryDat
> aSe
> >t.java:139)
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jim Newsham [mailto:jnewsham@referentia.com]
> >>Sent: Monday, November 06, 2006 11:21 AM
> >>To: 'Derby Discussion'
> >>Subject: StackOverflowError
> >>
> >>
> >>Any reason why I should get a stack overflow error with the following
> >>query?
> >>
> >>Yes, I know the query is a bit odd... it's not hand-written. The query
> >>generator could be optimized. Nevertheless... is the stack overflow
> here
> >>considered a bug or a limitation? If limitation, what specifically is
> the
> >>limitation?
> >>
> >>
> >>ij(SAMPLEBASE)> select count(*) from time where time.id in (select
> time.id
> >>from time, double_sample, band where band.id = double_sample.fk_band_id
> >>and
> >>double_sample.fk_time_id = time.id and band.id = 57 union select id from
> >>time where 1 = 0);
> >>1
> >>-----------
> >>ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown
> while
> >>evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> >>java.lang.StackOverflowError'.
> >>
> >>Thanks,
> >>Jim
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
>
Re: StackOverflowError
Posted by Rajesh Kartha <ka...@gmail.com>.
Hello Jim,
Is it possible to try this query out with the latest GA release 10.2.1.6 ?
http://db.apache.org/derby/derby_downloads.html#Latest+Official+Release
There have been some fixes in the related code generation areas for 10.2:
DERBY-766, DERBY-1714 etc.
Also, can post the schema that can reproduce this issue ?
Regards,
Rajesh
Jim Newsham wrote:
>Hi everyone,
>
>I thought the problem would go away if I gave Derby a better written query,
>so I fixed my query generator to be a bit smarter. Unfortunately, I still
>get a stack overflow error. Here's the new query:
>
>ij(SAMPLEBASE)> select count(*) from time where time.id in (select time.id
>from time, double_sample, band where band.id = double_sample.fk_band_id and
>double_sample.fk_time_id = time.id and (band.id = 39 or band.id = 55));
>1
>-----------
>ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown while
>evaluating an expression. SQLSTATE: XJ001: Java exception: ':
>java.lang.StackOverflowError'.
>
>
>This looks like an ugly bug for such a simple query. I didn't find any bug
>in jira which seemed to relate to this. Is this a known bug?
>
>Any advice on how to work around the problem is appreciated.
>
>I'm using Derby 10.1.3.1.
>
>Thanks,
>Jim
>
>P.S. I just included the outer query ("select count(*) from..") to
>reproduce the problem in ij. My program actually uses jdbc, executes the
>inner query, and calls ResultSet.last(). The result is the same, a
>StackOverflowError. Here's the stack trace I get in my app:
>
>Caused by: org.apache.derby.client.am.SqlException: The exception
>'java.lang.StackOverflowError' was thrown while evaluating an expression.
>SQLSTATE: XJ001: Java exception: ': java.lang.StackOverflowError'.
> at org.apache.derby.client.am.ResultSet.completeSqlca(Unknown
>Source)
> at
>org.apache.derby.client.net.NetResultSetReply.parseFetchError(Unknown
> Source)
> at
>org.apache.derby.client.net.NetResultSetReply.parseCNTQRYreply(Unknown
>Source)
> at
>org.apache.derby.client.net.NetResultSetReply.readPositioningFetch(Unknown
>Source)
> at
>org.apache.derby.client.net.ResultSetReply.readPositioningFetch(Unknown
>Source)
> at
>org.apache.derby.client.net.NetResultSet.readPositioningFetch_(Unknown
>Source)
> at org.apache.derby.client.am.ResultSet.getRowCount(Unknown Source)
> at org.apache.derby.client.am.ResultSet.lastX(Unknown Source)
> at org.apache.derby.client.am.ResultSet.last(Unknown Source)
> at
>com.referentia.sdf.monitor.samplebase.derby.QueryDataSet.getSize(QueryDataSe
>t.java:139)
>
>
>
>
>>-----Original Message-----
>>From: Jim Newsham [mailto:jnewsham@referentia.com]
>>Sent: Monday, November 06, 2006 11:21 AM
>>To: 'Derby Discussion'
>>Subject: StackOverflowError
>>
>>
>>Any reason why I should get a stack overflow error with the following
>>query?
>>
>>Yes, I know the query is a bit odd... it's not hand-written. The query
>>generator could be optimized. Nevertheless... is the stack overflow here
>>considered a bug or a limitation? If limitation, what specifically is the
>>limitation?
>>
>>
>>ij(SAMPLEBASE)> select count(*) from time where time.id in (select time.id
>>from time, double_sample, band where band.id = double_sample.fk_band_id
>>and
>>double_sample.fk_time_id = time.id and band.id = 57 union select id from
>>time where 1 = 0);
>>1
>>-----------
>>ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown while
>>evaluating an expression. SQLSTATE: XJ001: Java exception: ':
>>java.lang.StackOverflowError'.
>>
>>Thanks,
>>Jim
>>
>>
>>
>>
>
>
>
>
>
>
RE: StackOverflowError
Posted by Jim Newsham <jn...@referentia.com>.
Hi everyone,
I thought the problem would go away if I gave Derby a better written query,
so I fixed my query generator to be a bit smarter. Unfortunately, I still
get a stack overflow error. Here's the new query:
ij(SAMPLEBASE)> select count(*) from time where time.id in (select time.id
from time, double_sample, band where band.id = double_sample.fk_band_id and
double_sample.fk_time_id = time.id and (band.id = 39 or band.id = 55));
1
-----------
ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown while
evaluating an expression. SQLSTATE: XJ001: Java exception: ':
java.lang.StackOverflowError'.
This looks like an ugly bug for such a simple query. I didn't find any bug
in jira which seemed to relate to this. Is this a known bug?
Any advice on how to work around the problem is appreciated.
I'm using Derby 10.1.3.1.
Thanks,
Jim
P.S. I just included the outer query ("select count(*) from..") to
reproduce the problem in ij. My program actually uses jdbc, executes the
inner query, and calls ResultSet.last(). The result is the same, a
StackOverflowError. Here's the stack trace I get in my app:
Caused by: org.apache.derby.client.am.SqlException: The exception
'java.lang.StackOverflowError' was thrown while evaluating an expression.
SQLSTATE: XJ001: Java exception: ': java.lang.StackOverflowError'.
at org.apache.derby.client.am.ResultSet.completeSqlca(Unknown
Source)
at
org.apache.derby.client.net.NetResultSetReply.parseFetchError(Unknown
Source)
at
org.apache.derby.client.net.NetResultSetReply.parseCNTQRYreply(Unknown
Source)
at
org.apache.derby.client.net.NetResultSetReply.readPositioningFetch(Unknown
Source)
at
org.apache.derby.client.net.ResultSetReply.readPositioningFetch(Unknown
Source)
at
org.apache.derby.client.net.NetResultSet.readPositioningFetch_(Unknown
Source)
at org.apache.derby.client.am.ResultSet.getRowCount(Unknown Source)
at org.apache.derby.client.am.ResultSet.lastX(Unknown Source)
at org.apache.derby.client.am.ResultSet.last(Unknown Source)
at
com.referentia.sdf.monitor.samplebase.derby.QueryDataSet.getSize(QueryDataSe
t.java:139)
> -----Original Message-----
> From: Jim Newsham [mailto:jnewsham@referentia.com]
> Sent: Monday, November 06, 2006 11:21 AM
> To: 'Derby Discussion'
> Subject: StackOverflowError
>
>
> Any reason why I should get a stack overflow error with the following
> query?
>
> Yes, I know the query is a bit odd... it's not hand-written. The query
> generator could be optimized. Nevertheless... is the stack overflow here
> considered a bug or a limitation? If limitation, what specifically is the
> limitation?
>
>
> ij(SAMPLEBASE)> select count(*) from time where time.id in (select time.id
> from time, double_sample, band where band.id = double_sample.fk_band_id
> and
> double_sample.fk_time_id = time.id and band.id = 57 union select id from
> time where 1 = 0);
> 1
> -----------
> ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown while
> evaluating an expression. SQLSTATE: XJ001: Java exception: ':
> java.lang.StackOverflowError'.
>
> Thanks,
> Jim
>
>
StackOverflowError
Posted by Jim Newsham <jn...@referentia.com>.
Any reason why I should get a stack overflow error with the following query?
Yes, I know the query is a bit odd... it's not hand-written. The query
generator could be optimized. Nevertheless... is the stack overflow here
considered a bug or a limitation? If limitation, what specifically is the
limitation?
ij(SAMPLEBASE)> select count(*) from time where time.id in (select time.id
from time, double_sample, band where band.id = double_sample.fk_band_id and
double_sample.fk_time_id = time.id and band.id = 57 union select id from
time where 1 = 0);
1
-----------
ERROR 38000: The exception 'java.lang.StackOverflowError' was thrown while
evaluating an expression. SQLSTATE: XJ001: Java exception: ':
java.lang.StackOverflowError'.
Thanks,
Jim
RE: does Derby has some form of performance problem ?
Posted by Michael Segel <ms...@segel.com>.
> -----Original Message-----
> From: Bernt.Johnsen@Sun.COM [mailto:Bernt.Johnsen@Sun.COM]
> Sent: Sunday, November 05, 2006 3:49 PM
> To: Derby Discussion
> Subject: Re: does Derby has some form of performance problem ?
>
> >>>>>>>>>>>> legolas wood wrote (2006-11-03 14:46:10):
> > Hi
> > Thank you for reading my post
> > i want to know whether this benchmark result is correct or not.
>
> All benchmarks are correct. You just have to figure out what the
> benchmark actually tests and then interpret the results in the proper
> context. It may or may not be relevant for your application.
>
[mjs]
All benchmarks are lies unless you do them yourselves.
In the database wars of the 90's, each company would find a hardware partner
and tweak the configuration to give them the best performance. Then there
was Oracle which wrote some internals specifically to enhance their OLTP
performance.
> When it comes to H2, I would recomend that you read page 43 in the H2
> documentation about durability: "This database does not guarantee that
> all committed transactions survive a power failure". Durability comes
> at a cost, it's as simple as that.
>
[mjs]
What suggests is that they consider the transaction complete when its
written to the buffer/cache prior to actual writing on the disk.
> > we are using derby as embedded database and it took a long time to start
> > the database.
> > we use a proper shutdown every time we are leaving the application.
> > does any one here used derby for real world jobs (embedded mode) that
> > database size growth to +100 MB ?
> >
> > benchmark result
> > http://www.h2database.com/html/frame.html?performance.html&main
> >
> > is it usual for derby to take 20-30 seconds to start ?
> > does startup time depend on database size ?
>
>
Start up time is going to be relative. And its "cheatable".
Re: does Derby has some form of performance problem ?
Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.
>>>>>>>>>>>> legolas wood wrote (2006-11-03 14:46:10):
> Hi
> Thank you for reading my post
> i want to know whether this benchmark result is correct or not.
All benchmarks are correct. You just have to figure out what the
benchmark actually tests and then interpret the results in the proper
context. It may or may not be relevant for your application.
When it comes to H2, I would recomend that you read page 43 in the H2
documentation about durability: "This database does not guarantee that
all committed transactions survive a power failure". Durability comes
at a cost, it's as simple as that.
> we are using derby as embedded database and it took a long time to start
> the database.
> we use a proper shutdown every time we are leaving the application.
> does any one here used derby for real world jobs (embedded mode) that
> database size growth to +100 MB ?
>
> benchmark result
> http://www.h2database.com/html/frame.html?performance.html&main
>
> is it usual for derby to take 20-30 seconds to start ?
> does startup time depend on database size ?
--
Bernt Marius Johnsen, Database Technology Group,
Staff Engineer, Technical Lead Derby/Java DB
Sun Microsystems, Trondheim, Norway
Re: does Derby has some form of performance problem ?
Posted by Olav Sandstå <ol...@gmail.com>.
On 04/11/06, legolas wood <le...@gmail.com> wrote:
> so it is not normal that it takes 30 seconds for a system with one 100MB
> database ?
No, it should not be normal that it takes 30 seconds to start Derby. I
have done a small test where I inserted 100MB of data into Derby and
did a proper shutdown. In a new JVM I started Derby again and measured
the time it took to load the embedded driver and getting the first
connection to the database, ie. the time the following two lines of
code takes:
Class.forName(driver).newInstance();
Connection conn = DriverManager.getConnection(jdbcUrl);
I repeated this experiment 10 times and in all cases it took between
1.0 seconds and 1.1 seconds to start Derby (to run the two lines
above). This was done a on Pentium 4 desktop PC.
I do not think the startup time should be much influenced by the size
of the database given that you do a proper shutdown.
Regards,
Olav
Re: does Derby has some form of performance problem ?
Posted by legolas wood <le...@gmail.com>.
Daniel Noll wrote:
> legolas wood wrote:
>> Can i ask how much time does it takes ?
>> so it is normal that our database take 30 second to boot up just with
>> 100 MB size ?
>
> 30 seconds sounds abnormal but we see 10 second startups all the time.
>
> In a way it's good, because it means when we run unit tests, we can
> fit in a whole second coffee. ;-)
>
> The other side of the coin is that when a user opens a case which
> happens to contain 20 databases, it may take a *very* long time. But
> at the same rate, HSQLDB really isn't capable of maintaining databases
> as big as we encounter anyway.
>
> Daniel
>
>
Thank you Daniel ,
so it is not normal that it takes 30 seconds for a system with one 100MB
database ?
Thanks
Re: does Derby has some form of performance problem ?
Posted by Daniel Noll <da...@nuix.com.au>.
legolas wood wrote:
> Can i ask how much time does it takes ?
> so it is normal that our database take 30 second to boot up just with
> 100 MB size ?
30 seconds sounds abnormal but we see 10 second startups all the time.
In a way it's good, because it means when we run unit tests, we can fit
in a whole second coffee. ;-)
The other side of the coin is that when a user opens a case which
happens to contain 20 databases, it may take a *very* long time. But at
the same rate, HSQLDB really isn't capable of maintaining databases as
big as we encounter anyway.
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/ Fax: +61 2 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
Re: does Derby has some form of performance problem ?
Posted by legolas wood <le...@gmail.com>.
Daniel Noll wrote:
> legolas wood wrote:
>> Hi
>> Thank you for reading my post
>> i want to know whether this benchmark result is correct or not.
>> we are using derby as embedded database and it took a long time to
>> start the database.
>> we use a proper shutdown every time we are leaving the application.
>> does any one here used derby for real world jobs (embedded mode) that
>> database size growth to +100 MB ?
>>
>> benchmark result
>> http://www.h2database.com/html/frame.html?performance.html&main
>>
>> is it usual for derby to take 20-30 seconds to start ?
>> does startup time depend on database size ?
>
> Derby does take very long to start up. Our database sizes are
> generally quite large (gigabytes), but even smaller databases take
> "too long" to start up, particularly comparing to HSQLDB.
>
> IIRC, someone said they were going to address this issue the last time
> it was brought up (may have been by me.)
>
> Daniel
>
> (...PS... This H2 thing is new to me. I wonder what it's like at
> handling tens of gigabytes of data.)
>
>
>
Can i ask how much time does it takes ?
so it is normal that our database take 30 second to boot up just with
100 MB size ?
thanks
Re: does Derby has some form of performance problem ?
Posted by Daniel Noll <da...@nuix.com.au>.
legolas wood wrote:
> Hi
> Thank you for reading my post
> i want to know whether this benchmark result is correct or not.
> we are using derby as embedded database and it took a long time to start
> the database.
> we use a proper shutdown every time we are leaving the application.
> does any one here used derby for real world jobs (embedded mode) that
> database size growth to +100 MB ?
>
> benchmark result
> http://www.h2database.com/html/frame.html?performance.html&main
>
> is it usual for derby to take 20-30 seconds to start ?
> does startup time depend on database size ?
Derby does take very long to start up. Our database sizes are generally
quite large (gigabytes), but even smaller databases take "too long" to
start up, particularly comparing to HSQLDB.
IIRC, someone said they were going to address this issue the last time
it was brought up (may have been by me.)
Daniel
(...PS... This H2 thing is new to me. I wonder what it's like at
handling tens of gigabytes of data.)
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/ Fax: +61 2 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.