You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by John English <jo...@gmail.com> on 2021/09/25 14:39:30 UTC
Using indexes
I asked something like this some months ago, but am still not getting
anywhere...
I have a table used for logging events, which current contains just
under 400,000 rows:
CREATE TABLE system_log (
id INTEGER GENERATED ALWAYS AS IDENTITY,
time TIMESTAMP DEFAULT NULL,
username VARCHAR(15),
name VARCHAR(520),
facility VARCHAR(15) NOT NULL,
event VARCHAR(31) NOT NULL,
details VARCHAR(32000),
CONSTRAINT systemlog_pk PRIMARY KEY (id)
);
I generally want to display most recent events first, so have defined an
index for the purpose:
CREATE INDEX log_index ON system_log (id DESC);
I execute the following query:
SELECT id,DateTimeFormat(time,null) AS
t_time,name,username,facility,event,details
FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY;
(DateTimeFormat is a simple function that converts a timestamp into
something human-readable like "25-Sep-2021 at 12:30".)
A timing test showed that it takes 6.348 seconds to execute this. If I
order by id ASC, it takes 0.124 seconds.
I used SYSCS_SET_RUNTIMESTATISTICS(1) and
SYSCS_SET_STATISTICS_TIMING(1), and got a dump showing that it's doing
an external sort and not using the index using ORDER BY id DESC, but
using the PK index for ORDER BY id ASC (see below)
Why isn't the first query using the index I created, and what can I do
to persuade Derby to use it?
Rick Hillegas mentioned "covering indexes", but thanks to the large
"details" column I was unable to create such a beast. Would a CLOB be
any better?
TIA,
--
John English
-------------------------------------------------------------------------
Using ORDER BY id DESC:
Statement Text:
SELECT id,DateTimeFormat(time,null) AS
t_time,name,username,facility,event,details
FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY
Parse Time: 20
Bind Time: 1
Optimize Time: 1
Generate Time: 1
Compile Time: 23
Execute Time: 6305
Begin Compilation Timestamp : 2021-09-25 17:15:19.947
End Compilation Timestamp : 2021-09-25 17:15:19.97
Begin Execution Timestamp : 2021-09-25 17:15:19.971
End Execution Timestamp : 2021-09-25 17:15:26.295
Statement Execution Plan Text:
Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 20
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 0
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 18
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388218
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
Number of merge runs=11
Number of rows input=388218
Number of rows output=388218
Size of merge runs=[32767, 32767, 32767, 32767, 32767, 32767, 32767,
32767, 32767, 32767, 32767]
Sort type=external
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 18
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 388218
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 3881
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 1434
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation
level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 388218
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 2423
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1, 2, 3, 4, 7, 8}
Number of columns fetched=7
Number of pages visited=2661
Number of rows qualified=388218
Number of rows visited=388404
Scan type=heap
start position:
null
stop position:
null
qualifiers:
None
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
-------------------------------------------------------------------------
Using ORDER BY id ASC:
Statement Name:
null
Statement Text:
SELECT id,DateTimeFormat(time,null) AS
t_time,name,username,facility,event,details
FROM system_log
ORDER BY id ASC
NULLS LAST
FETCH FIRST 20 ROWS ONLY
Parse Time: 24
Bind Time: 1
Optimize Time: 1
Generate Time: 1
Compile Time: 27
Execute Time: 123
Begin Compilation Timestamp : 2021-09-25 17:14:57.128
End Compilation Timestamp : 2021-09-25 17:14:57.155
Begin Execution Timestamp : 2021-09-25 17:20:31.615
End Execution Timestamp : 2021-09-25 17:20:31.738
Statement Execution Plan Text:
Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 20
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 123
close time (milliseconds) = 0
optimizer estimated row count: 388223.00
optimizer estimated cost: 1035759.04
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 123
close time (milliseconds) = 0
optimizer estimated row count: 388223.00
optimizer estimated cost: 1035759.04
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 123
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 1
optimizer estimated row count: 388223.00
optimizer estimated cost: 1035759.04
Source result set:
Index Row to Base Row ResultSet for SYSTEM_LOG:
Number of opens = 1
Rows seen = 20
Columns accessed from heap = {0, 1, 2, 3, 4, 7, 8}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 122
close time (milliseconds) = 0
optimizer estimated row count: 388223.00
optimizer estimated cost: 1035759.04
Index Scan ResultSet for SYSTEM_LOG using constraint SYSTEMLOG_PK at
read uncommitted isolation level using share row locking chosen by the
optimizer
Number of opens = 1
Rows seen = 20
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 121
close time (milliseconds) = 0
next time in milliseconds/row = 6
scan information:
Bit set of columns fetched={1}
Number of columns fetched=1
Number of deleted rows visited=94
Number of pages visited=1187
Number of rows qualified=20
Number of rows visited=114
Scan type=btree
Tree height=3
start position:
None
stop position:
None
qualifiers:
None
optimizer estimated row count: 388223.00
optimizer estimated cost: 1035759.04
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by John English <jo...@gmail.com>.
On Tue, 28 Sep 2021, 17:52 Rick Hillegas, <ri...@gmail.com> wrote:
> Glad that you have made progress on this puzzle. One comment inline...
>
> On 9/28/21 4:10 AM, John English wrote:
> > The WHERE clause seems to prevent the index from being used.
> The index is not usable in this query because username is not the
> leading column in the index.
>
Ah, right.
>
Re: Using indexes
Posted by Rick Hillegas <ri...@gmail.com>.
Glad that you have made progress on this puzzle. One comment inline...
On 9/28/21 4:10 AM, John English wrote:
> A couple more data points, from testing different variants of the
> inner select:
>
> 1) SELECT time FROM system_log
> ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;
>
> 419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at
> read uncommitted isolation level using share row locking chosen by the
> optimizer"
>
> 2) SELECT time FROM system_log
> WHERE username='foo'
> ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;
>
> 2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted
> isolation level using share row locking chosen by the optimizer"
>
> Removing "FETCH NEXT 20 ROWS ONLY":
>
> 1) 388,237 rows in 3859ms, otherwise same as above (uses index).
> 2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index).
>
> The WHERE clause seems to prevent the index from being used.
The index is not usable in this query because username is not the
leading column in the index.
Re: Using indexes
Posted by John English <jo...@gmail.com>.
A couple more data points, from testing different variants of the inner
select:
1) SELECT time FROM system_log
ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;
419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at
read uncommitted isolation level using share row locking chosen by the
optimizer"
2) SELECT time FROM system_log
WHERE username='foo'
ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;
2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted
isolation level using share row locking chosen by the optimizer"
Removing "FETCH NEXT 20 ROWS ONLY":
1) 388,237 rows in 3859ms, otherwise same as above (uses index).
2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index).
The WHERE clause seems to prevent the index from being used.
--
John English
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by John English <jo...@gmail.com>.
On 27/09/2021 18:23, John English wrote:
> I'll try the temporary table approach as soon as I get a minute, and
> will let you know what happens.
I'm finally getting somewhere. Having (a) moved the row counting out to
a separate query, (b) renamed columns in the table to match what my code
requires so I can access the table directly instead of using a view, and
(c) using your (Rick's) original idea of a nested select (which I
decided to try again before going for a temporary table), I get an
execution time of 455ms (woo-hoo!) for this query:
SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details
FROM system_log,
(SELECT time FROM system_log
ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x
WHERE system_log.time=x.time;
However, if I add in a WHERE clause:
SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details
FROM system_log,
(SELECT time FROM system_log
WHERE username='foo'
ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x
WHERE system_log.time=x.time;
the execution time goes up to 2 seconds (I'm still happy!). The left
result set (the nested select, by the look of it) still doesn't use the
index:
Left result set:
Row Count (2):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
...
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 23211
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
Number of rows input=23211
Number of rows output=23211
Sort type=internal
...
optimizer estimated row count: 38824.20
optimizer estimated cost: 399695.58
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 23211
Rows filtered = 0
restriction = false
projection = true
...
Source result set:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted
isolation level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 23211
but the right result set uses the PK index.
So, I'm in a much better position as the result of Rick's advice, but I
still have no idea why the descending PK index isn't ever used!!!
--
John English
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by John English <jo...@gmail.com>.
On 26/09/2021 22:57, Rick Hillegas wrote:
> The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer
> isn't smart enough to know that the subquery returns only 20 small rows.
>
> What happens if you dump the results of the subquery into a temporary
> table and then join that with system_log?
Thanks, I'll give it a whirl. Meanwhile,I've got it down to about 5 or 6
seconds by (a) eliminating the view in favour of a direct table access,
and (b) using a separate SELECT COUNT(*) to get the row count, again
using the table directly and without using an ORDER BY clause. (I
renamed the id and time columns to time and t_time respectively, so the
names match what the code expects to get back from the view, and then
recreated the index.)
It still doesn't seem to be using the index (see trace below) but it's
an improvement. Sorting by time ASC use the PK index and takes about
200ms, but time DESC doesn't use an index and takes about 3s. And I
still don't understand why it's ignoring the index.
I'll try the temporary table approach as soon as I get a minute, and
will let you know what happens.
Thanks for the help,
--
John English
----------------------------------------------------
Statement Text:
SELECT DateTimeFormat(t_time,null) AS
t_time,facility,event,details,name,username FROM system_log
ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY
Parse Time: 19
Bind Time: 1
Optimize Time: 1
Generate Time: 0
Compile Time: 21
Execute Time: 4474
Begin Compilation Timestamp : 2021-09-27 17:52:23.802
End Compilation Timestamp : 2021-09-27 17:52:23.823
Begin Execution Timestamp : 2021-09-27 17:52:23.824
End Execution Timestamp : 2021-09-27 17:52:28.298
Statement Execution Plan Text:
Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 20
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388231
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
Number of rows input=388231
Number of rows output=388231
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 388231
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4232
close time (milliseconds) = 3
restriction time (milliseconds) = 0
projection time (milliseconds) = 1807
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation
level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 388231
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 2390
close time (milliseconds) = 3
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1, 2, 3, 4, 7, 8}
Number of columns fetched=7
Number of pages visited=2666
Number of rows qualified=388231
Number of rows visited=388417
Scan type=heap
start position:
null
stop position:
null
qualifiers:
None
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by Rick Hillegas <ri...@gmail.com>.
On 9/25/21 11:59 AM, John English wrote:
> On 25/09/2021 21:14, Rick Hillegas wrote:
>> On 9/25/21 7:39 AM, John English wrote:
>>> SELECT id,DateTimeFormat(time,null) AS
>>> t_time,name,username,facility,event,details
>>> FROM system_log
>>> ORDER BY id DESC
>>> NULLS LAST
>>> FETCH FIRST 20 ROWS ONLY;
>>
>> I can remember whether you tried to rewrite the query to use a
>> subquery. Something like this:
>>
>> SELECT id, time AS t_time,name,username,facility,event,details
>> FROM
>> system_log s,
>> (
>> SELECT id AS log_id
>> FROM system_log
>> ORDER BY id DESC
>> NULLS LAST
>> FETCH FIRST 20 ROWS ONLY
>> ) t
>> WHERE s.id = t.log_id
>> ;
>>
>> Does that help?
>>
>> -Rick
>
> No, when I tried it, it made it even worse if anything. The subquery
> still fetches all 400,000 rows, sorts them without using the index,
> and then uses the results to select again in the outer query. I really
> don't understand why the index is ignored.
>
The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer
isn't smart enough to know that the subquery returns only 20 small rows.
What happens if you dump the results of the subquery into a temporary
table and then join that with system_log?
Re: Using indexes
Posted by John English <jo...@gmail.com>.
On 25/09/2021 21:14, Rick Hillegas wrote:
> On 9/25/21 7:39 AM, John English wrote:
>> SELECT id,DateTimeFormat(time,null) AS
>> t_time,name,username,facility,event,details
>> FROM system_log
>> ORDER BY id DESC
>> NULLS LAST
>> FETCH FIRST 20 ROWS ONLY;
>
> I can remember whether you tried to rewrite the query to use a subquery.
> Something like this:
>
> SELECT id, time AS t_time,name,username,facility,event,details
> FROM
> system_log s,
> (
> SELECT id AS log_id
> FROM system_log
> ORDER BY id DESC
> NULLS LAST
> FETCH FIRST 20 ROWS ONLY
> ) t
> WHERE s.id = t.log_id
> ;
>
> Does that help?
>
> -Rick
No, when I tried it, it made it even worse if anything. The subquery
still fetches all 400,000 rows, sorts them without using the index, and
then uses the results to select again in the outer query. I really don't
understand why the index is ignored.
--
John English
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by John English <jo...@gmail.com>.
On 25/09/2021 21:14, Rick Hillegas wrote:
> SELECT id, time AS t_time,name,username,facility,event,details
> FROM
> system_log s,
> (
> SELECT id AS log_id
> FROM system_log
> ORDER BY id DESC
> NULLS LAST
> FETCH FIRST 20 ROWS ONLY
> ) t
> WHERE s.id = t.log_id
> ;
Ideally I want to access the table through a view, defined like this:
CREATE VIEW system_log_view AS
SELECT time AS t_time,
facility,
event,
details,
name,
username,
id AS time
FROM system_log
(The renamings may seem strange but they serve a purpose!)
Selecting the ids directly from the table takes 0.4s:
SELECT id FROM system_log
ORDER BY id DESC NULLS LAST
FETCH FIRST 20 ROWS ONLY;
Going through the view takes much longer (2.9s):
SELECT time FROM system_log_view
ORDER BY time DESC NULLS LAST
FETCH FIRST 20 ROWS ONLY;
It seems like the view is materializing all the rows, and then 20 rows
are being selected from the 400,000:
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388226
Rows returned = 20
In any case I want to know the total number of rows, even though I
display them in pages of 20 at a time. I thought I might get away with
using a separate query for this, but no:
SELECT COUNT(*) FROM system_log; -- 0.271s
SELECT COUNT(*) FROM system_log_view; -- 2.184s
I don't understand why the view is so much slower than accessing the
table directly. The direct table access says this:
Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at read
uncommitted isolation level using share row locking chosen by the optimizer
The corresponding line from executing the view doesn't mention the index:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level
using share row locking chosen by the optimizer
What am I not understanding about views here?
--
John English
--
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Using indexes
Posted by Rick Hillegas <ri...@gmail.com>.
On 9/25/21 7:39 AM, John English wrote:
> SELECT id,DateTimeFormat(time,null) AS
> t_time,name,username,facility,event,details
> FROM system_log
> ORDER BY id DESC
> NULLS LAST
> FETCH FIRST 20 ROWS ONLY;
I can remember whether you tried to rewrite the query to use a subquery.
Something like this:
SELECT id, time AS t_time,name,username,facility,event,details
FROM
system_log s,
(
SELECT id AS log_id
FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY
) t
WHERE s.id = t.log_id
;
Does that help?
-Rick