You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by "Hohl, Gerrit" <g....@aurenz.de> on 2021/01/04 14:43:08 UTC

Paging performance problem

Hello everyone,

we're trying to use OFFSET and LIMIT for paging. But as our tables our growing, we're running in some performance problems we didn't expect.

Let's say we have the following structure (the table can also have more columns, but for the sake of simplicity...):

CREATE TABLE data (id INTEGER NOT NULL, create_timestamp TIMESTAMP NOT NULL, PRIMARY(id));
CREATE INDEX data_create_timestamp_key ON data (create_timestamp ASC);

We have around 5m records in it. And now we do the two following queries:

SELECT * FROM data ORDER BY create_timestamp OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
SELECT * FROM data ORDER BY create_timestamp OFFSET 4499990 ROWS FETCH NEXT 10 ROWS ONLY;

While the first query returns right after it was started, the 2nd query takes almost a minute to complete.
First we thought that Derby might not have used the index for some reasons.
But the runtime statistics showed that it is using it.

We assume that it is due to the fact that the index might be some kind of unbalanced binary tree.
Is that true? And is there any work-around?

Gruß
Gerrit

P.S.: We still use Apache Derby 10.14.2.0 as we're still tight to Java 8.

AW: AW: AW: Paging performance problem

Posted by "Hohl, Gerrit" <g....@aurenz.de>.

Hi Rick,

good to know. 😉

Regards
Gerrit

-----Ursprüngliche Nachricht-----
Von: Rick Hillegas <ri...@gmail.com> 
Gesendet: Dienstag, 5. Januar 2021 20:00
An: Derby Discussion <de...@db.apache.org>; Hohl, Gerrit <g....@aurenz.de>
Betreff: Re: AW: AW: Paging performance problem

Hi Gerrit,

I'm glad that you found a satisfactory answer to your performance question. For the record, I can find my away around query plans printed in German.

Cheers,
-Rick

On 1/5/21 12:31 AM, Hohl, Gerrit wrote:
> Hi Rick,
>
>
> unfortunately I had to realize that the output of the query plans are printed by Derby in German.
> And I'm not quiet sure how to change that.
>
> But we maybe found the reason: The slower version uses the "HEAP" (based on the runtime statistics) to get the result while the faster version uses only the index ("projection").
>
> The reason behind it seems that the slower version loads records from the table (I guess that is what HEAP means - loading records into memory).
> It gets the create_timestamp from the index, but any other column - even if it is only the "id" - from the table.
> It even doesn't matter much - at least in the described case - if we only query for the "id" or also for the other columns.
>
> The faster version uses a "projection" and works only on the index, no access on the table needed.
> If we put the faster version into JOIN like I described in my mail earlier we then load only the needed records from the table.
>
> I hope I could explain the magic behind it a little bit, so helps others who may come across similar issues.
>
>
> Best regards,
> Gerrit
>
> -----Ursprüngliche Nachricht-----
> Von: Rick Hillegas <ri...@gmail.com>
> Gesendet: Montag, 4. Januar 2021 16:43
> An: Derby Discussion <de...@db.apache.org>; Hohl, Gerrit 
> <g....@aurenz.de>
> Betreff: Re: AW: Paging performance problem
>
> Hi Gerrit,
>
> It's hard to say without seeing the query plans for these scenarios.
> What query plans do you see when you follow the instructions in the "Working with RunTimeStatistics" section of the Derby Tuning Guide:
> http://db.apache.org/derby/docs/10.15/tuning/ctundepth13055.html
>
> -Rick
>
> On 1/4/21 7:08 AM, Hohl, Gerrit wrote:
>> Hello everyone,
>>
>> I guess I found at least one solution:
>>
>> CREATE INDEX data_create_timestamp_id_key ON data (create_timestamp 
>> ASC, id ASC)
>>
>> SELECT d.* FROM (SELECT id FROM data ORDER BY create_timestamp OFFSET
>> 4499990 ROWS FETCH NEXT 10 ROWS ONLY) as tmp, data d WHERE (tmp.id =
>> d.id)
>>
>> Takes less than 4 seconds with my test table.
>> But I'm wondering if that is really the best solution, especially as I'm still not sure about the reason.
>>
>> Isn't that new index I created also an unbalanced binary tree? Shouldn't it take the same amount of time?
>> But if I drop it and have only the primary key and data_create_timestamp_key index, the query takes even longer than my originally one (100s ).
>>
>> In addition: I passed a simply SQL query. Wondering if I also can recreate that with JPA/JPQL...
>>
>> Regards,
>> Gerrit
>>
>> Von: Hohl, Gerrit
>> Gesendet: Montag, 4. Januar 2021 15:43
>> An: Derby Discussion <de...@db.apache.org>
>> Betreff: Paging performance problem
>>
>> Hello everyone,
>>
>> we're trying to use OFFSET and LIMIT for paging. But as our tables our growing, we're running in some performance problems we didn't expect.
>>
>> Let's say we have the following structure (the table can also have more columns, but for the sake of simplicity...):
>>
>> CREATE TABLE data (id INTEGER NOT NULL, create_timestamp TIMESTAMP 
>> NOT NULL, PRIMARY(id)); CREATE INDEX data_create_timestamp_key ON 
>> data (create_timestamp ASC);
>>
>> We have around 5m records in it. And now we do the two following queries:
>>
>> SELECT * FROM data ORDER BY create_timestamp OFFSET 0 ROWS FETCH NEXT
>> 10 ROWS ONLY; SELECT * FROM data ORDER BY create_timestamp OFFSET
>> 4499990 ROWS FETCH NEXT 10 ROWS ONLY;
>>
>> While the first query returns right after it was started, the 2nd query takes almost a minute to complete.
>> First we thought that Derby might not have used the index for some reasons.
>> But the runtime statistics showed that it is using it.
>>
>> We assume that it is due to the fact that the index might be some kind of unbalanced binary tree.
>> Is that true? And is there any work-around?
>>
>> Gruß
>> Gerrit
>>
>> P.S.: We still use Apache Derby 10.14.2.0 as we're still tight to Java 8.
>>

AW: AW: Paging performance problem

Posted by "Hohl, Gerrit" <g....@aurenz.de>.

Hi Rick,

unfortunately I had to realize that the output of the query plans are printed by Derby in German.
And I'm not quiet sure how to change that.

But we maybe found the reason: The slower version uses the "HEAP" (based on the runtime statistics) to get the result while the faster version uses only the index ("projection").

The reason behind it seems that the slower version loads records from the table (I guess that is what HEAP means - loading records into memory).
It gets the create_timestamp from the index, but any other column - even if it is only the "id" - from the table.
It even doesn't matter much - at least in the described case - if we only query for the "id" or also for the other columns.

The faster version uses a "projection" and works only on the index, no access on the table needed.
If we put the faster version into JOIN like I described in my mail earlier we then load only the needed records from the table.

I hope I could explain the magic behind it a little bit, so helps others who may come across similar issues.

Best regards,
Gerrit

-----Ursprüngliche Nachricht-----
Von: Rick Hillegas <ri...@gmail.com> 
Gesendet: Montag, 4. Januar 2021 16:43
An: Derby Discussion <de...@db.apache.org>; Hohl, Gerrit <g....@aurenz.de>
Betreff: Re: AW: Paging performance problem

Hi Gerrit,

It's hard to say without seeing the query plans for these scenarios. 
What query plans do you see when you follow the instructions in the "Working with RunTimeStatistics" section of the Derby Tuning Guide: 
http://db.apache.org/derby/docs/10.15/tuning/ctundepth13055.html

-Rick

On 1/4/21 7:08 AM, Hohl, Gerrit wrote:
> Hello everyone,
>
> I guess I found at least one solution:
>
> CREATE INDEX data_create_timestamp_id_key ON data (create_timestamp 
> ASC, id ASC)
>
> SELECT d.* FROM (SELECT id FROM data ORDER BY create_timestamp OFFSET 
> 4499990 ROWS FETCH NEXT 10 ROWS ONLY) as tmp, data d WHERE (tmp.id = 
> d.id)
>
> Takes less than 4 seconds with my test table.
> But I'm wondering if that is really the best solution, especially as I'm still not sure about the reason.
>
> Isn't that new index I created also an unbalanced binary tree? Shouldn't it take the same amount of time?
> But if I drop it and have only the primary key and data_create_timestamp_key index, the query takes even longer than my originally one (100s ).
>
> In addition: I passed a simply SQL query. Wondering if I also can recreate that with JPA/JPQL...
>
> Regards,
> Gerrit
>
> Von: Hohl, Gerrit
> Gesendet: Montag, 4. Januar 2021 15:43
> An: Derby Discussion <de...@db.apache.org>
> Betreff: Paging performance problem
>
> Hello everyone,
>
> we're trying to use OFFSET and LIMIT for paging. But as our tables our growing, we're running in some performance problems we didn't expect.
>
> Let's say we have the following structure (the table can also have more columns, but for the sake of simplicity...):
>
> CREATE TABLE data (id INTEGER NOT NULL, create_timestamp TIMESTAMP NOT 
> NULL, PRIMARY(id)); CREATE INDEX data_create_timestamp_key ON data 
> (create_timestamp ASC);
>
> We have around 5m records in it. And now we do the two following queries:
>
> SELECT * FROM data ORDER BY create_timestamp OFFSET 0 ROWS FETCH NEXT 
> 10 ROWS ONLY; SELECT * FROM data ORDER BY create_timestamp OFFSET 
> 4499990 ROWS FETCH NEXT 10 ROWS ONLY;
>
> While the first query returns right after it was started, the 2nd query takes almost a minute to complete.
> First we thought that Derby might not have used the index for some reasons.
> But the runtime statistics showed that it is using it.
>
> We assume that it is due to the fact that the index might be some kind of unbalanced binary tree.
> Is that true? And is there any work-around?
>
> Gruß
> Gerrit
>
> P.S.: We still use Apache Derby 10.14.2.0 as we're still tight to Java 8.
>

Re: AW: Paging performance problem

Posted by Rick Hillegas <ri...@gmail.com>.

Hi Gerrit,

It's hard to say without seeing the query plans for these scenarios. 
What query plans do you see when you follow the instructions in the 
"Working with RunTimeStatistics" section of the Derby Tuning Guide: 
http://db.apache.org/derby/docs/10.15/tuning/ctundepth13055.html

-Rick

On 1/4/21 7:08 AM, Hohl, Gerrit wrote:
> Hello everyone,
>
> I guess I found at least one solution:
>
> CREATE INDEX data_create_timestamp_id_key ON data (create_timestamp ASC, id ASC)
>
> SELECT d.* FROM (SELECT id FROM data ORDER BY create_timestamp OFFSET 4499990 ROWS FETCH NEXT 10 ROWS ONLY) as tmp, data d
> WHERE (tmp.id = d.id)
>
> Takes less than 4 seconds with my test table.
> But I'm wondering if that is really the best solution, especially as I'm still not sure about the reason.
>
> Isn't that new index I created also an unbalanced binary tree? Shouldn't it take the same amount of time?
> But if I drop it and have only the primary key and data_create_timestamp_key index, the query takes even longer than my originally one (100s ).
>
> In addition: I passed a simply SQL query. Wondering if I also can recreate that with JPA/JPQL...
>
> Regards,
> Gerrit
>
> Von: Hohl, Gerrit
> Gesendet: Montag, 4. Januar 2021 15:43
> An: Derby Discussion <de...@db.apache.org>
> Betreff: Paging performance problem
>
> Hello everyone,
>
> we're trying to use OFFSET and LIMIT for paging. But as our tables our growing, we're running in some performance problems we didn't expect.
>
> Let's say we have the following structure (the table can also have more columns, but for the sake of simplicity...):
>
> CREATE TABLE data (id INTEGER NOT NULL, create_timestamp TIMESTAMP NOT NULL, PRIMARY(id));
> CREATE INDEX data_create_timestamp_key ON data (create_timestamp ASC);
>
> We have around 5m records in it. And now we do the two following queries:
>
> SELECT * FROM data ORDER BY create_timestamp OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
> SELECT * FROM data ORDER BY create_timestamp OFFSET 4499990 ROWS FETCH NEXT 10 ROWS ONLY;
>
> While the first query returns right after it was started, the 2nd query takes almost a minute to complete.
> First we thought that Derby might not have used the index for some reasons.
> But the runtime statistics showed that it is using it.
>
> We assume that it is due to the fact that the index might be some kind of unbalanced binary tree.
> Is that true? And is there any work-around?
>
> Gruß
> Gerrit
>
> P.S.: We still use Apache Derby 10.14.2.0 as we're still tight to Java 8.
>

AW: Paging performance problem

Posted by "Hohl, Gerrit" <g....@aurenz.de>.

Hello everyone,

I guess I found at least one solution:

CREATE INDEX data_create_timestamp_id_key ON data (create_timestamp ASC, id ASC)

SELECT d.* FROM (SELECT id FROM data ORDER BY create_timestamp OFFSET 4499990 ROWS FETCH NEXT 10 ROWS ONLY) as tmp, data d
WHERE (tmp.id = d.id)

Takes less than 4 seconds with my test table.
But I'm wondering if that is really the best solution, especially as I'm still not sure about the reason.

Isn't that new index I created also an unbalanced binary tree? Shouldn't it take the same amount of time?
But if I drop it and have only the primary key and data_create_timestamp_key index, the query takes even longer than my originally one (100s ).

In addition: I passed a simply SQL query. Wondering if I also can recreate that with JPA/JPQL...

Regards,
Gerrit

Von: Hohl, Gerrit
Gesendet: Montag, 4. Januar 2021 15:43
An: Derby Discussion <de...@db.apache.org>
Betreff: Paging performance problem

Hello everyone,

we're trying to use OFFSET and LIMIT for paging. But as our tables our growing, we're running in some performance problems we didn't expect.

Let's say we have the following structure (the table can also have more columns, but for the sake of simplicity...):

CREATE TABLE data (id INTEGER NOT NULL, create_timestamp TIMESTAMP NOT NULL, PRIMARY(id));
CREATE INDEX data_create_timestamp_key ON data (create_timestamp ASC);

We have around 5m records in it. And now we do the two following queries:

SELECT * FROM data ORDER BY create_timestamp OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
SELECT * FROM data ORDER BY create_timestamp OFFSET 4499990 ROWS FETCH NEXT 10 ROWS ONLY;

While the first query returns right after it was started, the 2nd query takes almost a minute to complete.
First we thought that Derby might not have used the index for some reasons.
But the runtime statistics showed that it is using it.

We assume that it is due to the fact that the index might be some kind of unbalanced binary tree.
Is that true? And is there any work-around?

Gruß
Gerrit

P.S.: We still use Apache Derby 10.14.2.0 as we're still tight to Java 8.