You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "geoff hendrey (JIRA)" <ji...@apache.org> on 2010/05/27 04:37:36 UTC
[jira] Issue Comment Edited: (DERBY-4675) OFFSET/FETCH SYNTAX
EFFICIENCY
[ https://issues.apache.org/jira/browse/DERBY-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872030#action_12872030 ]
geoff hendrey edited comment on DERBY-4675 at 5/26/10 10:36 PM:
----------------------------------------------------------------
Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns).
2010-05-26 18:17:34.966 GMT Thread[qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT
"r"."PK" AS "r.PK",
'<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic",
"r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE",
"r"."pic_LENGTH" AS "r.pic_LENGTH",
"r"."pic_FID" AS "r.pic_FID",
"r"."title" AS "r.title",
"r"."date" AS "r.date",
"r"."public" AS "r.public"
FROM
"GEOFF__PHOTOS__USER_PIC" AS "r"
OFFSET
40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 10
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 10
Rows filtered = 40
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 50
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Table Scan ResultSet for GEOFF__PHOTOS__USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 50
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 2, 3, 4, 5, 6, 7}
Number of columns fetched=7
Number of pages visited=58
Number of rows qualified=64
Number of rows visited=67
Scan type=heap
start position:
null
stop position:
null
qualifiers:
None
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Sorry, I was wrong. I did not have an index covering all the columns and I only noticed it when I looked at the dump above. pic_CONTENT_TYPE and pic_LENGTH are not covered. I removed the columns not covered by the index and the query is fast. I also tried forcing the optimizer to use the PK index with "--DERBY-PROPERTIES index=GEOFF__PHOTOS__USER__PK" but that didn't speed it up, whereas just making sure I had an index covering all the columns did. I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that?
was (Author: geoff_hendrey):
Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns).
2010-05-26 18:17:34.966 GMT Thread[qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT
"r"."PK" AS "r.PK",
'<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic",
"r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE",
"r"."pic_LENGTH" AS "r.pic_LENGTH",
"r"."pic_FID" AS "r.pic_FID",
"r"."title" AS "r.title",
"r"."date" AS "r.date",
"r"."public" AS "r.public"
FROM
"GEOFF__PHOTOS__USER_PIC" AS "r"
OFFSET
40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 10
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 10
Rows filtered = 40
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 50
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
Source result set:
Table Scan ResultSet for GEOFF__PHOTOS__USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 50
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 2, 3, 4, 5, 6, 7}
Number of columns fetched=7
Number of pages visited=58
Number of rows qualified=64
Number of rows visited=67
Scan type=heap
start position:
null
stop position:
null
qualifiers:
None
optimizer estimated row count: 82.00
optimizer estimated cost: 207956.54
> OFFSET/FETCH SYNTAX EFFICIENCY
> ------------------------------
>
> Key: DERBY-4675
> URL: https://issues.apache.org/jira/browse/DERBY-4675
> Project: Derby
> Issue Type: Improvement
> Affects Versions: 10.6.1.0
> Environment: unix
> Reporter: geoff hendrey
>
> Using OFFSET and FETCH isn't any faster for paging than using existing JDBC methods:
> //p is page number, n is rows per page
> setMaxRows(n*p);
> setFetchSize(n);
> //...
> rs.absolute((p-1)*n);
> When used on a table with blobs, both the JDBC way, and the OFFSET/FETCH way are unexpectedly slow ('unexpectedly' because I am not even retrieving the BLOB column).
> I need a way to do paging that doesn't slow down proportionately to the page I am requesting. It must also maintain this performance on tables with blobs. Following is discussion from users mail list, year or so ago.
> > 2) what are the performance implications for users of the embedded
> > driver? In particular, with the embedded driver I am hoping that
> > this feature allows portions of a result set to be retrieved without
> > the overhead of retrieving the entire result set. For example, if I
> I am afraid that with embedded driver, you will only save a little CPU
> (by avoiding some JDBC calls) since under the hood, the code siphons
> off the rows till it hits the offset, so if you have a large offset,
> you will still incur reading of those rows (modulo page caching). In
> client/server driver context the savings are larger, of course, in
> that fewer rows are sent over the wire. For simple queries that can
> use an index, the optimizer could make use of the offset information
> to avoid reading the entire row when skipping rows before offset, just
> counting rows in the index to get to the first qualifying row, but
> this optimization is not yet implemented.
> Often, this feature is used together with ORDER BY which would entail
> some sorting of the result set and then all the rows would have to be
> read anyway. Again, for some simple queries, sort avoidance is used by
> the optimizer, so optimization is still possible for for such queries.
> If you think this optimization is an important capability feel free to
> file an improvement issue for it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.