You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "geoff hendrey (JIRA)" <ji...@apache.org> on 2010/05/27 04:37:36 UTC

[jira] Issue Comment Edited: (DERBY-4675) OFFSET/FETCH SYNTAX EFFICIENCY

    [ https://issues.apache.org/jira/browse/DERBY-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872030#action_12872030 ] 

geoff hendrey edited comment on DERBY-4675 at 5/26/10 10:36 PM:
----------------------------------------------------------------

Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns).

2010-05-26 18:17:34.966 GMT Thread[qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT
"r"."PK" AS "r.PK",
	'<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic",
	"r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE",
	"r"."pic_LENGTH" AS "r.pic_LENGTH",
	"r"."pic_FID" AS "r.pic_FID",
	"r"."title" AS "r.title",
	"r"."date" AS "r.date",
	"r"."public" AS "r.public"
FROM
	"GEOFF__PHOTOS__USER_PIC" AS "r"
OFFSET
	 40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 10
	constructor time (milliseconds) = 0
	open time (milliseconds) = 0
	next time (milliseconds) = 0
	close time (milliseconds) = 0
	optimizer estimated row count:           82.00
	optimizer estimated cost:       207956.54

Source result set:
	Row Count (1):
	Number of opens = 1
	Rows seen = 10
	Rows filtered = 40
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		optimizer estimated row count:           82.00
		optimizer estimated cost:       207956.54

	Source result set:
		Project-Restrict ResultSet (3):
		Number of opens = 1
		Rows seen = 50
		Rows filtered = 0
		restriction = false
		projection = true
			constructor time (milliseconds) = 0
			open time (milliseconds) = 0
			next time (milliseconds) = 0
			close time (milliseconds) = 0
			restriction time (milliseconds) = 0
			projection time (milliseconds) = 0
			optimizer estimated row count:           82.00
			optimizer estimated cost:       207956.54

		Source result set:
			Table Scan ResultSet for GEOFF__PHOTOS__USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer
			Number of opens = 1
			Rows seen = 50
			Rows filtered = 0
			Fetch Size = 16
				constructor time (milliseconds) = 0
				open time (milliseconds) = 0
				next time (milliseconds) = 0
				close time (milliseconds) = 0
				next time in milliseconds/row = 0

			scan information:
				Bit set of columns fetched={0, 2, 3, 4, 5, 6, 7}
				Number of columns fetched=7
				Number of pages visited=58
				Number of rows qualified=64
				Number of rows visited=67
				Scan type=heap
				start position:
					null
				stop position:
					null
				qualifiers:
					None
				optimizer estimated row count:           82.00
				optimizer estimated cost:       207956.54


Sorry, I was wrong. I did not have an index covering all the columns and I only noticed it when I looked at the dump above. pic_CONTENT_TYPE and pic_LENGTH are not covered. I removed the columns not covered by the index and the query is fast. I also tried forcing the optimizer to use the PK index with "--DERBY-PROPERTIES index=GEOFF__PHOTOS__USER__PK" but that didn't speed it up, whereas just making sure I had an index covering all the columns did. I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that?

      was (Author: geoff_hendrey):
    Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns).

2010-05-26 18:17:34.966 GMT Thread[qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT
"r"."PK" AS "r.PK",
	'<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic",
	"r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE",
	"r"."pic_LENGTH" AS "r.pic_LENGTH",
	"r"."pic_FID" AS "r.pic_FID",
	"r"."title" AS "r.title",
	"r"."date" AS "r.date",
	"r"."public" AS "r.public"
FROM
	"GEOFF__PHOTOS__USER_PIC" AS "r"
OFFSET
	 40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 10
	constructor time (milliseconds) = 0
	open time (milliseconds) = 0
	next time (milliseconds) = 0
	close time (milliseconds) = 0
	optimizer estimated row count:           82.00
	optimizer estimated cost:       207956.54

Source result set:
	Row Count (1):
	Number of opens = 1
	Rows seen = 10
	Rows filtered = 40
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		optimizer estimated row count:           82.00
		optimizer estimated cost:       207956.54

	Source result set:
		Project-Restrict ResultSet (3):
		Number of opens = 1
		Rows seen = 50
		Rows filtered = 0
		restriction = false
		projection = true
			constructor time (milliseconds) = 0
			open time (milliseconds) = 0
			next time (milliseconds) = 0
			close time (milliseconds) = 0
			restriction time (milliseconds) = 0
			projection time (milliseconds) = 0
			optimizer estimated row count:           82.00
			optimizer estimated cost:       207956.54

		Source result set:
			Table Scan ResultSet for GEOFF__PHOTOS__USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer
			Number of opens = 1
			Rows seen = 50
			Rows filtered = 0
			Fetch Size = 16
				constructor time (milliseconds) = 0
				open time (milliseconds) = 0
				next time (milliseconds) = 0
				close time (milliseconds) = 0
				next time in milliseconds/row = 0

			scan information:
				Bit set of columns fetched={0, 2, 3, 4, 5, 6, 7}
				Number of columns fetched=7
				Number of pages visited=58
				Number of rows qualified=64
				Number of rows visited=67
				Scan type=heap
				start position:
					null
				stop position:
					null
				qualifiers:
					None
				optimizer estimated row count:           82.00
				optimizer estimated cost:       207956.54
  
> OFFSET/FETCH SYNTAX EFFICIENCY
> ------------------------------
>
>                 Key: DERBY-4675
>                 URL: https://issues.apache.org/jira/browse/DERBY-4675
>             Project: Derby
>          Issue Type: Improvement
>    Affects Versions: 10.6.1.0
>         Environment: unix
>            Reporter: geoff hendrey
>
> Using OFFSET and FETCH isn't any faster for paging than using existing JDBC methods:
> //p is page number, n is rows per page
> setMaxRows(n*p);
> setFetchSize(n);
> //...
> rs.absolute((p-1)*n);
> When used on a table with blobs, both the JDBC way, and the OFFSET/FETCH way are unexpectedly slow ('unexpectedly' because I am not even retrieving the BLOB column).
> I need a way to do paging that doesn't slow down proportionately to the page I am requesting. It must also maintain this performance on tables with blobs. Following is discussion from users mail list, year or so ago.
> > 2) what are the performance implications for users of the embedded
> > driver? In particular, with the embedded driver I am hoping that
> > this feature allows portions of a result set to be retrieved without
> > the overhead of retrieving the entire result set. For example, if I
> I am afraid that with embedded driver, you will only save a little CPU
> (by avoiding some JDBC calls) since under the hood, the code siphons
> off the rows till it hits the offset, so if you have a large offset,
> you will still incur reading of those rows (modulo page caching). In
> client/server driver context the savings are larger, of course, in
> that fewer rows are sent over the wire. For simple queries that can
> use an index, the optimizer could make use of the offset information
> to avoid reading the entire row when skipping rows before offset, just
> counting rows in the index to get to the first qualifying row, but
> this optimization is not yet implemented.
> Often, this feature is used together with ORDER BY which would entail
> some sorting of the result set and then all the rows would have to be
> read anyway. Again, for some simple queries, sort avoidance is used by
> the optimizer, so optimization is still possible for for such queries.
> If you think this optimization is an important capability feel free to
> file an improvement issue for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.