You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Craig Russell (JIRA)" <de...@db.apache.org> on 2005/09/20 20:16:46 UTC

[jira] Created: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Modify SQL to skip N rows of the result and return the next M rows
------------------------------------------------------------------

         Key: DERBY-581
         URL: http://issues.apache.org/jira/browse/DERBY-581
     Project: Derby
        Type: New Feature
  Components: Unknown  
 Environment: All
    Reporter: Craig Russell
    Priority: Minor

I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.

I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.

Craig

On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:

Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
Regards,
Ali

Mike Matrigali <mi...@sbcglobal.net> wrote:
As craig points out it is important in performance testing to say
exactly what you are measuring. In general Derby will try to
stream rows to the user before it has finished looking at all rows.
So often looking at the first row will and stopping will mean that
many rows have not been processed. BUT when an order by is involved
and the query plan either has no appropriate matching index, or decides
to use a different index then all the rows are processed, then they are
sent to the sorter and finally after all rows are processed they are
streamed to the client.

So as you have seen reading the first 1000 rows of a much larger data
set can happen very quickly.

As subsequent mail threads have pointed out, returning the top 1000
sorted rows is an interesting problem which could be costed and executed
differently if that information was pushed into the optimizer and the
sorter (and medium level projects were done in those areas).

> On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> 
> 
> 
> I have observed some interesting query performance behavior and am
> hoping someone here can explain. 
> 
> In my scenario, it appears that an existing index is not being used for
> the 'order by' part of the operation and as a result the perfo rmance of
> certain queries is suffering. Can someone explain if this is supposed
> to be what is happening and why? Please see below for the specific
> queries and their performance characteristics. 
> 
> Here are the particulars:
> 
> ---------------------------------
> 
> create table orders(
> 
> order_id varchar(50) NOT NULL
> 
> CONSTRAINT ORDERS_PK PRIMARY KEY,
> 
> amount numeric(31,2),
> 
> time date,
> 
> inv_num varchar(50),
> 
> line_num varchar(50),
> 
> phone varchar(50),
> 
> prod_num varchar(50));
> > --Load a large amount of data (720,000 records) into the 'orders' table
> > 
> --Create an index on the time column as that will be used i n the 'where'
> clause.
> 
> create index IX_ORDERS_TIME on orders(time);
> > 
> --When I run a query against this table returning top 1,000 records,
> this query returns very quickly, consistently less than .010 seconds.
>> 
>>
>> select * from orders
>>
>> where time > '10/01/2002' and time < '11/30/2002'
>>
>> order by time;
>>
>> --Now run a similarly query against same table, returning the top
>> 1,000 records.
>>
>> --The difference is that the results are now sorted by the primary key
>> ('order_id') rather than 'time'. 
>>
>> --This query returns slowly, approximately 15 seconds. Why??
>>
>> select * from orders
>>
>> where time > '10/01/2002' and time < '11/30/2002'
>>
>> order by order_id;
>>
>> --Now run a third query against the same 'orders' table, removing the
>> where clause
>>
>> --This query returns quickly, around .010 seconds. 
>>
>> 
>>
>> select * from orders
>>
>> order by order_id;
>>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Rick Hillegas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557353#action_12557353 ] 

Rick Hillegas commented on DERBY-581:
-------------------------------------

Hi Jaco,

It looks as though Bryan and Thomas are working on this issue through the subtasks listed above: DERBY-2965, DERBY-2998, DERBY-3002, and DERBY-3050. It looks as though the first subtask is finished. I believe that Bryan and Thomas intend to finish the other subtasks for the 10.4 release. See http://wiki.apache.org/db-derby/DerbyTenFourRelease

Hope this helps,
-Rick

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513910 ] 

Bryan Pendleton commented on DERBY-581:
---------------------------------------

I've started a Wiki page to collect information: http://wiki.apache.org/db-derby/OLAPOperations


> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645816#action_12645816 ] 

A B commented on DERBY-581:
---------------------------

> Is there actually a way to order and limit at the same time? There does not seem to be.
>> I'm afraid there's no way to do that currently.

Well, there is *one* way to do this, at least for simple queries, though it's far from intuitive.  I don't know if this really usable for your environment, but note the following quote from a comment on DERBY-2998:

"[The rows returned from the subquery have no guaranteed ordering (Derby doesn't allow ORDER BY in subqueries), and thus any predicate which restricts based on row_number() will restrict the rows based on an *undefined* order. Since the order of the rows from the subquery may depend on the presence of indexes, the set of rows which survives a row_order()-based restriction may depend on the indexes, as well. In the end I do _not_ think this is a bug--but it does strike me as a probable point of confusion for users. It seems that anyone who wants "the first x rows only" has to either accept the fact that "first" does not imply "ordered" (and thus results can vary depending on what conglomerate the optimizer chooses), or else s/he has to use optimizer ovverides to force the optimizer to use an index which is ordered on the desired columns."

So given that, I think you could try something like:

create table testing (i int, title varchar(40), author varchar(20));
insert into testing values (5, 'title_5', 'author_5');
insert into testing values (4, 'title_4', 'author_4');
insert into testing values (1, 'title_1', 'inigo');
insert into testing values (3, 'title_3', 'author_3');
insert into testing values (2, 'title_2', 'montoya');

-- Your original query; subquery results aren't ordered so you don't know
-- which rows you'll actually get back.

SELECT * FROM (
  SELECT ROW_NUMBER() OVER () as rownum, testing.*
    FROM testing
    WHERE title is not null
  ) AS tmp
  WHERE rownum >= 2 and rownum < 4;

ROWNUM              |I          |TITLE                         |AUTHOR
-----------------------------------------------------------------------------
2                   |4          |title_4                       |author_4
3                   |1          |title_1                       |inigo

-- Create an index whose ordering matches the ordering you want for your subquery.
create index ix_title_asc on testing(title asc);

-- Now use optimizer overrides to force the index for the subquery.  That will force
-- the subquery results to come back in index order, which then means your limit
-- using row_number gives you predictable results.

SELECT * FROM (
  SELECT ROW_NUMBER() OVER () as rownum, testing.*
    FROM testing --DERBY-PROPERTIES index=ix_title_asc
    WHERE title is not null
  ) AS tmp
  WHERE rownum >= 2 and rownum < 4;

ROWNUM              |I          |TITLE                         |AUTHOR
-----------------------------------------------------------------------------
2                   |2          |title_2                       |montoya
3                   |3          |title_3                       |author_3

As I said, it's not at all intuitive and it's probably not a viable option for complicated queries.  But I thought I'd post it just in case it proves useful...

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Aaron Zeckoski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644977#action_12644977 ] 

Aaron Zeckoski commented on DERBY-581:
--------------------------------------

The example shown above does not seem to actually work on Derby 10.4.2:

This test query works fine (2 results out of 5):
SELECT * FROM testing WHERE title is not null order by title

SELECT * FROM ( 
SELECT ROW_NUMBER() OVER (ORDER BY title asc) as rownum, columns FROM testing WHERE title is not null
) AS tmp WHERE rownum >= 2 and rownum < 4

Syntax error: Encountered "ORDER" at line 2, column 27.

Removing the ORDER results in working SQL again:
SELECT * FROM ( 
SELECT ROW_NUMBER() OVER () as rownum, testing.* FROM testing WHERE title is not null
) AS tmp WHERE rownum >= 2 and rownum < 4;

However, attempting to make this work with an order by is hopeless:
SELECT * FROM ( 
SELECT ROW_NUMBER() OVER () as rownum, testing.* FROM testing WHERE title is not null ORDER BY title
) AS tmp WHERE rownum >= 2 and rownum < 4

Syntax error: Encountered "ORDER" at line 2, column 87.

Is there actually a way to order and limit at the same time? There does not seem to be.

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Jaco Bester (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555289#action_12555289 ] 

Jaco Bester commented on DERBY-581:
-----------------------------------

So, who is in control of this ISSUE!!! Can someone please push this issue to be resolved, PLEASE!!! Great pain to build software solutions for structural weaknesses, which leads to bad software design!!!  This issue is now more than 2 years old, and fundamentally one of the most important! Just tell me who I should sleep with  ;-)

I would appreciate some ACTION!!!

Thanx

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton reassigned DERBY-581:
-------------------------------------

    Assignee: Bryan Pendleton

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514248 ] 

Bryan Pendleton commented on DERBY-581:
---------------------------------------

I'm considering filing 3 new sub-task JIRA entries, to track the first (of what may be many) sub-projects to investigate specific parts of the OLAP operations features:
1) null ordering (NULLS FIRST / NULLS LAST in the ORDER BY clause)
2) GROUPING function (returns 1 if this row is one whose values are the results of aggregation over the specified column reference(s) during a grouped query, and 0 otherwise)
3) ROW_NUMBER function

I picked these three as they seemed like incremental, bite-sized features which can be implemented independently and will help us start to get our feet wet with the OLAP operations features.


> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Christian d'Heureuse (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514337 ] 

Christian d'Heureuse commented on DERBY-581:
--------------------------------------------

Bryan, there is already a JIRA entry for NULLS FIRST/LAST: DERBY-2887

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696080#action_12696080 ] 

Knut Anders Hatlen commented on DERBY-581:
------------------------------------------

Since all sub-tasks of this issue have been marked as resolved, and since DERBY-4079 introduced OFFSET/FETCH which provides the functionality requested here, I suggest that we mark this issue as resolved. Any objections?

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645773#action_12645773 ] 

Knut Anders Hatlen commented on DERBY-581:
------------------------------------------

> Is there actually a way to order and limit at the same time? There does not seem to be.

I'm afraid there's no way to do that currently. The implementation of ROW_NUMBER in 10.4 only supports an empty window clause, so you cannot specify OVER (ORDER BY title ASC) yet. Also, ORDER BY in nested SELECT statements is not allowed by the SQL standard, so no help there.

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Rick Hillegas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696097#action_12696097 ] 

Rick Hillegas commented on DERBY-581:
-------------------------------------

Hi Knut,

I think there's more work which could be done to improve our FETCH/OFFSET support (such as adding ORDER BY to subselects), but I agree that this particular issue can be closed. Thanks.

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen resolved DERBY-581.
--------------------------------------

       Resolution: Fixed
    Fix Version/s: 10.5.0.0

Resolving the issue with fix-version 10.5.0.0.

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>             Fix For: 10.5.0.0
>
>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Posted by "Andrew McIntyre (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-581?page=all ]

Andrew McIntyre updated DERBY-581:
----------------------------------

    Component: SQL
                   (was: Unknown)

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>          Key: DERBY-581
>          URL: http://issues.apache.org/jira/browse/DERBY-581
>      Project: Derby
>         Type: New Feature
>   Components: SQL
>  Environment: All
>     Reporter: Craig Russell
>     Priority: Minor

>
> I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <mi...@sbcglobal.net> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira