You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Tars Joris (JIRA)" <ji...@apache.org> on 2008/10/29 09:58:44 UTC

[jira] Created: (DERBY-3926) Incorrect ORDER BY caused by index

Incorrect ORDER BY caused by index
----------------------------------

                 Key: DERBY-3926
                 URL: https://issues.apache.org/jira/browse/DERBY-3926
             Project: Derby
          Issue Type: Bug
          Components: SQL
    Affects Versions: 10.4.2.0
            Reporter: Tars Joris
         Attachments: derby-reproduce.zip

I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.

I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.

The output of the attached script is as follows (results should be ordered on the middle column):
ID                  |VALUE        |VALUE
----------------------------------------------
2147483653          |000002       |21857
2147483654          |000003       |21857
4294967297          |000001       |21857

While I would expect:
ID                  |VALUE        |VALUE
----------------------------------------------
4294967297          |000001       |21857
2147483653          |000002       |21857
2147483654          |000003       |21857

This is the definition:
CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
CREATE INDEX key1 ON table1(id);
CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
CREATE UNIQUE INDEX key2 ON table2(id, name);
CREATE INDEX key3 ON table2(value);

This is the query:
SELECT table1.id, m0.value, m1.value
FROM table1, table2 m0, table2 m1
WHERE table1.id=m0.id
AND m0.name='PageSequenceId'
AND table1.id=m1.id
AND m1.name='PostComponentId'
AND m1.value='21857'
ORDER BY m0.value;

The bug can be reproduced by just executing the attached script with the ij-tool.
Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.

See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.

Michael Segel pointed out:
"It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705726#action_12705726 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Mike, thanks for your time on this jira. Yes, your understanding of my description is correct. I think the key here is that the outer table(table3) is returning more than one row and each one of those row is requiring us to look at the middle table (table2) which results into 3 scans on table2. So even though, table2.value has index on it, it is not helping in this case because of 3 different scans on table2. If it were just one scan on table2, then the index table2.value would have returned the rows to us in proper order. The 3 different scans would require a sorting on them to return the rows in sorted order.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Tars Joris (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701827#action_12701827 ] 

Tars Joris commented on DERBY-3926:
-----------------------------------

I will find out if the test code can be granted to ASF.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699721#action_12699721 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

Thanks Army for taking the time to look at this, and for pointing us to a good theory!

I like Army's suggestion. I wish I had a clearer grasp on the concept of a 'partial join order'. I think that
this is an intermediate stage during optimization, where the optimizer has so far chosen an ordering
for some, but not yet all, of the tables. It looks like the code is trying to handle the problem of how
to make a sort avoidance check at this intermediate point in optimization.

I think that all *chosen* query plans eventually reach a *complete* join order, but a query plan which
is *discarded* due to being too expensive may never get beyond a *partial* join order. Is that true?

It seems to me that there may be two topics here:

1) it would be valid to make such a check for the purposes of doing some preliminary costing of 
the join order up to this point, but for such an algorithm, the code would have to then re-visit the 
sort avoidance decision later in the optimization, once the partial join order had become a full join order.
That is, I wonder if the overall flow is something like:
 - optimizer investigates a partial join order, decides to cost it, determines that (so far) a sort is not required.
 - optimizer later completes this join order, decides that it is acceptable, but does NOT re-analyze
   whether a sort is now required for the complete join order
 - optimizer then chooses the correct full join order, but incorrectly avoids the sort due to the decision
   it made when considering the partial join order.

2) The comment ("ORDER BY S.A, T.B, S.C") raises the interesting question of the situation in
which each column, considered individually, is ordered properly, but because the ORDER BY
clause interleaves columns from different tables, a sort is still required. That is, if S had an index
on (A, C), and T had an index on B, we might look at ORDER BY S.A, T.B, S.C and think that no
sorting of the results was required, because the join from S -> T would emit the rows in the
correct order, but that is wrong; the interleaving of the columns means that the sort must still be performed.
And, presumably, there is the interesting situation where we need to cost out a query at a point
where we have determined a partial join order that contains a position for S, but not T, or vice versa.

It would be interesting to know more about the Wisconsin test cases, about the queries involved, and
about the before- and after- query plan differences, with respect to sorting. Are *all* the changes due
to situations where we formerly avoided a sort, but now we choose one? That's interesting, I think; we
want to be careful to avoid introducing un-necessary sorts because that could be a substantial
performance regression (of course, if the old query was returning the rows in the wrong order, but
the test didn't check, and the new query is now being performed correctly, that's important to know, too!)

It would also be interesting to see if we can construct an example along the lines of the
ORDER BY S.A, T.B, S.C case, such that various other permutations (ORDER BY S.A, S.C, T.B or
ORDER BY T.B, S.A, S.C) did not require sorting, but ORDER BY S.A, T.B, S.C did, and see how
the query plans emitted for these various cases behaved.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705646#action_12705646 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimize phase through the debugger and it appears to me (I may be wrong and would appreciate others looking at my detail analysis of the optimize phase below) that the problem may be with the generate phase or the execute phase where we may be not using the non-unique index on table2 correctly to fetch the orders in row.

The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

For the query above, in addition to the predicates supplied by the user, optimizer internally generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id 
2)table1.id=table3.id 
3)table3.id=table2.id
4)table2.name='PageSequenceId' 
5)table3.name='PostComponentId' 
6)table3.value='21857' 

Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding optimizables ie 4) will be associated with table2 and 5),6) will be associated with table3. This is because these predicates are constant comparison with columns. This leaves us with 3 predicates, namely 1), 2), 3)
which are multitable join predicates.

Optimizer has a class called RowOrdering associated with it (OptimizerImpl.currentRowOrdering). 
currentRowOrdering has following fields in it.
currentRowOrdering	RowOrderingImpl  
	alwaysOrderedOptimizables	Vector<E>
	columnsAlwaysOrdered	ColumnOrdering
	currentColumnOrdering	null	
	ordering	Vector<E>  
	unorderedOptimizables	Vector<E>  
All the predicates that are constant comparison will go into columnsAlwaysOrdered. These pushing of constant comparison predicates happen per optimizable basis when that particular optimizable is being consdiered in the possible join order combination.

In our specific query, through optimizer overrides, we have instructed optimizer to only consider join order [table3, table2, table1]. The optimizer starts with [table3, -1, -1]. First thing it does is it goes through the join predicates (which are 1), 2) and 3) in the predicate list above). But since
all the referenced tables for any of the 3 predicates are not covered by the current join order of [table3, -1, -1], nothing gets done to those join predicates. Next, the optimizer will tell 
currentRowOrdering to (this happens in FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering, OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above list into it's columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows 
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. So far, the logic for currentRowOrdering seems to be working fine. Next, we have asked the optimizer to use the index index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6) since that predicate is on the same column on which the index is created but it does not cover the other columns from table3 that are being referenced in this query (which table3.id and table3.name). Because of this, we determine that the index being considered is not a covering index. The code to determine whether the sorting can be avoided for [table3, -1, -1], is in OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on table2.value, the order by column's table does not match with table3 and hecne we determine that sorting is not required based on what optimizer has seen so far. So it appears that we leave it to table2 when its turn comes in the join order to decide whether sorting should indeed be avoided or not.

Next we consider the join order [table3, table2, -1]. For table2, we have asked the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is go through the join predicates 1), 2) and 3). Predicate number 3) which is TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the current join order [table3, table2, -1] includes the tables referenced by predicate 3). So, at this point, there are 2 predicates pushed down to table3, they are number 5) and 6). And for table2, there are 2 prdicates pushed down to it, they are number 3) and 4). Also, since predicate 4) is a constant comparison, it will get added to currentRowOrdering. At this point, currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. In addition, Table at position 1(which is Table2 in our join order) has column 2 which is always ordered because it is being compared with constant. Next, we have asked the optimizer to use the index nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate 4) since that predicate is on column name and not value. Notice, this is a different code path we are following for table2 compared to table3 above. Because table3.value is not already an ordered column in currentRowOrdering because there is no
constant comparison predicate on it, we add it to the "ordering " vector in currentOrdering object. This is the first object that gets added to the currentRowOrdering."ordering" vector in our eg. So, at this point, the currentRowOrdering has only 3 of it's fields propulated and they are as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it does not cover all the column from table2 that are being referenced in this query and hence it is not a covering index. Next, the code to determine whether sort can be avoided for join order [table3, table2, -1], we go through the code path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the order by column's table matches with table2 in join order. Because of this match, we need to look at currentRowOrdering to see if it will take care of the sorting and if so we can avoid the sort. To look into currentRowOrdering, we first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call, cr is the order by column). So, we are checking if table2 is always ordered in currentRowOrdering. Since table2 is not always ordered in this query, this check returns false. Next, we check if not the entire table, is the order by table.order by column combination always ordered in currentRowOrdering. In our query, that will be table2.value Since there is no constant comparison predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does not include Table 1, Column 3. So, we have not found table2 to be always ordered and we have not found table2.value to be always ordered either. The last place to check is the ordering vector in columnsAlwaysOrdered. This vector does include Table 1, Column 3 which is table2.value and hence we determine that sorting is not needed to table2. All this code of checking the columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet). Assuming that this code is working as intended, I think then the culprit might be when we generate the code. The only step in optimize left is to add table1 to the join order. So, at the end of the optimize phase, the join order will look as follows [table2, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 2, Column 1
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]

The only change to currentRowOrdering that is caused by adding of table1 in third join order position is that we are going to use primary key on table1 and hence we need to reflect that in currentRowOrdering by adding it to the ordering vector.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699763#action_12699763 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

> due to the presence of "W" in the join order, I think the results would still end up out of order.

I'm not seeing that, seems like the data from W is irrelevant to the final ordering, but I think it's 
good to be cautious and verify all of this with some actual tables and some actual queries.

Mamta, you might want to mark this issue as assigned to you...?

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699556#action_12699556 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I talked to Army offline on this jira entry and following was his feedback on the issue.

He tends to agree that the outermost table m1 should not avoid sorting when the ordering required is on a column that is from an optimizable that is not in the join order yet. The join order is 0,-1,-1 when m1 decides that we can choose a plan which will avoid sorting. The code that makes that decision is in OrderByList.sortRequired(...), lines 496 - 533 Apparently, this code has been there since Cloudscape 2.0 days. 

**************Start of Army's analysis of the issue*******************
The comment preceding the code segment mentioned above says:

            /*
            ** Check whether the table referred to is in the table map (if any).
            ** If it isn't, we may have an ordering that does not require
            ** sorting for the tables in a partial join order.  Look for
            ** columns beyond this column to see whether a referenced table
            ** is found - if so, sorting is required (for example, in a
            ** case like ORDER BY S.A, T.B, S.C, sorting is required).
            */

The 2nd and 3rd line in comment confuses me.  The only time I can think of where we could assume "sort avoidance" was okay when a table number is missing would be if the ORDER BY expression did not refer to any tables, ex. if it was a constant:

    ORDER BY 'some literal', m0.value

But from what I can tell, we catch that during preprocessing and remove the constant, so the above code still wouldn't be useful. (Note: the above ORDER BY will actually cause the query to return the correct results because the presence of a non-column expression (esp. the literal) causes the optimizer to ALWAYS perform a sort--that's a workaround to the problem if the user really needs one...).

If the code mentioned above was removed, then I think the optimizer would require sorting for the first optimizable (m1), and that would in turn mean that we have to sort for the entire join order--which should return the correct results.  Of course, the thought of just removing code that's been in there for years is a bit scary...It would be nice to understand what the intended use case was, but the comments are not clear about that at all.
**************End of Army's analysis of the issue*******************

Based on the above feedback, I commented out the code from 496-533 lines in OrderByList and ran the junit tests and they all ran fine. The old harness had one test case failing (lang/wisconsin) and it failed because the query plans now include sorting when the original plans (without my code changes) did not include sorting.  It appears that wisconsin test does not check the results of the cursors which are getting prepared. It just opens few cursors in order to get their query plans and dumps the query plans without checking the results of those queries. So I am not sure if the results from those queries have changed because of the additional sorting which is being added into their query plans.

Ofcourse, the problem query shown below works fine with the code commenting suggested by Army 
SELECT table1.id, m0.value, m1.value FROM  --DERBY-PROPERTIES joinOrder=FIXED 
table2  m1 -- DERBY-PROPERTIES index=key3
,  table2 m0 -- DERBY-PROPERTIES index=key3
, table1 
WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;

I would like to know what the community thinks of the code removal suggested by Army.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723320#action_12723320 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

BTW, Kathey, thanks for taking care of backporting the changes for this jira entry while my hard drive died.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703351#action_12703351 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I have changed the test to use the test script submitted by Tars this morning. Thanks for the updated script, Tars Joris.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Myrna van Lunteren (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718242#action_12718242 ] 

Myrna van Lunteren commented on DERBY-3926:
-------------------------------------------

Looks like this caused 1 javadoc warning:
C:\nightlies\main\src\opensource\java\engine\org\apache\derby\iapi\sql\compile\OptimizablePredicateList.java:136: warning - @return tag has no arguments.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


Do you know how the current code determines that sort can be avoided in the one remaining 
case?  Does it do something special with the equality condition?

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689790#action_12689790 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

Including the alias name in the query plan output seems like a great idea. Thanks for tracking that down!


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali closed DERBY-3926.
---------------------------------

    Resolution: Fixed

regressions introduced by this fix have been resolved by checkins tracked under DERBY-4311

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.1.3.3, 10.2.2.1, 10.3.3.1, 10.4.2.1, 10.6.0.0, 10.5.2.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716735#action_12716735 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

To elaborate more on the upgrade test failures, I had the junit tests using the gui and that's when I saw upgrade test failures.
java -Xmx1024m -cp '../../tools/java/junit.jar;../../classes' junit.swingui.TestRunner -noloading org.apache.derbyTesting.functionTests.suites.All

BTW, the platform for the test runs is Windows XP v2.10 and IBM 16 jdk.

I ran the upgrade tests by themselves using the gui TestRunner and didn't see the failures. I am now running the entire junit test suit using text runner to see if the failures I saw earlier reproduce.
java -Xmx256M -XX:MaxPermSize=128M -cp '../../tools/java/junit.jar;../../classes' junit.textui.TestRunner org.apache.derbyTesting.functionTests.suites.All

I will appreciate if someone has time to apply my patch and run the entire junit suite on their machine to see if the upgrade tests fail there. Thank you.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717311#action_12717311 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Thanks, Mike, for ruuning the tests. I will plan on committing the patch tomorrow. 

One piece of possible improvement that can be made to the patch in future is to hide all the information about the column ordering in the RowOrdering object. Currently, we check if the current Optimizable is a one-row resultset or not and if not, then the check to make sure that all the previous optimizables should be one-row resultset happens in OrderByList.sortRequired(this check is done if we have found earlier that there is no equijoin between the current optimizable's order by column with columns already ordered from the previous optimizables. This check is made much earlier in the optimization phase and the result of that check is encapsulated in RowOrdering object and so no code restructuring is needed for this equijoin part of the logic. The possible code improvement is for part of the logic where the current optimizable is multi-row, there is no equijoin on this current optimizable's order by columns so say that ordering already exists and the previous optimizables are not all one-row resultset. It may be possible for this piece of logic to be encapsulated somehow in RowOrdering object). I am not sure if this encapsulation is feasible or not but just wanted to note it in the jira.

This observation was also made by Mike and Army as comments to this jira entry,
********************************************************
A B added a comment - 20/May/09 10:17 AM 
mike> I thought it would be cleaner if all the information that was necessary 
mike> [...] was actually already located in the current RowOrdering object. 
mike> [...] The work here is that some more information may need to be tracked 
mike> in the RowOrdering. 

For what little it's worth, I agree, I think this is a nice idea that would in fact be a 
bit cleaner. I think the current patch is acceptable, as well, but if I had the luxury 
of choosing, I'd probably go with Mike's approach, if possible... 
********************************************

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

      Environment: 
Checked into 10.3. This will still go to 10.2 and 10.1

    Fix Version/s: 10.3.3.1

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I also agree that including the alias name would be great.  Make sure to include a sample of what the query plan looks like after your change in the other jira.

I also was guessing that the optimizer got confused by the two aliases on the same table.  It seems like you should find where in the optimizer we determine how the current query plan is sorted and we must compare that with how the query wants to be sorted to determine if the current plan is a valid "sort avoidance" plan.  My guess is that at this point we somehow pass in that the plan is sorted a table2.value but somewhere we are not recognizing that we need to 
check for m0.value.  but just a guess.  It could also be a bug where we are passing in the wrong alias to the check.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712480#action_12712480 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Thanks, Knut.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704227#action_12704227 ] 

Mamta A. Satoor edited comment on DERBY-3926 at 4/29/09 11:32 AM:
------------------------------------------------------------------

I am attaching a newer version of script3.sql The new script is called script3WithUserFriendlyIndexNames.sql The only changes made to script3.sql are to use user-friendly index names so that it is easier to understand the query which is using index names through optimizer overrides to demonstrate the buggy behavior. Hopefully it will be easier to read the query plan as well. The query to see the problem is as follows

SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

The query plan for the query above looks as follows
Statement Name:
        null
Statement Text:
        SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId'
AND table1.id=table3.id
AND table3.name='PostComponentId'
AND table3.value='21857' ORDER BY table2.value
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text:
Project-Restrict ResultSet (10):
Number of opens = 1
Rows seen = 3
Rows filtered = 0
restriction = false
projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:            0.20
        optimizer estimated cost:         1185.66

Source result set:
        User supplied optimizer overrides for join are { joinOrder=FIXED }
        Nested Loop Exists Join ResultSet:
        Number of opens = 1
        Rows seen from the left = 3
        Rows seen from the right = 3
        Rows filtered = 0
        Rows returned = 3
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                optimizer estimated row count:            0.20
                optimizer estimated cost:         1185.66

        Left result set:
                User supplied optimizer overrides for join are { joinOrder=FIXED }
                Nested Loop Join ResultSet:
                Number of opens = 1
                Rows seen from the left = 3
                Rows seen from the right = 3
                Rows filtered = 0
                Rows returned = 3
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        optimizer estimated row count:            0.20
                        optimizer estimated cost:         1185.35

                Left result set:
                        Project-Restrict ResultSet (5):
                        Number of opens = 1
                        Rows seen = 3
                        Rows filtered = 0
                        restriction = true
                        projection = true
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                restriction time (milliseconds) = 0
                                projection time (milliseconds) = 0
                                optimizer estimated row count:            0.20
                                optimizer estimated cost:            6.97

                        Source result set:
                                Index Row to Base Row ResultSet for TABLE3:
                                Number of opens = 1
                                Rows seen = 3
                                Columns accessed from heap = {0, 1, 2}
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        optimizer estimated row count:  0.20
                                        optimizer estimated cost:            6.97

                                        User supplied optimizer overrides on TABLE3 are { index=NONUNIQUEONVALUE_TABLE3 }
                                        Index Scan ResultSet for TABLE3 using index NONUNIQUEONVALUE_TABLE3 at read committed isolation level using share row locking chosen by the optimizer
                                        Number of opens = 1
                                        Rows seen = 3
                                        Rows filtered = 0
                                        Fetch Size = 1
                                                constructor time (milliseconds) = 0
                                                open time (milliseconds) = 0
                                                next time (milliseconds) = 0
                                                close time (milliseconds) = 0
                                                next time in milliseconds/row = 0

                                        scan information:
                                                Bit set of columns fetched=All
                                                Number of columns fetched=2
                                                Number of deleted rows visited=0

                                                Number of pages visited=2
                                                Number of rows qualified=3
                                                Number of rows visited=4
                                                Scan type=btree
                                                Tree height=-1
                                                start position:        >= on first 1 column(s).
        Ordered null semantics on the following columns:

                                                stop position:        > on first 1 column(s).
        Ordered null semantics on the following columns:

                                                qualifiers:None
                                                optimizer estimated row count:          0.20
                                                optimizer estimated cost:     6.97


                Right result set:
                        Project-Restrict ResultSet (8):
                        Number of opens = 3
                        Rows seen = 8688
                        Rows filtered = 8685
                        restriction = true
                        projection = true
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                restriction time (milliseconds) = 0
                                projection time (milliseconds) = 0
                                optimizer estimated row count:            0.20
                                optimizer estimated cost:         1178.38

                        Source result set:
                                Index Row to Base Row ResultSet for TABLE2:
                                Number of opens = 3
                                Rows seen = 8688
                                Columns accessed from heap = {0, 1, 2}
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        optimizer estimated row count:  0.20
                                        optimizer estimated cost:         1178.38

                                        User supplied optimizer overrides on TABLE2 are { index=NONUNIQUEONVALUE_TABLE2 }
                                        Index Scan ResultSet for TABLE2 using index NONUNIQUEONVALUE_TABLE2 at read committed isolation level using share row locking chosen by the optimizer
                                        Number of opens = 3
                                        Rows seen = 8688
                                        Rows filtered = 0
                                        Fetch Size = 1
                                                constructor time (milliseconds)= 0
                                                open time (milliseconds) = 0
                                                next time (milliseconds) = 0
                                                close time (milliseconds) = 0
                                                next time in milliseconds/row =0

                                        scan information:
                                                Bit set of columns fetched={1}
                                                Number of columns fetched=1
                                                Number of deleted rows visited=0

                                                Number of pages visited=12
                                                Number of rows qualified=8688
                                                Number of rows visited=8688
                                                Scan type=btree
                                                Tree height=2
                                                start position:        None
                                                stop position:        None
                                                qualifiers:None
                                                optimizer estimated row count:          0.20
                                                optimizer estimated cost:  1178.38



        Right result set:
                Index Scan ResultSet for TABLE1 using constraint SQL090429102526750 at read committed isolation level using share row locking chosen by the optimizer
                Number of opens = 3
                Rows seen = 3
                Rows filtered = 0
                Fetch Size = 1
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched={0}
                        Number of columns fetched=1
                        Number of deleted rows visited=0
                        Number of pages visited=3
                        Number of rows qualified=3
                        Number of rows visited=3
                        Scan type=btree
                        Tree height=1
                        start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:0
                        stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:0
                        qualifiers:
Column[0][0] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

                        optimizer estimated row count:            0.20
                        optimizer estimated cost:            0.31


      was (Author: mamtas):
    I am attaching a newer version of script3.sql The new script is called script3WithUserFriendlyIndexNames.sql The only changes made to script3.sql are to use user-friendly index names so that it is easier to understand the query which is using index names through optimizer overrides to demonstrate the buggy behavior. Hopefully it will be easier to read the query plan as well. The query to see the problem is as follows

SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

  
> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697757#action_12697757 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

For the incorrect case, the join order picked is 2,1,0 which m1, m0, table1 where as for the correct case, the join order picked is 2,0,1 which is m1,table1,m0. The other interesting thing in case of incorrect case is, for m0, for the plan used for m0, both the "optimizer estimated row count" and "optimizer estimated cost" are 0.00. Is it possible for the optimizer to be able to find a plan for a table such that the "optimizer estimated row count" and "optimizer estimated cost" can be 0? Maybe that is a valid situation, I don't know enough about the optimizer yet to answer that question but I though those statistics were worth bringing it up for discussion. 



> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702147#action_12702147 ] 

A B commented on DERBY-3926:
----------------------------

I haven't looked at the details of your previous post, but regarding:

> So think the current fix may not be valid as it is causing sort where they do not need to be.

I agree.  That's what I was hoping to convey in my comment on April 16th when I wrote:

> *if* that's a correct statement of how the code is *supposed* to work, then it is actually
> quite useful and it does make sense.

I.e. we shouldn't remove it.  As for:

> is it expected that the optimizer should recognize that a one row result set requires
> no ordering? 

I don't know the details of how it works, but there is definitely logic in the optimizer to try to recognize one row result sets and to make decisions based on that information.  The RowOrderingImpl.java class even includes a field called "alwaysOrderedOptimizables" for which the javadoc says:

    /*
    ** This vector contains table numbers for tables that are always ordered.
    ** This happens for one-row tables.
    */

Not sure how that plays into the above queries, but I thought I'd mention it...

> Another case I have seen is an order by of A.key, B.key being satisfied by join on
> A.key, B.key when it is known A.key is a single value

Yes, I think that's what the "columnsAlwaysOrdered" field in RowOrderingImpl seems to be for.  Any columns which are compared to constants (and probably parameter markers as well?) are considered "always ordered" and hence should not require sorting in and of themselves.  Ex. In the query for this issue the columnsAlwaysOrdered field includes columns m0.name, m1.name, and m1.value because all of those columns are compared to literals in the WHERE clause.  It seems quite possible that the optimizer will apply these rules transitively to detect other "always ordered" columns as well...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702107#action_12702107 ] 

A B commented on DERBY-3926:
----------------------------

Thanks Mamta!

Could you perhaps include the join order that was chosen for each of the above queries?  I'm not looking for the full query plan, just something short like the { TENKTUP1, TENKTUP2 } notation.  You should be able to get the join order by reading the query plan top to bottom; the order in which you see the table names should reflect the join order chosen by the optimizer.

Note that the order by clauses for queries #2 thru #7 all match the "ORDER BY S.A, T.B" shape that I mentioned in my first April 16th comment, so if the theory as to how that code is supposed to work was correct, that *might* explain why these queries fail to avoid the sort when the relevant code is commented out...?  Of course, more tracing/debugging of the individual queries would be necessary to know for sure...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711781#action_12711781 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimizer code for following query
connect 'jdbc:derby:wombat';
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 


During the optimize phase, for an optimizable, everytime we are considering a new access path for it, we check if the conglomerate being considered for the optimizable is an index conglomerate. If yes, then we check if the current ordering for the given join order already has the index columns in it. That check is done by the following call in FromBaseTable.nextAccessPath: 478	
if ( ! rowOrdering.orderedOnColumn(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING,
	getTableNumber(), baseColumnPositions[i]))
The RowOrderingImpl.orderedOnColumn:119 first checks if the optimizable we are dealing with is marked as always ordered(I think this happens if the optimizable is a one-row table). If yes, then the we will return true from RowOrderingImpl.orderedOnColumn. If not, we check if there was a predicate on the index column which makes it always ordered(line 127). If not,then we check if the column is already being ordered by checking ordering vector at line 133. If not, we go ahead and add the index column to the ordering vector in currentRowOrdering object.

Going to the query above when the optimizer is going through the join order [0, -1] which means it is considering TENKTUP1 at the outermost join order position and no optimizable has yet been considered in the next position yet. For TENKTUP1, we go through the above piece of code. The method RowOrderingImpl.orderedOnColumn is going to return false for the index column UNIQUE1(index TK1UNIQUE1 is on that column) because 
a)TENKTUP1 is not always ordered meaning it is not one-row table
b)there is no constant predicate on TENKTUP1.UNIQUE1
c)there is no other ordering on TENKTUP1.UNIQUE1

Since for our query, the index column UNIQUE1 is not already ordered yet in the currentRowOrdering object, we go ahead and add it to the ordering vector inside currentRowOrdering object (this is done in FromBaseTable.nextAccessPath at line: 484 with following 2 code lines.)
	rowOrdering.nextOrderPosition(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING);
	rowOrdering.addOrderedColumn(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING,
		getTableNumber(), baseColumnPositions[i]);
This adding of TENKTUP1.UNIQUE1 to ordering object will indicate that the rows are ordered on that column. So, at the end of considering access path for [0, -1] join order position, currentRowOrdering object will have TENKTUP1.UNIQUE1 in ordering vector and it will be marked as the current column ordering (this is done by having currentColumnOrdering = TENKTUP1.UNIQUE1). 

Next, we work on finding the cost of the given access path for [0, -1] join order. Once we find the cost, we check to see if it makes sense to avoid sort on it from what we know so far. This is done in OptimizerImpl.costBasedCostOptimizable through following piece of code
	if (considerSortAvoidance && requiredRowOrdering.sortRequired(currentRowOrdering,
		assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
What we are doing here is checking if the ordering requested by the user for the current table (that information is in requiredRowOrdering) is satisfied by row ordering provided by the join order being considered so far. The row ordering provided for the current join order is in currentRowOrdering. In our specific eg, since user has requested for ordering on TENKTUP1.UNIQUE1 and we are ordering on it because of the index that is being used, the above call to requiredRowOrdering.sortRequired is going to return sorting not required.

Next, we put the next optimizable in the join order, meaning moving from [0,-1] join order to [0, 1]. At this point, optimizer is considering TENKTUP2 in the 2nd join order position. We have asked the optimizer to use index TK2UNIQUE1 for optimizable TENKTUP2. We go through the same code path as above for this optimizable. We will find that the index column TENKTUP2.unique1 needs to be added to ordering vector in currentRowOrdering because TENKTUP2 is not one-row table and there is no constant predicate on TENKTUP2.unique1 and hence the current index being considered on TENKTUP2 is going to provide ordering on TENKTUP2.unique1. Next, we see if sort can be aovided for TENKTUP2 by comparing required row ordering against current row ordering. User has required ordering on TENKTUP2.unique1 and current row ordering satisfies that ordering because of the index which is being considered for TENKTUP2.

So, if my understanding of code is correct, the sorting is getting avoided NOT based on the fact that equality condition exists between the two optimizables ie TENKTUP1.unique1 = TENKTUP2.unique1 In other words, sorting avoidacne decision was not based on TENKTUP1.unique1 = TENKTUP2.unique1 It was because optimizer decides that the individual sorting required on the optimizables have been satisfied by indexes picked on them.

Hope this helps understand the current behavior of the trunk code for the query in question.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705646#action_12705646 ] 

Mamta A. Satoor edited comment on DERBY-3926 at 5/4/09 1:38 PM:
----------------------------------------------------------------

I went through the optimize phase through the debugger and it appears to me (I may be wrong and would appreciate others looking at my detail analysis of the optimize phase below) that the problem may be with the generate phase or the execute phase where we may be not using the non-unique index on table2 correctly to fetch the orders in row.

The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

For the query above, in addition to the predicates supplied by the user, optimizer internally generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id 
2)table1.id=table3.id 
3)table3.id=table2.id
4)table2.name='PageSequenceId' 
5)table3.name='PostComponentId' 
6)table3.value='21857' 

Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding optimizables ie 4) will be associated with table2 and 5),6) will be associated with table3. This is because these predicates are constant comparison with columns. This leaves us with 3 predicates, namely 1), 2), 3)
which are multitable join predicates.

Optimizer has a class called RowOrdering associated with it (OptimizerImpl.currentRowOrdering). 
currentRowOrdering has following fields in it.
currentRowOrdering	RowOrderingImpl  
	alwaysOrderedOptimizables	Vector<E>
	columnsAlwaysOrdered	ColumnOrdering
	currentColumnOrdering	null	
	ordering	Vector<E>  
	unorderedOptimizables	Vector<E>  
All the predicates that are constant comparison will go into columnsAlwaysOrdered. These pushing of constant comparison predicates happen per optimizable basis when that particular optimizable is being consdiered in the possible join order combination.

In our specific query, through optimizer overrides, we have instructed optimizer to only consider join order [table3, table2, table1]. The optimizer starts with [table3, -1, -1]. First thing it does is it goes through the join predicates (which are 1), 2) and 3) in the predicate list above). But since
all the referenced tables for any of the 3 predicates are not covered by the current join order of [table3, -1, -1], nothing gets done to those join predicates. Next, the optimizer will tell 
currentRowOrdering to (this happens in FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering, OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above list into it's columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows 
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. So far, the logic for currentRowOrdering seems to be working fine. Next, we have asked the optimizer to use the index index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6) since that predicate is on the same column on which the index is created but it does not cover the other columns from table3 that are being referenced in this query (which table3.id and table3.name). Because of this, we determine that the index being considered is not a covering index. The code to determine whether the sorting can be avoided for [table3, -1, -1], is in OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on table2.value, the order by column's table does not match with table3 and hecne we determine that sorting is not required based on what optimizer has seen so far. So it appears that we leave it to table2 when its turn comes in the join order to decide whether sorting should indeed be avoided or not.

Next we consider the join order [table3, table2, -1]. For table2, we have asked the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is go through the join predicates 1), 2) and 3). Predicate number 3) which is TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the current join order [table3, table2, -1] includes the tables referenced by predicate 3). So, at this point, there are 2 predicates pushed down to table3, they are number 5) and 6). And for table2, there are 2 prdicates pushed down to it, they are number 3) and 4). Also, since predicate 4) is a constant comparison, it will get added to currentRowOrdering. At this point, currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. In addition, Table at position 1(which is Table2 in our join order) has column 2 which is always ordered because it is being compared with constant. Next, we have asked the optimizer to use the index nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate 4) since that predicate is on column name and not value. Notice, this is a different code path we are following for table2 compared to table3 above. Because table3.value is not already an ordered column in currentRowOrdering because there is no
constant comparison predicate on it, we add it to the "ordering " vector in currentOrdering object. This is the first object that gets added to the currentRowOrdering."ordering" vector in our eg. So, at this point, the currentRowOrdering has only 3 of it's fields propulated and they are as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it does not cover all the column from table2 that are being referenced in this query and hence it is not a covering index. Next, the code to determine whether sort can be avoided for join order [table3, table2, -1], we go through the code path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the order by column's table matches with table2 in join order. Because of this match, we need to look at currentRowOrdering to see if it will take care of the sorting and if so we can avoid the sort. To look into currentRowOrdering, we first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call, cr is the order by column). So, we are checking if table2 is always ordered in currentRowOrdering. Since table2 is not always ordered in this query, this check returns false. Next, we check if not the entire table, is the order by table.order by column combination always ordered in currentRowOrdering. In our query, that will be table2.value Since there is no constant comparison predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does not include Table 1, Column 3. So, we have not found table2 to be always ordered and we have not found table2.value to be always ordered either. The last place to check is the ordering vector in columnsAlwaysOrdered. This vector does include Table 1, Column 3 which is table2.value and hence we determine that sorting is not needed to table2. All this code of checking the columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet). Assuming that this code is working as intended, I think then the culprit might be when we generate the code. The only step in optimize left is to add table1 to the join order. So, at the end of the optimize phase, the join order will look as follows [table3, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 2, Column 1
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]

The only change to currentRowOrdering that is caused by adding of table1 in third join order position is that we are going to use primary key on table1 and hence we need to reflect that in currentRowOrdering by adding it to the ordering vector.

      was (Author: mamtas):
    I went through the optimize phase through the debugger and it appears to me (I may be wrong and would appreciate others looking at my detail analysis of the optimize phase below) that the problem may be with the generate phase or the execute phase where we may be not using the non-unique index on table2 correctly to fetch the orders in row.

The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

For the query above, in addition to the predicates supplied by the user, optimizer internally generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id 
2)table1.id=table3.id 
3)table3.id=table2.id
4)table2.name='PageSequenceId' 
5)table3.name='PostComponentId' 
6)table3.value='21857' 

Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding optimizables ie 4) will be associated with table2 and 5),6) will be associated with table3. This is because these predicates are constant comparison with columns. This leaves us with 3 predicates, namely 1), 2), 3)
which are multitable join predicates.

Optimizer has a class called RowOrdering associated with it (OptimizerImpl.currentRowOrdering). 
currentRowOrdering has following fields in it.
currentRowOrdering	RowOrderingImpl  
	alwaysOrderedOptimizables	Vector<E>
	columnsAlwaysOrdered	ColumnOrdering
	currentColumnOrdering	null	
	ordering	Vector<E>  
	unorderedOptimizables	Vector<E>  
All the predicates that are constant comparison will go into columnsAlwaysOrdered. These pushing of constant comparison predicates happen per optimizable basis when that particular optimizable is being consdiered in the possible join order combination.

In our specific query, through optimizer overrides, we have instructed optimizer to only consider join order [table3, table2, table1]. The optimizer starts with [table3, -1, -1]. First thing it does is it goes through the join predicates (which are 1), 2) and 3) in the predicate list above). But since
all the referenced tables for any of the 3 predicates are not covered by the current join order of [table3, -1, -1], nothing gets done to those join predicates. Next, the optimizer will tell 
currentRowOrdering to (this happens in FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering, OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above list into it's columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows 
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. So far, the logic for currentRowOrdering seems to be working fine. Next, we have asked the optimizer to use the index index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6) since that predicate is on the same column on which the index is created but it does not cover the other columns from table3 that are being referenced in this query (which table3.id and table3.name). Because of this, we determine that the index being considered is not a covering index. The code to determine whether the sorting can be avoided for [table3, -1, -1], is in OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on table2.value, the order by column's table does not match with table3 and hecne we determine that sorting is not required based on what optimizer has seen so far. So it appears that we leave it to table2 when its turn comes in the join order to decide whether sorting should indeed be avoided or not.

Next we consider the join order [table3, table2, -1]. For table2, we have asked the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is go through the join predicates 1), 2) and 3). Predicate number 3) which is TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the current join order [table3, table2, -1] includes the tables referenced by predicate 3). So, at this point, there are 2 predicates pushed down to table3, they are number 5) and 6). And for table2, there are 2 prdicates pushed down to it, they are number 3) and 4). Also, since predicate 4) is a constant comparison, it will get added to currentRowOrdering. At this point, currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. In addition, Table at position 1(which is Table2 in our join order) has column 2 which is always ordered because it is being compared with constant. Next, we have asked the optimizer to use the index nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate 4) since that predicate is on column name and not value. Notice, this is a different code path we are following for table2 compared to table3 above. Because table3.value is not already an ordered column in currentRowOrdering because there is no
constant comparison predicate on it, we add it to the "ordering " vector in currentOrdering object. This is the first object that gets added to the currentRowOrdering."ordering" vector in our eg. So, at this point, the currentRowOrdering has only 3 of it's fields propulated and they are as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it does not cover all the column from table2 that are being referenced in this query and hence it is not a covering index. Next, the code to determine whether sort can be avoided for join order [table3, table2, -1], we go through the code path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the order by column's table matches with table2 in join order. Because of this match, we need to look at currentRowOrdering to see if it will take care of the sorting and if so we can avoid the sort. To look into currentRowOrdering, we first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call, cr is the order by column). So, we are checking if table2 is always ordered in currentRowOrdering. Since table2 is not always ordered in this query, this check returns false. Next, we check if not the entire table, is the order by table.order by column combination always ordered in currentRowOrdering. In our query, that will be table2.value Since there is no constant comparison predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does not include Table 1, Column 3. So, we have not found table2 to be always ordered and we have not found table2.value to be always ordered either. The last place to check is the ordering vector in columnsAlwaysOrdered. This vector does include Table 1, Column 3 which is table2.value and hence we determine that sorting is not needed to table2. All this code of checking the columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet). Assuming that this code is working as intended, I think then the culprit might be when we generate the code. The only step in optimize left is to add table1 to the join order. So, at the end of the optimize phase, the join order will look as follows [table2, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 2, Column 1
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]

The only change to currentRowOrdering that is caused by adding of table1 in third join order position is that we are going to use primary key on table1 and hence we need to reflect that in currentRowOrdering by adding it to the ordering vector.
  
> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688322#action_12688322 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

My apologies for false alarm on 10.1 codeline. I noticed that I had some local changes made on that client. Once i reverted them, the npe went away. I will mark the new jira entry  DERBY-4015 for npe as not a bug.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697764#action_12697764 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

During query optimization phase, we need to know the estimated row count for the tables involved in the query. What I have found during debugging  of the correct case and incorrect case is that the estimated row count is not the same in the 2 cases. When we finish the database session in which the tables, indexes are created, as part of closing the session, we must update the estimated row count and that would explain why the count is different for the same table. Of course, this change in estimated row count must be factoring in what plan gets picked up by the optimizer. 

Next, I am planning on pursuing the broken case cost estimates during different join permutation because it seems like we are ending up with scenarios where the cost estimate shows the cost and row count to be 0 and hence we end up picking that plan because it is costing nothing to pickup that plan. Like I said in my earlier comment, I am not sure if it is ok to cost nothing for a plan and hence it ending up becoming the best plan and probably the best sort avoidance plan too.
costEstimate	
	cost	0.0	
	rowCount	0.0	


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711301#action_12711301 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

I've been following this discussion, and learning a lot! Thanks much for the careful writeups and explanations.

It seems to me that the problem with the current Wisconsin query has to do with the precise definition of "one row result set".

If we go back to Mike's original insight some time ago, he said:

> the key here is that the outer table(table3) is returning more than one row and
> each one of those row is requiring us to look at the middle table (table2) which
> results into 3 scans on table2

And that seems correct to me. But at some point I think we lost the "AND" part of Mike's statement
when it was translated into the bits-and-bytes of actual code.

That is, in the Wisconsin query in question, the outer table (TENKTUP1) is returning
more than one row, *however* each one of those rows only results in a single
open of the inner table (TENKTUP2), because the join key is unique.

This comment from the patch seems to describe the essence of the issue:

+	 * ... and hence we need to make sure 
+	 * that the outer predicates in the join order are all
+	 * one row optimizables meaning that they can at the
+	 * most return only one row. If they return more than
+	 * one row, then it will require multiple scans of the
+	 * current optimizable and the rows returned from
+	 * those multiple scans may not be ordered correctly.

The thing I think we need to do is to figure out some way to encode the following test:

   Are the outer predicates in the join such that they will require at most one scan of the current optimizable?

I think the reason the Wisconsin query has changed behavior (now includes an unnecessary
sort) is because the patch isn't quite expressing exactly this idea. Outer predicates which
are *not* one row result sets should still be able to perform sort avoidance plans, *as long as*
the join condition will perform only a single open of the inner optimizable.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I don't think tracking down the difference in estimated row count is going to lead you to the solution to this problem.  For this issue it is key to figure out why the particular plan chosen in the bug case is wrong.  It may be that the cost of the plan is wrong, but that should not affect whether any chose plan gives the right result - the cost should never affect that.  

So again I believe the key thing here is to track down why this plan that has m1 as the outermost table thinks that this plan does not have to add a sort at the top to get proper
ordering required by order by m0.  My guess is that the bad plan uses the same key for
m1 and m0 which are just ranges on the same table, but has incorrect logic to not see
that it needs m0 using index key as outer most rather than just index key.  

Also don't get confused when the bug is fixed.  My guess is that once the bug is fixed that the currently chosen plan will get marked correctly as NOT a sort avoidance plan which then will require an additional sort node which will add to the cost.  This will probably make it such that the optimizer no longer picks this plan.   

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701729#action_12701729 ] 

Kathey Marsden commented on DERBY-3926:
---------------------------------------

Hi Mamta, I noticed the grant to ASF checkbox was not marked with the attachment derby-reproduce.zip. Do you think it is ok to turn it into a test?  This may be fine since it is just SQL and not java code, but I wanted to just check and make sure.



> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: script3.sql

I modified the last script provided by Tars to create a table identical to table2 and the new table is called table3. The original query for Tars test case which returns incorrect order of rows used a self join on table2 along with table1. To avoid any possibilities of problems associated with may be self join, I created this table3 which is identical to table2. But despite that, the query is still returning incorrect results. Here is how the new query (without the self join) looks like.
SELECT table1.id, table2.value, table3.value FROM table1, table2, table3 WHERE table1.id=table2.id AND 

table2.name='PageSequenceId' AND table1.id=table3.id AND table3.name='PostComponentId' AND 

table3.value='21857' ORDER BY table2.value;

The database can be setup to run the query above is enclosed as script3.sql to this jira entry.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_patch6_060309_stat.txt
                DERBY3926_patch6_060309_diff.txt

I am attaching DERBY3926_patch5_060309_diff.txt Hopefully, this is the final patch for this jira entry. This patch takes care of the original problem query and one query from wisconsin which was getting an unnecessary sort node on it with the previous patch (DERBY3926_patch5_052709_stat.txt). 

Following are the files that were touched by the patch.
M      java\engine\org\apache\derby\impl\sql\compile\RowOrderingImpl.java
M      java\engine\org\apache\derby\impl\sql\compile\OrderByList.java
M      java\engine\org\apache\derby\impl\sql\compile\FromBaseTable.java
M      java\engine\org\apache\derby\impl\sql\compile\OptimizerImpl.java
M      java\engine\org\apache\derby\impl\sql\compile\PredicateList.java
M      java\engine\org\apache\derby\iapi\sql\compile\RowOrdering.java
M      java\engine\org\apache\derby\iapi\sql\compile\RequiredRowOrdering.java
M      java\engine\org\apache\derby\iapi\sql\compile\OptimizablePredicateList.java
M      java\testing\org\apache\derbyTesting\functionTests\tests\lang\wisc_setup.sql
M      java\testing\org\apache\derbyTesting\functionTests\tests\lang\_Suite.java
M      java\testing\org\apache\derbyTesting\functionTests\tests\lang\OrderByAndSortAvoidance.java
M      java\testing\org\apache\derbyTesting\functionTests\master\wisconsin.out


Following is the patch description.
The problem with the trunk codeline is that when optimizer goes through optimizables in a join order, it only looks at those optimizables individually to decide whether sorting can be avoided on them or not. That approach leaves out few queries which require sorting but do not get sorted. The decision for avoiding sorting should also include relationship between the optimizables in a given join order. Following query demonstrates the trunk problem
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED 
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2 
, table1 
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value; 

In the query above, when optimizer is considering [table3, table2, -1] join order, it determines that sorting can be avoided on this join order because the order by column table2.value is already covered by the index nonUniqueOnValue_Table2. It does not see that the outermost optimizable table3 will qualify more than one row and hence it will be a multi-row resulset and for each one of those rows, we will be doing a scan into table2. In other words, there will be multiple scans into table2(and the rows returned by each one of those scans will be ordered on table2.value) but the collective rows from those multiple scans are not necessarily going to be ordered on table2.value. This patch is attempting to fix that problem.

Currently, in trunk, a column is marked always ordered during a query processing when the optimizer finds that there is constant comparison predicate on the order by column. If the column does not have a constant predicate (as in our example above), we next see if we are using an index which will provide the required ordering on column (which is true in our case. The required ordering on table2.value is provided by the index nonUniqueOnValue_Table2). But as we can see in the query above, this index coverage is not enough to say that sorting is not needed. We need to add 2 more conditions before we can decide to avoid the sorting. One of those cases is 1)if the order by column does not belong to the outermost optimizable, then check if the order by column's optimizable is a one-row resultset. If yes, then it will be safe for the optimizer to avoid the sorting. The second case to consider is 2)if the order by column does not belong to the outermost optimizable, then check if the order by column's optimizable is multi-row resultset BUT all the outer optimizables are one-row resulsets. If either of these 2 additional conditions are satisfied then optimizer can choose to avoid the sorting. Otherwise
sorting should be added to the query plan. The example query above does not satisfy the 2 additional checks and hence sorting should be done as part of the query plan.

The changes for the 1)check above has gone into OrderbyByList.sortRequired(RowOrdering, JBitSet, OptimizableList). The implementation of this change just required us to check the outer optimizables to be one row since the order by column's optimizable is not one row. If outer optimizables are all one-row, then we say that sorting can be avoided. Otherwise sorting is required.

The changes for the 2)check above has gone into FromBaseTable.nextAccessPath(Optimizer optimizer, OptimizablePredicateList predList, RowOrdering rowOrdering) The implementation of this change requires us to see if the order by column is involved in equijoin with outer optimizable's indexed column. If yes, then we know that since outer optimizable is ordered, the rows qualified via the equijoin will also be ordered and hence sorting can be avoided. But if this is not true, then we can't rely on outer optimizables' rows to be ordered on the order by column. To avoid sorting, we need to identify this case 2) as another case when the column can be marked as always ordered and that is when there is an equijoin predicate on the order by column with some other column
which is already known to be always ordered. Taking the query from wisconsin as an example will explain this behavior
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 

For the above query, as per the current trunk codeline, none of the order by columns are marked as always ordered because there is no constant comparison predicate on them. But, for the given join order, with TENKTUP1 as the outermost resultset and with the index TK1UNIQUE1, we know that the current row ordering at this point is going to ensure that rows from TENKTUP1 are ordered on UNIQUE1. Next, when we process TEKTUP2 in the 2nd join order position, we find that there is no constant predicate on TENKTUP2.unique1 and hence we conculde that the rows from TENKTUP2 are not going to be ordered and we decide to force a sort node on the top of the query. But in reality, even though the outer optimizable is not a single row resultset, it is ordered on TENKTUP1.unique1 and hence all those rows from outer optimizable are going to be ordered on TENKTUP1.unique1 and the inner optimizable has an equality join on 

TENKTUP1.unique1 using the order by column TENKTUP2.unique1 What that translates to is that even if there will be multiple scans into TENKTUP2, the rows qualified are going to be all ordered because of the equijoin between the outer and inner optimizables on the order by columns. So, with my latest patch, I have expanded the notion of always ordered columns to include both constant comparison predicates AND ordered column that has equijoin with an outer optimizable's ordered column. 

I think this patch is also improving the existing queries to include a better path than what it was picking up before. Following is an example of one such query from wisconsin.
select * from TENKTUP1, TENKTUP2 
where TENKTUP1.unique1 = TENKTUP2.unique1 
and TENKTUP2.unique1 < 100
order by TENKTUP1.unique1; 
For this query, the trunk currently decides to use TENKTUP1 as the outermost optimizable using the TK1UNIQUE1 index and then those rows are filtered using TENKTUP2.unique1 < 100. Each of the 2 tables involved in the query have 10000 rows each. So we are going through 10000 qualified indexed rows from TENKTUP1 and then applying TENKTUP2.unique1 < 100 on them. With the attached patch, we use TENKTUP2 as the outermost optimizable with the index TK2UNIQUE1 and only gets the indexed rows which satisfy TENKTUP2.unique1 < 100 and then on them, we use the equlity join to fetch qualified rows from TENKTUP1. 

I hope the above explanation helps understand the patch. I would appreciate if someone can take the time to go through the patch and provide any feedback they may have. If I don't hear anything by early next week, I will go ahead and commit the patch.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689785#action_12689785 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I debugged to see which table is picked as the leftmost table since it is unclear from the query plan what we mean when we say TABLE2 is the leftmost table. There is a self-join involved with TABLE2 and we use aliases m0 and m1 for them but in the query plan, we talk about them as just TABLE2 and not along  with their alias names. It appears that in both failing and passing case, m1 is the leftmost table in the query plan chosen for the queries. (I made one line change in my client to impl\sql\compile\IndexToBaseRowNode.java where rather than just sending the table name, I send both table name and alias name to the query plan collection logic. The code change is shown below
-               mb.push(source.getBaseTableName());
+               mb.push(source.getBaseTableName() + " " + source.getCorrelationName());
With the change above, I can see in the query plan for the failing and passing case that the leftmost resultset is for TABLE2 M1. I will go ahead and enter a jira entry to make query plan more readable by including the alias name along with the table name.

I will next look at why we decide to skip the sorting in the failing case. That seems to be the only difference between the 2 query plans.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kristian Waagan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-3926:
-----------------------------------

          Urgency: Normal
    Fix Version/s:     (was: 11.0.0.0)
                   10.6.0.0

Triaged July 3, 2009: Assigned normal urgency, replaced 11.0 with 10.6 in Fix versions.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.1.2, 10.6.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707021#action_12707021 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

It appears that we need to catch the case where an optimizable is not the outermost node and optimizer is considering using an index on that optimizable (the index is on the order by column) and there is no constant comparison predicate on that column. This pretty much makes the index of no use but say the optimizer has been forced to use that index through optimizer overrides (ie what we have done in our test queries in this jira). If the outer tables in the join order are all one-row resultset, then it is not an issues because we will be doing only one scan on the optimizable in question and all the rows returned for that optimizable will be sorted on the index being considered on optimizable. The problem case is when there are outer optimizable involved and the outer optimizables will qualify more than one row which will be returned for them and for each one of those rows, we will be doing a scan on the optimizable in question and hence the rows satisfied through multiple scans of the optimizable in question will not be in any sorted order. To fix this, I am planning on adding additional code in OptimizerImpl.costBasedCostOptimizable after the following existing if statement at line 2239
if (joinPosition == 0 ||  optimizableList.getOptimizable(proposedJoinOrder[joinPosition - 1]).considerSortAvoidancePath())
Following is the psudeo code of what I am planning on adding
if (joinPosition != 0) //if we are the outermost optimizable, we are good to go.
{
    if (optimizable.currentPlanUsingIndex() && optimizable.indexOnOrderByColumn() && optimizable.noConstantPredicateOnIndexColumn())
        Sorting can't be avoided on this optimizable
    else
    {
        Continue with the existing code which is 
        if (requiredRowOrdering.sortRequired(currentRowOrdering,assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
        ................... 
    }
}

I will try to implement this psedo code. Let me know if anyone has any comments if this does not look like a good possible solution.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701587#action_12701587 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I have worked on writing a junit test which is currently going to fail because we are returning the data in incorrect order. I thought it would be useful to have the test for people to quickly run the test if they wanted to. This junit test will not be part of any suite currently since the bug is not fixed yet. Putting it in the suite is going to make it fail everytime because the test is asserting that the data be returned in the correct order. I will check that test in soon. It was painful to convert the setup script provided for this jira into a junit test. The script is huge. I was able to use Army's test converter DERBY-2151(it was extremely helpful because it atleast converted half of the script into junit test. My understanding is that the converter works on the older canon based master file. I think it takes sql delimited by ; from the canon file and assumes that next line is the output of that sql. So, the converter skipped every other sql from my setup script. I ended up hand putting every other line which was skipped by the converter. )

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719921#action_12719921 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Merged changes for DERBY-3926 into 10.4.2.1 codeline using revision 785073. Had to hand copy some additional methods into RuntimeStatisticsParser.java because those methods didn't exist in 10.4.2.1 codeline.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717318#action_12717318 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

BTW, i reran the reproducible test case for DERBY-4240 with my patch and the test case returns correct result. Before committing the patch tomorrow, I will go ahead and add a test case for DERBY-4240 in OrderByAndSortAvoidance.

Also, I forgot to mention that I did take care of following comments from Mike on my earlier patch.
********************************
The new routine you added isColumnAlwaysOrdered() is doc'd as: 
+ * Return true if the column is always ordered. That will be true if the 
+ * column has a constant comparison predicate on it. 
But the comment in the call to it seems to say it is expecting something different: 
* The current order by column does not have any constant 
* comparison predicate on it nor does it belong to a 
* single row table which means that the rows will not 

o some lines over 80 

o mostly uses brace on new line, but on the following uses brace on same line: 
 if (moreThanOneTableInJoinOrder) { 
********************************

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I ran full set of tests against ibm16 on a windows xp box with the latest patch and got no unexpected errors
(continue to get the replication test error seen in many nightly test runs).  I also by hand ran the upgrade tests
in the gui and did not get any errors with the patch also.

Unless anyone else reviews and disagrees I think we should get this patch in and backported.  Any additional
improvements to the patch can be handled by subsequent work.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_patch5_052709_stat.txt
                DERBY3926_patch5_052709_diff.txt

I am attaching a new patch, namely DERBY3926_patch5_052709_stat.txt. This patch still does the same thing as the previous patch which is to make sure that if order by column does not belong to the outermost optimizable, then check if the order by column has a constant comparison predicate on it or it is coming from a single-row table. If either of the 2 conditions is true then we do not need to worry if the outer opitimizable to the optimizable for order by column are one-row resultsets or not. One thing to note is that I have the additional check to see if the order by column comes from a single row table. Earlier patch only checked if the order by column has constant comparison predicate on it. All this logic is now in RequiredRowOrdering:sortRequired(RowOrdering rowOrdering, JBitSet tableMap, OptimizableList optimizableList) throws StandardException;
This is more logical place for code to be in rather than putting in OptimizerImpl which is where the previous patch put the code changes.

The one item remaining on this jira is one query from wisconsin that now requires a sorting node rather than avoiding the sort. Mike and Bryan are right about checking if the previous optimizables are multi-row optimizables then see if the multiple scan into order by column's optimizable are all going to return same single row resulset. If so, then there is no need to require sorting even though outer optimizables are multu-row resultsets.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688804#action_12688804 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I will spend little time on this jira. As Mike pointed out, it is a possibilit that in the wrong result case, may be we are getting confused with m0 and m1 which basically are involved in a self join. 

For a given query, if the optimizer chooses a query plan with an index key for say a where clause and it finds that the same key can be used to satisfy the order by, then in that case, optimizer will avoid doing the sorting required for order by because the rows are already in order based on the index key used to satisfy where clause.

In this particular case, there is an index called key3 on table2.value which can be used to satisfy the m1.value='21857'.  It is possible that may be there is some bug in optimizer which thinks that the index used for m1.value can be used to avoid the sorting required for m0.value. But the sort should not be avoided because the order by is on m0.value and not m1.value and hence index used for m1.value can't be used to avoid the sort on m0.value. I will debug this a little to see what is index key3 being used for ie m0 or m1. It is unclear from the query plan whether the index key3 is being used on m0 or m1.

SELECT table1.id, m0.value, m1.value 
FROM table1, table2 m0, table2 m1 
WHERE table1.id=m0.id 
AND m0.name='PageSequenceId' 
AND table1.id=m1.id 
AND m1.name='PostComponentId' 
AND m1.value='21857' 
ORDER BY m0.value; 


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

A B updated DERBY-3926:
-----------------------

    Attachment: d3926_repro.sql

Attaching another SQL file, d3926_repro.sql, which reproduces the problem (for me) with a simpler set of tables and data.

This repro was motivated by the observation made by Mike and re-iterated by Mamta, namely:

> the key here is that the outer table(table3) is returning more than one row and
> each one of those row is requiring us to look at the middle table (table2) which
> results into 3 scans on table2 

Put differently, the outer table is the one which is "driving" the iteration because a) each row from the outer table leads to a scan on the middle table, and b) the optimizer determines that no sort is necessary.  Thus the order of the result is based solely on the order of the rows that are retrieved from the outer table.

Some observations about what was necessary to get this particular repro to work:

  1) The outer table (T1) has a non-unique index on column I1, and we force the optimizer to use that non-unique index.

      Why? If the optimizer were to use a table scan for T1 then we would check to see if T1 was a "oneRowResultSet", which it isn't (and can't be, since we need T1 to return multiple rows in order to satisfy observation 3 below).  Since it's not a one-row result set T1 would then get added to the list of "unordered optimizables" for the join order--and if that list has at least one optimizable in it, we would end up doing an explicit sort and thus the problem would not repro.  So the plan for T1 must use an index.

  2) There is a predicate in the WHERE clause which compares the non-unique indexed column T1.I1 to a CONSTANT expression.

      Why? If such a predicate did not exist then T1.I1 would be added as the first column in the "rowOrdering" for the join order.  Then when the optimizer adds the middle table (T2) to the join order, it would see that the index for T2 does *not* satisfy the ordering requirement of T1.I1, which means we would end up doing an explicit sort for the whole plan.  So the problem would not repro.  By adding a predicate to compare T1.I1 with a constant, we effectively make T1.I1 "always ordered" and so we do not need to add it to the row ordering.

  3) The outer table (T1) has multiple rows which have the same value for the indexed column T1.I1.

      Why? The presence of multiple rows is important because that's what leads to multiple scans on the middle table (as pointed out by Mike and Mamta).  So we need to have a predicate which compares to a fixed constant value (observation 2), but we also need that predicate to return multiple rows.  Thus there must be multiple rows in T1 which have I1 column values that equal the constant value used in the predicate.  (This is why the index for T1 must be non-unique.)

  4) The middle table (T2) has an index that is ordered the same way as the ORDER BY clause.  We force the optimizer to use that index for T2.

      Why? If the optimizer is using an index that satisfies the ordering requirement for the query, it will try to avoid sorting the resultant rows.  Sort avoidance is key to reproducing the reported behavior--esp. the optimizer *thinks* it can avoid the sort, and does so, but in truth it should *not* have done so.

  5) The index for T2 is not covering--and esp. it does *not* include the column T2.I2 that is used for joining with the outer table.

      Why? The fact that the index is non-covering means that we will have to go to the T2 table conglomerate to fetch the row that has the current join value--and access to the table conglomerate is _unordered_.

  6)  The column from T1 (outer table) that is joined with T2 (middle table) has varied values for each of the rows.

      Why? The presence of different values in T1.J1 means that we will scan T2's table conglomerate multiple times for different T2.I2 values, and since the table conglomerate is unordered (observation 5), those multiple scans will return the rows in an order that does *not* match the index order.

  7) The rows that are inserted into T2 are inserted in an order that does NOT match the ORDER BY ordering.

      Why? It appears that, when inserting rows into a table, the order of the rows in the base table conglomerate generally matches the insertion order.  I don't think there are any guarantees of that, but that's what I observed for this simple data.  So if we were to insert the rows in proper order, access to the base table conglomerate (observations 5 & 6) might in fact return the rows in the desired order by accident, which would hide the problem that we're trying to reproduce.

With all of those observations in place, I was able to write the attached script to reproduce the problem for me.  The results I see when I run are:

J1         |J2  |J3
---------------------------
0          |f   |five
1          |g   |six
2          |e   |four

but the query specifies "ORDER BY t2.j2", so the rows are in the wrong order.

It would be great if others could try to run the script to make sure they see the same behavior (if not, then some or all of this comment may in fact be wrong or incomplete...).

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699744#action_12699744 ] 

A B commented on DERBY-3926:
----------------------------

> if condition "b" was in place, we would see that the "leading set" of order by
> columns--i.e. the number of order by columns before "m0.value", is EMPTY,
> which means that "so far" nothing is sorted, and thus the sort would be required.

One easy way to check this theory would be to make the following change:

@@ -503,7 +503,7 @@
                        */
                        if (tableMap != null)
                        {
-                               if ( ! tableMap.get(cr.getTableNumber()))
+                               if ((position > 0) && !tableMap.get(cr.getTableNumber()))
                                {
                                        /* Table not in partial join order */
                                        for (int remainingPosition = loc + 1;

and see what happens...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Rick Hillegas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Hillegas reopened DERBY-3926:
----------------------------------


I am re-opening this issue. Based on the discussion around DERBY-4331 it appears that this issue may have not been fixed. It seems that the ordering problem may simply have been moved from one family of queries to another.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.1.3.3, 10.2.2.1, 10.3.3.1, 10.4.2.1, 10.5.2.0, 10.6.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


could it be that the leftmost join is actually on the m1.value part of the query rather than the m0.value of the query.  maybe something got confused about the self join with 2 different "range" variables.  

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683990#action_12683990 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

The query plan when the incorrect results are returned is as follows
Statement Name:
        null
Statement Text:
        SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 W
HERE table1.id=m0.id AND
m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m
1.value='21857' ORDER BY
m0.value
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text:
Project-Restrict ResultSet (10):
Number of opens = 1
Rows seen = 3
Rows filtered = 0
restriction = false
projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:            0.00
        optimizer estimated cost:            3.52

Source result set:
        Nested Loop Join ResultSet:
        Number of opens = 1
        Rows seen from the left = 168
        Rows seen from the right = 3
        Rows filtered = 0
        Rows returned = 3
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                optimizer estimated row count:            0.00
                optimizer estimated cost:            3.52

        Left result set:
                Nested Loop Join ResultSet:
                Number of opens = 1
                Rows seen from the left = 3
                Rows seen from the right = 168
                Rows filtered = 0
                Rows returned = 168
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        optimizer estimated row count:            0.00
                        optimizer estimated cost:            3.52

                Left result set:
                        Project-Restrict ResultSet (5):
                        Number of opens = 1
                        Rows seen = 3
                        Rows filtered = 0
                        restriction = true
                        projection = true
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                restriction time (milliseconds) = 0
                                projection time (milliseconds) = 0
                                optimizer estimated row count:            0.00
                                optimizer estimated cost:            3.52

                        Source result set:
                                Index Row to Base Row ResultSet for TABLE2:
                                Number of opens = 1
                                Rows seen = 3
                                Columns accessed from heap = {0, 1, 2}
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        optimizer estimated row count:  0.00
                                        optimizer estimated cost:            3.52
                                        Index Scan ResultSet for TABLE2 using index KEY3 at read committed isolation level using share row locking chosen by the optimizer
                                        Number of opens = 1
                                        Rows seen = 3
                                        Rows filtered = 0
                                        Fetch Size = 1
                                                constructor time (milliseconds)= 0
                                                open time (milliseconds) = 0
                                                next time (milliseconds) = 0
                                                close time (milliseconds) = 0
                                                next time in milliseconds/row =0
                                        scan information:
                                                Bit set of columns fetched=All
                                                Number of columns fetched=2
                                                Number of deleted rows visited=0

                                                Number of pages visited=2
                                                Number of rows qualified=3
                                                Number of rows visited=4
                                                Scan type=btree
                                                Tree height=-1
                                                start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:

                                                stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:

                                                qualifiers:None
                                                optimizer estimated row count:        0.00
                                                optimizer estimated cost:     3.52


                Right result set:
                        Project-Restrict ResultSet (8):
                        Number of opens = 3
                        Rows seen = 8688
                        Rows filtered = 8520
                        restriction = true
                        projection = true
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                restriction time (milliseconds) = 0
                                projection time (milliseconds) = 0
                                optimizer estimated row count:            0.00
                                optimizer estimated cost:            0.00

                        Source result set:
                                Index Row to Base Row ResultSet for TABLE2:
                                Number of opens = 3
                                Rows seen = 8688
                                Columns accessed from heap = {0, 1, 2}
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        optimizer estimated row count:  0.00
                                        optimizer estimated cost:            0.00

                                        Index Scan ResultSet for TABLE2 using index KEY3 at read committed isolation level using share row locking chosen by the
 optimizer
                                        Number of opens = 3
                                        Rows seen = 8688
                                        Rows filtered = 0
                                        Fetch Size = 1
                                                constructor time (milliseconds)= 0
                                                open time (milliseconds) = 0
                                                next time (milliseconds) = 0
                                                close time (milliseconds) = 0
                                                next time in milliseconds/row =0

                                        scan information:
                                                Bit set of columns fetched={1}
                                                Number of columns fetched=1
                                                Number of deleted rows visited=0

                                                Number of pages visited=12
                                                Number of rows qualified=8688
                                                Number of rows visited=8688
                                                Scan type=btree
                                                Tree height=2
                                                start position:        None
                                                stop position:        None
                                                qualifiers:None
                                                optimizer estimated row count:          0.00
                                                optimizer estimated cost:     0.00



        Right result set:
                Table Scan ResultSet for TABLE1 at read committed isolation level using instantaneous share row locking chosen by the optimizer
                Number of opens = 168
                Rows seen = 3
                Rows filtered = 0
                Fetch Size = 16
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=1
                        Number of pages visited=1
                        Number of rows qualified=3
                        Number of rows visited=9408
                        Scan type=heap
                        start position:
null                    stop position:
null                    qualifiers:
Column[0][0] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
Column[0][1] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

                        optimizer estimated row count:            0.00
                        optimizer estimated cost:            0.00

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_patch4_051519_stat.txt
                DERBY3926_patch4_051519_diff.txt

Army, thanks for looking at the patch. I need to remove the commented out if condition along with corresponding {} brackets. Also, I have added the comment for currentRowOrdering.alwaysOrdered(previousOptimizable.getTableNumber()) Let me know if I should reword it differently. 

I have attached a patch with these changes and some more order by tests. That is the only difference between this patch and the earlier patch.,

On another note, I was planning on moving all this new code inside the existing method OrderByList.sortRequired which gets called by OptimizerImpl. This way, all the decision regarding sort avoidance would have been made in the appropriate named method sortRequired. But the new code added by me needs access to current join order position, previous optimizables, predicate lists etc which are not available to OrderByList.sortRequired method. Passing these additional parameters to the method will require changes to other part of the Derby code where we call this method. Based on that, I am thinking that I should leave my new code where it is right now. If anyone feels differently about the location of the new code, do let me know.

The only issue with the suggested patch is that a query like following is now going to require a sort node on the top which we didn't require earlier. Other than this one case, all the other test cases
have worked fine with my patch. I am inclied on going ahead and committing the patch with this know one case. If I don't hear anything back on the patch, I will plan on comitting it towards the end of the week.
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707215#action_12707215 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Sorry, I don't know how but I missed the crucial logic for checking one row resultset on previous optimizables. 

Here is my new pseudo code for handling the check to see if we are going to consider a new order by column and that column does not have a constant predicate on it. Before it can decide to do sort avoidance, It needs to make sure that previous optimizables do not return more than one row.

if (joinPosition != 0) { 
    if (optimizable.addingNewOrderByColumn() && optimizable.doesNotHaveConstantPredOnOrderByColumn())
    {
	if (previousOptimizablesAreAllSingleRow()) 
	        consider sort avoidanve (by running the current code)
    	else
                             Sorting can't be avoided on this optimizable 
    } else
	        consider sort avoidanve (by running the current code)
}

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718310#action_12718310 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Thanks for catching that, Myrna. I have fixed it with revision 783641.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710809#action_12710809 ] 

A B commented on DERBY-3926:
----------------------------

I took a look at the patch and the changes look reasonable to me. The latest version of the
patch seems a bit cleaner than the previous ones, which is nice.  And I like the detailed
comments in OptimizerImpl.java :)

I did notice that in the latest patch there is an "if" condition which is commented out.  Was
that intentional, and if so, can it (and the corresponding {} brackets) be removed entirely,
instead of just being commented out?

It might be useful if you could add a comment to indicate that the call to:

  currentRowOrdering.alwaysOrdered(previousOptimizable.getTableNumber())

will return true if the optimizable was found to be a "oneRowResultSet" during costing.
That might make it more clear that the code is actually in sync with the comments which
precede it (I had to do some investigation before I could come to that conclusion).

Your explanation of the failure in the wisconsin test makes sense to me. Thanks so much
for your diligence with this particular issue!

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


to me looks like the problem comes with plan that is trying to do sort avoidance.  The working plan has a sort node on the top.  

But to me the "bad" plan looks like a valid sort avoidance plan.  Seems to have index scans on the left side, using key3 which is sorted on the value that the query in the end wants to be sorted on.  

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719596#action_12719596 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Merged changes for DERBY-3926 into 10.5.1.2 codeline using revision 784809

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699728#action_12699728 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Bryan, I haven't read your comment completely but I noticed " I wish I had a clearer grasp on the concept of a 'partial join order.  I think that this is an intermediate stage during optimization, where the optimizer has so far chosen an ordering for some, but not yet all, of the tables. It looks like the code is trying to handle the problem of how to make a sort avoidance check at this intermediate point in optimization." Your understanding of partial join order is correct. For our specific query case, the optimizer is eventually going to have a join order consisting of 3 tables once it determines the best join order. But to arrive to that join order, it goes through many iterations of possible join orders. It starts out by considering only one optimizable in the first position of the join order, eg, 0,-1,-1 and this will be considered partial join order.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: script3WithUserFriendlyIndexNames.sql

I am attaching a newer version of script3.sql The new script is called script3WithUserFriendlyIndexNames.sql The only changes made to script3.sql are to use user-friendly index names so that it is easier to understand the query which is using index names through optimizer overrides to demonstrate the buggy behavior. Hopefully it will be easier to read the query plan as well. The query to see the problem is as follows

SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699755#action_12699755 ] 

A B commented on DERBY-3926:
----------------------------

> *and* b) the leading set of order by columns canNOT be empty.

Okay, check that, that's not a complete solution.

Consider

    ORDER BY S.A, T.B

with a join order of { S, W, - }.

If we can avoid sorting for S.A and then we process "T.B" and say that we can avoid the sort there, as well, we'll get into trouble.  Once we get to join order { S, W, T } we might find that our access plan for T can avoid the sort for B...but due to the presence of "W" in the join order, I think the results would still end up out of order.  So the condition would have to be generalized more than what I posted earlier...I think...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

    Affects Version/s: 10.3.3.1
                       10.2.3.0
                       10.1.3.3

I noticed that this issue also occurs on versions back to 10.1


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Stan Bradbury (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658240#action_12658240 ] 

Stan Bradbury commented on DERBY-3926:
--------------------------------------

I verified the report then dropped and recreated the index key3.  The records ordered properly after that.  Go figure...??  With the newly created cardinality statistics it may be the correct index is selected this time but the original results are still wrong.

ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE tab
ER BY m0.value;
ID                  |VALUE
----------------------------------------------------------------------------------
------------------------------------------------------------------------------
2147483653          |000002
2147483654          |000003
4294967297          |000001
3 rows selected

ij> drop index key3;
0 rows inserted/updated/deleted

ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE
table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND
m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;
ID                  |VALUE
------------------------------------------------------------------------------------
------------------------------------------------------------------------------
4294967297          |000001
2147483653          |000002
2147483654          |000003
3 rows selected

ij> CREATE INDEX key3 ON table2(value);
0 rows inserted/updated/deleted
ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE
table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND
m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;
ID                  |VALUE
------------------------------------------------------------------------------------
------------------------------------------------------------------------------
4294967297          |000001
2147483653          |000002
2147483654          |000003


3 rows selected

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden resolved DERBY-3926.
-----------------------------------

    Resolution: Fixed

I neglected to resolve this issue after backporting it to 10.1. Resolving it now. Note: DERBY-4331 has been identified as a regression from this change.



> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.1.3.3, 10.2.2.1, 10.3.3.1, 10.4.2.1, 10.5.2.0, 10.6.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor reassigned DERBY-3926:
--------------------------------------

    Assignee: Mamta A. Satoor

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683992#action_12683992 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

The query plan when the correct results are returned is as follows
Statement Name:
        null
Statement Text:
        SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 W
HERE table1.id=m0.id AND
m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m
1.value='21857' ORDER BY
m0.value
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text:
Sort ResultSet:
Number of opens = 1
Rows input = 3
Rows returned = 3
Eliminate duplicates = false
In sorted order = false
Sort information:
        Number of rows input=3
        Number of rows output=3
        Sort type=internal
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        optimizer estimated row count:            0.20
        optimizer estimated cost:            8.26

Source result set:
        Project-Restrict ResultSet (9):
        Number of opens = 1
        Rows seen = 3
        Rows filtered = 0
        restriction = false
        projection = true
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count:            0.20
                optimizer estimated cost:            8.26

        Source result set:
                Nested Loop Exists Join ResultSet:
                Number of opens = 1
                Rows seen from the left = 3
                Rows seen from the right = 3
                Rows filtered = 0
                Rows returned = 3
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        optimizer estimated row count:            0.20
                        optimizer estimated cost:            8.26

                Left result set:
                        Nested Loop Exists Join ResultSet:
                        Number of opens = 1
                        Rows seen from the left = 3
                        Rows seen from the right = 3
                        Rows filtered = 0
                        Rows returned = 3
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                optimizer estimated row count:            0.20
                                optimizer estimated cost:            7.29

                        Left result set:
                                Project-Restrict ResultSet (5):
                                Number of opens = 1
                                Rows seen = 3
                                Rows filtered = 0
                                restriction = true
                                projection = true
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        restriction time (milliseconds) = 0
                                        projection time (milliseconds) = 0
                                        optimizer estimated row count:
  0.20
                                        optimizer estimated cost:            6.9
7

                                Source result set:
                                        Index Row to Base Row ResultSet for TABL
E2:
                                        Number of opens = 1
                                        Rows seen = 3
                                        Columns accessed from heap = {0, 1, 2}
                                                constructor time (milliseconds)
= 0
                                                open time (milliseconds) = 0
                                                next time (milliseconds) = 0
                                                close time (milliseconds) = 0
                                                optimizer estimated row count:
          0.20
                                                optimizer estimated cost:
     6.97

                                                Index Scan ResultSet for TABLE2using index KEY3 at read committed isolation level using instantaneous share row
 locking chosen by the optimizer
                                                Number of opens = 1
                                                Rows seen = 3
                                                Rows filtered = 0
                                                Fetch Size = 16
                                                        constructor time (millis
econds) = 0
                                                        open time (milliseconds)
 = 0
                                                        next time (milliseconds)
 = 0
                                                        close time (milliseconds
) = 0
                                                        next time in millisecond
s/row = 0

                                                scan information:
                                                        Bit set of columns fetch
ed=All
                                                        Number of columns fetche
d=2
                                                        Number of deleted rows v
isited=0
                                                        Number of pages visited=
2
                                                        Number of rows qualified
=3
                                                        Number of rows visited=4

                                                        Scan type=btree
                                                        Tree height=2
                                                        start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:

                                                        stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:

                                                        qualifiers:
None
                                                        optimizer estimated row
count:            0.20
                                                        optimizer estimated cost
:            6.97


                        Right result set:
                                Index Scan ResultSet for TABLE1 using constraint SQL090320113016460 at read committed isolation level using share row locking chosen by the optimizer
                                Number of opens = 3
                                Rows seen = 3
                                Rows filtered = 0
                                Fetch Size = 1
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        next time in milliseconds/row = 0

                                scan information:
                                        Bit set of columns fetched={0}
                                        Number of columns fetched=1
                                        Number of deleted rows visited=0
                                        Number of pages visited=3
                                        Number of rows qualified=3
                                        Number of rows visited=3
                                        Scan type=btree
                                        Tree height=1
                                        start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:
0
                                        stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:
0
                                        qualifiers:
None
                                        optimizer estimated row count:
  0.20
                                        optimizer estimated cost:            0.31


                Right result set:
                        Index Row to Base Row ResultSet for TABLE2:
                        Number of opens = 3
                        Rows seen = 3
                        Columns accessed from heap = {2}
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                optimizer estimated row count:            0.20
                                optimizer estimated cost:            0.97

                                Index Scan ResultSet for TABLE2 using constraint SQL090320113016670 at read committed isolation level using share row locking chosen by the optimizer
                                Number of opens = 3
                                Rows seen = 3
                                Rows filtered = 0
                                Fetch Size = 1
                                        constructor time (milliseconds) = 0
                                        open time (milliseconds) = 0
                                        next time (milliseconds) = 0
                                        close time (milliseconds) = 0
                                        next time in milliseconds/row = 0

                                scan information:
                                        Bit set of columns fetched=All
                                        Number of columns fetched=3
                                        Number of deleted rows visited=0
                                        Number of pages visited=6
                                        Number of rows qualified=3
                                        Number of rows visited=3
                                        Scan type=btree
                                        Tree height=2
                                        start position:
        >= on first 2 column(s).
        Ordered null semantics on the following columns:0 1
                                        stop position:
        > on first 2 column(s).
        Ordered null semantics on the following columns:0 1
                                        qualifiers:None
                                        optimizer estimated row count:  0.20
                                        optimizer estimated cost:            0.97

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716280#action_12716280 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

After doing a sync of the codeline and re-running the junit tests, I see failures in upgrade tests but I do not think they are related to my changes. 

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_notforcheckin_patch1_051109_stat.txt
                DERBY3926_notforcheckin_patch1_051109_diff.txt

I have attached a patch (not intended for checkin) DERBY3926_notforcheckin_patch1_051109_diff.txt based on the pseudocode that I posted last week. It fixes the problem query in question but when I run wisconsin test, I see that now we are adding sort nodes on top of few queries. I am trying to understand if it makes sense for us to have the additional sort node case by case. The first case I am looking at seems like should not get a sort node when the patch is choosing to add one. The query for that case is as follows
	select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 order by TENKTUP1.unique1;

The query plan for the above query shows tektup1 to be the outermost query so I am not sure why we need the sort node on the top. I will look further into it. If anyone has time to look at the patch, I will greatly appreciate it. There are no comments(minimal) for the new code. I will work on adding some comments and repost the patch to make it easier to read but the code should correspond fairly straightforward to the psuedo code posted last week.

The old query plan for the query above from wisconsin is as follows
ij> values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Statement Name: 
	C
Statement Text: 
	select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 order by TENKTUP1.unique1
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text: 
Nested Loop Exists Join ResultSet:
<filtered number of opens>
<filtered rows seen from the left>
<filtered rows seen from the right>
Rows filtered = 0
<filtered rows returned>
	constructor time (milliseconds) = 0
	open time (milliseconds) = 0
	next time (milliseconds) = 0
	close time (milliseconds) = 0
Left result set:
	Index Row to Base Row ResultSet for TENKTUP1:
	<filtered number of opens>
	<filtered rows seen>
	Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		Index Scan ResultSet for TENKTUP1 using index TK1UNIQUE1 at serializable isolation level using share table locking chosen by the optimizer
		<filtered number of opens>
		<filtered rows seen>
		Rows filtered = 0
		Fetch Size = 1
			constructor time (milliseconds) = 0
			open time (milliseconds) = 0
			next time (milliseconds) = 0
			close time (milliseconds) = 0
		scan information: 
			Bit set of columns fetched={1}
			Number of columns fetched=1
			Number of deleted rows visited=0
			<filtered number of pages visited>
			<filtered number of rows qualified>
			<filtered number of rows visited>
			Scan type=btree
			Tree height=2
			start position: 	None
			stop position: 	None
			qualifiers:None
Right result set:
	Index Row to Base Row ResultSet for TENKTUP2:
	<filtered number of opens>
	<filtered rows seen>
	Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		Index Scan ResultSet for TENKTUP2 using index TK2UNIQUE1 at serializable isolation level using share row locking chosen by the optimizer
		<filtered number of opens>
		<filtered rows seen>
		Rows filtered = 0
		Fetch Size = 1
			constructor time (milliseconds) = 0
			open time (milliseconds) = 0
			next time (milliseconds) = 0
			close time (milliseconds) = 0
		scan information: 
			Bit set of columns fetched=All
			Number of columns fetched=2
			Number of deleted rows visited=0
			<filtered number of pages visited>
			<filtered number of rows qualified>
			<filtered number of rows visited>
			Scan type=btree
			Tree height=2
			start position: 
	>= on first 1 column(s).
	Ordered null semantics on the following columns: 0 
			stop position: 
	> on first 1 column(s).
	Ordered null semantics on the following columns: 0 
			qualifiers:None



The new query plan after my changes is as follows
ij> values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Statement Name:
        null
Statement Text:
        select * from TENKTUP1, TENKTUP2
         where TENKTUP1.unique1 = TENKTUP2.unique1
         order by TENKTUP1.unique1

Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text:
Sort ResultSet:
Number of opens = 1
Rows input = 10000
Rows returned = 10000
Eliminate duplicates = false
In sorted order = false
Sort information:
        Number of merge runs=3
        Number of rows input=10000
        Number of rows output=10000
        Size of merge runs=[3215, 3215, 3215]
        Sort type=external
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        optimizer estimated row count:        10005.00
        optimizer estimated cost:        73930.40

Source result set:
        Nested Loop Exists Join ResultSet:
        Number of opens = 1
        Rows seen from the left = 10000
        Rows seen from the right = 10000
        Rows filtered = 0
        Rows returned = 10000
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                optimizer estimated row count:        10005.00
                optimizer estimated cost:        73930.40

        Left result set:
                Table Scan ResultSet for TENKTUP1 at read committed isolation level using instantaneous share row locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 10000
                Rows filtered = 0
                Fetch Size = 16
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=16
                        Number of pages visited=771
                        Number of rows qualified=10000
                        Number of rows visited=10000
                        Scan type=heap
                        start position:null                    
                        stop position:null                    
                        qualifiers:None
                        optimizer estimated row count:        10005.00
                        optimizer estimated cost:        14870.88

        Right result set:
                Index Row to Base Row ResultSet for TENKTUP2:
                Number of opens = 10000
                Rows seen = 10000
                Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        optimizer estimated row count:        10005.00
                        optimizer estimated cost:        59059.52

                        Index Scan ResultSet for TENKTUP2 using index TK2UNIQUE1 at read committed isolation level using share row locking chosen by the optimizer
                        Number of opens = 10000
                        Rows seen = 10000
                        Rows filtered = 0
                        Fetch Size = 1
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                next time in milliseconds/row = 0

                        scan information:
                                Bit set of columns fetched=All
                                Number of columns fetched=2
                                Number of deleted rows visited=0
                                Number of pages visited=20000
                                Number of rows qualified=10000
                                Number of rows visited=10000
                                Scan type=btree
                                Tree height=2
                                start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:0
                                stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:0
                                qualifiers:None
                                optimizer estimated row count:        10005.00
                                optimizer estimated cost:        59059.52


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706509#action_12706509 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Army, I tried your much simpler repro and it reproduces the problem.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712398#action_12712398 ] 

Knut Anders Hatlen commented on DERBY-3926:
-------------------------------------------

OrderByAndSortAvoidance.java has been causing warnings when building the javadocs in UTF-8 locale because it contains non-ASCII characters encoded with ISO-8859-1. The warnings look like this:

  [javadoc] /code/derby/trunk1/java/testing/org/apache/derbyTesting/functionTests/tests/lang/OrderByAndSortAvoidance.java:738: warning: unmappable character for encoding UTF8
  [javadoc]                 + "documents.  Ils ont �t� faits pour quelque chose.')");
  [javadoc]                                        ^
  [javadoc] /code/derby/trunk1/java/testing/org/apache/derbyTesting/functionTests/tests/lang/OrderByAndSortAvoidance.java:738: warning: unmappable character for encoding UTF8
  [javadoc]                 + "documents.  Ils ont �t� faits pour quelque chose.')");
  [javadoc]                                          ^
  [javadoc] /code/derby/trunk1/java/testing/org/apache/derbyTesting/functionTests/tests/lang/OrderByAndSortAvoidance.java:901: warning: unmappable character for encoding UTF8
  [javadoc]                             "'BatchTypeInstructions', 'Ne pas jeter ces documents.  Ils ont �t� faits pour quelque chose.')");
  [javadoc]                                                                                             ^
.
.
.

I fixed it in revision 777886 by replacing the occurrences of the problematic non-ASCII character (é) with a Unicode escape sequence (\u00e).

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694206#action_12694206 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I spent some time in comparing the code path for the working case and non-working case in a debugger. What I found is in optimize phase, at one point, one of the join orders we try in non-working case is 2,0,1 and for the working case it is 2,0,-1. 
0 stands for Table1 and 1 stands for Table2 m0, 2 stands for Table2 m1. -1 means that join position is not being used. I am not sure why we are picking different join orders. In addition to that, I see that for the 2,0,1 join order in non-working case, we indeed mark sort avoidance for Table2 m0, I definitely need to do more debugging to see why we think for this join order, we can avoid sorting for Table2 m0. I realize this is not much information but wanted to share what I have. 

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I took a close look at the original query plans for queries 1 through 3, and believe all of the original 
query plans with no sort at the top are valid plans.  So think the current fix may not be valid as it is causing sort where they do not need to be.  I am new to this, so if my logic is wrong please point it
out.

For query 1:
1)
ij> -- one row from joining table
get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique2 = TENKTUP2.unique2
	 and TENKTUP2.unique1 = 0
	 order by TENKTUP1.unique1';

Because TENKTUP2.unique1 is a unique column and the query requires TENKTUP2.unique1 = 0,
only 1 or 0 rows qualifies from TENKTUP2.   Because TENKTUP2.unique2 and TENKTUP1.unique2 are 
also both unique then TENKTUP1.unique2 = TENKTUP2.unique2 means only 1 or 0 rows qualify
qualify from TENKTUP1.  This means the query can only return 1 or 0 rows.  Because this is the
case there is no need to sort a 1 or 0 row result set.

My question is, is it expected that the optimizer should recognize that a one row result set requires 
no ordering?  

It looks like queries 3 through 7 are all variations on one row result sets.

Query 2 is different:
get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 order by TENKTUP1.unique1, TENKTUP2.unique1';

In this case the interesting thing is that TENKTUP1.unique1 must be the same as 
TENKTUP2.unique1.  So this means that any plan that is sorted on TENKTUP1.unique1 OR
TENKTUP2.unique1 fulfills the order by requirement to be sorted by TENKTUP1.unique1, TENKTUP2.unique1.

So in the original queryplan it turns out the outermost index scan is:
TENKTUP1 using index TK1UNIQUE1  which because of the predicate is enough to avoid sorting.

Again the question is whether the code is meant to catch this case or did it get lucky?

Originally I have to admit I didn't understand how any multiple table order by could be a sort 
avoidance plan since we have no indexes that cover multiple tables.  I believe the above show
some of the cases.

Another case I have seen is an order by of A.key, B.key being satisfied by join on A.key, B.key when it is known A.key is a single value.  Or satisfied also by join on B.key, A.key where it is also known that
A is a single value like A.key = ?. 





> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702085#action_12702085 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

wisconsin test was showing diffs after I commented out the code in OrderByList (code through 504-533). The diffs were for 7 queries and for those 7 queries, now the plan picked does a sorting (prior to my changes, the sorting was getting avoided). wisconsin test only does query plan dump, it does not check the actual data returned for those queries. For the 7 queries that changed, I added a check to dump the data returned from those queries. Rerunning wisconsin with and without my code changes atleast confirms that the data returned because of the additional sorting node has not been affected. All of these 7 queries involved multiple tables in the FROM list and they had ORDER BY clause. (I have included the 7 queries below for reference).

I went through all the queries in wisconsin test and see that there are still quite a few queries (even the ones with more than one table in the FROM list and have ORDER BY) that do not have sorting node on top of their query plan because of the commenting of the code. One of such query example is
get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 and TENKTUP2.unique1 < 6000
	 order by TENKTUP1.unique1';


BTW, the 7 queries that have changed their plans so that they now require sorting are as follows(Note that the comment for the query 2 "says that sort avoidance with joins and order by on columns in different tables". Well, with other change in the code, we are not avoiding sort anymore)
1)
ij> -- one row from joining table
get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique2 = TENKTUP2.unique2
	 and TENKTUP2.unique1 = 0
	 order by TENKTUP1.unique1';

2)
ij> -- Sort avoidance with joins and order by on columns in different tables
--
-- order on joining columns
get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 order by TENKTUP1.unique1, TENKTUP2.unique1';

3)
ij> get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 and TENKTUP1.unique2 = 0
	 and TENKTUP2.unique2 = 0
	 order by TENKTUP1.unique1, TENKTUP2.unique1';

4)
ij> get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 and TENKTUP1.unique2 < 6000
	 and TENKTUP2.unique2 = 0
	 order by TENKTUP1.unique1, TENKTUP2.unique1';

5)
ij> get cursor c as
	'select * from TENKTUP1, TENKTUP2
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 and TENKTUP1.unique2 < 6000
	 and TENKTUP2.unique2 = 0
	 order by TENKTUP1.unique2, TENKTUP2.unique2';

6)
ij> get cursor c as
	'select * from TENKTUP1, TENKTUP2, ONEKTUP
	 where TENKTUP1.unique1 = TENKTUP2.unique1
	 and ONEKTUP.unique1 = TENKTUP1.unique1
	 and TENKTUP1.unique2 = 0
	 and TENKTUP2.unique2 = 0
	 order by TENKTUP1.unique1, TENKTUP2.unique1';

7)
ij> get cursor c as
	'select * from TENKTUP1, TENKTUP2, ONEKTUP
	 where TENKTUP1.unique2 = TENKTUP2.unique2
	 and ONEKTUP.unique2 = TENKTUP1.unique2
	 and TENKTUP1.unique2 = 0
	 and TENKTUP2.unique2 = 0
	 order by TENKTUP1.unique1, TENKTUP2.unique1';



> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Fix Version/s: 11.0.0.0
                   10.5.1.2
                   10.4.2.1

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721244#action_12721244 ] 

Kathey Marsden commented on DERBY-3926:
---------------------------------------

Hi Mamta,

I can go ahead and finish backporting this change to 10.3, 10.2, and 10.1.  I am backporting DERBY-4268 and running tests anyway and so can easily run tests for the two changes together.  The two changes don't have any common files, so I will check in separately,


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

    Fix Version/s: 10.2.2.1
                   10.1.3.3

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.1.3.3, 10.2.2.1, 10.3.3.1, 10.4.2.1, 10.5.2.0, 10.6.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689969#action_12689969 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

Hi Mamta thanks for all the hard work on this.

I just wanted to comment that I think your theory that "may be there is some bug in optimizer 
which thinks that the index used for m1.value can be used to avoid the sorting required for m0.value"
is a really good theory, and I think this is a good idea to explore in depth.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699729#action_12699729 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Bryan, I think you are right that there might be some partial join orders which will never reach complete join order stage because the partial join order is turning out to be too expensive.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695057#action_12695057 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Another interesting thing I noticed (in both working and non-working case) is we do recognize the optimize phase that sorting is required for m0 for the order by clause as shown belowHere, for the join order [2, 0, 1], we identify that sorting is required for M0.
Thread [main] (Suspended)	
	OrderByList.sortRequired(RowOrdering, JBitSet) line: 549	
	Level2OptimizerImpl(OptimizerImpl).costBasedCostOptimizable(Optimizable, TableDescriptor, ConglomerateDescriptor, OptimizablePredicateList, CostEstimate) line: 2248	
	Level2OptimizerImpl(OptimizerImpl).costOptimizable(Optimizable, TableDescriptor, ConglomerateDescriptor, OptimizablePredicateList, CostEstimate) line: 1984	
	FromBaseTable.optimizeIt(Optimizer, OptimizablePredicateList, CostEstimate, RowOrdering) line: 521	
	ProjectRestrictNode.optimizeIt(Optimizer, OptimizablePredicateList, CostEstimate, RowOrdering) line: 316	
	Level2OptimizerImpl(OptimizerImpl).costPermutation() line: 1938	
	SelectNode.optimize(DataDictionary, PredicateList, double) line: 1767	
	CursorNode(DMLStatementNode).optimizeStatement() line: 305	
	CursorNode.optimizeStatement() line: 515	
	GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 367	
	GenericStatement.prepare(LanguageConnectionContext, boolean) line: 88	
	GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 802	
	EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 606	
	ij.executeImmediate(String) line: 329	
	utilMain.doCatch(String) line: 505	
	utilMain.runScriptGuts() line: 347	
	utilMain.go(LocalizedInput[], LocalizedOutput) line: 245	
	Main.go(LocalizedInput, LocalizedOutput) line: 210	
	Main.mainCore(String[], Main) line: 177	
	Main.main(String[]) line: 73	
	Main.main(String[]) line: 73	
	ij.main(String[]) line: 59	
But it is obvious from the query plan for non-working that somehow we later decide to do sort avoidance for m0. I will look more to see where the optimizer changes it mind about the sort requirement for m0.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_notforcheckin_patch2_051109_diff.txt

I am reattaching the earlier patch with little more comments(DERBY3926_notforcheckin_patch2_051109_diff.txt)  to help understand the code for someone who might be looking at the patch. My patch is adding sort node for the query from the wisconsin test as shown below is
connect 'jdbc:derby:wombat';
select * from tenktup1, tenktup2
where tenktup1.unique1 = tenktup2.unique1
order by tenktup1.unique1;
I think the reason for the sort node might be the code below that I have added in my patch
if (currentRowOrdering.orderingRequiredOnTable(optimizable.getTableNumber()))

With the if statement above, I was trying to see if the current optimizable is covering some columns from the order by and if there are constant predicates on those column. I am trying to do this by just looking at currentRowOrdering object. I think the correct code should look at both requiredRowOrdering provided by the user and current row ordering info for the current join order as collected by the optimizer in the currentRowOrdering object. I will work on fixing the if statement code above to see if it gets rid of the sort node for the query above.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711239#action_12711239 ] 

A B commented on DERBY-3926:
----------------------------

mike> I thought it would be cleaner if all the information that was necessary
mike> [...] was actually already located in the current RowOrdering object. 
mike> [...] The work here is that some more information may need to be tracked
mike> in the RowOrdering.

For what little it's worth, I agree, I think this is a nice idea that would in fact be a
bit cleaner.  I think the current patch is acceptable, as well, but if I had the luxury
of choosing, I'd probably go with Mike's approach, if possible...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

    Derby Categories: [High Value Fix, Wrong query result]  (was: [Wrong query result])

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


Let me know if I am understanding what is going on.
To me what looks like is happening is that the optimizer is looking
at each of the join nodes and says the following:
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 :
    always ordered because we only get rows where table3.value = 21857
    but note that it is a non-unique key so multiple rows can come back.

table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
    here it see's that an index scan on table2 will return keys in table2.valu
    order.

Now it seems like the optimizer is saying that since table3 is always ordered
that it can then consider the result set to be solely ordered on
table2.value.  This would be a correct assumption if table3 returned a single
row, but not correct when it returns multiple rows.

In the bug case it looks to me like table3 returns 3 rows where
table3.value = 21857.

table3.id  | table3.value
----------------------------------------------
2147483653 |21857
2147483654 |21857
4294967297 |21857

Now the next part of the join has been forced to use
table2.nonUniqueOnValue_Table2, which is not useful other than providing
sorted access, so it turns into a table scan. The key is that it turns
into 3 table scans, searching for table3.id=table2.id leading to:

ID         |table2.VALUE |VALUE
----------------------------------------------
2147483653 |000002       |21857
2147483654 |000003       |21857
4294967297 |000001       |21857

The code does THREE full scan using the index and thus the rows are not
ordered by table2.value.  So even though we are using a key that should give
us the correct sorted order on table2.value, we are traversing it more than
once so it does not matter.  If the probes had been done using a single
scan with something like where (id = 2147483653 or 2147483654 or 4294967297)
it would have been sorted correctly, but that is not how joins work.

I believe the execution code is doing the right thing.   It seems to me that
the optimizer code is incorrectly using the "always sorted information"
incorrectly.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


If others who know this code are ok with your patch then I am ok.

In looking at the current code I thought it would be cleaner if all the information that was 
necessary to answer the question of whether the current join ordering met the current
order by requirement was actually already located in the current RowOrdering object.

This would not require pushing the information down into the OrderBy.requiredOrdering()
interface.  Instead the information should already be in the RowOrdering object, so that
OrderBy.sortRequired(), would be implemented as it is currently using just information
from the RowOrdering interface.  The work here is that some more information may need
to be tracked in the RowOrdering to implement the additional logic you have come up with.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Tars Joris (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tars Joris updated DERBY-3926:
------------------------------

    Attachment: derby-reproduce.zip

derby.log
output.txt
script.sql
sysinfo.txt

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I reviewed patch 5 and only comment I think requires change is that patch should enable the test you already added to always be run as part of the suite.

nits:
I find the "alwaysOrdered", isColumnAlwaysOrdered, terminology confusing.   Sometimes we need to be checking one row result set, and sometimes we seem it to mean a single value but may be multiple duplicate values.  It would be nice if the routine names reflected what we are counting on.

The new routine you added  isColumnAlwaysOrdered() is doc'd as:
+    * Return true if the column is always ordered. That will be true if the
+    * column has a constant comparison predicate on it.
But the comment in the call to it seems to say it is expecting something different:
* The current order by column does not have any constant
* comparison predicate on it nor does it belong to a
* single row table which means that the rows will not

o some lines over 80

o mostly uses brace on new line, but on the following uses brace on same line:
 if (moreThanOneTableInJoinOrder) {




> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701802#action_12701802 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Thanks for catching that, Kathey. I don't know the answer whether it is ok to create a SQL test out of an attachement which is not marked grant to ASF. Maybe someone else has a definitive answer.

Tars Joris, do you think you can grant the derby-reproduce.zip to ASF?

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698508#action_12698508 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I have done more debugging on the simpler query with optimizer overrides as shown below
SELECT table1.id, m0.value, m1.value FROM  --DERBY-PROPERTIES joinOrder=FIXED 
table2  m1 -- DERBY-PROPERTIES index=key3
,  table2 m0 -- DERBY-PROPERTIES index=key3
, table1 
WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;

For the query above, when the optimizer will start considering the join orders, it is going to assume table2 m1 is 0, table2 m0 is 1 and table1 is 3. So, when I say in my analysis below, that optimizer is working with say join order 0,1,-1, what I mean is optimizer is considering the join order 
m1,m0,-1. -1 means we are not taking into consideration any table for that position at this point. With that in mind, let me share what is happening and what part of it I don't understand.

For the query above, optimizer first considers 0,-1,-1 and in OptimizerImpl.costBasedCostOptimizable line 2248 which is shown as below
if (requiredRowOrdering.sortRequired(currentRowOrdering,assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
The 2 params to the call above are set as follows
At this point, currentRowOrdering	RowOrderingImpl  (id=62)	
	alwaysOrderedOptimizables	Vector<E>  (id=92)	
		elementCount	0	
	columnsAlwaysOrdered	ColumnOrdering  (id=94)	
		columns	Vector<E>  (id=158)	
		myDirection	3	
		tables	Vector<E>  (id=159)	
	currentColumnOrdering	null	
	ordering	Vector<E>  (id=96)	
		elementCount	0	
	unorderedOptimizables	Vector<E>  (id=97)	
		elementCount	0	
(ColumnsAlwaysOrdered in currentRowOrdering is set to Direction: 3 Table 0, Column 3 Table 0, Column 2)
The 2nd param assignedTableMap just has {0} because we are only considering 0 in the join order.

We return for the call above with true and that qualifies us to consider sort avoidance for m1. One of my question is should m1 get quailified to have sort avoidance set to true when we are not really ordering on any column that comes from m1?

Moving on, once we finish with 0,-1,-1 where we decided that sorting can be avoided for m1, we move to the join order 0,1,-1. Again, we come to the code mentioned above for 0,1,-1
if (requiredRowOrdering.sortRequired(currentRowOrdering,assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
If the earlier decision for considering sort avoidance true for m1 was correct(for join order 0,-1,-1), we should definitely return sort avoidance false for join order 0,1,-1 because with m1 as the outermost table, we can't avoid sorting on m0. The 2 params at this point in code are as follows for the call above
The first param is currentRowOrdering	RowOrderingImpl  (id=62)	
	alwaysOrderedOptimizables	Vector<E>  (id=99)	
		elementCount	0	
	columnsAlwaysOrdered	ColumnOrdering  (id=156)	
		columns	Vector<E>  (id=245)	
		myDirection	3	
		tables	Vector<E>  (id=246)	
	currentColumnOrdering	ColumnOrdering  (id=157)	
		columns	Vector<E>  (id=252)	
		myDirection	1	
		tables	Vector<E>  (id=255)	
	ordering	Vector<E>  (id=103)	
		elementCount	1	
		elementData	Object[10]  (id=259)	
		modCount	3	
	unorderedOptimizables	Vector<E>  (id=104)	
		elementCount	0	
(ColumnsAlwaysOrdered in currentRowOrdering is set to Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2)
(currentColumnOrdering in currentRowOrdering is set to Direction: 1 Table 1, Column 3)
(ordering in currentRowOrdering is set to [Direction: 1 Table 1, Column 3])
The 2nd param assignedTableMap just has {0,1} because we are only considering 0 and 1 in the join order.

Going further deep in 0,1,-1 join order, the call above leads to following line in impl.sql.compile.OrderByList.sortRequired(RowOrdering, JBitSet) line: 535	
if ( ! rowOrdering.alwaysOrdered(cr.getTableNumber())) 

In this if block, we say that sorting is not required. In my mind, we should required sorting because the outermost table m1 is not going to be able to satisfy the ordering requirement for m0.value and hence we can't consider sort avoidance when m1 is outermost. But of course, code is deciding that sort avoidance is ok in this case. I will highly appreciate if someone can help me understand this piece of code which ends up making the decision for sorting required or not required. I think the code is mostly centered in org.apache.derby.impl.sql.compile.RowOrderingImple (protocol class org.apache.derby.iapi.sql.compile.RowOrdering). There is not much of javadoc at the class level which is making it hard me to grasp what this class is supposed to do and what are the different memebers in it. I am going to debug further on my own to try to understand it but any help from community will be appreciated.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

    Attachment: wisconsin_10.1_result.zip

Merging the change to 10.1, I see a diff in wisconsin. The main thing is that it is not using optimizer directives because they do not exist in 10.1 I think. I confirmed that the original repro works on the patched 10.1 and plan to update the wisconsin master, but would appreciate if someone could take a look and make sure the 10.1 output looks ok. Attached is the test output (wisconsin_10.1_result.zip)

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698038#action_12698038 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Yes, even if the cost calculation for a plan is wrong, Derby should not return wrong resutls and hence for this jira entry, at this point, it might not be worth it to pursue the lead if the cost calculation is wrong.

I did find one consistently reproducible SQL which will cause the problem behavior whether we are in the same session where the tables/indexes were created or whether we start a fresh database session. So, once the database has been setup, one can open a new ij session and consistently repro the problem case with following optimizer overrides (this way, one does not have to setup the whole database in the same session as the origina problem SQL to repro the problem)
SELECT table1.id, m0.value, m1.value FROM  --DERBY-PROPERTIES joinOrder=FIXED 
table2  m1 -- DERBY-PROPERTIES index=key3
, table2 m0 -- DERBY-PROPERTIES index=key3
, table1 
WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;

So, the important thing is have both m1 and m0 use the index KEY3 which is on the column value on which ordering is happening for table m0.

Now that I have a simple repro case (ie I don't have to go through countless iteration of optimizer for all different join orders and different predicate pulling down in different join orders), I can focus on the problem join order. 

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699739#action_12699739 ] 

A B commented on DERBY-3926:
----------------------------

For what it's worth, I agree with everything Bryan wrote in his April 16th comment :)

> I think that all *chosen* query plans eventually reach a *complete* join order, but a query
> plan which is *discarded* due to being too expensive may never get beyond a *partial*
> join order. Is that true? 

Yes, that's true.

> The comment ("ORDER BY S.A, T.B, S.C") raises the interesting question of the
> situation in which each column, considered individually, is ordered properly, but
> because the ORDER BY clause interleaves columns from different tables, a sort is
> still required.

Yes, it does.  And as I re-think about this, perhaps the code was written for a situation like

    "ORDER BY S.A, T.B"    -- Note that we do *NOT* have "S.C".

In that case a partial join order with "S" would satisfy the first order by column, and the second order by colum, "T.B", would have a table that is not in the join order.  Without the logic in question, I think the method would determine that a sort was required because table "T" wasn't found.  But the logic in question would see if there was anything 'after' T.B, and since there isn't, it would say that the partial join order can avoid a sort *so far*, with the assumption that if the next optimizable to be placed in the join order is "T", we might be able to avoid the sort entirely.

As soon as "S.C" gets added to the list, though, the logic sees that we have interleaving columns and therefore correctly requires a sort.

So *if* that's a correct statement of how the code is *supposed* to work, then it is actually quite useful and it does make sense.  But there seems to be a glitch in the logic--namely, it should perhaps require that a) all of the tables for the LEADING SET of order by columns, up to the one whose table cannot be found, MUST exist within the join order, *and* b) the leading set of order by columns canNOT be empty.  I think the code as written checks for "a", but it does not check for "b".

So in the query for this issue, we have "ORDER BY m0.value".  When we get a partial join order with { m1 } in it, we check the order by column "m0.value" and find that "m0" is not in the (partial) join order.  Today, due to the lack of condition "b", we think we can avoid the sort.  But if condition "b" was in place, we would see that the "leading set" of order by columns--i.e. the number of order by columns before "m0.value", is EMPTY, which means that "so far" nothing is sorted, and thus the sort would be required.

I haven't actually tried that out, I'm just writing as things occur to me, so this could be incomplete and/or entirely incorrect...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701593#action_12701593 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I am working on the wisconsin test so I can print out the test results and see if the code changes suggested by Army affects the data output in anyways.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3926:
----------------------------------

    Attachment: derby-3926_10.3_mergeattempt.txt

Attached is a patch for my first attempt at merging this to 10.3.  This should not be committed.

  The new test is failing seeing extra columns in the query:
  sql1 = "select c.col1, b.col1, a.col1 from a, b, c where a.col1=1 "+
        "and b.col1 = 2 and c.col1=3 order by c.col1, b.col1, a.col1";

Probably one of those phantom column bugs needs to be backported too.  I'll take a look and see.  
) testAdditionalOrderByCases(org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance)junit.framewo
sertionFailedError: Unexpected column count: expected:<3> but was:<5>
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:750)
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:700)
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:658)
       at org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance.testAdditionalOrderByCases(Order
SortAvoidance.java:10102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
       at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:89)
       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
       at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
       at junit.extensions.TestSetup.run(TestSetup.java:23)
       at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
) testAdditionalOrderByCases(org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance)junit.framewo
sertionFailedError: Unexpected column count: expected:<3> but was:<5>
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:750)
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:700)
       at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java:658)
       at org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance.testAdditionalOrderByCases(Order
SortAvoidance.java:10102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
       at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:89)
       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
       at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
       at junit.extensions.TestSetup.run(TestSetup.java:23)
       at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
       at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
       at junit.extensions.TestSetup.run(TestSetup.java:23)
       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
       at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
       at junit.extensions.TestSetup.run(TestSetup.java:23)
       at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)

AILURES!!!
ests run: 10,  Failures: 2,  Errors: 0


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


I was expecting a fix like the following, maybe I don't understand the code path that you
are proposing.  The key point seems to be whether the other optimizables in the join order which have been marked
"already sorted" are single row result sets or not.

I don't think the issue is whether a "useful" index is being used or not.  Even a useful index
in the middle of a join may result in multiple probes.

So was expecting something like:

if (not outer most join)
    if (all previous join nodes which have been marked already sorted are single row result sets)
        consider sort avoidance on the current join node



> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683974#action_12683974 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I ran the reporducible test case with couple different releases of Derby and noticed following
1)on trunk, if I quit out of the ij session which does the setup and runs the query with incorrect results, and then start a new session and run the query again it returns correct result. I didn't have to drop and recreate the index for it to work. The plan used in the 2 cases for the query are different
2)I found the same behavior in 10.2 codeline
3)With 10.1 codeline, I ran into npe when trying to run the setup script. Not sure what is going on there. I will enter a jira entry for the 10.1 behavior.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Tars Joris (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tars Joris updated DERBY-3926:
------------------------------

    Attachment: test-script.zip

Test script that can be included in the Test-Suite.

Some data was changed, but I verified that it still reproduced the bug.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717880#action_12717880 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Committed the patch into trunk with revision 783168. In next couple days, will start backporting it to earlier codelines.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_patch3_051509_stat.txt
                DERBY3926_patch3_051509_diff.txt

I have attached a patch (DERBY3926_patch3_051509_diff.txt)  that fixes the reproducible order by query case. The problem was that when considering inner optimizable nodes that required ordering(but the user query has no constant comparison predicate(s) on those columns), we did not check if the previous optimizables all returned single rows before deciding to avoid sorting. If the previous optimizables return more than one row, then that would require multiple scans into the inner optimizable and the rows satisfied by the multiple scans may or may not be ordered as per the user query requirement. 

The new logic has gone into impl\sql\compile\OptimizerImpl.java and the supporting code to determine if the current inner optimizable requires ordering on columns with no constant comparison predicates on them is in impl\sql\compile\OrderByList.java

The logic has also been explained in detail in OptimizerImpl.java I will appreciate if someone can take a look at it. I have run junit tests and derbyall. There are two failures in derbyall. One of them is for T_RawStoreFactory which I think is existing known jira issue DERBY-3993. The other failure is in wisconsin test. Only one sql is failing in wisconsin. That sql with some optimizer overrides looks as follows.
select * from   --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1   -- DERBY-PROPERTIES index=TK1UNIQUE1
, TENKTUP2  -- DERBY-PROPERTIES index=TK2UNIQUE1
where TENKTUP1.unique1 = TENKTUP2.unique1
order by TENKTUP1.unique1, TENKTUP2.unique1;

As the name of the columns suggest, there are unique indexes on the columns we are dealing with in the query above. With my changes in the patch, now we are requiring a sort on the top of the query plan. Prior to my changes, we were avoiding sort on this query. The reason we are requiring sort as per the new logic is - The outermost optimizable is TENKTUP1 and it is going to return more than one row. Next, we consider TENKTUP2 as the inner optimizable. We see that user has requested ordering on
TENKTUP2.unique1 and there is no constant predicate on TENKTUP2.unique1 AND the previous optimizable is not one row resultset and because of these conditions, we require that sorting is necessary. I think ideally, we should be able to avoid sort because even though the previous optimizable is returning more than one row, the current optimizable has equality check with the previous optimizable (on the ordered columns) and hence even though there will be multiple scans into current optimizable, the rows will
all be ordered because of the equality check. I haven't given this additional logic much thought. I will look more into it.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723318#action_12723318 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Kathey, I looked through the diff and like you said, it may have to do with no way of using optimizer overrides to have the optimizer use the specific join order. One other thing that can be done(though not necessary) is to run it on other codelines without optimizer overrides to see if we similar diff there.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>         Environment: Checked into 10.3. This will still go to 10.2 and 10.1
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.1.2, 11.0.0.0
>
>         Attachments: d3926_repro.sql, derby-3926_10.3_mergeattempt.txt, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, DERBY3926_patch5_052709_diff.txt, DERBY3926_patch5_052709_stat.txt, DERBY3926_patch6_060309_diff.txt, DERBY3926_patch6_060309_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip, wisconsin_10.1_result.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3926:
----------------------------------


My take on the most recent queryplan that you posted is that it does need a sort node as the outermost node
is a tablescan, not an ordered index scan.  Having said that it seems likely that somehow your change incorrectly
eliminated a sort avoidance path, and then that plan was costed differently and not chosen.  To be sure you could
take the problem wisconsin query, force the old join order/index choice using hints and then see if your new code
chooses sort avoidance.

It seems to me the code you want to change is the path in current code where it has just chosen to add a node
to the query ordering vector (not look at the ordering vector after the fact).  At the point the code is about to add
a node, then you want to do the checks you have in your patch.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711252#action_12711252 ] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

Mike asked "Do you know how the current code determines that sort can be avoided in the one remaining 
case? Does it do something special with the equality condition? " I do not know how the code knows about avoiding the sort in case of following query. I will spend some time understanding the code path.
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt, DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt, DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024). I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.