You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Mamta Satoor <ms...@gmail.com> on 2005/05/06 01:47:57 UTC

[PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Hi,

I have a patch for this optimizer bug. Basically, the issue turned out
to be the logic for DISTINCT elimination. During the optimization
phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
class's returnsAtMostSingleRow() method gets called. This method
returns true if the method concludes that DISTINCT in the query is
redundant (based on a complex logic that decides that the query is
going to return distinct rows on its own without the DISTINCT clause.
The details of the current logic for DISTINCT elimination can be found
in the comments at the method level.)

For the query in question in this bug, the method returned true for
DISTINCT elimination which is wrong. The explanation is as follows.

First of all, I was able to simplify the query reported in the bug to
following query.
select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
(select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))

This query gets converted to following during optimization
select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
(select  q3."NO1" from IDEPT q3 where
(  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;

This optimized query has 4 predicates associated with it
q3.reportto_no = q2.no1
q2.discrim_dept = 'HardwareDept'
q1.descrim_dept = 'SoftwareDept'
q1.no1 = q3.no1

Next, on this optimized query(since it has DISTINCT clause in it), the
returnsAtMostSingleRow() method gets called. The method incorrectly
returns true indicating that DISTINCT can be eliminated. The reason
for this is that method is looking at predicates that belong to the
inside query with the exists clause (which is on table IDEPT q3) to
determine DISTINCT elimination for the outer level.

The fix is that the predicates from the exists query, (in this
particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
should not be considered when deciding elimination of DISTINCT in the
outer query. That is what the attached patch does.

Hope this helps understand the problem and the proposed fix for it.
The files impacted by the change are as follows
svn stat
M java\engine\org\apache\derby\impl\sql\compile\FromList.java
M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out

Please send in comments you may have. I have run the existing tests
and the patch didn't cause any failures.

thanks,
Mamta

Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Posted by Mamta Satoor <ms...@gmail.com>.
Can someone please review this for me?

thanks,
Mamta

On 5/5/05, Mamta Satoor <ms...@gmail.com> wrote:
> Hi,
> 
> I have a patch for this optimizer bug. Basically, the issue turned out
> to be the logic for DISTINCT elimination. During the optimization
> phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
> class's returnsAtMostSingleRow() method gets called. This method
> returns true if the method concludes that DISTINCT in the query is
> redundant (based on a complex logic that decides that the query is
> going to return distinct rows on its own without the DISTINCT clause.
> The details of the current logic for DISTINCT elimination can be found
> in the comments at the method level.)
> 
> For the query in question in this bug, the method returned true for
> DISTINCT elimination which is wrong. The explanation is as follows.
> 
> First of all, I was able to simplify the query reported in the bug to
> following query.
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> (select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
> 
> This query gets converted to following during optimization
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
> (select  q3."NO1" from IDEPT q3 where
> (  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
> 
> This optimized query has 4 predicates associated with it
> q3.reportto_no = q2.no1
> q2.discrim_dept = 'HardwareDept'
> q1.descrim_dept = 'SoftwareDept'
> q1.no1 = q3.no1
> 
> Next, on this optimized query(since it has DISTINCT clause in it), the
> returnsAtMostSingleRow() method gets called. The method incorrectly
> returns true indicating that DISTINCT can be eliminated. The reason
> for this is that method is looking at predicates that belong to the
> inside query with the exists clause (which is on table IDEPT q3) to
> determine DISTINCT elimination for the outer level.
> 
> The fix is that the predicates from the exists query, (in this
> particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
> should not be considered when deciding elimination of DISTINCT in the
> outer query. That is what the attached patch does.
> 
> Hope this helps understand the problem and the proposed fix for it.
> The files impacted by the change are as follows
> svn stat
> M java\engine\org\apache\derby\impl\sql\compile\FromList.java
> M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> 
> Please send in comments you may have. I have run the existing tests
> and the patch didn't cause any failures.
> 
> thanks,
> Mamta
> 
> 
>

Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Posted by Jack Klebanoff <kl...@sbcglobal.net>.
+1

The revised patch looks OK to me. I think that it should be submitted if 
derbyall passes. Probably derbylang is enough.

Jack Klebanoff
Mamta Satoor wrote:

>Hi Jack,
>
>Appreciate you taking the time to do the review and catching the
>predicate with expression. I have changed the code and also added a
>test case for it. I have fired the derbyall suite on my codeline to
>make sure everything else runs smoothly. Attached is the updated patch
>anyways.
>
>thanks,
>Mamta
>
>On 5/11/05, Jack Klebanoff <kl...@sbcglobal.net> wrote:
>  
>
>>The patch does not completely fix the problem. It does not handle the
>>case where the exists table column is embedded in an expression. Try the
>>following variation on the test select:
>>
>>select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
>>where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
>>and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
>>(select  q3."NO1" from IDEPT q3 where  ( ABS(q3."REPORTTO_NO") =  q2."NO1")))
>>
>>Because q3."REPORTTO_NO" is inside a call to ABS the code added to
>>FromList.returnsAtMostSingleRow does not see it.
>>
>>I would suggest using
>>
>>                       JBitSet referencedTables =
>>and.getLeftOperand().getTablesReferenced();
>>                       if( referencedTables.get( existsTableNumber))
>>                       {
>>
>>predicatesTemp.removeElementAt(predicatesTempIndex);
>>                           break;
>>                       }
>>
>>instead of
>>                       BinaryRelationalOperatorNode beon =
>>(BinaryRelationalOperatorNode)
>>                           and.getLeftOperand();
>>                       ValueNode left = beon.getLeftOperand();
>>                       ValueNode right = beon.getRightOperand();
>>
>>                       /* If left or right side of predicate refer to
>>exists base table,
>>                       then remove it */
>>                       if ((left instanceof ColumnReference) &&
>>                           ((ColumnReference) left).getTableNumber() ==
>>existsTableNumber)
>>                       {
>>
>>predicatesTemp.removeElementAt(predicatesTempIndex);
>>                           break;
>>                       }
>>                       else if ((right instanceof ColumnReference) &&
>>                           ((ColumnReference) right).getTableNumber()
>>== existsTableNumber)
>>                       {
>>
>>predicatesTemp.removeElementAt(predicatesTempIndex);
>>                           break;
>>                       }
>>
>>I have tried it out and it seems to work.
>>
>>Jack Klebanoff
>>
>>    
>>


Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Posted by Mamta Satoor <ms...@gmail.com>.
Thanks for committing the patch. Also, the tests ran fine with no
failure with the patch.

Mamta

On 5/11/05, Satheesh Bandaram <sa...@sourcery.org> wrote:
> Submitted this patch. Thanks for addressing the review comments quickly.
> 
> Satheesh
> 
> Sending       
> java\engine\org\apache\derby\impl\sql\compile\FromList.java
> Sending       
> java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> Sending       
> java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> Transmitting file data ...
> Committed revision 169735.
> 
> Mamta Satoor wrote:
> 
> Hi Jack,

Appreciate you taking the time to do the review and catching
> the
predicate with expression. I have changed the code and also added a
test
> case for it. I have fired the derbyall suite on my codeline to
make sure
> everything else runs smoothly. Attached is the updated
> patch
anyways.

thanks,
Mamta

On 5/11/05, Jack Klebanoff
> <kl...@sbcglobal.net> wrote:
 
> The patch does not completely fix the problem. It does not handle the
case
> where the exists table column is embedded in an expression. Try
> the
following variation on the test select:

select distinct q1."NO1" from
> IDEPT q1, IDEPT q2
where ( q2."DISCRIM_DEPT" = 'HardwareDept')
and (
> q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
(select q3."NO1"
> from IDEPT q3 where ( ABS(q3."REPORTTO_NO") = q2."NO1")))

Because
> q3."REPORTTO_NO" is inside a call to ABS the code added
> to
FromList.returnsAtMostSingleRow does not see it.

I would suggest using

> JBitSet referencedTables =
and.getLeftOperand().getTablesReferenced();
 if(
> referencedTables.get( existsTableNumber))
> {

predicatesTemp.removeElementAt(predicatesTempIndex);
 break;
 }

instead
> of
 BinaryRelationalOperatorNode beon =
(BinaryRelationalOperatorNode)
> and.getLeftOperand();
 ValueNode left = beon.getLeftOperand();
 ValueNode
> right = beon.getRightOperand();

 /* If left or right side of predicate
> refer to
exists base table,
 then remove it */
 if ((left instanceof
> ColumnReference) &&
 ((ColumnReference) left).getTableNumber()
> ==
existsTableNumber)
> {

predicatesTemp.removeElementAt(predicatesTempIndex);
 break;
 }
 else if
> ((right instanceof ColumnReference) &&
 ((ColumnReference)
> right).getTableNumber()
== existsTableNumber)
> {

predicatesTemp.removeElementAt(predicatesTempIndex);
 break;
 }

I have
> tried it out and it seems to work.

Jack Klebanoff

Mamta Satoor wrote:

 
> Hi,

I have a patch for this optimizer bug. Basically, the issue turned
> out
to be the logic for DISTINCT elimination. During the optimization
phase,
> if a query has DISTINCT clause, then impl.sql.compile.FromList
class's
> returnsAtMostSingleRow() method gets called. This method
returns true if the
> method concludes that DISTINCT in the query is
redundant (based on a complex
> logic that decides that the query is
going to return distinct rows on its
> own without the DISTINCT clause.
The details of the current logic for
> DISTINCT elimination can be found
in the comments at the method level.)

For
> the query in question in this bug, the method returned true for
DISTINCT
> elimination which is wrong. The explanation is as follows.

First of all, I
> was able to simplify the query reported in the bug to
following
> query.
select distinct q1."NO1" from IDEPT q1, IDEPT q2
where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
(select q3."NO1" from IDEPT q3 where
> (q3."REPORTTO_NO" = q2."NO1")))

This query gets converted to following
> during optimization
select distinct q1."NO1" from IDEPT q1, IDEPT q2
where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and not exists (
(select q3."NO1" from IDEPT q3 where
(
> q3."REPORTTO_NO" = q2."NO1" and q3."NO1" = q1."NO1") ) ) ;

This optimized
> query has 4 predicates associated with it
q3.reportto_no =
> q2.no1
q2.discrim_dept = 'HardwareDept'
q1.descrim_dept =
> 'SoftwareDept'
q1.no1 = q3.no1

Next, on this optimized query(since it has
> DISTINCT clause in it), the
returnsAtMostSingleRow() method gets called. The
> method incorrectly
returns true indicating that DISTINCT can be eliminated.
> The reason
for this is that method is looking at predicates that belong to
> the
inside query with the exists clause (which is on table IDEPT q3)
> to
determine DISTINCT elimination for the outer level.

The fix is that the
> predicates from the exists query, (in this
particular case, q3."NO1" =
> q1."NO1" and q3.reportto_no = q2.no1)
should not be considered when deciding
> elimination of DISTINCT in the
outer query. That is what the attached patch
> does.

Hope this helps understand the problem and the proposed fix for
> it.
The files impacted by the change are as follows
svn stat
M
> java\engine\org\apache\derby\impl\sql\compile\FromList.java
M
> java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
M
> java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out

Please
> send in comments you may have. I have run the existing tests
and the patch
> didn't cause any failures.

thanks,
Mamta


 
> >
 
> ________________________________
> 
Index:
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
===================================================================
---
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
> (revision 169642)
+++
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
> (working copy)
@@ -1040,7 +1040,7 @@
 }
 
 /**
- * Decrement (query block)
> level (0-based) for 
+ * Decrement (query block) level (0-based) for
 * all
> of the tables in this from list.
 * This is useful when flattening a
> subquery.
 *
@@ -1069,7 +1069,7 @@
 }
 }
 
- 
+
 /**
 * This method is used
> for both subquery flattening and distinct
 * elimination based on a
> uniqueness condition. For subquery
@@ -1079,7 +1079,7 @@
 * any duplicates.
> * This is true if every table in the from list is
 * (a base table and the
> set of columns from the table that
- * are in equality comparisons with
> expressions that do not include columns 
+ * are in equality comparisons
> with expressions that do not include columns
 * from the same table is a
> superset of any unique index
 * on the table) or an EXISTS FBT. In addition,
> at least 1 of the tables
 * in the list has a set of columns in equality
> comparisons with expressions
@@ -1105,28 +1105,28 @@
 * create an array of
> columns from the table(eqOuterCol)
 * (this is used to determine that only
> one row will be returned
 * from a join)
- * 
+ *
 * if the current table is
> the table for the result columns
 * set the result columns in the eqOuterCol
> and tableColMap
 * (if these columns are a superset of a unique index and
 *
> all joining tables result in only one row, the
 * results will be distinct)
> * go through all the predicates and update tableColMap and
- * eqOuterCol
> with join columns and correlation variables, 
+ * eqOuterCol with join
> columns and correlation variables,
 * parameters and constants
 * since
> setting constants, correlation variables and parameters,
- * reduces the
> number of columns required for uniqueness in a 
+ * reduces the number of
> columns required for uniqueness in a
 * multi-column index, they are set for
> all the tables (if the
 * table is not the result table, in this case only
> the column of the
 * result table is set)
 * join columns are just updated
> for the column in the row of the
 * joining table.
- * 
- * check if the
> marked columns in tableColMap are a superset of a unique 
- * index 
+ *
+ *
> check if the marked columns in tableColMap are a superset of a unique
+ *
> index
 * (This means that the join will only produce 1 row when joined
 *
> with 1 row of another table)
- * check that there is a least one table for
> which the columns in 
+ * check that there is a least one table for which
> the columns in
 * eqOuterCol(i.e. constant values) are a superset of a
> unique index
 * (This quarantees that there will be only one row selected
 *
> from this table).
@@ -1134,8 +1134,8 @@
 * Once all tables have been
> evaluated, check that all the tables can be
 * joined by unique index or
> will have only one row
 *
- * 
 *
+ *
 * @param rcl If non-null, the RCL
> from the query block.
 * If non-null for subqueries, then entry can
 * be
> considered as part of an = comparison.
@@ -1150,8 +1150,8 @@
 *
 *
> @exception StandardException Thrown on error
 */
- boolean
> returnsAtMostSingleRow(ResultColumnList rcl, 
- ValueNode
> whereClause, 
+ boolean
> returnsAtMostSingleRow(ResultColumnList rcl,
+ ValueNode
> whereClause,
 PredicateList wherePredicates,
 DataDictionary dd)
 throws
> StandardException
@@ -1160,6 +1160,13 @@
 int[] tableNumbers;
> ColumnReference additionalCR = null;
 
+ PredicateList predicatesTemp;
+
> predicatesTemp = (PredicateList) getNodeFactory().getNode(
+
> C_NodeTypes.PREDICATE_LIST, getContextManager());
+ int wherePredicatesSize
> = wherePredicates.size();
+ for (int index = 0; index < wherePredicatesSize;
> index++)
+
> predicatesTemp.addPredicate((Predicate)wherePredicates.elementAt(index));
+
> /* When considering subquery flattening, we are interested
 * in the 1st
> (and only) entry in the RCL. (The RCL will be
 * null if result column is
> not of interest for subquery flattening.)
@@ -1193,6 +1200,77 @@
 {
 return
> false;
 }
+ FromBaseTable fbt = (FromBaseTable) prn.getChildResult();
+
> //Following for loop code is to take care of Derby-251 (DISTINCT returns
+
> //duplicate rows).
+ //Derby-251 returned duplicate rows because we were
> looking at predicates
+ //that belong to existsTable to determine DISTINCT
> elimination
+ //
+ //(Check method level comments to understand DISTINCT
> elimination rules.)
+ //
+ //For one specific example, consider the query
> below
+ //select distinct q1."NO1" from IDEPT q1, IDEPT q2
+ //where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
+ //and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
+ //(select q3."NO1" from IDEPT q3
> where (q3."REPORTTO_NO" = q2."NO1")))
+ //(select q3."NO1" from IDEPT q3
> where ( ABS(q3."REPORTTO_NO") = q2."NO1")))
+ //
+ //Table IDEPT in the
> query above has a primary key defined on column "NO1"
+ //This query gets
> converted to following during optimization
+ //
+ //select distinct q1."NO1"
> from IDEPT q1, IDEPT q2
+ //where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+
> //and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and not exists (
+ //(select
> q3."NO1" from IDEPT q3 where
+ //( ( ABS(q3."REPORTTO_NO") = q2."NO1") and
> q3."NO1" = q1."NO1") ) ) ;
+ //
+ //For the optimized query above, Derby
> generates following predicates.
+ //ABS(q3.reportto_no) = q2.no1
+
> //q2.discrim_dept = 'HardwareDept'
+ //q1.descrim_dept = 'SoftwareDept'
+
> //q1.no1 = q3.no1
+ //The predicate ABS(q3."NO1") = q1."NO1" should not be
> considered when trying
+ //to determine if q1 in the outer query has
> equality comparisons. 
+ //Similarly, the predicate q3.reportto_no = q2.no1
> should not be
+ //considered when trying to determine if q2 in the outer
> query has
+ //equality comparisons. To achieve this, predicates based on
> exists base
+ //table q3 (the first and the last predicate) should be
> removed while
+ //evaluating outer query for uniqueness.
+ //
+ if
> (fbt.getExistsBaseTable())
+ {
+ int existsTableNumber =
> fbt.getTableNumber();
+ int predicatesTempSize = predicatesTemp.size();
+
> for (int predicatesTempIndex = predicatesTempSize-1;
+ predicatesTempIndex
> >= 0; predicatesTempIndex--)
+ {
+ AndNode topAndNode = (AndNode)
+
> ((Predicate)
> predicatesTemp.elementAt(predicatesTempIndex)).getAndNode();
+
+
> for (ValueNode whereWalker = topAndNode; whereWalker instanceof AndNode;
+
> whereWalker = ((AndNode) whereWalker).getRightOperand())
+ {
+ // See if
> this is a candidate =
+ AndNode and = (AndNode) whereWalker;
+
+ //we only
> need to worry about equality predicates because only those
+ //predicates
> are considered during DISTINCT elimination.
+ if
> (!and.getLeftOperand().isRelationalOperator() ||
+
> !(((RelationalOperator)(and.getLeftOperand())).getOperator() ==
+
> RelationalOperator.EQUALS_RELOP))
+ {
+ continue;
+ }
+
+ JBitSet
> referencedTables = and.getLeftOperand().getTablesReferenced();
+ if
> (referencedTables.get(existsTableNumber))
+ {
+
> predicatesTemp.removeElementAt(predicatesTempIndex);
+ break;
+ }
+ }
+ }
+
> }
 }
 
 /* Build an array of tableNumbers from this query block.
@@ -1245,7
> +1323,7 @@
 /* Now see if there are any equality conditions
 * of interest
> in the where predicates.
 */
-
> wherePredicates.checkTopPredicatesForEqualsConditions(
+
> predicatesTemp.checkTopPredicatesForEqualsConditions(
> tableNumber, eqOuterCols, tableNumbers,
 tableColMap[index],
> resultColTable);
 
@@ -1298,7 +1376,7 @@
 /* unique key join - exists tables
> already marked as 
 * 1 row - so don't need to look at them
 */
- if
> (!oneRow[i] && tableColMap[i][index].get(0)) 
+ if (!oneRow[i] &&
> tableColMap[i][index].get(0))
 {
 oneRow[i] = true;
 foundOneRow =
> true;
Index:
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
===================================================================
---
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
> (revision 169642)
+++
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
> (working copy)
@@ -12,6 +12,11 @@
 create unique index three_c1 on
> three(c1);
 create table four(c1 int, c2 int, c3 int, c4 int, c5 int);
> create unique index four_c1c3 on four(c1, c3);
+CREATE TABLE "APP"."IDEPT"
> ("DISCRIM_DEPT" VARCHAR(32), "NO1" INTEGER NOT NULL, 
+"NAME" VARCHAR(50),
> "AUDITOR_NO" INTEGER, "REPORTTO_NO" INTEGER, "HARDWAREASSET"
+ VARCHAR(15),
> "SOFTWAREASSET" VARCHAR(15));
+-- primary/unique
+ALTER TABLE "APP"."IDEPT"
> ADD CONSTRAINT "PK_IDEPT" PRIMARY KEY ("NO1");
 
 insert into one values (1,
> 1, 1, 1, 1);
 insert into one values (2, 1, 1, 1, 1);
@@ -51,6 +56,12 @@
> insert into four values (3, 1, 2, 1, 1);
 insert into four values (3, 1, 3,
> 1, 1);
 
+insert into idept values ('Dept', 1, 'Department1', null, null,
> null, null);
+insert into idept values ('HardwareDept', 2, 'Department2',
> 25, 1, 'hardwareaset2', null);
+insert into idept values ('HardwareDept', 3,
> 'Department3', 25, 2, 'hardwareaset3', null);
+insert into idept values
> ('SoftwareDept', 4, 'Department4', 25, 1, null, 'softwareasset4');
+insert
> into idept values ('SoftwareDept', 5, 'Department5', 30, 4, null,
> 'softwareasset5');
+
 call
> SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
> maximumdisplaywidth 20000;
 
@@ -61,6 +72,24 @@
 -- Following runtime
> statistics output should have Distinct Scan in it
 values
> SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
 
+--Derby251
> Distinct should not get eliminated for following query
+--because there is
> no equality condition on unique column of table
+--in the outside
> query
+select distinct q1."NO1", q1."NAME", q1."AUDITOR_NO",
> 
+q1."REPORTTO_NO", q1."DISCRIM_DEPT", q1."SOFTWAREASSET" from 
+IDEPT q1,
> IDEPT q2 where ( q2."DISCRIM_DEPT" = 'HardwareDept') 
+ and (
> q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" 
+<> ALL ( select
> q3."NO1" from IDEPT q3 where ( ( 
+q3."DISCRIM_DEPT" = 'Dept') or (
> q3."DISCRIM_DEPT" = 
+'HardwareDept') or ( q3."DISCRIM_DEPT" =
> 'SoftwareDept') ) 
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+--
+--Another
> test case of Derby251 where the exists table column is embedded in an
> expression.
+select distinct q1."NO1" from IDEPT q1, IDEPT q2
+where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
+and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
+(select q3."NO1" from IDEPT q3 where
> ( ABS(q3."REPORTTO_NO") = q2."NO1")));
+
 -- result ordering is not
> guaranteed, but order by clause will change how
 -- distinct is executed. So
> test by retrieving data into a temp table and
 -- return results ordered
> after making sure the query was executed as expected.
@@ -170,3 +199,4 @@
> drop table two;
 drop table three;
 drop table four;
+drop table
> idept;
Index:
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
===================================================================
---
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
> (revision 169642)
+++
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
> (working copy)
@@ -19,6 +19,13 @@
 0 rows inserted/updated/deleted
 ij>
> create unique index four_c1c3 on four(c1, c3);
 0 rows
> inserted/updated/deleted
+ij> CREATE TABLE "APP"."IDEPT" ("DISCRIM_DEPT"
> VARCHAR(32), "NO1" INTEGER NOT NULL, 
+"NAME" VARCHAR(50), "AUDITOR_NO"
> INTEGER, "REPORTTO_NO" INTEGER, "HARDWAREASSET"
+ VARCHAR(15),
> "SOFTWAREASSET" VARCHAR(15));
+0 rows inserted/updated/deleted
+ij> --
> primary/unique
+ALTER TABLE "APP"."IDEPT" ADD CONSTRAINT "PK_IDEPT" PRIMARY
> KEY ("NO1");
+0 rows inserted/updated/deleted
 ij> insert into one values
> (1, 1, 1, 1, 1);
 1 row inserted/updated/deleted
 ij> insert into one values
> (2, 1, 1, 1, 1);
@@ -87,6 +94,16 @@
 1 row inserted/updated/deleted
 ij>
> insert into four values (3, 1, 3, 1, 1);
 1 row
> inserted/updated/deleted
+ij> insert into idept values ('Dept', 1,
> 'Department1', null, null, null, null);
+1 row inserted/updated/deleted
+ij>
> insert into idept values ('HardwareDept', 2, 'Department2', 25, 1,
> 'hardwareaset2', null);
+1 row inserted/updated/deleted
+ij> insert into
> idept values ('HardwareDept', 3, 'Department3', 25, 2, 'hardwareaset3',
> null);
+1 row inserted/updated/deleted
+ij> insert into idept values
> ('SoftwareDept', 4, 'Department4', 25, 1, null, 'softwareasset4');
+1 row
> inserted/updated/deleted
+ij> insert into idept values ('SoftwareDept', 5,
> 'Department5', 30, 4, null, 'softwareasset5');
+1 row
> inserted/updated/deleted
 ij> call
> SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
 0 rows
> inserted/updated/deleted
 ij> maximumdisplaywidth 20000;
@@ -143,6 +160,31
> @@
 None
 next qualifiers:
 None
+ij> --Derby251 Distinct should not get
> eliminated for following query
+--because there is no equality condition on
> unique column of table
+--in the outside query
+select distinct q1."NO1",
> q1."NAME", q1."AUDITOR_NO", 
+q1."REPORTTO_NO", q1."DISCRIM_DEPT",
> q1."SOFTWAREASSET" from 
+IDEPT q1, IDEPT q2 where ( q2."DISCRIM_DEPT" =
> 'HardwareDept') 
+ and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1"
> 
+<> ALL ( select q3."NO1" from IDEPT q3 where ( ( 
+q3."DISCRIM_DEPT" =
> 'Dept') or ( q3."DISCRIM_DEPT" = 
+'HardwareDept') or ( q3."DISCRIM_DEPT" =
> 'SoftwareDept') ) 
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+NO1 |NAME
> |AUDITOR_NO |REPORTTO_NO|DISCRIM_DEPT |SOFTWAREASSET
> 
+---------------------------------------------------------------------------------------------------------------------------------------
+4
> |Department4 |25 |1 |SoftwareDept |softwareasset4 
+5 |Department5 |30 |4
> |SoftwareDept |softwareasset5 
+ij> --
+--Another test case of Derby251
> where the exists table column is embedded in an expression.
+select distinct
> q1."NO1" from IDEPT q1, IDEPT q2
+where ( q2."DISCRIM_DEPT" =
> 'HardwareDept')
+and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <>
> ALL
+(select q3."NO1" from IDEPT q3 where ( ABS(q3."REPORTTO_NO") =
> q2."NO1")));
+NO1 
+-----------
+4 
+5 
 ij> -- result ordering is not
> guaranteed, but order by clause will change how
 -- distinct is executed. So
> test by retrieving data into a temp table and
 -- return results ordered
> after making sure the query was executed as expected.
@@ -2319,4 +2361,6 @@
> 0 rows inserted/updated/deleted
 ij> drop table four;
 0 rows
> inserted/updated/deleted
+ij> drop table idept;
+0 rows
> inserted/updated/deleted
 ij> 
 
>

Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Posted by Mamta Satoor <ms...@gmail.com>.
Hi Jack,

Appreciate you taking the time to do the review and catching the
predicate with expression. I have changed the code and also added a
test case for it. I have fired the derbyall suite on my codeline to
make sure everything else runs smoothly. Attached is the updated patch
anyways.

thanks,
Mamta

On 5/11/05, Jack Klebanoff <kl...@sbcglobal.net> wrote:
> The patch does not completely fix the problem. It does not handle the
> case where the exists table column is embedded in an expression. Try the
> following variation on the test select:
> 
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> (select  q3."NO1" from IDEPT q3 where  ( ABS(q3."REPORTTO_NO") =  q2."NO1")))
> 
> Because q3."REPORTTO_NO" is inside a call to ABS the code added to
> FromList.returnsAtMostSingleRow does not see it.
> 
> I would suggest using
> 
>                        JBitSet referencedTables =
> and.getLeftOperand().getTablesReferenced();
>                        if( referencedTables.get( existsTableNumber))
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
> 
> instead of
>                        BinaryRelationalOperatorNode beon =
> (BinaryRelationalOperatorNode)
>                            and.getLeftOperand();
>                        ValueNode left = beon.getLeftOperand();
>                        ValueNode right = beon.getRightOperand();
> 
>                        /* If left or right side of predicate refer to
> exists base table,
>                        then remove it */
>                        if ((left instanceof ColumnReference) &&
>                            ((ColumnReference) left).getTableNumber() ==
> existsTableNumber)
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
>                        else if ((right instanceof ColumnReference) &&
>                            ((ColumnReference) right).getTableNumber()
> == existsTableNumber)
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
> 
> I have tried it out and it seems to work.
> 
> Jack Klebanoff
> 
> Mamta Satoor wrote:
> 
> >Hi,
> >
> >I have a patch for this optimizer bug. Basically, the issue turned out
> >to be the logic for DISTINCT elimination. During the optimization
> >phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
> >class's returnsAtMostSingleRow() method gets called. This method
> >returns true if the method concludes that DISTINCT in the query is
> >redundant (based on a complex logic that decides that the query is
> >going to return distinct rows on its own without the DISTINCT clause.
> >The details of the current logic for DISTINCT elimination can be found
> >in the comments at the method level.)
> >
> >For the query in question in this bug, the method returned true for
> >DISTINCT elimination which is wrong. The explanation is as follows.
> >
> >First of all, I was able to simplify the query reported in the bug to
> >following query.
> >select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> >where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> >(select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
> >
> >This query gets converted to following during optimization
> >select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> >where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
> >(select  q3."NO1" from IDEPT q3 where
> >(  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
> >
> >This optimized query has 4 predicates associated with it
> >q3.reportto_no = q2.no1
> >q2.discrim_dept = 'HardwareDept'
> >q1.descrim_dept = 'SoftwareDept'
> >q1.no1 = q3.no1
> >
> >Next, on this optimized query(since it has DISTINCT clause in it), the
> >returnsAtMostSingleRow() method gets called. The method incorrectly
> >returns true indicating that DISTINCT can be eliminated. The reason
> >for this is that method is looking at predicates that belong to the
> >inside query with the exists clause (which is on table IDEPT q3) to
> >determine DISTINCT elimination for the outer level.
> >
> >The fix is that the predicates from the exists query, (in this
> >particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
> >should not be considered when deciding elimination of DISTINCT in the
> >outer query. That is what the attached patch does.
> >
> >Hope this helps understand the problem and the proposed fix for it.
> >The files impacted by the change are as follows
> >svn stat
> >M java\engine\org\apache\derby\impl\sql\compile\FromList.java
> >M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> >M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> >
> >Please send in comments you may have. I have run the existing tests
> >and the patch didn't cause any failures.
> >
> >thanks,
> >Mamta
> >
> >
> 
>

Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows

Posted by Jack Klebanoff <kl...@sbcglobal.net>.
The patch does not completely fix the problem. It does not handle the 
case where the exists table column is embedded in an expression. Try the 
following variation on the test select:

select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
(select  q3."NO1" from IDEPT q3 where  ( ABS(q3."REPORTTO_NO") =  q2."NO1")))

Because q3."REPORTTO_NO" is inside a call to ABS the code added to 
FromList.returnsAtMostSingleRow does not see it.

I would suggest using

                        JBitSet referencedTables = 
and.getLeftOperand().getTablesReferenced();
                        if( referencedTables.get( existsTableNumber))
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }

instead of
                        BinaryRelationalOperatorNode beon = 
(BinaryRelationalOperatorNode)
                            and.getLeftOperand();
                        ValueNode left = beon.getLeftOperand();
                        ValueNode right = beon.getRightOperand();

                        /* If left or right side of predicate refer to 
exists base table,
                        then remove it */
                        if ((left instanceof ColumnReference) &&
                            ((ColumnReference) left).getTableNumber() == 
existsTableNumber)
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }
                        else if ((right instanceof ColumnReference) &&
                            ((ColumnReference) right).getTableNumber() 
== existsTableNumber)
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }

I have tried it out and it seems to work.

Jack Klebanoff

Mamta Satoor wrote:

>Hi,
>
>I have a patch for this optimizer bug. Basically, the issue turned out
>to be the logic for DISTINCT elimination. During the optimization
>phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
>class's returnsAtMostSingleRow() method gets called. This method
>returns true if the method concludes that DISTINCT in the query is
>redundant (based on a complex logic that decides that the query is
>going to return distinct rows on its own without the DISTINCT clause.
>The details of the current logic for DISTINCT elimination can be found
>in the comments at the method level.)
>
>For the query in question in this bug, the method returned true for
>DISTINCT elimination which is wrong. The explanation is as follows.
>
>First of all, I was able to simplify the query reported in the bug to
>following query.
>select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
>where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
>and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
>(select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
>
>This query gets converted to following during optimization
>select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
>where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
>and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
>(select  q3."NO1" from IDEPT q3 where
>(  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
>
>This optimized query has 4 predicates associated with it
>q3.reportto_no = q2.no1
>q2.discrim_dept = 'HardwareDept'
>q1.descrim_dept = 'SoftwareDept'
>q1.no1 = q3.no1
>
>Next, on this optimized query(since it has DISTINCT clause in it), the
>returnsAtMostSingleRow() method gets called. The method incorrectly
>returns true indicating that DISTINCT can be eliminated. The reason
>for this is that method is looking at predicates that belong to the
>inside query with the exists clause (which is on table IDEPT q3) to
>determine DISTINCT elimination for the outer level.
>
>The fix is that the predicates from the exists query, (in this
>particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
>should not be considered when deciding elimination of DISTINCT in the
>outer query. That is what the attached patch does.
>
>Hope this helps understand the problem and the proposed fix for it.
>The files impacted by the change are as follows
>svn stat
>M java\engine\org\apache\derby\impl\sql\compile\FromList.java
>M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
>M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
>
>Please send in comments you may have. I have run the existing tests
>and the patch didn't cause any failures.
>
>thanks,
>Mamta
>  
>