You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by de...@db.apache.org on 2004/09/30 22:23:40 UTC

[jira] Created: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Message:

  A new issue has been created in JIRA.

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/DERBY-25

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: DERBY-25
    Summary: INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
       Type: Bug

     Status: Unassigned
   Priority: Minor

    Project: Derby
 Components: 
             SQL
   Versions:
             10.0.2.0

   Assignee: 
   Reporter: Tulika Agrawal

    Created: Thu, 30 Sep 2004 1:23 PM
    Updated: Thu, 30 Sep 2004 1:23 PM

Description:
Reporting for Mamta Satoor.

If we use insert into desttable, select 
distinct from source, into a desttable which has autoincrement 
column in it, we might see gaps in the autoincrement column if 
there are duplicated rows in the source table. The reason for 
this is Derby projects values into destination table columns before 
building a distinct resultset from the source table. The piece 
of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 




---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-25?page=all ]
     
Satheesh Bandaram resolved DERBY-25:
------------------------------------

    Resolution: Duplicate

This is a Duplicate of Derby-3.

> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://issues.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by "Christian d'Heureuse (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-25?page=comments#action_54693 ]
     
Christian d'Heureuse commented on DERBY-25:
-------------------------------------------

This is the same bug as DERBY-3 (http://issues.apache.org/jira/browse/DERBY-3)

> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://issues.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

[jira] Closed: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-25?page=all ]
     
Satheesh Bandaram closed DERBY-25:
----------------------------------


Duplicate of Derby-3.

> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://issues.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by "Shreyas Kaushik (JIRA)" <de...@db.apache.org>.

     [ http://nagoya.apache.org/jira/browse/DERBY-25?page=comments#action_56020 ]
     
Shreyas Kaushik commented on DERBY-25:
--------------------------------------

I have been working on this trying to resolve this. Here are a few comments on this:

1) The main reason for this happeing is the auto increment value getting projected before the duplicate rows are discovered. But I was not able to get a row from the source ResultSet without actually getting the projected value.

2) From the activation I discovered that a hierarchy of ResultSets are involved here, where each ResultSet has a specific job and it least bothers where data is coming from or where it is going to send the data. So in this case the ProjectRestrictResultSet is where the projection happens. Any call to get the next row from the source is routed through this ResultSet where the actual projection happens.

3) One solution that I thought about was doing the elimination of duplicate rows before we do the Projection, since I was unable to figure out how to bypass the projection ( as stated in the above comment) , I thought of using the SYS.SYSCOLUMNS table where Derby stores the latest auto increment value and increment ident and all other details about the table and the column itself.

4) But one problem in using this table is that whenever updates need to happen to this table it gets locked and if we are in the middle of transaction and some errors take place the table might be in a in consistent state. The Derby docs also say that in the case of using an auto increment column, if a transaction that is using this is rolled back gaps might result as the SYS.SYSCOLUMNS table is not locked for rolling back the auto increment values.



> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://nagoya.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by Daniel John Debrunner <dj...@debrunners.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreyas Kaushik wrote:

> Hi Dan,
>
>    Thanks for all the info and suggestions you gave about going around a
> thing like this. Definitely going through the code is the best way to
> understand Derby internals.
>
> I did more digging on the activation plan and figured out the ResultSet
> tree that is formed which propogates the results through them. This is
> how the tree looks.
>
> InsertResultSet -> NormalizeResultSet ->SortResultSet
> ->ProjectRestrictResultSet.
>
> The actual projection of the values happens in the doProjection method
> in the ProjectRestrictResultSet and control returns back to the
> SortResultSet. This in turn calls the MergeInserter(implements
> SortController interface) to actaully eliminate duplicate rows.
>   So for a table that has an indentity column that is generated, value
> is projected for all the rows and duplicate rows are eliminated in the
> SortResultSet, hence the gap in the identity column keys.
>
> So a solution to this bug will invovle determing the timing of the
> projection. When *select distinct* is used a source ResultSet with no
> repetitions needs to be built before the projection is done.
>
> Some sugestions:
>
> i) Now whether doing the above would cause the projection to happen
>      elsewhere.
> ii)  making the doProjection a public method .
> iii) is there a better approach to do this than the above two ?
>
> My suggestion would be to do the projection after the sorter has
> finished eliminating the duplicate rows. ( Still need to work on this
> from an implementation point of view)

You are heading in the right direction I think, but keep in mind I don't
think you want to change how the basic building blocks (ResultSet
implementations) work, just how they are put together. Thus your option
ii) is not the correct approach.

It would be interesting to also look at the query plan/tree for a simple
insert into the table (insert into t values (?,?)) and for the select
distinct by itself (without the insert). Looking at those may give you a
better idea of how the problem could be solved. It may turn out the
correct tree should be something like

InsertResultSet -> NormalizeResultSet -> ProjectRestrictResultSet(1)
- ->SortResultSet->ProjectRestrictResultSet(2).

Where the identity column is handled in ProjectRestrictResultSet(1)
above the SortResultSet, while the lower ProjectRestrictResultSet(2)
handles any projection/restriction from the SELECT statement.

Dan.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBrdYwIv0S4qsbfuQRAm3lAKDIjStY4fMlLkgXHWP8ox893TkxZgCcCS8X
LppzOTUzM2fLROupa6hLRNg=
=Lqoz
-----END PGP SIGNATURE-----

Re: [jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by Daniel John Debrunner <dj...@debrunners.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreyas Kaushik (JIRA) wrote:

>      [
http://nagoya.apache.org/jira/browse/DERBY-25?page=comments#action_55864 ]
>
> Shreyas Kaushik commented on DERBY-25:
> --------------------------------------
>
> I was looking at this issue, trying to make a patch for this. I use
the following logic to do this:
>
> 1) Initially previous row is null
>
> 2) SInce this is the first row add it to the temp ResultSet for later
insertion.
>
> 3) After this store the current Row as previous Row.
>
> 4) For every pass compare previous row and current row, if same don't
insert otherwise
>    insert and do a projection.
>
>   I will put the code that I wrote for this inline here. At this point
of time I still have some failures while running the tests which I hope
to resolve soon.
>

My guess is that you are now eliminating rows from a result set that
should not be eliminated. ProjectRestrictResultSet is a general purpose
result set to filter rows both horizontally (predicates) and vertically
(columns). It seems from your changes (the call to isEquals) the code
will always eliminate duplicate consecutive rows. Thus in a simple query
like

select a, b from t where c = ?

the code now will incorrectly remove adjacent rows that happen to have
the same column values. Note that in many cases the input to
ProjectRestrictResultSet will not be ordered, which is what you change
would require.

The result set classes (which are not java.sql.ResultSet) are arranged
together in a tree by the generate class to fulfill the functionality of
the SQL statement. For a query rows flow from the bottom of the tree
upwards, towards the top result set, which is the result set that plugs
into the JDBC result set that forms the result of the query. Thus my
simple query above may be formed by a ProjectRestrictResultSet above a
TableScanResultSet. In general any result set does not know or care what
type of result set(s) is feeding it or what kind of result set it is
feeding into.

I don't how how this 'insert ... select distinct from ...' query is put
together in terms of result sets, and thinking about it briefly I not
sure I understand how the bug is happening. It would be useful if you
could get the query plan (the layout of the tree of result sets) for the
complete statement, and then see where the identity column is being
defined. Maybe it is happening too low in the tree.

This is a great way to learn Derby, get in the code, modify it and if it
doesn't seem to work the way you thought it would, ask the list!
Dan.

> thanks
> Shreyas
>
> --------------------------------Diff for
ProjectRestrictResultSet--------------------------------
> Index:
java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java
> ===================================================================

> +            if (candidateRow != null)
>  			{
> +                retVal = isEquals(candidateRow);
> +
> +                previousRow = candidateRow.getClone();
> +
> +                if(!retVal && !firstRow)
> +                    continue;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBpgQ6Iv0S4qsbfuQRAohqAKCKXHA8EVN9wT7E8o/8gq/rx2cgowCff5dj
QI89lVi4t/BGDZu1vkU2PB8=
=0fVd
-----END PGP SIGNATURE-----

[jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by "Shreyas Kaushik (JIRA)" <de...@db.apache.org>.

     [ http://nagoya.apache.org/jira/browse/DERBY-25?page=comments#action_55864 ]
     
Shreyas Kaushik commented on DERBY-25:
--------------------------------------

I was looking at this issue, trying to make a patch for this. I use the following logic to do this:

1) Initially previous row is null

2) SInce this is the first row add it to the temp ResultSet for later insertion.

3) After this store the current Row as previous Row.

4) For every pass compare previous row and current row, if same don't insert otherwise
   insert and do a projection.

  I will put the code that I wrote for this inline here. At this point of time I still have some failures while running the tests which I hope to resolve soon.

thanks
Shreyas

--------------------------------Diff for ProjectRestrictResultSet--------------------------------
Index: java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java
===================================================================
--- java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java	(revision 106542)
+++ java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java	(working copy)
@@ -43,8 +43,10 @@
 import org.apache.derby.iapi.error.StandardException;
 
 import org.apache.derby.iapi.types.RowLocation;
+import org.apache.derby.iapi.types.Orderable;
 
 import org.apache.derby.catalog.types.ReferencedColumnsDescriptorImpl;
+import org.apache.derby.impl.jdbc.Util;
 
 
 /**
@@ -77,6 +79,9 @@
 
 	private ExecRow projRow;
 
+    private ExecRow previousRow = null;
+    private boolean firstRow = false;
+
     //
     // class interface
     //
@@ -245,6 +250,7 @@
 	    boolean restrict = false;
 	    DataValueDescriptor restrictBoolean;
 		long	beginRT = 0;
+            boolean retVal;
 
 		/* Return null if open was short circuited by false constant expression */
 		if (shortCircuitOpen)
@@ -256,8 +262,17 @@
 	    do 
 		{
 			candidateRow = source.getNextRowCore();
-			if (candidateRow != null) 
+
+            if (candidateRow != null)
 			{
+                retVal = isEquals(candidateRow);
+
+                previousRow = candidateRow.getClone();
+
+                if(!retVal && !firstRow)
+                    continue;
+
+
 				beginRT = getCurrentTimeMillis();
 				/* If restriction is null, then all rows qualify */
 				if (restriction == null)
@@ -492,7 +507,8 @@
 		if (projection != null)
 		{
 	        result = (ExecRow) projection.invoke(activation);
-		}
+        }
+
 		else
 		{
 			result = mappedResultRow;
@@ -501,10 +517,9 @@
 		// Copy any mapped columns from the source
 		for (int index = 0; index < projectMapping.length; index++)
 		{
-			if (projectMapping[index] != -1)
-			{
-				result.setColumn(index + 1, sourceRow.getColumn(projectMapping[index]));
-			}
+            if(projectMapping[index] != -1) {
+			    result.setColumn(index + 1, sourceRow.getColumn(projectMapping[index]));
+            }
 		}
 
 		/* We need to reSet the current row after doing the projection */
@@ -528,6 +543,27 @@
 		return source.isForUpdate();
 	}
 
+    private boolean isEquals(ExecRow candidateRow)  throws StandardException {
+        boolean notEqual = false;
+        DataValueDescriptor []sourceDesc, destDesc;
+
+        // Do the checking if a row is being repeated
+        if(previousRow != null) {
+            firstRow = false;
+            sourceDesc = candidateRow.getRowArray();
+            destDesc = previousRow.getRowArray();
+
+            for ( int i = 0; i < destDesc.length; i++) {
+                // Do the actual comparision here
+                if(!(sourceDesc[i].compare(Orderable.ORDER_OP_EQUALS,destDesc[i],true,true))) {
+                   notEqual = true;
+                }
+            }
+        }
+        if(previousRow == null)
+           firstRow = true;
+        return notEqual;
+    }
 }
-------------------------------------Diff for ProjectRestrictResultSet ends--------------------------


> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://nagoya.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira