You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Shreyas Kaushik (JIRA)" <de...@db.apache.org> on 2004/11/25 13:31:21 UTC

[jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

     [ http://nagoya.apache.org/jira/browse/DERBY-25?page=comments#action_55864 ]
     
Shreyas Kaushik commented on DERBY-25:
--------------------------------------

I was looking at this issue, trying to make a patch for this. I use the following logic to do this:

1) Initially previous row is null

2) SInce this is the first row add it to the temp ResultSet for later insertion.

3) After this store the current Row as previous Row.

4) For every pass compare previous row and current row, if same don't insert otherwise
   insert and do a projection.

  I will put the code that I wrote for this inline here. At this point of time I still have some failures while running the tests which I hope to resolve soon.

thanks
Shreyas

--------------------------------Diff for ProjectRestrictResultSet--------------------------------
Index: java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java
===================================================================
--- java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java	(revision 106542)
+++ java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java	(working copy)
@@ -43,8 +43,10 @@
 import org.apache.derby.iapi.error.StandardException;
 
 import org.apache.derby.iapi.types.RowLocation;
+import org.apache.derby.iapi.types.Orderable;
 
 import org.apache.derby.catalog.types.ReferencedColumnsDescriptorImpl;
+import org.apache.derby.impl.jdbc.Util;
 
 
 /**
@@ -77,6 +79,9 @@
 
 	private ExecRow projRow;
 
+    private ExecRow previousRow = null;
+    private boolean firstRow = false;
+
     //
     // class interface
     //
@@ -245,6 +250,7 @@
 	    boolean restrict = false;
 	    DataValueDescriptor restrictBoolean;
 		long	beginRT = 0;
+            boolean retVal;
 
 		/* Return null if open was short circuited by false constant expression */
 		if (shortCircuitOpen)
@@ -256,8 +262,17 @@
 	    do 
 		{
 			candidateRow = source.getNextRowCore();
-			if (candidateRow != null) 
+
+            if (candidateRow != null)
 			{
+                retVal = isEquals(candidateRow);
+
+                previousRow = candidateRow.getClone();
+
+                if(!retVal && !firstRow)
+                    continue;
+
+
 				beginRT = getCurrentTimeMillis();
 				/* If restriction is null, then all rows qualify */
 				if (restriction == null)
@@ -492,7 +507,8 @@
 		if (projection != null)
 		{
 	        result = (ExecRow) projection.invoke(activation);
-		}
+        }
+
 		else
 		{
 			result = mappedResultRow;
@@ -501,10 +517,9 @@
 		// Copy any mapped columns from the source
 		for (int index = 0; index < projectMapping.length; index++)
 		{
-			if (projectMapping[index] != -1)
-			{
-				result.setColumn(index + 1, sourceRow.getColumn(projectMapping[index]));
-			}
+            if(projectMapping[index] != -1) {
+			    result.setColumn(index + 1, sourceRow.getColumn(projectMapping[index]));
+            }
 		}
 
 		/* We need to reSet the current row after doing the projection */
@@ -528,6 +543,27 @@
 		return source.isForUpdate();
 	}
 
+    private boolean isEquals(ExecRow candidateRow)  throws StandardException {
+        boolean notEqual = false;
+        DataValueDescriptor []sourceDesc, destDesc;
+
+        // Do the checking if a row is being repeated
+        if(previousRow != null) {
+            firstRow = false;
+            sourceDesc = candidateRow.getRowArray();
+            destDesc = previousRow.getRowArray();
+
+            for ( int i = 0; i < destDesc.length; i++) {
+                // Do the actual comparision here
+                if(!(sourceDesc[i].compare(Orderable.ORDER_OP_EQUALS,destDesc[i],true,true))) {
+                   notEqual = true;
+                }
+            }
+        }
+        if(previousRow == null)
+           firstRow = true;
+        return notEqual;
+    }
 }
-------------------------------------Diff for ProjectRestrictResultSet ends--------------------------


> INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
> --------------------------------------------------------------------------
>
>          Key: DERBY-25
>          URL: http://nagoya.apache.org/jira/browse/DERBY-25
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.0.2.0
>     Reporter: Tulika Agrawal
>     Priority: Minor

>
> Reporting for Mamta Satoor.
> If we use insert into desttable, select 
> distinct from source, into a desttable which has autoincrement 
> column in it, we might see gaps in the autoincrement column if 
> there are duplicated rows in the source table. The reason for 
> this is Derby projects values into destination table columns before 
> building a distinct resultset from the source table. The piece 
> of code doing this is in org.apache.derby.impl.sql.execute.ProjectRestrictResultSet class's getNextRowCore() method where it calls doProjection. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreyas Kaushik wrote:

> Hi Dan,
>
>    Thanks for all the info and suggestions you gave about going around a
> thing like this. Definitely going through the code is the best way to
> understand Derby internals.
>
> I did more digging on the activation plan and figured out the ResultSet
> tree that is formed which propogates the results through them. This is
> how the tree looks.
>
> InsertResultSet -> NormalizeResultSet ->SortResultSet
> ->ProjectRestrictResultSet.
>
> The actual projection of the values happens in the doProjection method
> in the ProjectRestrictResultSet and control returns back to the
> SortResultSet. This in turn calls the MergeInserter(implements
> SortController interface) to actaully eliminate duplicate rows.
>   So for a table that has an indentity column that is generated, value
> is projected for all the rows and duplicate rows are eliminated in the
> SortResultSet, hence the gap in the identity column keys.
>
> So a solution to this bug will invovle determing the timing of the
> projection. When *select distinct* is used a source ResultSet with no
> repetitions needs to be built before the projection is done.
>
> Some sugestions:
>
> i) Now whether doing the above would cause the projection to happen
>      elsewhere.
> ii)  making the doProjection a public method .
> iii) is there a better approach to do this than the above two ?
>
> My suggestion would be to do the projection after the sorter has
> finished eliminating the duplicate rows. ( Still need to work on this
> from an implementation point of view)

You are heading in the right direction I think, but keep in mind I don't
think you want to change how the basic building blocks (ResultSet
implementations) work, just how they are put together. Thus your option
ii) is not the correct approach.

It would be interesting to also look at the query plan/tree for a simple
insert into the table (insert into t values (?,?)) and for the select
distinct by itself (without the insert). Looking at those may give you a
better idea of how the problem could be solved. It may turn out the
correct tree should be something like

InsertResultSet -> NormalizeResultSet -> ProjectRestrictResultSet(1)
- ->SortResultSet->ProjectRestrictResultSet(2).

Where the identity column is handled in ProjectRestrictResultSet(1)
above the SortResultSet, while the lower ProjectRestrictResultSet(2)
handles any projection/restriction from the SELECT statement.

Dan.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBrdYwIv0S4qsbfuQRAm3lAKDIjStY4fMlLkgXHWP8ox893TkxZgCcCS8X
LppzOTUzM2fLROupa6hLRNg=
=Lqoz
-----END PGP SIGNATURE-----


Re: [jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreyas Kaushik (JIRA) wrote:

>      [
http://nagoya.apache.org/jira/browse/DERBY-25?page=comments#action_55864 ]
>
> Shreyas Kaushik commented on DERBY-25:
> --------------------------------------
>
> I was looking at this issue, trying to make a patch for this. I use
the following logic to do this:
>
> 1) Initially previous row is null
>
> 2) SInce this is the first row add it to the temp ResultSet for later
insertion.
>
> 3) After this store the current Row as previous Row.
>
> 4) For every pass compare previous row and current row, if same don't
insert otherwise
>    insert and do a projection.
>
>   I will put the code that I wrote for this inline here. At this point
of time I still have some failures while running the tests which I hope
to resolve soon.
>


My guess is that you are now eliminating rows from a result set that
should not be eliminated. ProjectRestrictResultSet is a general purpose
result set to filter rows both horizontally (predicates) and vertically
(columns). It seems from your changes (the call to isEquals) the code
will always eliminate duplicate consecutive rows. Thus in a simple query
like

select a, b from t where c = ?

the code now will incorrectly remove adjacent rows that happen to have
the same column values. Note that in many cases the input to
ProjectRestrictResultSet will not be ordered, which is what you change
would require.


The result set classes (which are not java.sql.ResultSet) are arranged
together in a tree by the generate class to fulfill the functionality of
the SQL statement. For a query rows flow from the bottom of the tree
upwards, towards the top result set, which is the result set that plugs
into the JDBC result set that forms the result of the query. Thus my
simple query above may be formed by a ProjectRestrictResultSet above a
TableScanResultSet. In general any result set does not know or care what
type of result set(s) is feeding it or what kind of result set it is
feeding into.

I don't how how this 'insert ... select distinct from ...' query is put
together in terms of result sets, and thinking about it briefly I not
sure I understand how the bug is happening. It would be useful if you
could get the query plan (the layout of the tree of result sets) for the
complete statement, and then see where the identity column is being
defined. Maybe it is happening too low in the tree.

This is a great way to learn Derby, get in the code, modify it and if it
doesn't seem to work the way you thought it would, ask the list!
Dan.


> thanks
> Shreyas
>
> --------------------------------Diff for
ProjectRestrictResultSet--------------------------------
> Index:
java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java
> ===================================================================

> +            if (candidateRow != null)
>  			{
> +                retVal = isEquals(candidateRow);
> +
> +                previousRow = candidateRow.getClone();
> +
> +                if(!retVal && !firstRow)
> +                    continue;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBpgQ6Iv0S4qsbfuQRAohqAKCKXHA8EVN9wT7E8o/8gq/rx2cgowCff5dj
QI89lVi4t/BGDZu1vkU2PB8=
=0fVd
-----END PGP SIGNATURE-----