You are viewing a plain text version of this content. The canonical link for it is here.

Posted to ojb-user@db.apache.org by "Janssen, Roger" <ro...@ibanx.nl> on 2006/07/06 10:46:26 UTC

Bug : duplicate objects in resultset [new patch]

Hi,

Sorry for the late reply but i have been ill the last week.

Jakob indicated that with my patch applied, still duplicate objects were
being returned in the resultset. I had a look at the testcase
(testOrderByJoined2) and i can confirm this.

The trick is that you have to execute a distinct query. But... Jakob
also indicated that HSQLDB than gives a sql syntax error, and i can
confirm this. 
The case is that:

  SELECT A0.X, A0.Y, A0.Z FROM T0 AS A0 JOIN T1 as A1 ORDER BY A1.Q
  
is a correct statement according to HSQLDB, but  

  SELECT DISTINCT A0.X, A0.Y, A0.Z FROM T0 AS A0 JOIN T1 as A1 ORDER BY
A1.Q
  
is incorrect, then it suddenly requires the orderby collumns in the
resultset like

  SELECT DISTINCT A0.X, A0.Y, A0.Z, A1.Q FROM T0 AS A0 JOIN T1 as A1
ORDER BY A1.Q
  
but THAT is what creates the duplicate records in the resultset.

Imho i consider this to be an inconsistency/bug in HSQLDB. I do not
think it is correct that the rdbms requires me to have specific columns
in the resultset actually affecting the queryresult.

I believe that MySQL does NOT have this problem, my guess is that loads
of other rdbms's do not either. So it now has become a platform specific
problem. I extended my patch making the query generation regarding the
inclusion of the orderby columns platform specific.

What did i do?
- added method to Platform interface : boolean requiresOrderbyColumns();
- added method to PlatformDefaultImpl class : boolean
requiresOrderbyColumns() returning true;
- added method to PlatformMySQLImpl class : boolean
requiresOrderbyColumns() returning false;
- in SqlQueryStatement declared m_platform to be protected instead of
private
- in SqlQueryStatement added addJoinColumn methods (as in earlier patch)
- in SqlQueryStatement added ensureJoinColumns method (as in earlier
patch)
- in SqlQueryStatement modified method appendOrderByClause to use real
names instead
  of resultset column indexes in orderby clause (as in earlier patch)
- in SqlSelectStatement modified method buildStatement:
	// iBanx patch - start
	if(m_platform.requiresOrderbyColumns())
	{
		ensureColumns(orderByFields, columnList, stmt);
	}
	else
	{
		ensureJoinColumns(orderByFields, columnList);
	}
	// iBanx patch - end


Todo : implement requiresOrderbyColumns method in other rbms specific
Platform implementations returning false if it the rdbms does NOT
require the orderby columns to be in the resultset.

This patch ensures correct sql syntax and builds a query that does not
return duplicate objects when the database supports the required sql
syntax. Of course, it has to be a 'distinct' query!

So, running the tests in QueryTest, on an HSQLDB they will fail, on a
MySSQL db they will pass.

I included modified files in the zip. Modification are marked with "//
iBanx patch - start" and
"// iBanx patch - end".

I also suggested another solution, modifying RsIterator so that the
'next' and 'hasNext' 
methods filter out the duplicate records in the resultset. But this will
not work when the
duplicate records/objects are spread over multiple pages of the
resultset, since only duplicates
in one page would be filtered out. So this is not a good solution.

Will this new patch work, and if so, will it be applied in 1.0.5 and
1.1?

Roger Janssen
iBanx
 

-----Original Message-----
From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch] 
Sent: Friday, June 23, 2006 9:54 PM
To: OJB Users List
Cc: Selders, Bart
Subject: Re: Bug : duplicate objects in resultset : Modify RsIterator
class?

hi roger,

with your path applied the testcase

     public void testOrderByJoined2()
     {
         Criteria crit = new Criteria();
         QueryByCriteria q = QueryFactory.newQuery(Project.class, crit);
         q.addOrderByAscending("persons.lastname");

         Collection results = broker.getCollectionByQuery(q);
         assertNotNull(results);

         // compare with count
         int count = broker.getCount(q);
         assertEquals(results.size(), count);

         // check duplicates
         Set resultsAsSet = new HashSet(results);
         assertEquals(resultsAsSet.size(), results.size());
     }

still returns duplicates ???

jakob

Jakob Braeuchi schrieb:
> hi roger,
> 
> i ran this new testcase in QueryTest:
> 
>     /**
>      * test OrderBy joined column. Read all Projects
>      */
>     public void testOrderByJoined2()
>     {
>         Criteria crit = new Criteria();
>         QueryByCriteria q = QueryFactory.newQuery(Project.class, crit,

> true);
>         q.addOrderByAscending("persons.lastname");
> 
>         Collection results = broker.getCollectionByQuery(q);
>         assertNotNull(results);
> 
>         ManageableHashSet resultsMgmt = (ManageableHashSet) 
> broker.getCollectionByQuery(RemovalAwareSet.class, q);
>         assertNotNull(resultsMgmt);
> 
>         // compare with count
>         int count = broker.getCount(q);
>         assertEquals(resultsMgmt.size(), count);
>     }
> 
> and it causes the following failure with hsqldb
> 
> * Can't prepare statement:
> * sql statement was 'SELECT DISTINCT A0.ID,A0.TITLE,A0.DESCRIPTION 
> FROM PROJECT A0 INNER JOIN PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID 
> INNER JOIN PERSON A2 ON A1.PERSON_ID=A2.ID ORDER BY A2.LASTNAME'
> * Exception message is [ORDER BY item should be in the SELECT DISTINCT
> list: org.hsqldb.Expression@c6eff5 in statement [SELECT DISTINCT 
> A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER JOIN 
> PERSON_PROJECT
> A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON A1.PERSON_ID=A2.ID 
> ORDER BY A2.LASTNAME]]
> * Vendor error code [-71]
> * SQL state code [37000]
> 
> the original testcase testOrderByJoined() produces the following sql 
> with your patch:
> 
> SELECT A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER JOIN 
> PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON 
> A1.PERSON_ID=A2.ID WHERE A0.TITLE LIKE
'testOrderByJoined1151088454187'
> ORDER BY A0.TITLE,A2.LASTNAME,A2.FIRSTNAME
> 
> and although A2.LASTNAME,A2.FIRSTNAME are not in the select, hsqldb is

> happy. the problem is the DISTINCT.
> 
> jakob
> 
> Janssen, Roger schrieb:
>> Hi,
>>
>> If that is the case (I do not get this result with the OJB testsuite 
>> and my patch and testcase, or don't you guys use hsql anymore for the

>> tests?), then probably some modification has to be made to RsIterator

>> class. The next and hasNext methods must make sure that you do NOT 
>> get a duplicate object in the resultset. For the next-method (return 
>> the next not-yet-returned object), this would not be problem, I have 
>> my doubts about the hasNext method (hasNext should be true if there 
>> is a next object AND it has not been returned yet). This implies that

>> paging
>> (pagination) will also work again, because this is also affected by 
>> the duplicate-records bug.
>>
>> The problem is though, that with these appearent contraints by some 
>> databases, I do not know how the paging implementation using native 
>> SQL is going to work, since I can limit the returned records to say 
>> 10, but if there are 5 duplicates in there, it is not the result 
>> expected (By the way, is the paging implementation using native SQL 
>> on the roadmap for OJB 1.1? We really need it because performance 
>> with queries on large resultsets are killing our apps.).
>>
>> Another alternative is that the SQL query generation implementation 
>> regarding this issue could be db platform specific, and just accept 
>> the fact that with some platforms you get duplicate objects.
>>
>> Roger Janssen
>> iBanx
>>
>> -----Original Message-----
>> From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch] Sent: Wednesday, June 
>> 21, 2006 10:13 PM
>> To: OJB Users List
>> Subject: Re: Bug : duplicate objects in resultset : Why add orderby 
>> columns to resultset?
>>
>> hi all,
>>
>> this is what hsql says when the orderby column is not in the select 
>> :(
>>
>> Caused by: java.sql.SQLException: ORDER BY item should be in the 
>> SELECT DISTINCT list: org.hsqldb.Expression@1566a6c in statement 
>> [SELECT DISTINCT A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER 
>> JOIN PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON

>> A1.PERSON_ID=A2.ID ORDER BY A2.LASTNAME]
>>
>> hth
>> jakob
>>
>> Thomas Dudziak schrieb:
>>> On 6/19/06, Janssen, Roger <ro...@ibanx.nl> wrote:
>>>
>>>> Diving deeper into the OJB code I see that the ensureColumns method
>> in
>>>> fact does two things:
>>>> (1) add the columns to the returnable set of columns in the 
>>>> resultset
>>>> (2) adds the columns to a list of columns to be used in force a 
>>>> join
>>>>
>>>> (2) is okay for orderby's, but (1) is not. If for orderby's (1) was
>> left
>>>> out, would that be a problem?
>>> As far as I remember, there was a problem a couple of months ago 
>>> that for one database all orderby columns had to be selected columns

>>> as well (though I don't remember offhand which database it was,
though).
>>>
>>> Tom
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>
>>
>>
>> *********************************************************************
>> ****T
>>
>>  he information contained in this communication is confidential and 
>> is intended solely for the use of the individual or entity to  whom 
>> it is addressed.You should not copy, disclose or distribute this 
>> communication without the authority of iBanx bv. iBanx bv is neither 
>> liable for the proper and complete transmission of the information 
>> has been maintainedn  or that the communication is free of viruses, 
>> interceptions or interference.
>> If you are not the intended recipient of this communication please 
>> returnt  he communication to the sender and delete and destroy all 
>> copies.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org

Re: Bug : duplicate objects in resultset [new patch]

Posted by Jakob Braeuchi <jb...@gmx.ch>.

hi roger,

thanks for the new patch. i'll check it asap.
i googled a little bit for "ORDER BY item should be in the SELECT 
DISTINCT list" and i found some posts about the same problem in 
sql-server and oracle.

http://www.codecomments.com/archive352-2005-12-731772.html
http://forums.oracle.com/forums/thread.jspa?messageID=753442&#753442
http://www.dbtalk.net/microsoft-public-sqlserver-programming/select-distinct-order-gives-error-177949.html

could you please verify your fix on these dbms ?

jakob


Janssen, Roger schrieb:
> Hi,
> 
> Sorry for the late reply but i have been ill the last week.
> 
> Jakob indicated that with my patch applied, still duplicate objects were
> being returned in the resultset. I had a look at the testcase
> (testOrderByJoined2) and i can confirm this.
> 
> The trick is that you have to execute a distinct query. But... Jakob
> also indicated that HSQLDB than gives a sql syntax error, and i can
> confirm this. 
> The case is that:
> 
>   SELECT A0.X, A0.Y, A0.Z FROM T0 AS A0 JOIN T1 as A1 ORDER BY A1.Q
>   
> is a correct statement according to HSQLDB, but  
> 
>   SELECT DISTINCT A0.X, A0.Y, A0.Z FROM T0 AS A0 JOIN T1 as A1 ORDER BY
> A1.Q
>   
> is incorrect, then it suddenly requires the orderby collumns in the
> resultset like
> 
>   SELECT DISTINCT A0.X, A0.Y, A0.Z, A1.Q FROM T0 AS A0 JOIN T1 as A1
> ORDER BY A1.Q
>   
> but THAT is what creates the duplicate records in the resultset.
> 
> Imho i consider this to be an inconsistency/bug in HSQLDB. I do not
> think it is correct that the rdbms requires me to have specific columns
> in the resultset actually affecting the queryresult.
> 
> I believe that MySQL does NOT have this problem, my guess is that loads
> of other rdbms's do not either. So it now has become a platform specific
> problem. I extended my patch making the query generation regarding the
> inclusion of the orderby columns platform specific.
> 
> What did i do?
> - added method to Platform interface : boolean requiresOrderbyColumns();
> - added method to PlatformDefaultImpl class : boolean
> requiresOrderbyColumns() returning true;
> - added method to PlatformMySQLImpl class : boolean
> requiresOrderbyColumns() returning false;
> - in SqlQueryStatement declared m_platform to be protected instead of
> private
> - in SqlQueryStatement added addJoinColumn methods (as in earlier patch)
> - in SqlQueryStatement added ensureJoinColumns method (as in earlier
> patch)
> - in SqlQueryStatement modified method appendOrderByClause to use real
> names instead
>   of resultset column indexes in orderby clause (as in earlier patch)
> - in SqlSelectStatement modified method buildStatement:
> 	// iBanx patch - start
> 	if(m_platform.requiresOrderbyColumns())
> 	{
> 		ensureColumns(orderByFields, columnList, stmt);
> 	}
> 	else
> 	{
> 		ensureJoinColumns(orderByFields, columnList);
> 	}
> 	// iBanx patch - end
> 
> 
> Todo : implement requiresOrderbyColumns method in other rbms specific
> Platform implementations returning false if it the rdbms does NOT
> require the orderby columns to be in the resultset.
> 
> This patch ensures correct sql syntax and builds a query that does not
> return duplicate objects when the database supports the required sql
> syntax. Of course, it has to be a 'distinct' query!
> 
> So, running the tests in QueryTest, on an HSQLDB they will fail, on a
> MySSQL db they will pass.
> 
> I included modified files in the zip. Modification are marked with "//
> iBanx patch - start" and
> "// iBanx patch - end".
> 
> I also suggested another solution, modifying RsIterator so that the
> 'next' and 'hasNext' 
> methods filter out the duplicate records in the resultset. But this will
> not work when the
> duplicate records/objects are spread over multiple pages of the
> resultset, since only duplicates
> in one page would be filtered out. So this is not a good solution.
> 
> Will this new patch work, and if so, will it be applied in 1.0.5 and
> 1.1?
> 
> Roger Janssen
> iBanx
>  
> 
> -----Original Message-----
> From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch] 
> Sent: Friday, June 23, 2006 9:54 PM
> To: OJB Users List
> Cc: Selders, Bart
> Subject: Re: Bug : duplicate objects in resultset : Modify RsIterator
> class?
> 
> hi roger,
> 
> with your path applied the testcase
> 
>      public void testOrderByJoined2()
>      {
>          Criteria crit = new Criteria();
>          QueryByCriteria q = QueryFactory.newQuery(Project.class, crit);
>          q.addOrderByAscending("persons.lastname");
> 
>          Collection results = broker.getCollectionByQuery(q);
>          assertNotNull(results);
> 
>          // compare with count
>          int count = broker.getCount(q);
>          assertEquals(results.size(), count);
> 
>          // check duplicates
>          Set resultsAsSet = new HashSet(results);
>          assertEquals(resultsAsSet.size(), results.size());
>      }
> 
> still returns duplicates ???
> 
> jakob
> 
> Jakob Braeuchi schrieb:
>> hi roger,
>>
>> i ran this new testcase in QueryTest:
>>
>>     /**
>>      * test OrderBy joined column. Read all Projects
>>      */
>>     public void testOrderByJoined2()
>>     {
>>         Criteria crit = new Criteria();
>>         QueryByCriteria q = QueryFactory.newQuery(Project.class, crit,
> 
>> true);
>>         q.addOrderByAscending("persons.lastname");
>>
>>         Collection results = broker.getCollectionByQuery(q);
>>         assertNotNull(results);
>>
>>         ManageableHashSet resultsMgmt = (ManageableHashSet) 
>> broker.getCollectionByQuery(RemovalAwareSet.class, q);
>>         assertNotNull(resultsMgmt);
>>
>>         // compare with count
>>         int count = broker.getCount(q);
>>         assertEquals(resultsMgmt.size(), count);
>>     }
>>
>> and it causes the following failure with hsqldb
>>
>> * Can't prepare statement:
>> * sql statement was 'SELECT DISTINCT A0.ID,A0.TITLE,A0.DESCRIPTION 
>> FROM PROJECT A0 INNER JOIN PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID 
>> INNER JOIN PERSON A2 ON A1.PERSON_ID=A2.ID ORDER BY A2.LASTNAME'
>> * Exception message is [ORDER BY item should be in the SELECT DISTINCT
>> list: org.hsqldb.Expression@c6eff5 in statement [SELECT DISTINCT 
>> A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER JOIN 
>> PERSON_PROJECT
>> A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON A1.PERSON_ID=A2.ID 
>> ORDER BY A2.LASTNAME]]
>> * Vendor error code [-71]
>> * SQL state code [37000]
>>
>> the original testcase testOrderByJoined() produces the following sql 
>> with your patch:
>>
>> SELECT A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER JOIN 
>> PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON 
>> A1.PERSON_ID=A2.ID WHERE A0.TITLE LIKE
> 'testOrderByJoined1151088454187'
>> ORDER BY A0.TITLE,A2.LASTNAME,A2.FIRSTNAME
>>
>> and although A2.LASTNAME,A2.FIRSTNAME are not in the select, hsqldb is
> 
>> happy. the problem is the DISTINCT.
>>
>> jakob
>>
>> Janssen, Roger schrieb:
>>> Hi,
>>>
>>> If that is the case (I do not get this result with the OJB testsuite 
>>> and my patch and testcase, or don't you guys use hsql anymore for the
> 
>>> tests?), then probably some modification has to be made to RsIterator
> 
>>> class. The next and hasNext methods must make sure that you do NOT 
>>> get a duplicate object in the resultset. For the next-method (return 
>>> the next not-yet-returned object), this would not be problem, I have 
>>> my doubts about the hasNext method (hasNext should be true if there 
>>> is a next object AND it has not been returned yet). This implies that
> 
>>> paging
>>> (pagination) will also work again, because this is also affected by 
>>> the duplicate-records bug.
>>>
>>> The problem is though, that with these appearent contraints by some 
>>> databases, I do not know how the paging implementation using native 
>>> SQL is going to work, since I can limit the returned records to say 
>>> 10, but if there are 5 duplicates in there, it is not the result 
>>> expected (By the way, is the paging implementation using native SQL 
>>> on the roadmap for OJB 1.1? We really need it because performance 
>>> with queries on large resultsets are killing our apps.).
>>>
>>> Another alternative is that the SQL query generation implementation 
>>> regarding this issue could be db platform specific, and just accept 
>>> the fact that with some platforms you get duplicate objects.
>>>
>>> Roger Janssen
>>> iBanx
>>>
>>> -----Original Message-----
>>> From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch] Sent: Wednesday, June 
>>> 21, 2006 10:13 PM
>>> To: OJB Users List
>>> Subject: Re: Bug : duplicate objects in resultset : Why add orderby 
>>> columns to resultset?
>>>
>>> hi all,
>>>
>>> this is what hsql says when the orderby column is not in the select 
>>> :(
>>>
>>> Caused by: java.sql.SQLException: ORDER BY item should be in the 
>>> SELECT DISTINCT list: org.hsqldb.Expression@1566a6c in statement 
>>> [SELECT DISTINCT A0.ID,A0.TITLE,A0.DESCRIPTION FROM PROJECT A0 INNER 
>>> JOIN PERSON_PROJECT A1 ON A0.ID=A1.PROJECT_ID INNER JOIN PERSON A2 ON
> 
>>> A1.PERSON_ID=A2.ID ORDER BY A2.LASTNAME]
>>>
>>> hth
>>> jakob
>>>
>>> Thomas Dudziak schrieb:
>>>> On 6/19/06, Janssen, Roger <ro...@ibanx.nl> wrote:
>>>>
>>>>> Diving deeper into the OJB code I see that the ensureColumns method
>>> in
>>>>> fact does two things:
>>>>> (1) add the columns to the returnable set of columns in the 
>>>>> resultset
>>>>> (2) adds the columns to a list of columns to be used in force a 
>>>>> join
>>>>>
>>>>> (2) is okay for orderby's, but (1) is not. If for orderby's (1) was
>>> left
>>>>> out, would that be a problem?
>>>> As far as I remember, there was a problem a couple of months ago 
>>>> that for one database all orderby columns had to be selected columns
> 
>>>> as well (though I don't remember offhand which database it was,
> though).
>>>> Tom
>>>>
>>>> --------------------------------------------------------------------
>>>> - To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>>
>>>
>>>
>>> *********************************************************************
>>> ****T
>>>
>>>  he information contained in this communication is confidential and 
>>> is intended solely for the use of the individual or entity to  whom 
>>> it is addressed.You should not copy, disclose or distribute this 
>>> communication without the authority of iBanx bv. iBanx bv is neither 
>>> liable for the proper and complete transmission of the information 
>>> has been maintainedn  or that the communication is free of viruses, 
>>> interceptions or interference.
>>> If you are not the intended recipient of this communication please 
>>> returnt  he communication to the sender and delete and destroy all 
>>> copies.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>> For additional commands, e-mail: ojb-user-help@db.apache.org
>>
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org