You are viewing a plain text version of this content. The canonical link for it is here.
Posted to torque-dev@db.apache.org by Scott Eade <se...@backstagetech.com.au> on 2003/05/31 03:19:55 UTC

LargeSelect returning different numbers of rows. (Was: [vote] release torque-3.0.1)

Federico Spinazzi wrote:

> - LargeSelect gives me different results on the total number of object 
> retrieved with different parameters;

LargeSelect doesn't know the total number of records until such time as 
the buffer of records hits the last record.  Prior to this it just 
indicates that more records exist than the number that have been 
retrieved so far.  Is this the behavior you are seeing or is it 
something else?  If you do believe there is a problem can you provide a 
test case?

Cheers,

Scott

-- 
Scott Eade
Backstage Technologies Pty. Ltd.
http://www.backstagetech.com.au
seade@backstagetech.com.au





Re: LargeSelect returning different numbers of rows. (Was: [vote] release torque-3.0.1)

Posted by Scott Eade <se...@backstagetech.com.au>.
Hi Fredrico,

You touch upon a number of issues, I hope the following information is 
helpful.

Firstly, LargeSelect isn't going to work very well (and I have certainly 
not tested it) for databases that do not natively support the concepts 
of limit and offset in their SQL.  Torque 3.0 provides support for MySQL 
and PostgreSQL (though I think the syntax has changed for PostgreSQL 
7.3).  I think there is code for Oracle somewhere (either in cvs head or 
attached to an issue in Scarab), but I am reasonably sure that there is 
no support for this if DB2 is being used (if DB2's SQL dialect supports 
these concepts then the first step in getting things to work will be to 
implement this in BasePeer).  If the database in use does not support 
limit/offset or Torque has not been coded to use the native limit/offset 
support provided by the database then implementation of these features 
will fall back to the Village API.  With 37000 rows this will fall flat 
on its face (OutOfMemoryException) because Village will run the entire 
query and then discard the unwanted rows (using a big chunk of memory 
and many CPU cycles in the process).  Is there not some filtering that 
can be applied to reduce the number of records that need to be 
selected?  Not sure about you, but the last time I ran a query that 
retrieved 37000 rows I only looked at a couple of hundred of them :-).

For an example test case please see LargeSelectTest in cvs.  This can be 
executed as part of building Torque with maven.

I can only guess that the record count issues are a result of relying on 
Village scrolling the records - not to say that the problem is with 
Village, but rather that LargeSelect has not been tested in this 
situation.  The fact that it is producing different results at different 
times is strange.

The ConcurrentModificationExceptions you are experiencing may be caused 
by the fact that the data is still being processed when you attempt to 
retrieve more data.  This would be a bug, but certainly not one that I 
have seen (perhaps because I haven't tested LargeSelect without native 
limit/offset support or with a page size of 5000).  It would perhaps be 
a good idea to generate a more extreme test case to try and root this 
problem out.

If you are interested in contributing to Torque, please see 
http://jakarta.apache.org/site/getinvolved.html

Regards,

Scott

Federico Spinazzi wrote:

> Scott Eade wrote:
>
>> Federico Spinazzi wrote:
>>
>>> - LargeSelect gives me different results on the total number of 
>>> object retrieved with different parameters;
>>
>>
>>
>> LargeSelect doesn't know the total number of records until such time 
>> as the buffer of records hits the last record.  Prior to this it just 
>> indicates that more records exist than the number that have been 
>> retrieved so far.  Is this the behavior you are seeing or is it 
>> something else?  If you do believe there is a problem can you provide 
>> a test case?
>>
>> Cheers,
>>
>> Scott
>>
> Hmm,
> I have tried to retrieve about 37000 record from a DB2 table with 
> large select because of OutOfMemory error otherwise. I gave up also 
> because it was too slow.
> The code is the following
>        try {
>            Criteria crit = new Criteria();
>            LargeSelect ls = new LargeSelect(crit,
>                    3000, "it.masterhouse.torque.termopoli.ArticoliPeer");
>            int total = 0;
>            while (ls.getNextResultsAvailable()) {
>                List l = ls.getNextResults();
>                total += l.size();
>            }
>            System.out.println("record selected: " + total);
>        } catch (Throwable t) {
>            t.printStackTrace();
>        }
> (I don't mean this code make sense, I want to spot the 'bug')
> Whan trying to choose the best values for pageSize and memoryPageLimit 
> I discovered that many combinations gave me the correct result while a 
> pageSize of 3000 ( as in the code) gives me 69000 records instead of 
> 37187.
> I can try to move the data in hsql and see if the problem is here 
> again, because I' dont know another way to buil a test case ...
> However, I'm just retring the failing test and I'm getting an 
> OutOfMemory exception ...
> Moreover, if I try with pageSize 10 and memoryPageLimit 5000 I get an 
> java.util.ConcurrentModificationException ...
> I think that large select, if useful, should be reworked.
> As I'll need to use Torque in the future I candidate myself to help.
> Can someone help me to understand how I can do that?
> Thank you very much for you attention.
> Federico Spinazzi

-- 
Scott Eade
Backstage Technologies Pty. Ltd.
http://www.backstagetech.com.au





Re: LargeSelect returning different numbers of rows. (Was: [vote] release torque-3.0.1)

Posted by Federico Spinazzi <f....@masterhouse.it>.
Scott Eade wrote:

> Federico Spinazzi wrote:
>
>> - LargeSelect gives me different results on the total number of 
>> object retrieved with different parameters;
>
>
> LargeSelect doesn't know the total number of records until such time 
> as the buffer of records hits the last record.  Prior to this it just 
> indicates that more records exist than the number that have been 
> retrieved so far.  Is this the behavior you are seeing or is it 
> something else?  If you do believe there is a problem can you provide 
> a test case?
>
> Cheers,
>
> Scott
>
Hmm,
I have tried to retrieve about 37000 record from a DB2 table with large 
select because of OutOfMemory error otherwise. I gave up also because it 
was too slow.
The code is the following
        try {
            Criteria crit = new Criteria();
            LargeSelect ls = new LargeSelect(crit,
                    3000, "it.masterhouse.torque.termopoli.ArticoliPeer");
            int total = 0;
            while (ls.getNextResultsAvailable()) {
                List l = ls.getNextResults();
                total += l.size();
            }
            System.out.println("record selected: " + total);
        } catch (Throwable t) {
            t.printStackTrace();
        }
(I don't mean this code make sense, I want to spot the 'bug')
Whan trying to choose the best values for pageSize and memoryPageLimit I 
discovered that many combinations gave me the correct result while a 
pageSize of 3000 ( as in the code) gives me 69000 records instead of 37187.
I can try to move the data in hsql and see if the problem is here again, 
because I' dont know another way to buil a test case ...
However, I'm just retring the failing test and I'm getting an 
OutOfMemory exception ...
Moreover, if I try with pageSize 10 and memoryPageLimit 5000 I get an 
java.util.ConcurrentModificationException ...
I think that large select, if useful, should be reworked.
As I'll need to use Torque in the future I candidate myself to help.
Can someone help me to understand how I can do that?
Thank you very much for you attention.
Federico Spinazzi