You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Andrey Razumovsky <ra...@gmail.com> on 2009/05/15 11:03:13 UTC

Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Hi,

Yes of course. You can use SelectQuery.setFetchLimit(int) and
setFetchStart(int) methods. They do just that.

Andrey

2009/5/15 stefcl <st...@gmail.com>

>
> Hello,
>
> Is there a way I could configure the fetch size of a SelectQuery?
>
> In pure JDBC, in order to be able to handle a large resultset datarow by
> datarow, I need to indicate to the JDBC driver that it should make use of
> cursors by calling Statement.setFetchSize( int ).
>
> If I don't, the driver waits until all rows are returned before giving
> access to the ResultSet, which causes OutOfMemory exceptions.
> Unfortunately, I haven't found how I could force a statement level fetch
> size when working with SelectQuery objects in the cayenne API.
> (Warning: I'm not talking about paging here)
>
> Can someone help?
>
> PS: I use postgresql jdbc4 driver.
>
> --
> View this message in context:
> http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23555842.html
> Sent from the Cayenne - User mailing list archive at Nabble.com.
>
>

Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.
Hello,

Setting fetchSize with postgres does make a difference, even in a simple
JDBC select.
If you consider the following code copy-pasted from my test case:

PreparedStatement stmt = 
sqlMapper.getCurrentConnection().prepareStatement("SELECT
itemID,sellingprice,margin,popularityrating,name,att1,att2,att3,att4,att5,longtailpos,timest 
FROM itemdetails WHERE profileID= 10 ORDER BY longtailpos");

        stmt.setFetchSize(1000);
        stmt.execute();
        ResultSet set = stmt.executeQuery();

        int i = 0;


        while( set.next())
        {
            i++;
            System.out.println(set.getString(1));
        }


It can iterates through approx 1'500'000 rows without any memory issues
(java process stays at approx 20mo while the loop executes). I can also see
that it takes no more than a few seconds before the first row is printed in
the output console, meaning that results are still being fetched from the DB
at the moment the resultset is accessed.

Now if I comment the line "stmt.setFetchSize(1000);" in the above code,
execution stops at "ResultSet set = stmt.executeQuery();" for approx 15
seconds and then an OutOfMemoryException is thrown from JDBC code. Also in
cayenne code, that would be before the resultset is even accessed (making
ResultIterator useless).

A similar problem is exposed here, with a sample stacktrace :
http://www.nabble.com/java.sql.Statement-generates-java.lang.OutOfMemoryError-in-big-tabe-td21568763.html

And the reason is detailed here :
http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor

Perhaps other drivers that automatically use server side cursors aren't
affected or just ignore this setting but I can say for sure that this
setting is necessary with postGres.

Andrey, you say you keep getting OutOfMemory exceptions no matter which
value you set. Do you get these exceptions while iterating through the
results or during the call to executeQuery?
If you get it while looping through the resultset, it could mean that
somehow the rows objects aren't garbage collected, it can happen with
cayenne version 2.0 due to the dataContext holding strong references to the
dataObjects. I didn't want to write a fix for v3 at first, but I realized
that in v2, I had to manually unregister the dataObjects from the context in
order to keep the memory usage in safe areas.

If I can be of any help, please ask.
Best regards...

Stéphane
-- 
View this message in context: http://www.nabble.com/Re%3A-Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of--a-selectquery-tp23652781p23703577.html
Sent from the Cayenne - Dev mailing list archive at Nabble.com.


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrus Adamchik <an...@objectstyle.org>.
Somehow I am not surprised. I may also try that on Oracle when the  
code becomes available on trunk.

(BTW not sure that Stephane is subscribed to the dev list, so cc'ying  
this message.)

Andrus

On May 24, 2009, at 12:52 PM, Andrey Razumovsky wrote:
> I should say my tests on Postgres and mysql didn't show any results.  
> If I
> use small heap size, i get OutOfMemory no matter which fetch size  
> was set.
> Fetch speed and memory usage are the same. Seems drivers just ignore  
> this
> parameter. Stephane, did your workaround help you?
>
> Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe  
> other
> DBMS drivers work well with this param. I'm going to commit today or
> tomorrow. Nothing will break anything if we add this ability.
>
> Andrey
>
> 2009/5/21 Andrus Adamchik <an...@objectstyle.org>
>
>> Cool :-)
>>
>> On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:
>>
>> Now when I know of this JDBC feature, I'd prefer to have it in  
>> Cayenne
>>> sooner that later. We would also want to have it for other queries  
>>> than
>>> just
>>> only SelectQuery.
>>>
>>
>> Absolutely. This has to be a part of the QueryMetadata on the  
>> backend. On
>> the frontend any query that can potentially select data should have a
>> corresponding setter.
>>
>> Could someone help me testing it against different types
>>> of database if I commit?
>>>
>>
>> I can test it on almost all DB's that we support. Of course we  
>> should have
>> Cayenne unit tests that will provide regression (i.e. driver XYZ  
>> doesn't
>> throw UnsupportedOperationException when we call a corresponding JDBC
>> method).
>>
>> In addition to that I'd like to see if there's really memory/speed  
>> savings
>> when using that (i.e. is it really worth it). For that I suggest  
>> writing a
>> JDBC test outside of Cayenne, that can be run in profiler against  
>> different
>> DB's.
>>
>> Andrus
>>


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrey Razumovsky <ra...@gmail.com>.
I should say my tests on Postgres and mysql didn't show any results. If I
use small heap size, i get OutOfMemory no matter which fetch size was set.
Fetch speed and memory usage are the same. Seems drivers just ignore this
parameter. Stephane, did your workaround help you?

Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe other
DBMS drivers work well with this param. I'm going to commit today or
tomorrow. Nothing will break anything if we add this ability.

Andrey

2009/5/21 Andrus Adamchik <an...@objectstyle.org>

> Cool :-)
>
> On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:
>
>  Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
>> sooner that later. We would also want to have it for other queries than
>> just
>> only SelectQuery.
>>
>
> Absolutely. This has to be a part of the QueryMetadata on the backend. On
> the frontend any query that can potentially select data should have a
> corresponding setter.
>
>  Could someone help me testing it against different types
>> of database if I commit?
>>
>
> I can test it on almost all DB's that we support. Of course we should have
> Cayenne unit tests that will provide regression (i.e. driver XYZ doesn't
> throw UnsupportedOperationException when we call a corresponding JDBC
> method).
>
> In addition to that I'd like to see if there's really memory/speed savings
> when using that (i.e. is it really worth it). For that I suggest writing a
> JDBC test outside of Cayenne, that can be run in profiler against different
> DB's.
>
> Andrus
>

Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrus Adamchik <an...@objectstyle.org>.
Cool :-)

On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:

> Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
> sooner that later. We would also want to have it for other queries  
> than just
> only SelectQuery.

Absolutely. This has to be a part of the QueryMetadata on the backend.  
On the frontend any query that can potentially select data should have  
a corresponding setter.

> Could someone help me testing it against different types
> of database if I commit?

I can test it on almost all DB's that we support. Of course we should  
have Cayenne unit tests that will provide regression (i.e. driver XYZ  
doesn't throw UnsupportedOperationException when we call a  
corresponding JDBC method).

In addition to that I'd like to see if there's really memory/speed  
savings when using that (i.e. is it really worth it). For that I  
suggest writing a JDBC test outside of Cayenne, that can be run in  
profiler against different DB's.

Andrus

Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrey Razumovsky <ra...@gmail.com>.
Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
sooner that later. We would also want to have it for other queries than just
only SelectQuery. Could someone help me testing it against different types
of database if I commit?

2009/5/20 stefcl <st...@gmail.com>

>
> I have posted a small fix in the jira.
>
>
> I added the following in SelectQuery.java :
> ***************************************
>        //quick fix : property for setting the jdbc fetch size
>
>        protected Integer cursorFetchSize;
>
>        /**
>        * Gets the JDBC fetch size for this query.
>        * @return the fetch size defined for this query, null if not set.
>        */
>        public Integer getCursorFetchSize()
>        {
>                return cursorFetchSize;
>        }
>
>
>        /**
>        * Sets the JDBC fetch size for this query.
>        * @param cursorFetchSize the desired fetch size, or null to reset
> it.
>        */
>        public void setCursorFetchSize(Integer cursorFetchSize)
>        {
>                this.cursorFetchSize = cursorFetchSize;
>        }
>
>        //end quick fix
> ************************************
>
> And modified SelectAction.java to add the check in performAction :
> public void performAction(Connection connection, OperationObserver
> observer)
>            throws SQLException, Exception {
>
>        long t1 = System.currentTimeMillis();
>
>        SelectTranslator translator = createTranslator(connection);
>        PreparedStatement prepStmt = translator.createStatement();
>
>        //quick fix : sets the fetch size
>        if( query.getCursorFetchSize() != null )
>                prepStmt.setFetchSize( query.getCursorFetchSize() );
>        //quick fix end
>
>        ResultSet rs = prepStmt.executeQuery();
>        QueryMetadata md = query.getMetaData(getEntityResolver());
>
> (...)
>
> ********************************************************
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23635409.html
> Sent from the Cayenne - User mailing list archive at Nabble.com.
>
>

Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.
I have posted a small fix in the jira.


I added the following in SelectQuery.java :
***************************************
	//quick fix : property for setting the jdbc fetch size
	
	protected Integer cursorFetchSize;
	
	/**
	* Gets the JDBC fetch size for this query.
	* @return the fetch size defined for this query, null if not set.
	*/
	public Integer getCursorFetchSize()
	{
		return cursorFetchSize;
	}
	
	
	/**
	* Sets the JDBC fetch size for this query.
	* @param cursorFetchSize the desired fetch size, or null to reset it.
	*/
	public void setCursorFetchSize(Integer cursorFetchSize)
	{
		this.cursorFetchSize = cursorFetchSize;
	}
	
	//end quick fix
************************************

And modified SelectAction.java to add the check in performAction :
public void performAction(Connection connection, OperationObserver observer)
            throws SQLException, Exception {

        long t1 = System.currentTimeMillis();

        SelectTranslator translator = createTranslator(connection);
        PreparedStatement prepStmt = translator.createStatement();
	
	//quick fix : sets the fetch size	
	if( query.getCursorFetchSize() != null )
		prepStmt.setFetchSize( query.getCursorFetchSize() );
	//quick fix end

        ResultSet rs = prepStmt.executeQuery();
        QueryMetadata md = query.getMetaData(getEntityResolver());

(...)

********************************************************




-- 
View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23635409.html
Sent from the Cayenne - User mailing list archive at Nabble.com.


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.
Just to say, because it's a standard JDBC feature I don't think that letting
the user set this param can do much harm. Nevertheless, I perfectly
understand that you can't take any change lightly.

In fact most other OR tools let you set this param. In Ibatis you can do it
directly in your query mapping file. Hibernate criteria queries have a
setFetchSize() method and, looking for similar issues on the Jira I found
something like this concerning OJB. Don't get me wrong, I'm not trying to
say that it's a shame or anything negative, just sharing what I could learn
elsewhere on this topic.

I try to avoid this as much as possible but I think I'm going to make this
change myself so I can begin working on my project while you do all the
required testing. I really like cayenne and I have seen that there's a nice
community that kindly answers your questions without trying to make you feel
like an idiot. 

Thanks everyone...


Andrus Adamchik wrote:
> 
> Thanks for the Jira. Can't give you a time estimate on that, we are an  
> open source project after all. All I can say now is that adding this  
> API to Cayenne is fairly trivial (you can even take a shot at this  
> yourself, if you feel like hacking the Cayenne internals). But before  
> we commit anything like that, I'd like us to research the effects of  
> this on at least a few of the databases that Cayenne supports, and  
> various driver-specific caveats that I am sure will arise. So this  
> will take some time to make it a production quality feature.
> 
> Andrus
> 
> 

-- 
View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23586272.html
Sent from the Cayenne - User mailing list archive at Nabble.com.


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrus Adamchik <an...@objectstyle.org>.
Thanks for the Jira. Can't give you a time estimate on that, we are an  
open source project after all. All I can say now is that adding this  
API to Cayenne is fairly trivial (you can even take a shot at this  
yourself, if you feel like hacking the Cayenne internals). But before  
we commit anything like that, I'd like us to research the effects of  
this on at least a few of the databases that Cayenne supports, and  
various driver-specific caveats that I am sure will arise. So this  
will take some time to make it a production quality feature.

Andrus


On May 17, 2009, at 12:52 PM, stefcl wrote:

>
> According to the JDBC javadoc, setFetchSize(int) is an *hint* you  
> can give,
> the driver may decide to ignore it. Nevetheless I'm sure it's useful  
> because
> other OR/mapper like IBatis provide a way to set this parameter.
>
> I have filled a Jira issue about this :
> https://issues.apache.org/jira/browse/CAY-1225
> https://issues.apache.org/jira/browse/CAY-1225
>
> I really hope there's a chance to get a fix shortly, because Cayenne  
> is the
> only OR/mapper that gave me satisfaction.
>
>
>
> Andrus Adamchik wrote:
>>
>> I haven't experimented with that much (so no idea which drivers take
>> advantage of that and which would simply ignore it), but that would
>> indeed be a useful parameter for ResultIterator scenario. So maybe
>> open a feature request in Jira?
>>
>> Andrus
>>
> -- 
> View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23581761.html
> Sent from the Cayenne - User mailing list archive at Nabble.com.
>
>


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.
According to the JDBC javadoc, setFetchSize(int) is an *hint* you can give,
the driver may decide to ignore it. Nevetheless I'm sure it's useful because
other OR/mapper like IBatis provide a way to set this parameter.

I have filled a Jira issue about this :
https://issues.apache.org/jira/browse/CAY-1225
https://issues.apache.org/jira/browse/CAY-1225 

I really hope there's a chance to get a fix shortly, because Cayenne is the
only OR/mapper that gave me satisfaction. 



Andrus Adamchik wrote:
> 
> I haven't experimented with that much (so no idea which drivers take  
> advantage of that and which would simply ignore it), but that would  
> indeed be a useful parameter for ResultIterator scenario. So maybe  
> open a feature request in Jira?
> 
> Andrus
> 
-- 
View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23581761.html
Sent from the Cayenne - User mailing list archive at Nabble.com.


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Andrus Adamchik <an...@objectstyle.org>.
I haven't experimented with that much (so no idea which drivers take  
advantage of that and which would simply ignore it), but that would  
indeed be a useful parameter for ResultIterator scenario. So maybe  
open a feature request in Jira?

Andrus


On May 15, 2009, at 5:45 PM, stefcl wrote:

>
> Thanks,
>
> Since the beginning I am convinced that resultiterators are the  
> right way,
> unfortunately they are useless if the jdbc driver loads everything  
> in memory
> before giving access to the resultset.
>
> I hope there's something to do about it because it's a showstopper  
> for me.
>
>
> Robert Zeigler-6 wrote:
>>
>> Take a look at paginated queries (which isn't /quite/ what you want,
>> but may help you):
>>
>> http://cayenne.apache.org/doc/paginated-queries.html
>>
>> As well as ResultIterator:
>> http://cayenne.apache.org/doc/iterating-through-data-rows.html
>>
>> Robert
>>
>> On May 15, 2009, at 5/159:21 AM , stefcl wrote:
>>
>>>
>>>
>>> Thanks but my problem is not related to paging.
>>>
>>> I would like to be able to execute a select query which returns  
>>> approx
>>> 100000 rows with a single selectQuery and process them one by one,
>>> while
>>> keeping only a few of them in memory (using the resultiterator).
>>>
>>> In jdbc, Statement.setFetchSize(1000) tells the jdbc driver to
>>> retrieve
>>> results from the database 1000 at a time and ask for the next 1000
>>> as you're
>>> iterating the resultset. Otherwise its default behavior is to
>>> retrieve the
>>> 100'000 rows in memory before you can start iterating the resultset.
>>>
>>> It's not the same thing as inserting a TOP or a LIMIT clause in the
>>> query.
>>>
>>> Any help appreciated
>>>
>>>
>>> Andrey Razumovsky wrote:
>>>>
>>>> Hi,
>>>>
>>>> Yes of course. You can use SelectQuery.setFetchLimit(int) and
>>>> setFetchStart(int) methods. They do just that.
>>>>
>>>> Andrey
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23560831.html
>>> Sent from the Cayenne - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23561312.html
> Sent from the Cayenne - User mailing list archive at Nabble.com.
>
>


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.
Thanks,

Since the beginning I am convinced that resultiterators are the right way,
unfortunately they are useless if the jdbc driver loads everything in memory
before giving access to the resultset.

I hope there's something to do about it because it's a showstopper for me.


Robert Zeigler-6 wrote:
> 
> Take a look at paginated queries (which isn't /quite/ what you want,  
> but may help you):
> 
> http://cayenne.apache.org/doc/paginated-queries.html
> 
> As well as ResultIterator:
> http://cayenne.apache.org/doc/iterating-through-data-rows.html
> 
> Robert
> 
> On May 15, 2009, at 5/159:21 AM , stefcl wrote:
> 
>>
>>
>> Thanks but my problem is not related to paging.
>>
>> I would like to be able to execute a select query which returns approx
>> 100000 rows with a single selectQuery and process them one by one,  
>> while
>> keeping only a few of them in memory (using the resultiterator).
>>
>> In jdbc, Statement.setFetchSize(1000) tells the jdbc driver to  
>> retrieve
>> results from the database 1000 at a time and ask for the next 1000  
>> as you're
>> iterating the resultset. Otherwise its default behavior is to  
>> retrieve the
>> 100'000 rows in memory before you can start iterating the resultset.
>>
>> It's not the same thing as inserting a TOP or a LIMIT clause in the  
>> query.
>>
>> Any help appreciated
>>
>>
>> Andrey Razumovsky wrote:
>>>
>>> Hi,
>>>
>>> Yes of course. You can use SelectQuery.setFetchLimit(int) and
>>> setFetchStart(int) methods. They do just that.
>>>
>>> Andrey
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23560831.html
>> Sent from the Cayenne - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23561312.html
Sent from the Cayenne - User mailing list archive at Nabble.com.


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by Robert Zeigler <ro...@roxanemy.com>.
Take a look at paginated queries (which isn't /quite/ what you want,  
but may help you):

http://cayenne.apache.org/doc/paginated-queries.html

As well as ResultIterator:
http://cayenne.apache.org/doc/iterating-through-data-rows.html

Robert

On May 15, 2009, at 5/159:21 AM , stefcl wrote:

>
>
> Thanks but my problem is not related to paging.
>
> I would like to be able to execute a select query which returns approx
> 100000 rows with a single selectQuery and process them one by one,  
> while
> keeping only a few of them in memory (using the resultiterator).
>
> In jdbc, Statement.setFetchSize(1000) tells the jdbc driver to  
> retrieve
> results from the database 1000 at a time and ask for the next 1000  
> as you're
> iterating the resultset. Otherwise its default behavior is to  
> retrieve the
> 100'000 rows in memory before you can start iterating the resultset.
>
> It's not the same thing as inserting a TOP or a LIMIT clause in the  
> query.
>
> Any help appreciated
>
>
> Andrey Razumovsky wrote:
>>
>> Hi,
>>
>> Yes of course. You can use SelectQuery.setFetchLimit(int) and
>> setFetchStart(int) methods. They do just that.
>>
>> Andrey
>>
>
> -- 
> View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23560831.html
> Sent from the Cayenne - User mailing list archive at Nabble.com.
>


Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Posted by stefcl <st...@gmail.com>.

Thanks but my problem is not related to paging.

I would like to be able to execute a select query which returns approx
100000 rows with a single selectQuery and process them one by one, while
keeping only a few of them in memory (using the resultiterator).

In jdbc, Statement.setFetchSize(1000) tells the jdbc driver to retrieve
results from the database 1000 at a time and ask for the next 1000 as you're
iterating the resultset. Otherwise its default behavior is to retrieve the
100'000 rows in memory before you can start iterating the resultset. 

It's not the same thing as inserting a TOP or a LIMIT clause in the query.

Any help appreciated


Andrey Razumovsky wrote:
> 
> Hi,
> 
> Yes of course. You can use SelectQuery.setFetchLimit(int) and
> setFetchStart(int) methods. They do just that.
> 
> Andrey
> 

-- 
View this message in context: http://www.nabble.com/Big-selects-on-PostGres-%3A-Configuring-Statement.setFetchSize%28%29-of-a-selectquery-tp23555842p23560831.html
Sent from the Cayenne - User mailing list archive at Nabble.com.