You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cayenne.apache.org by Simon Schneider <ss...@mackoy.de> on 2012/12/12 14:18:33 UTC

Fetching and processing a large amount of objects

Hi list,

my task is to fetch and process a large amount of objects. Reading the documentation my best choice
seemed to do a dataContext.performIteratedQuery(query). The problem is that I get a Java Heap Space
error, on executing this statement. Now it seems to me the problem is that performIteratedQuery loads
all data into a ResultSet.
My question is how to handle such cases, is there any best practice? Would it be ok to use a paginated
query. Then unregistering processed objects from the DataContext they belong to and also removing
them from the array that resulted from the paginated query.

Regards Simon

Re: Fetching and processing a large amount of objects

Posted by Aristedes Maniatis <ar...@maniatis.org>.

On 13/12/12 12:18am, Simon Schneider wrote:
> Hi list,
>
> my task is to fetch and process a large amount of objects. Reading the documentation my best choice
> seemed to do a dataContext.performIteratedQuery(query). The problem is that I get a Java Heap Space
> error, on executing this statement. Now it seems to me the problem is that performIteratedQuery loads
> all data into a ResultSet.
> My question is how to handle such cases, is there any best practice? Would it be ok to use a paginated
> query. Then unregistering processed objects from the DataContext they belong to and also removing
> them from the array that resulted from the paginated query.

By the time you step through to the end of the results, even with a paginated query you will have read every object into RAM. However paginated queries are a really convenient way to fetch even millions of rows where you don't necessarily want to access every object (eg. when you want to display a scrollable list of results and you only need to resolve the objects you can see in the current viewport).

As Michael says, an option is to read in the data in batches into different Contexts.

Ari

-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Fetching and processing a large amount of objects

Posted by Michael Gentry <mg...@masslight.net>.

Hi Simon, some questions:

1) How many records are you talking about?
2) Are you updating your object with a flag/etc you can query on again
later (to exclude objects you've already processed)?
3) What version of Cayenne are you using and what database?
4) When you convert your Map (from the iterated query) into a DataObject,
are you creating a new DataContext or using the old one over and over again?

For #4, if you are using the same DataContext repeatedly, try changing your
logic to something more like:

while (iterator.hasNextRow())
{
    DataContext context = DataContext.createDataContext();
    Map row = (Map) iterator.nextRow();
    CayenneObject object = (CayenneObject)
context.objectFromDataRow("CayenneObject", row);
    ...
    object.doStuff();
    ...
    context.commitChanges();
}

This way you won't build up a ton of objects in a single DataContext and
possibly run out of memory.

mrg

On Wed, Dec 12, 2012 at 8:18 AM, Simon Schneider <ss...@mackoy.de>
wrote:
>
> Hi list,
>
> my task is to fetch and process a large amount of objects. Reading the
documentation my best choice
> seemed to do a dataContext.performIteratedQuery(query). The problem is
that I get a Java Heap Space
> error, on executing this statement. Now it seems to me the problem is
that performIteratedQuery loads
> all data into a ResultSet.
> My question is how to handle such cases, is there any best practice?
Would it be ok to use a paginated
> query. Then unregistering processed objects from the DataContext they
belong to and also removing
> them from the array that resulted from the paginated query.
>
> Regards Simon