You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cayenne.apache.org by Tomi N/A <he...@gmail.com> on 2007/05/14 19:57:01 UTC

OutOfMemoryError: reading a large number of objects one by one

I'll try to be to the point.
I need:
- to "stream" data from potentially large tables
- wan't the app to not use more than a couple of MB of RAM, as
(theoretically) there shouldn't be a reason for it to need any more

Using cayenne 1.2 something (no, switching to something a bit more
modern is not a high priority...yet).

Have tried with paginated query, failed.
Have tried with dc.performIteratedQuery(...), failed.
Have tried with a combination, failed.
Have tried various values of cayenne.DataRowStore.snapshot.size (10^3
to 10^5), failed.
Have not tried combining a low DataRowStore.snapshot.size value with
an iterated/paginated query.

This was the last (failed) attempt:

Expression condition = ExpressionFactory.matchExp(MyClassA.TO_B_PROPERTY, b);
		SelectQuery query = new SelectQuery(MyClassA.class, condition);
		query.setFetchLimit(10000);
		query.setPageSize(100);

		ResultIterator ri;
		try {
			ri = Util.getCommonContext().performIteratedQuery(query);
			MyClassA mca;
			while (ri.hasNextRow()) {
				DataRow dr = (DataRow) ri.nextDataRow();
				mca = (MyClassA)
Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr, false);
				
				System.out.println(mca.getAttr1().toString());
				System.out.println(mca.getToMyClassC().getAttr3().toString());
			}
		} catch (CayenneException ex) {
			ex.printStackTrace();
		}

The fetch limit is only a temporary measure: I used it to deduce that
I run out of memory somewhere between a 1000 and 10000.
Now...I see no reason why it would be impossible to have a snippet of
code not unlike this one (working with a hand full of objects at any
given time) work with an arbitrarily large dataset and use no more
than 1MB of RAM, but I'll be generous and say it's great if it works
using under, say, 30MB. :)

Opinions, hints, suggestions, possibilities?

TIA,
t.n.a.

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Derek Rendall <de...@gmail.com>.

My patch seemed to scale well - it got to about (I think) 40 Meg or so then
stabilized with only nominal growth from there. I did not test much higher
than 100 K records as I did not need to. I guess the size will be related to
how big each record is when represented as a data row * number of rows.

I think that this is a pretty common "problem" to ORM tools. When I looked
at it a couple of years ago, it appeared to be the case that neither
Hibernate nor Kodo JDO addressed this issue with any better approach. I did
not check TopLink. ORM tools tend to focus on simplifying user tasks rather
than batch type tasks.

Note: some people will advocate sitting such logic on top of a standard JDBC
result set (I'm not commenting one way or another ;-). Thats really the only
way to avoid loading at least something for each record up front.

Also, you should probably start tracking new objects before the while loop
as well (for the first 100 :-)

Derek

On 5/16/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
>
> On May 15, 2007, at 12:47 AM, Tomi N/A wrote:
>
> > Reduced the max number of objects to 1000. The result? A NPE at:
> >               for (MyClassC mcc :
> > (List<MyClassC>)mca.getToMyClassC().getToParentClass
> > ().getMyClassCArray())
> > {
>
> Ok, so the cache size will have to be big enough to hold all resolved
> objects within the lifetime of a context. So let's try another
> strategy. Return the max objects back to 10000 and uncheck "use
> shared cache" for the DataDomain.
>
> If this doesn't work, I suggest to run the app in profiler to see
> exactly how objects are allocated and collected.
>
> > The database referential integrity ensures there can be no nulls if
> > (mcc != null), which it is.
> > As far as -Xmx is concerned, it's at it's default value (64M), which
> > should be several times more than necessary for the job.
>
> Agreed - the default 64m should be enough if there's no leaks.
>
> Andrus
>

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Tomi N/A <he...@gmail.com>.

Is there a minimal safe streaming loop that can be constructed which I
can use to experiment with?
Let's say I have a billion rows in TestEntityA and want to print their
information to the stdout...what would be the minimal safe code to do
that?

t.n.a.

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Andrus Adamchik <an...@objectstyle.org>.

On May 15, 2007, at 12:47 AM, Tomi N/A wrote:

> Reduced the max number of objects to 1000. The result? A NPE at:
> 		for (MyClassC mcc :
> (List<MyClassC>)mca.getToMyClassC().getToParentClass 
> ().getMyClassCArray())
> {

Ok, so the cache size will have to be big enough to hold all resolved  
objects within the lifetime of a context. So let's try another  
strategy. Return the max objects back to 10000 and uncheck "use  
shared cache" for the DataDomain.

If this doesn't work, I suggest to run the app in profiler to see  
exactly how objects are allocated and collected.

> The database referential integrity ensures there can be no nulls if
> (mcc != null), which it is.
> As far as -Xmx is concerned, it's at it's default value (64M), which
> should be several times more than necessary for the job.

Agreed - the default 64m should be enough if there's no leaks.

Andrus

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Tomi N/A <he...@gmail.com>.

2007/5/14, Derek Rendall <de...@gmail.com>:
> OK, my memory on this stuff is now going back a year or two, but I did do
> some extensive playing around with exactly this scenario. My example died
> about 6-7 thousand records in 64 Megs - I found out why, and it seemed
> pretty reasonable reasons at the time (something to do with a data row being
> inflated to an object and then being cached in the object store as an
> object). As I had some time back then I ended up creating a version of
> Cayenne that handled pretty large data sets (I only needed 100 thousand
> records to be handled), by making some relatively minor adjustments. I did
> discuss some of the techniques on the mailing lists but I can't seem to find
> the entries now. I did find a Jira issue:
> http://issues.apache.org/cayenne/browse/CAY-294

Thanks for the comment, Derek.
Your patch would probably serve me just fine for this project (<80k
records), but I'm looking for a more general approach in light of
future projects...more importantly, because I think that this is the
sort of thing an ORM layer should support and should support well (for
an arbitrary number of records, basically), I'm inclined to look
deeper into the problem.

> Try doing the following every 100 records or so (BTW: not sure if this stuff
> actually still around :-) YMMV:
>
>                 getDataContext().getObjectStore().unregisterNewObjects();
>                 getDataContext().getObjectStore().startTrackingNewObjects();

My mileage did vary.
Your suggestion had two effects:
1.) I got to work with an order of magnitude larger datasets
(experimenting just how much bigger right now)
2.) I got unexplicable NPEs, for instance: I have a table A and a
table redundantA which holds cached data about records in A. They use
the same pk and are kept automatically in sync with triggers. However,
a.getToRedundantA() gives me null for certain records (always the same
ones?!). This should not be possible. I've checked and the data in the
database is valid, it's an application problem.

First of all, I'm rather unnerved by the fact that this should occur
in such a (seemingly) random fashion: this greatly (coupled with
similar problems when I set the DataRowStore size to low values like
1000) undermines my confidence that the objects I have in memory
represent the database state. Secondly, I'm interested to know why it
happens. Here's what I did this time:

int i=0;
while (ri.hasNextRow()) {
	DataRow dr = (DataRow) ri.nextDataRow();
	mca = (MyClassA)
Util.getCommonDataContext().objectFromDataRow(MyClassA.class, dr,
false);
	...do something with mca ....
	if (i++ == 99) {
		Util.getCommonDataContext().getObjectStore().unregisterNewObjects();
		Util.getCommonDataContext().getObjectStore().startTrackingNewObjects();
		i=0;
	}
}

I'll try to map out the memory consumption by varying the reset limit
(99) and other parameters...but the memory consumption is completely
irrelevant when compared to the problem of unreliable data.
I'd rather be back at square one than have to worry if an object is
correctly reconstructed after I do objectFromDataRow.

Cheers,
t.n.a.

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Derek Rendall <de...@gmail.com>.

OK, my memory on this stuff is now going back a year or two, but I did do
some extensive playing around with exactly this scenario. My example died
about 6-7 thousand records in 64 Megs - I found out why, and it seemed
pretty reasonable reasons at the time (something to do with a data row being
inflated to an object and then being cached in the object store as an
object). As I had some time back then I ended up creating a version of
Cayenne that handled pretty large data sets (I only needed 100 thousand
records to be handled), by making some relatively minor adjustments. I did
discuss some of the techniques on the mailing lists but I can't seem to find
the entries now. I did find a Jira issue:
http://issues.apache.org/cayenne/browse/CAY-294

Try doing the following every 100 records or so (BTW: not sure if this stuff
actually still around :-) YMMV:

                getDataContext().getObjectStore().unregisterNewObjects();
                getDataContext().getObjectStore().startTrackingNewObjects();

Hope that helps

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Tomi N/A <he...@gmail.com>.

2007/5/14, Andrus Adamchik <an...@objectstyle.org>:
> You may also try to reduce the "Max Number of Objects" for the
> DataDomain from the default 10000 to something smaller.
>
> Also are you setting "-Xmx" to something small? You may want to start
> with the higher values, if only to be able to see the effect of
> various memory saving strategies before the app crashes.

Reduced the max number of objects to 1000. The result? A NPE at:
		for (MyClassC mcc :
(List<MyClassC>)mca.getToMyClassC().getToParentClass().getMyClassCArray())
{

The database referential integrity ensures there can be no nulls if
(mcc != null), which it is.
As far as -Xmx is concerned, it's at it's default value (64M), which
should be several times more than necessary for the job.
An embarrassing side problem is that I don't know how to set it to
something greater within the NetBeans IDE when running the JUnit test
I'm running. However, that's not the main issue, is it?
I've ran into this problem before: allocating 512MB of RAM just to
calculate a report in memory is hardly justified, but that's exactly
what happens in another application I've built.

Any other thoughts?

BTW, this feels like a good moment to point out that the first reply
to my question came within 10 minutes of me sending the question, and
it wasn't the first timethat happened, either: I've never ever seen
this kind of support, free or commercial. It'd go without saying that
I appreciate it, but I'll say it anyway: thanks Andrus, thanks
everyone.

Cheers,
t.n.a.

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Andrus Adamchik <an...@objectstyle.org>.

You may also try to reduce the "Max Number of Objects" for the  
DataDomain from the default 10000 to something smaller.

Also are you setting "-Xmx" to something small? You may want to start  
with the higher values, if only to be able to see the effect of  
various memory saving strategies before the app crashes.

Andrus


On May 14, 2007, at 10:39 PM, Tomi N/A wrote:

> 2007/5/14, Andrus Adamchik <an...@objectstyle.org>:
>>
>> On May 14, 2007, at 8:57 PM, Tomi N/A wrote:
>>
>> > mca = (MyClassA)
>> > Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr,  
>> false);
>>
>> This is the source of the memory leak. You may want to replace the
>> context after processing an X number of rows.
>
> No effect, captain.
> I replaced
>    Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr,  
> false);
> with
>    Util.getNewContext().objectFromDataRow(SpisUtil.class, dr, false)
> where getNewContext() does
>    return Configuration.getSharedConfiguration().getDomain 
> ().createDataContext();
>
> I understand I wouldn't want to create a new DataContext for *every*
> DataObject, but I had hoped it would be proof of concept.
>
> Any other ideas, comments, hints as to why the above approach  
> didn't work?
>
> Cheers,
> t.n.a.
>

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Tomi N/A <he...@gmail.com>.

2007/5/14, Andrus Adamchik <an...@objectstyle.org>:
>
> On May 14, 2007, at 8:57 PM, Tomi N/A wrote:
>
> > mca = (MyClassA)
> > Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr, false);
>
> This is the source of the memory leak. You may want to replace the
> context after processing an X number of rows.

No effect, captain.
I replaced
    Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr, false);
with
    Util.getNewContext().objectFromDataRow(SpisUtil.class, dr, false)
where getNewContext() does
    return Configuration.getSharedConfiguration().getDomain().createDataContext();

I understand I wouldn't want to create a new DataContext for *every*
DataObject, but I had hoped it would be proof of concept.

Any other ideas, comments, hints as to why the above approach didn't work?

Cheers,
t.n.a.

Re: OutOfMemoryError: reading a large number of objects one by one

Posted by Andrus Adamchik <an...@objectstyle.org>.

On May 14, 2007, at 8:57 PM, Tomi N/A wrote:

> mca = (MyClassA)
> Util.getCommonContext().objectFromDataRow(SpisUtil.class, dr, false);

This is the source of the memory leak. You may want to replace the  
context after processing an X number of rows.

(btw, Cayenne 3.0 nightly's fix that, but you mentioned you don't  
want to upgrade just yet)

Andrus