You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by yves pielusenet <yv...@free.fr> on 2006/06/12 16:18:07 UTC

Re: Iterate over big table...

Le vendredi 19 mai 2006 à 13:13 -0700, Stanley Bradbury a écrit :
> yves pielusenet wrote:
> 
> >Hello,
> >I have such a table :
> >CREATE TABLE data (
> >   numvign INTEGER PRIMARY KEY,
> >   data BLOB NOT NULL
> >)
> >
> >This table should have millions of records. I have to iterate over all
> >records.
> >What is the best way (I don't want to have a outOfMemoryException) :
> >
> >for(int i=0; i<nbRows; i++){
> >    theData = select data from data where numvign=i;
> >    doSomething(theData);
> >}
> >
> >or :
> >
> >alldata = select data from data order by numvign;
> >while(alldata.hasNext()){
> >   doSomething(allData.next());
> >}
> >
> >When I do a big select, are all selected rows into memory or derby
> >access to database when it necessary ?
> >
> >thanks :)
> >
> >  
> >
> Hi -
> With the proper Derby pageCache and JVM maxHeap settings you should not 
> get OutOfMemory (OOM) exceptions when using Derby.  Determining the 
> exact, best setting requires iterative testing on your application using 
> different settings.  I've included guidelines below on how to balance 
> these factors for a system.  The default pageCache setting of Derby 
> works well for databases that do not make heavy use of LOB data (thus 
> utilizing many 32 K pages rather than the standard 4 K pages).  When you 
> select  from the above table you will be using all 32 K pages and should 
> compute the proper setting for pageCache and maxHeap to obtain a ratio 
> on 16:1 or higher.  I have not heard of OOM exceptions happening in 
> systems using a 20:1 ratio or higher.
> 
> Here's the basic information you need to know:
> Avoinding  OOM exceptions means balancing three things:
>      How much of the  physical memory of the machine you can allocate to 
> the JVM.
>           IMPORTANT: The JVM must fit in physical memory.
>      The maximum heap size allowed for the jvm (-Xmx )
>      The size of the Derby/Cloudscape pageCache (pageCache * avg. pageSize)
> 
> The JVM maximum Heap size (maxHeap) is set using the java parameter 
> "-Xmx#m" where # is the maximum number of Mb to allocate to the heap 
> (for example java -Xmx128m - for a 128 Mb maxHeap).
> The Derby pageCache size is set using the property 
> derby.storage.pageCacheSize. pageCache and is specified in database 
> pages (which can be of various sizes: 4K for standard tables, 32K for 
> tables with LOBs). 
> 
> The following sizing method computes the number of pages to allocate to 
> the pageCache as a ratio of the JVM maximum Heap size.  This ratio for a 
> default implementation of Derby is 16:1 (default sizes:  pageCache of 
> 1000 pages, only 4 K pages in the database running in a 64 Mb JVM) and 
> can be calculated using the following formula:  JVM maxHeap / ( (avg. 
> page size) * pageCache) / pageCache) [ 64M / ( (1000*4)/1000 ].  If you 
> are getting OOM exceptions in Derby it is likely that your 'avg. page 
> size' is greater than 4 K.
> 
> 
Hello,
I know it is a old message. I understand the aswer. I use this into my
application :

String pageSize = System.getProperty("derby.storage.pageSize", "16384");
System.getProperties().put("derby.storage.pageSize", pageSize);

String pageCacheSize = System.getProperty("derby.storage.pageCacheSize", "500");
System.getProperties().put("derby.storage.pageCacheSize", pageCacheSize);

String nbInitialPage = System.getProperty("derby.storage.initialPages", "100");
System.getProperties().put("derby.storage.initialPages", nbInitialPage);

It works well when I do lots of 'select' which returns me a single row.

As as say in my first post, I have to iterate aver a very big table
(millions of rows).
So can I do a 'select * from data order by index' over this big table ?
is it better than doing :
for(int id=1; id<nbrow; id++){
   ResultSet rs = select * from data where index=id
}
?

is the resultSet of jdbc derby use the pagecache ?

thanks :)

-- 
yves piel


Re: Iterate over big table...

Posted by Dy...@Sun.COM.
Suavi Ali Demir <de...@yahoo.com> writes:

> if it is a forward only cursor, when you do a select * and loop through all rows and doSomething() with each row (and not hold a reference to the data yourself), there would be only 1 row's data in memory at a given time, no? (or few rows if Derby prefetches in chunks as you do rs.next())

Could well be. I don't know :) I assumed that the OP had tried it and
WAS seeing out-of-memory errors when querying for all the
records.

-- 
dt

Re: Iterate over big table...

Posted by yves pielusenet <yv...@free.fr>.
Le lundi 12 juin 2006 à 08:02 -0700, Suavi Ali Demir a écrit :
> if it is a forward only cursor, when you do a select * and loop
> through all rows and doSomething() with each row (and not hold a
> reference to the data yourself), there would be only 1 row's data in
> memory at a given time, no? (or few rows if Derby prefetches in chunks
> as you do rs.next())
Cool :)
if it's right it is wonderfull.
Yes I only deals with forward only cursor and I loopthought all rows
without hold any references.
I can't use stored procedure since this is BLOB data wich contains
serialized objects.

best regards

-- 
yves piel

> 
> 
> Dyre.Tjeldvoll@Sun.COM wrote:
>         yves pielusenet writes:
>         
>         > Le vendredi 19 mai 2006 à 13:13 -0700, Stanley Bradbury a
>         écrit :
>         >> yves pielusenet wrote:
>         >> 
>         >> >Hello,
>         >> >I have such a table :
>         >> >CREATE TABLE data (
>         >> > numvign INTEGER PRIMARY KEY,
>         >> > data BLOB NOT NULL
>         >> >)
>         >> >
>         >> >This table should have millions of records. I have to
>         iterate over all
>         >> >records.
>         >> >What is the best way (I don't want to have a
>         outOfMemoryException) :
>         >> >
>         >> >for(int i=0; i>> > theData = select data from data where
>         numvign=i;
>         >> > doSomething(theData);
>         >> >}
>         >> >
>         >> >or :
>         >> >
>         >> >alldata = select data from data order by numvign;
>         >> >while(alldata.hasNext()){
>         >> > doSomething(allData.next());
>         >> >}
>         
>         Depends on what "doSomething" does. If doSomething could be
>         expressed
>         as SQL or done from a stored procedure, you wouldn't have to
>         materialize the entire 1 mill rowset in the client. 
>         
>         Instead of either getting all 1 mill rows into a single rs, or
>         performing 1 mill queries yielding a single-row rs, you could
>         try to
>         read as many rows as you think is safe, and repeat until you
>         are
>         done. But this doesn't work so well if there are "holes" in
>         your pk
>         sequence...
>         
>         -- 
>         dt
>         
> 


Re: Iterate over big table...

Posted by Suavi Ali Demir <de...@yahoo.com>.
if it is a forward only cursor, when you do a select * and loop through all rows and doSomething() with each row (and not hold a reference to the data yourself), there would be only 1 row's data in memory at a given time, no? (or few rows if Derby prefetches in chunks as you do rs.next())
  

Dyre.Tjeldvoll@Sun.COM wrote:
  yves pielusenet writes:

> Le vendredi 19 mai 2006 à 13:13 -0700, Stanley Bradbury a écrit :
>> yves pielusenet wrote:
>> 
>> >Hello,
>> >I have such a table :
>> >CREATE TABLE data (
>> > numvign INTEGER PRIMARY KEY,
>> > data BLOB NOT NULL
>> >)
>> >
>> >This table should have millions of records. I have to iterate over all
>> >records.
>> >What is the best way (I don't want to have a outOfMemoryException) :
>> >
>> >for(int i=0; i>> > theData = select data from data where numvign=i;
>> > doSomething(theData);
>> >}
>> >
>> >or :
>> >
>> >alldata = select data from data order by numvign;
>> >while(alldata.hasNext()){
>> > doSomething(allData.next());
>> >}

Depends on what "doSomething" does. If doSomething could be expressed
as SQL or done from a stored procedure, you wouldn't have to
materialize the entire 1 mill rowset in the client. 

Instead of either getting all 1 mill rows into a single rs, or
performing 1 mill queries yielding a single-row rs, you could try to
read as many rows as you think is safe, and repeat until you are
done. But this doesn't work so well if there are "holes" in your pk
sequence...

-- 
dt



Re: Iterate over big table...

Posted by Dy...@Sun.COM.
yves pielusenet <yv...@free.fr> writes:

> Le vendredi 19 mai 2006 à 13:13 -0700, Stanley Bradbury a écrit :
>> yves pielusenet wrote:
>> 
>> >Hello,
>> >I have such a table :
>> >CREATE TABLE data (
>> >   numvign INTEGER PRIMARY KEY,
>> >   data BLOB NOT NULL
>> >)
>> >
>> >This table should have millions of records. I have to iterate over all
>> >records.
>> >What is the best way (I don't want to have a outOfMemoryException) :
>> >
>> >for(int i=0; i<nbRows; i++){
>> >    theData = select data from data where numvign=i;
>> >    doSomething(theData);
>> >}
>> >
>> >or :
>> >
>> >alldata = select data from data order by numvign;
>> >while(alldata.hasNext()){
>> >   doSomething(allData.next());
>> >}

Depends on what "doSomething" does. If doSomething could be expressed
as SQL or done from a stored procedure, you wouldn't have to
materialize the entire 1 mill rowset in the client. 

Instead of either getting all 1 mill rows into a single rs, or
performing 1 mill queries yielding a single-row rs, you could try to
read as many rows as you think is safe, and repeat until you are
done. But this doesn't work so well if there are "holes" in your pk
sequence...

-- 
dt