You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Jim Newsham <jn...@referentia.com> on 2008/10/21 00:32:01 UTC

excessive disk space allocation

 

Hi,

 

I'm doing some benchmarking of our application which stores data in derby.
The parts of the application which I am exercising only perform inserts, not
deletes.  The results suggest that derby disk space allocation is excessive,
particularly because compressing the tables reduces the size of the database
*substantially*.  For example, here are the results of several databases,
both before and after compression.  

 

Application running time.  original -> compressed

0.5 days.  178.2mb -> 63.1mb

1 day.  559.3mb -> 82.8mb

2 days.  1,879.1mb -> 120.8mb

4 days.  5,154.4mb -> 190.5mb

8 days. 11,443.7mb -> 291.6mb

16 days.  23,706.7mb -> 519.3mb

 

Plotting the data, I observe that both uncompressed and compressed sizes
appear to grow linearly, but the growth factor (slope of the linear
equation) is 53 times as large for the uncompressed database.  Needless to
say. this is huge.

 

I expected that with only inserts and no deletes, there should be little or
no wasted space (and no need for table compression).  Is this assumption
incorrect?

 

Thanks,

Jim

RE: excessive disk space allocation

Posted by Jim Newsham <jn...@referentia.com>.


> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM]
> Sent: Thursday, October 23, 2008 5:26 AM
> To: Derby Discussion
> Subject: Re: excessive disk space allocation
> 
> On 23.10.08 02:26, Jim Newsham wrote:
> > Hasn't been a lot of response to this thread.  I have a 23gb database
> > holding only 500mb of data, all created with just inserts (no deletes).
> For
> > our app, this is a serious problem.
> 
> Hi Jim,
> 
> I would say this is a serious problem for any app...
> Do you have a way to reproduce this behavior?
> If so, it would be very helpful if you could attach the repro to a
> Jira[1] issue.

The behavior is exhibited in our app, which is unfortunately rather complex.
I tried writing a stand-alone test case which behaves similarly to our app,
but haven't been able to reproduce the problem yet.  The problem only
becomes apparent with a sufficiently large database, which takes hours to
generate... so trying to reproduce the problem is a bit time-consuming.
I'll try again later, but time is short for now, so I've got to move on to
"mitigation".

It's also hard to know which factors contribute to the problem, in creating
the test case.  I have numerous threads; each thread inserts into 2 common
tables, as well as 4 private tables.  Auto-commit is disabled, and
transactions are largish (on the order of 100 or perhaps 1000).  Each thread
inserts every 30 seconds, so I'm not really hammering the database.

> 
> 
> >
> > Someone suggested the problem is caused by multi-threaded inserts, but
> the
> > tables which exhibit the problem were only inserted into by a single
> thread,
> > each.
> >
> > Any suggestions?
> >
> > Is there a way to tell, before compacting, how much space would be saved
> by
> > compacting a table?  With this information, at least I would be able to
> > periodically compact just those tables which merit being compacted, as a
> > workaround to the real problem.
> 
> I don't remember the details, but if you have a look at the source for
> "DerbyDiskSpaceDiag.java" attached to DERBY-2549[2] you should be able
> to figure out the SQL required. I'm sure it's in the manual somewhere too.
> You can also just compile and run the program (preferably on a copy /
> non-production db) and see what it says. With 23 GB of data, I think it
> will chew for a while before you see any output.

Yeah I've already found that "script".   It calls the SYSCS_DIAG.SPACE_TABLE
table function described here:

http://db.apache.org/derby/docs/10.4/ref/rrefsyscsdiagtables.html

Unfortunately this table function doesn't really provide enough information
to know in advance, how much wasted space would be freed by compacting a
table.  The NUMUNFILLEDPAGES and ESTIMSPACESAVING fields sound promising,
but the first field is always quite low (apparently derby thinks the pages
are "filled" when in fact they're very sparse), and the second field is
always zero.  

Here's an example (connection0 is the database before compaction;
connection1 is the database after compaction).  Notice how much
NUMALLOCATEDPAGES decreases:

ij(CONNECTION1)> set connection connection0;

0 rows selected
ij(CONNECTION0)> select * from table (syscs_diag.space_table('APP',
'ROUTER_FHK0908F44U_0_SAMPLE')) as t;
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES   |NUMFREEPAGES        |NUMUNFILLEDPAGES
|PAGESIZE   |ESTIMSPACESAVING
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
ROUTER_FHK0908F44U_0_SAMPLE
|0     |206                 |0                   |1                   |32768
|0
SQL081017094932450
|1     |22                  |0                   |0                   |4096
|0
SQL081017094932451
|1     |13                  |0                   |0                   |4096
|0

3 rows selected
ij(CONNECTION0)> SET CONNECTION CONNECTION1;
ij(CONNECTION1)> select * from table (syscs_diag.space_table('APP',
'ROUTER_FHK0908F44U_0_SAMPLE')) as t;
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES   |NUMFREEPAGES        |NUMUNFILLEDPAGES
|PAGESIZE   |ESTIMSPACESAVING
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
ROUTER_FHK0908F44U_0_SAMPLE
|0     |3                   |0                   |1                   |32768
|0
SQL081017094932450
|1     |12                  |0                   |0                   |4096
|0
SQL081017094932451
|1     |8                   |0                   |0                   |4096
|0
3 rows selected

I guess I can try to estimate based on the average record size, but that'll
probably be a little messy and rough.

Regards,
Jim

> 
> 
> 
> --
> Kristian
> 
> [1] https://issues.apache.org/jira/browse/DERBY
> [2] https://issues.apache.org/jira/browse/DERBY-2549
> 
> 
> >
> > Thanks,
> > Jim
> >
> >> -----Original Message-----
> >> From: Jim Newsham [mailto:jnewsham@referentia.com]
> >> Sent: Tuesday, October 21, 2008 11:21 AM
> >> To: 'Derby Discussion'
> >> Subject: RE: excessive disk space allocation
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> >>> Sent: Monday, October 20, 2008 9:27 PM
> >>> To: Derby Discussion
> >>> Subject: Re: excessive disk space allocation
> >>>
> >>> Jim Newsham <jn...@referentia.com> writes:
> >>>
> >>>> Hi,
> >>>>
> >>>> I'm doing some benchmarking of our application which stores data in
> >>> derby.
> >>>> The parts of the application which I am exercising only perform
> >> inserts,
> >>> not
> >>>> deletes.  The results suggest that derby disk space allocation is
> >>> excessive,
> >>>> particularly because compressing the tables reduces the size of the
> >>> database *
> >>>> substantially*.  For example, here are the results of several
> >> databases,
> >>> both
> >>>> before and after compression.
> >>>>
> >>>> Application running time.  original -> compressed
> >>>>
> >>>> 0.5 days.  178.2mb -> 63.1mb
> >>>>
> >>>> 1 day.  559.3mb -> 82.8mb
> >>>>
> >>>> 2 days.  1,879.1mb -> 120.8mb
> >>>>
> >>>> 4 days.  5,154.4mb -> 190.5mb
> >>>>
> >>>> 8 days. 11,443.7mb -> 291.6mb
> >>>>
> >>>> 16 days.  23,706.7mb -> 519.3mb
> >>>>
> >>>> Plotting the data, I observe that both uncompressed and compressed
> >> sizes
> >>>> appear to grow linearly, but the growth factor (slope of the linear
> >>> equation)
> >>>> is 53 times as large for the uncompressed database.  Needless to say.
> >>> this is
> >>>> huge.
> >>>>
> >>>> I expected that with only inserts and no deletes, there should be
> >> little
> >>> or no
> >>>> wasted space (and no need for table compression).  Is this assumption
> >>>> incorrect?
> >>> Hi Jim,
> >>>
> >>> You may have come across a known issue with multi-threaded inserts to
> >>> the same table:
> >>>
> >>> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
> >>> https://issues.apache.org/jira/browse/DERBY-2337
> >>> https://issues.apache.org/jira/browse/DERBY-2338
> >> Thanks for those links.  I used the diagnostic dump program from the
> >> mentioned discussion thread to see how much the individual tables in my
> >> database are compacting.
> >>
> >> The "multi-threaded inserts to the same table" theory doesn't quite
> jive
> >> here.  In my case, I have multiple threads inserting into the database,
> >> but
> >> most of the data goes into tables which are only inserted into by a
> single
> >> thread for the duration of the application.
> >>
> >> There are only two tables inserted into by more than one thread, and
> the
> >> data they contain is relatively small (a few percent).  For a test
> >> database
> >> I'm looking at right now, these two tables compress to 50% and 90% of
> >> original size, respectively... not much at all.
> >>
> >> By contrast, I am seeing most of the other tables (which aren't
> inserted
> >> into by more than one thread) compress to between 0.5% and 3.8% of
> >> original
> >> size.  For example, I see one table go from 783 pages to 4 pages.
> >>
> >> Jim
> >>
> >>
> >>
> >
> >
> >
>

Re: excessive disk space allocation

Posted by Kristian Waagan <Kr...@Sun.COM>.

On 23.10.08 02:26, Jim Newsham wrote:
> Hasn't been a lot of response to this thread.  I have a 23gb database
> holding only 500mb of data, all created with just inserts (no deletes).  For
> our app, this is a serious problem.

Hi Jim,

I would say this is a serious problem for any app...
Do you have a way to reproduce this behavior?
If so, it would be very helpful if you could attach the repro to a 
Jira[1] issue.


> 
> Someone suggested the problem is caused by multi-threaded inserts, but the
> tables which exhibit the problem were only inserted into by a single thread,
> each.  
> 
> Any suggestions? 
> 
> Is there a way to tell, before compacting, how much space would be saved by
> compacting a table?  With this information, at least I would be able to
> periodically compact just those tables which merit being compacted, as a
> workaround to the real problem.

I don't remember the details, but if you have a look at the source for 
"DerbyDiskSpaceDiag.java" attached to DERBY-2549[2] you should be able 
to figure out the SQL required. I'm sure it's in the manual somewhere too.
You can also just compile and run the program (preferably on a copy / 
non-production db) and see what it says. With 23 GB of data, I think it 
will chew for a while before you see any output.



-- 
Kristian

[1] https://issues.apache.org/jira/browse/DERBY
[2] https://issues.apache.org/jira/browse/DERBY-2549


> 
> Thanks,
> Jim
> 
>> -----Original Message-----
>> From: Jim Newsham [mailto:jnewsham@referentia.com]
>> Sent: Tuesday, October 21, 2008 11:21 AM
>> To: 'Derby Discussion'
>> Subject: RE: excessive disk space allocation
>>
>>
>>
>>> -----Original Message-----
>>> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
>>> Sent: Monday, October 20, 2008 9:27 PM
>>> To: Derby Discussion
>>> Subject: Re: excessive disk space allocation
>>>
>>> Jim Newsham <jn...@referentia.com> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm doing some benchmarking of our application which stores data in
>>> derby.
>>>> The parts of the application which I am exercising only perform
>> inserts,
>>> not
>>>> deletes.  The results suggest that derby disk space allocation is
>>> excessive,
>>>> particularly because compressing the tables reduces the size of the
>>> database *
>>>> substantially*.  For example, here are the results of several
>> databases,
>>> both
>>>> before and after compression.
>>>>
>>>> Application running time.  original -> compressed
>>>>
>>>> 0.5 days.  178.2mb -> 63.1mb
>>>>
>>>> 1 day.  559.3mb -> 82.8mb
>>>>
>>>> 2 days.  1,879.1mb -> 120.8mb
>>>>
>>>> 4 days.  5,154.4mb -> 190.5mb
>>>>
>>>> 8 days. 11,443.7mb -> 291.6mb
>>>>
>>>> 16 days.  23,706.7mb -> 519.3mb
>>>>
>>>> Plotting the data, I observe that both uncompressed and compressed
>> sizes
>>>> appear to grow linearly, but the growth factor (slope of the linear
>>> equation)
>>>> is 53 times as large for the uncompressed database.  Needless to say.
>>> this is
>>>> huge.
>>>>
>>>> I expected that with only inserts and no deletes, there should be
>> little
>>> or no
>>>> wasted space (and no need for table compression).  Is this assumption
>>>> incorrect?
>>> Hi Jim,
>>>
>>> You may have come across a known issue with multi-threaded inserts to
>>> the same table:
>>>
>>> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
>>> https://issues.apache.org/jira/browse/DERBY-2337
>>> https://issues.apache.org/jira/browse/DERBY-2338
>> Thanks for those links.  I used the diagnostic dump program from the
>> mentioned discussion thread to see how much the individual tables in my
>> database are compacting.
>>
>> The "multi-threaded inserts to the same table" theory doesn't quite jive
>> here.  In my case, I have multiple threads inserting into the database,
>> but
>> most of the data goes into tables which are only inserted into by a single
>> thread for the duration of the application.
>>
>> There are only two tables inserted into by more than one thread, and the
>> data they contain is relatively small (a few percent).  For a test
>> database
>> I'm looking at right now, these two tables compress to 50% and 90% of
>> original size, respectively... not much at all.
>>
>> By contrast, I am seeing most of the other tables (which aren't inserted
>> into by more than one thread) compress to between 0.5% and 3.8% of
>> original
>> size.  For example, I see one table go from 783 pages to 4 pages.
>>
>> Jim
>>
>>
>>
> 
> 
>

RE: excessive disk space allocation

Posted by Jim Newsham <jn...@referentia.com>.

Hasn't been a lot of response to this thread.  I have a 23gb database
holding only 500mb of data, all created with just inserts (no deletes).  For
our app, this is a serious problem. 

Someone suggested the problem is caused by multi-threaded inserts, but the
tables which exhibit the problem were only inserted into by a single thread,
each.  

Any suggestions? 

Is there a way to tell, before compacting, how much space would be saved by
compacting a table?  With this information, at least I would be able to
periodically compact just those tables which merit being compacted, as a
workaround to the real problem.

Thanks,
Jim

> -----Original Message-----
> From: Jim Newsham [mailto:jnewsham@referentia.com]
> Sent: Tuesday, October 21, 2008 11:21 AM
> To: 'Derby Discussion'
> Subject: RE: excessive disk space allocation
> 
> 
> 
> > -----Original Message-----
> > From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> > Sent: Monday, October 20, 2008 9:27 PM
> > To: Derby Discussion
> > Subject: Re: excessive disk space allocation
> >
> > Jim Newsham <jn...@referentia.com> writes:
> >
> > > Hi,
> > >
> > > I'm doing some benchmarking of our application which stores data in
> > derby.
> > > The parts of the application which I am exercising only perform
> inserts,
> > not
> > > deletes.  The results suggest that derby disk space allocation is
> > excessive,
> > > particularly because compressing the tables reduces the size of the
> > database *
> > > substantially*.  For example, here are the results of several
> databases,
> > both
> > > before and after compression.
> > >
> > > Application running time.  original -> compressed
> > >
> > > 0.5 days.  178.2mb -> 63.1mb
> > >
> > > 1 day.  559.3mb -> 82.8mb
> > >
> > > 2 days.  1,879.1mb -> 120.8mb
> > >
> > > 4 days.  5,154.4mb -> 190.5mb
> > >
> > > 8 days. 11,443.7mb -> 291.6mb
> > >
> > > 16 days.  23,706.7mb -> 519.3mb
> > >
> > > Plotting the data, I observe that both uncompressed and compressed
> sizes
> > > appear to grow linearly, but the growth factor (slope of the linear
> > equation)
> > > is 53 times as large for the uncompressed database.  Needless to say.
> > this is
> > > huge.
> > >
> > > I expected that with only inserts and no deletes, there should be
> little
> > or no
> > > wasted space (and no need for table compression).  Is this assumption
> > > incorrect?
> >
> > Hi Jim,
> >
> > You may have come across a known issue with multi-threaded inserts to
> > the same table:
> >
> > http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
> > https://issues.apache.org/jira/browse/DERBY-2337
> > https://issues.apache.org/jira/browse/DERBY-2338
> 
> Thanks for those links.  I used the diagnostic dump program from the
> mentioned discussion thread to see how much the individual tables in my
> database are compacting.
> 
> The "multi-threaded inserts to the same table" theory doesn't quite jive
> here.  In my case, I have multiple threads inserting into the database,
> but
> most of the data goes into tables which are only inserted into by a single
> thread for the duration of the application.
> 
> There are only two tables inserted into by more than one thread, and the
> data they contain is relatively small (a few percent).  For a test
> database
> I'm looking at right now, these two tables compress to 50% and 90% of
> original size, respectively... not much at all.
> 
> By contrast, I am seeing most of the other tables (which aren't inserted
> into by more than one thread) compress to between 0.5% and 3.8% of
> original
> size.  For example, I see one table go from 783 pages to 4 pages.
> 
> Jim
> 
> 
>

RE: excessive disk space allocation

Posted by Jim Newsham <jn...@referentia.com>.

> -----Original Message-----
> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> Sent: Monday, October 20, 2008 9:27 PM
> To: Derby Discussion
> Subject: Re: excessive disk space allocation
> 
> Jim Newsham <jn...@referentia.com> writes:
> 
> > Hi,
> >
> > I'm doing some benchmarking of our application which stores data in
> derby.
> > The parts of the application which I am exercising only perform inserts,
> not
> > deletes.  The results suggest that derby disk space allocation is
> excessive,
> > particularly because compressing the tables reduces the size of the
> database *
> > substantially*.  For example, here are the results of several databases,
> both
> > before and after compression.
> >
> > Application running time.  original -> compressed
> >
> > 0.5 days.  178.2mb -> 63.1mb
> >
> > 1 day.  559.3mb -> 82.8mb
> >
> > 2 days.  1,879.1mb -> 120.8mb
> >
> > 4 days.  5,154.4mb -> 190.5mb
> >
> > 8 days. 11,443.7mb -> 291.6mb
> >
> > 16 days.  23,706.7mb -> 519.3mb
> >
> > Plotting the data, I observe that both uncompressed and compressed sizes
> > appear to grow linearly, but the growth factor (slope of the linear
> equation)
> > is 53 times as large for the uncompressed database.  Needless to say.
> this is
> > huge.
> >
> > I expected that with only inserts and no deletes, there should be little
> or no
> > wasted space (and no need for table compression).  Is this assumption
> > incorrect?
> 
> Hi Jim,
> 
> You may have come across a known issue with multi-threaded inserts to
> the same table:
> 
> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
> https://issues.apache.org/jira/browse/DERBY-2337
> https://issues.apache.org/jira/browse/DERBY-2338

Thanks for those links.  I used the diagnostic dump program from the
mentioned discussion thread to see how much the individual tables in my
database are compacting.

The "multi-threaded inserts to the same table" theory doesn't quite jive
here.  In my case, I have multiple threads inserting into the database, but
most of the data goes into tables which are only inserted into by a single
thread for the duration of the application.  

There are only two tables inserted into by more than one thread, and the
data they contain is relatively small (a few percent).  For a test database
I'm looking at right now, these two tables compress to 50% and 90% of
original size, respectively... not much at all.

By contrast, I am seeing most of the other tables (which aren't inserted
into by more than one thread) compress to between 0.5% and 3.8% of original
size.  For example, I see one table go from 783 pages to 4 pages.

Jim

Re: excessive disk space allocation

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Jim Newsham <jn...@referentia.com> writes:

> Hi,
>
> I’m doing some benchmarking of our application which stores data in derby. 
> The parts of the application which I am exercising only perform inserts, not
> deletes.  The results suggest that derby disk space allocation is excessive,
> particularly because compressing the tables reduces the size of the database *
> substantially*.  For example, here are the results of several databases, both
> before and after compression. 
>
> Application running time.  original -> compressed
>
> 0.5 days.  178.2mb -> 63.1mb
>
> 1 day.  559.3mb -> 82.8mb
>
> 2 days.  1,879.1mb -> 120.8mb
>
> 4 days.  5,154.4mb -> 190.5mb
>
> 8 days. 11,443.7mb -> 291.6mb
>
> 16 days.  23,706.7mb -> 519.3mb
>
> Plotting the data, I observe that both uncompressed and compressed sizes
> appear to grow linearly, but the growth factor (slope of the linear equation)
> is 53 times as large for the uncompressed database.  Needless to say… this is
> huge.
>
> I expected that with only inserts and no deletes, there should be little or no
> wasted space (and no need for table compression).  Is this assumption
> incorrect?

Hi Jim,

You may have come across a known issue with multi-threaded inserts to
the same table:

http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
https://issues.apache.org/jira/browse/DERBY-2337
https://issues.apache.org/jira/browse/DERBY-2338

-- 
Knut Anders