You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by De Meyer Tim <Ti...@cegeka.be> on 2011/08/05 16:35:37 UTC

Derby eating up disk space + how to trigger ReclaimSpace elegantly

Hi,

We're working on a java webapp and using a derby database (v10.5) in embedded mode.
The first version of the application went live about 2 months ago.
The application is used to make invoices and we also store the XML of an invoice document (input data for generating a PDF against a template).
Today, the database is about 1,7GB on disk.

During these 2 months, we've released some minor upgrades, including database migration scripts (like extra tables for new functionality).
Now we've noticed that, when we ran our latest upgrade, the database has suddenly shrunk to a size of 600MB.
This latest upgrade contained a migration script that dropped a no longer needed column on practically every table.
It's after executing this script that the shrink happened.
We did some debugging and hit suspend when the disk size started shrinking, it lead us to a Derby class called "ReclaimSpace".

The shrink was a bliss, because the customer was already complaining about the large size on disk :-)
We're afraid the database will start using up unnecessary space again soon, and of course, we're not going to have a similar migration script in every upgrade.
Is there an elegant way to configure derby to do this cleaning continuously, or to let our webapp instruct derby to do some cleaning?
We're launching the webapp from within a small java webstart app (we launch a Jetty and attach our war file), so it's even OK for us to write some java code to do it programmatically.

We've found this, but it's on a per table basis.
http://db.apache.org/derby/docs/10.1/adminguide/cadminspace21579.html
http://db.apache.org/derby/docs/10.5/ref/rrefaltertablecompress.html

By the way, we're running about 10 of these webapps on different desktop PC's.
Each webapp synchronizes its data to a server running a postgresql 9 database.
This means that this postgres database accumulates all the data of these 10 webapps (some data is shared, so it's not times ten).
The size of this postgres DB is less than 1GB, which we think is surprisingly small compared to the derby DB for one webapp.

Any help on all of this would be more than welcome.

Kind regards,

Tim De Meyer


Re: Derby eating up disk space + how to trigger ReclaimSpace elegantly

Posted by "Dag H. Wanvik" <da...@oracle.com>.
On 05.08.2011 16:35, De Meyer Tim wrote:
>
> Hi,
>
> We're working on a java webapp and using a derby database (v10.5) in 
> embedded mode.
> The first version of the application went live about 2 months ago.
> The application is used to make invoices and we also store the XML of 
> an invoice document (input data for generating a PDF against a template).
> Today, the database is about 1,7GB on disk.
>
> During these 2 months, we've released some minor upgrades, including 
> database migration scripts (like extra tables for new functionality).
> Now we've noticed that, when we ran our latest upgrade, the database 
> has suddenly shrunk to a size of 600MB.
> This latest upgrade contained a migration script that dropped a no 
> longer needed column on practically every table.
> It's after executing this script that the shrink happened.
> We did some debugging and hit suspend when the disk size started 
> shrinking, it lead us to a Derby class called "ReclaimSpace".
>
> The shrink was a bliss, because the customer was already complaining 
> about the large size on disk :-)
> We're afraid the database will start using up unnecessary space again 
> soon, and of course, we're not going to have a similar migration 
> script in every upgrade.
> Is there an elegant way to configure derby to do this cleaning 
> continuously, or to let our webapp instruct derby to do some cleaning?
> We're launching the webapp from within a small java webstart app (we 
> launch a Jetty and attach our war file), so it's even OK for us to 
> write some java code to do it programmatically.
>
> We've found this, but it's on a per table basis.
> http://db.apache.org/derby/docs/10.1/adminguide/cadminspace21579.html
> http://db.apache.org/derby/docs/10.5/ref/rrefaltertablecompress.htm 
> <http://db.apache.org/derby/docs/10.5/ref/rrefaltertablecompress.html>
>
The SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE and 
SYSCS_UTIL.SYSCS_COMPRESS_TABLE are the normal two way to reclaim space 
in Derby. The latter will also return space to the operating system, but 
requires a table level lock. The fact that they procedures opererates on 
a table at a time should't pose any big problem, you can use metadata to 
iterate over the tables of interest.

Some users have apps that can't afford to wait for the cleaning and opt 
for partitioning tables, say, one per month of data, and then dropping 
the tables that are older than a certain threshold. Views and/or table 
functions can be used to allows viewing the data as a whole in queries.

Thanks,
Dag




>
> By the way, we're running about 10 of these webapps on different 
> desktop PC's.
> Each webapp synchronizes its data to a server running a postgresql 9 
> database.
> This means that this postgres database accumulates all the data of 
> these 10 webapps (some data is shared, so it's not times ten).
> The size of this postgres DB is less than 1GB, which we think is 
> surprisingly small compared to the derby DB for one webapp.
>
> Any help on all of this would be more than welcome.
>
> Kind regards,
>
> Tim De Meyer
>


Re: Derby eating up disk space + how to trigger ReclaimSpace elegantly

Posted by Phil Bradley <ph...@tower.ie>.
Hi,

For what it's worth, I had an approximately similar experience
with approximately similar volumes of data on a desktop app using
Derby embedded. I introduced a daily compress of all tables using
the following command (for each table):

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('MYSCHEMA', 'MYTABLE', 0)

The schedule for this was handled at the application level rather
than the derby level. Specifically, we used quartz to do the
scheduling.

Also, as you mention, this is on a per table basis. I don't
believe there is a single command that you can use to compress
the whole database so we just use the database metadata to list
all tables and apply the procedure to each.

Regards,
Phil





On Fri, 05 Aug 2011 16:35 +0200, "De Meyer Tim"
<Ti...@cegeka.be> wrote:

  Hi,
  We're working on a java webapp and using a derby database
  (v10.5) in embedded mode.
  The first version of the application went live about 2 months
  ago.
  The application is used to make invoices and we also store the
  XML of an invoice document (input data for generating a PDF
  against a template).
  Today, the database is about 1,7GB on disk.
  During these 2 months, we've released some minor upgrades,
  including database migration scripts (like extra tables for
  new functionality).
  Now we've noticed that, when we ran our latest upgrade, the
  database has suddenly shrunk to a size of 600MB.
  This latest upgrade contained a migration script that dropped
  a no longer needed column on practically every table.
  It's after executing this script that the shrink happened.
  We did some debugging and hit suspend when the disk size
  started shrinking, it lead us to a Derby class called
  "ReclaimSpace".
  The shrink was a bliss, because the customer was already
  complaining about the large size on disk :-)
  We're afraid the database will start using up unnecessary
  space again soon, and of course, we're not going to have a
  similar migration script in every upgrade.
  Is there an elegant way to configure derby to do this cleaning
  continuously, or to let our webapp instruct derby to do some
  cleaning?
  We're launching the webapp from within a small java webstart
  app (we launch a Jetty and attach our war file), so it's even
  OK for us to write some java code to do it programmatically.
  We've found this, but it's on a per table basis.
  [1]http://db.apache.org/derby/docs/10.1/adminguide/cadminspace
  21579.html
  [2]http://db.apache.org/derby/docs/10.5/ref/rrefaltertablecomp
  ress.html
  By the way, we're running about 10 of these webapps on
  different desktop PC's.
  Each webapp synchronizes its data to a server running a
  postgresql 9 database.
  This means that this postgres database accumulates all the
  data of these 10 webapps (some data is shared, so it's not
  times ten).
  The size of this postgres DB is less than 1GB, which we think
  is surprisingly small compared to the derby DB for one webapp.
  Any help on all of this would be more than welcome.
  Kind regards,
  Tim De Meyer

References

1. http://db.apache.org/derby/docs/10.1/adminguide/cadminspace21579.html
2. http://db.apache.org/derby/docs/10.5/ref/rrefaltertablecompress.html