You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Amitava Kundu1 <am...@in.ibm.com> on 2014/01/30 07:46:00 UTC

Issue with large delete in derby

Hi,
We are using embedded derby 10.5.1.1 in our product, This derby database is
used as regular RDBMS where lot of insert, delete and select happens, There
are business entities each of its occurrence could be of size 10 GB and
upward e.g. a huge log file data.
In our application, we use cascade delete and also has referential
integrity constraints ON.

This application runs on 64 bit Linux with 8 GB RAM allocated to JVM.
Similar time is observed in our development Windows box.

 It takes more than 3 hour to delete those entities. During this time all
the relevant tables stay locked and no other operation is feasible.

We'd like know what could be different options/ strategy be adopted for:
   Speeding up the delete process
   Ability to other database activities in parallel


Thanks
	Amitava Kundu

RE: Issue with large delete in derby

Posted by "John I. Moore, Jr." <so...@att.net>.

Here is a personal experience that could be related.

I once worked on an application that used a widely known commercial database (not Derby),
and a script that someone else wrote took several hours to perform a series of deletions
and insertions.  After examining the script, I realized that there was a single commit at
the end.  By adding a few commits at appropriate places where the database would remain
consistent for use on the application, we were able to achieve an order of magnitude
increase in performance.  Hope this helps.

_________________________________________

John I. Moore, Jr.
SoftMoore Consulting

-----Original Message-----
From: Amitava Kundu1 [mailto:amitavakundu@in.ibm.com] 
Sent: Thursday, January 30, 2014 1:46 AM
To: derby-user@db.apache.org
Subject: Issue with large delete in derby


Hi,
We are using embedded derby 10.5.1.1 in our product, This derby database is
used as regular RDBMS where lot of insert, delete and select happens, There
are business entities each of its occurrence could be of size 10 GB and
upward e.g. a huge log file data.
In our application, we use cascade delete and also has referential
integrity constraints ON.

This application runs on 64 bit Linux with 8 GB RAM allocated to JVM.
Similar time is observed in our development Windows box.

 It takes more than 3 hour to delete those entities. During this time all
the relevant tables stay locked and no other operation is feasible.

We'd like know what could be different options/ strategy be adopted for:
   Speeding up the delete process
   Ability to other database activities in parallel


Thanks
	Amitava Kundu

Re: Issue with large delete in derby

Posted by mike matrigali <mi...@gmail.com>.

On 1/29/2014 10:46 PM, Amitava Kundu1 wrote:
>
> Hi,
> We are using embedded derby 10.5.1.1 in our product, This derby database is
> used as regular RDBMS where lot of insert, delete and select happens, There
> are business entities each of its occurrence could be of size 10 GB and
> upward e.g. a huge log file data.
> In our application, we use cascade delete and also has referential
> integrity constraints ON.
>
> This application runs on 64 bit Linux with 8 GB RAM allocated to JVM.
> Similar time is observed in our development Windows box.
>
>   It takes more than 3 hour to delete those entities. During this time all
> the relevant tables stay locked and no other operation is feasible.
>
> We'd like know what could be different options/ strategy be adopted for:
>     Speeding up the delete process
>     Ability to other database activities in parallel
>
>
> Thanks
> 	Amitava Kundu
>
>
here is a link about diagnosing performance issues that may help:
http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

Re: Issue with large delete in derby

Posted by mike matrigali <mi...@gmail.com>.

I think more information is needed to be able to help.  For instance:
o ddl of the table and indexes and referential constraints
o query plan of an example delete that takes longer than you expect.
Posting a reproducible case using dummy data is the easiest way for the
community to help you.

>From the query plan we can tell you if derby is doing expected row level
locking, and using expected indexes or if for some reason it
is doing a whole table scan.  The table scan could be because application
does not have appropriate indexes defined or it could be because derby has
chosen a bad plan.

For plan problems in 10.5.1.1 I would suggest making sure to run
update statistics on every table in your system.  Or suggest you upgrade
to the latest derby 10.10 derby release which will automatically update the
statistics in background for you.

I don't know if it fits your application model, but a number of derby
applications that follow the pattern of insert a bunch of connected data
and then delete it all later often use a different table for each of this
connected data.  The usual case is a unit of time of data is collected (say
a week) and then the next week is put into a separate set
of tables and the previous week is all deleted at once by dropping the
table.  The benefit is that a drop of a table in derby does way less
work than individually deleting each row in the table.  This is for
a few reasons:
1) delete must log a record for each row deleted, while drop need only log
a few records associated with dropping the table(s)
2) delete must do extra post commit work to recover space for subsequent
inserts, while drop need not do any.

If these applications need to run queries across all the related tables
there are a few options:
1) hand build a union
2) use table functions to make all the related tables look like one for
selects.

One other suggestion.  Your application sounds for I/O dependent.  If you
have more than one disk on your system
it is possible with derby to place the transaction log on one disk and the
database on a different disk to get better I/O
performance.

On Wed, Jan 29, 2014 at 10:46 PM, Amitava Kundu1 <am...@in.ibm.com>wrote:

>
> Hi,
> We are using embedded derby 10.5.1.1 in our product, This derby database is
> used as regular RDBMS where lot of insert, delete and select happens, There
> are business entities each of its occurrence could be of size 10 GB and
> upward e.g. a huge log file data.
> In our application, we use cascade delete and also has referential
> integrity constraints ON.
>
> This application runs on 64 bit Linux with 8 GB RAM allocated to JVM.
> Similar time is observed in our development Windows box.
>
>  It takes more than 3 hour to delete those entities. During this time all
> the relevant tables stay locked and no other operation is feasible.
>
> We'd like know what could be different options/ strategy be adopted for:
>    Speeding up the delete process
>    Ability to other database activities in parallel
>
>
> Thanks
>         Amitava Kundu
>
>