You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Monica Ferrero <MF...@accelrys.com> on 2009/01/14 12:49:39 UTC

Performance deleting lots of rows from a table

Hi!

I need to delete a considerable number of rows from a table, and it is taking too long.  The operation takes equally long from ij, so I think it is independent of my application code.

So for instance in the case below, it took about 23 seconds to execute the delete statement, which was deleting 156112 rows out of 380600:

ij> select count(*) from listhit;
1
-----------
380600

ij> delete from listhit where listresultid = 11501573;
156112 rows inserted/updated/deleted

23 seconds

The definition of the table is as follows:

create table ListHit (
    LISTRESULTID int,
    INDX int,
    HIT blob,
    TAG varchar(250),
    ATTS blob );

CREATE INDEX listhit_listresultid ON listhit(listresultid);
CREATE INDEX listhit_listresultid_indx ON listhit(listresultid,indx);

And from ij:

ij> describe listhit;
COLUMN_NAME         |TYPE_NAME|DEC&|NUM&|COLUM&|COLUMN_DEF|CHAR_OCTE&|IS_NULL&
------------------------------------------------------------------------------
LISTRESULTID        |INTEGER  |0   |10  |10    |NULL      |NULL      |YES
INDX                |INTEGER  |0   |10  |10    |NULL      |NULL      |YES
HIT                 |BLOB     |NULL|NULL|21474&|NULL      |NULL      |YES
TAG                 |VARCHAR  |NULL|NULL|250   |NULL      |500       |YES
ATTS                |BLOB     |NULL|NULL|21474&|NULL      |NULL      |YES

5 rows selected

ij> show indexes;
TABLE_NAME   |COLUMN_NAME  | NON_U&|TYPE|ASC&|CARDINA&|PAGES
----------------------------------------------------------------------------------------
LISTHIT                |LISTRESULTID          |1     |3   |A   |NULL    |NULL
LISTHIT                |LISTRESULTID          |1     |3   |A   |NULL    |NULL
LISTHIT                |INDX                           |1     |3   |A   |NULL    |NULL


Any ideas?

Thanks for your help,

Monica

________________________________
Accelrys Limited (http://accelrys.com)
Registered office: 334 Cambridge Science Park, Cambridge, CB4 0WN, UK
Registered in England: 2326316

RE: Performance deleting lots of rows from a table

Posted by Monica Ferrero <MF...@accelrys.com>.
Hi Kristian,


For what I can see in the query plan the index is being used (query plan below). I don't know if there is any obvious information in the query plan that I'm missing. There is a number of other queries being run, mainly system ones, I guess they are not relevant.

The delete took 38 seconds from the application code for the query plan shown.


In normal conditions the derby log is very minimal, so I don't think that would have an influence? In any case at least the System Information does not show any spike or anything for I/O whilst the delete takes place.

Regarding having to update the statistics, I would expect this to happen automatically... If this is not the case so far, it is a bit of a nuisance as we don't really want DB specific code at the application level, as it is meant to work with different databases.
If it is critical and it is recommended, I guess we would need some "if derby regenerate statistics" after a delete, but would prefer to avoid cluttering the code. I'd be interested to know a bit more about this so we can make a decision.


Thanks a lot for your help. Very much appreciated,

Monica


2009-01-14 14:24:26.996 GMT Thread[DRDAConnThread_5,5,main] (XID = 374750), (SESSIONID = 3), delete from listhit where listResultId = 11567113 ******* Delete ResultSet using row locking:
deferred: false
Rows deleted = 136263
Indexes updated = 2
Execute Time = 0
        Project-Restrict ResultSet (2):
        Number of opens = 1
        Rows seen = 136263
        Rows filtered = 0
        restriction = false
        projection = true
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count:        30341.00
                optimizer estimated cost:         9606.47

        Source result set:
                Project-Restrict ResultSet (1):
                Number of opens = 1
                Rows seen = 136263
                Rows filtered = 0
                restriction = false
                projection = true
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        restriction time (milliseconds) = 0
                        projection time (milliseconds) = 0
                        optimizer estimated row count:        30341.00
                        optimizer estimated cost:         9606.47

                Source result set:
                        Index Scan ResultSet for LISTHIT using index LISTHIT_LISTRESULTID_INDX at read committed isolation level using exclusive row locking chosen by the optimizer
                        Number of opens = 1
                        Rows seen = 136263
                        Rows filtered = 0
                        Fetch Size = 1
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                next time in milliseconds/row = 0

                        scan information:
                                Bit set of columns fetched={0, 1, 2}
                                Number of columns fetched=3
                                Number of deleted rows visited=0
                                Number of pages visited=944
                                Number of rows qualified=136263
                                Number of rows visited=136264
                                Scan type=btree
                                Tree height=4
                                start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:

                                stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:

                                qualifiers:
None
                                optimizer estimated row count:        30341.00
                                optimizer estimated cost:         9606.47

-----Original Message-----
From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM]
Sent: 14 January 2009 12:08
To: Derby Discussion
Subject: Re: Performance deleting lots of rows from a table

Monica Ferrero wrote:
> Hi!
>
>
>
> I need to delete a considerable number of rows from a table, and it is
> taking too long.  The operation takes equally long from ij, so I think
> it is independent of my application code.
>
>
>
> So for instance in the case below, it took about 23 seconds to execute
> the delete statement, which was deleting 156112 rows out of 380600:
>

Hello Monica,

Just a general observation.
After you delete that many rows from a table, you should regenerate the
statistics. If the statistics are outdated, the optimizer might take bad
decisions regarding the query plan.

You can do this by compressing the table, or dropping and recreating the
index. Work is also going on to make the statistics update automatic.

Regarding your delete, you can check the query plan to see if it is
doing a table scan or using the index. You should also see how your IO
system is taking the load. For instance, you may see improved
performance by putting the log on a separate device.


Regards,
--
Kristian


[ snip ]

Accelrys Limited (http://accelrys.com)
Registered office: 334 Cambridge Science Park, Cambridge, CB4 0WN, UK
Registered in England: 2326316

RE: Performance deleting lots of rows from a table

Posted by Monica Ferrero <MF...@accelrys.com>.
Thanks Bryan.

I may try this, but I doubt it be any quicker as this table can get pretty big (millions of rows), and when a new set of rows gets inserted for a result I  know it takes several seconds (for like a 100.000 rows), so doing so for most of the contents of the table I think it would take far too long.

Thanks for the suggestion,

Monica

-----Original Message-----
From: Bryan Pendleton [mailto:bpendleton@amberpoint.com]
Sent: 14 January 2009 15:19
To: Derby Discussion
Subject: Re: Performance deleting lots of rows from a table

> I need to delete a considerable number of rows from a table, and it is
> taking too long.  The operation takes equally long from ij, so I think
> it is independent of my application code.

As an alternate technique, you could try instead copying the rows that
you want to keep to a new table, then dropping the old table and switching
to the new table:

  - create new table with different table name, but same column definitions
  - insert into new table select rows-to-keep from current table
  - drop current table
  - rename new table to have same name as current table

You might benchmark this technique, to see if it is any faster.

bryan



Accelrys Limited (http://accelrys.com)
Registered office: 334 Cambridge Science Park, Cambridge, CB4 0WN, UK
Registered in England: 2326316

Re: Performance deleting lots of rows from a table

Posted by Bryan Pendleton <bp...@amberpoint.com>.
> I need to delete a considerable number of rows from a table, and it is 
> taking too long.  The operation takes equally long from ij, so I think 
> it is independent of my application code.

As an alternate technique, you could try instead copying the rows that
you want to keep to a new table, then dropping the old table and switching
to the new table:

  - create new table with different table name, but same column definitions
  - insert into new table select rows-to-keep from current table
  - drop current table
  - rename new table to have same name as current table

You might benchmark this technique, to see if it is any faster.

bryan



Re: Performance deleting lots of rows from a table

Posted by Kristian Waagan <Kr...@Sun.COM>.
Monica Ferrero wrote:
> Hi!
> 
>  
> 
> I need to delete a considerable number of rows from a table, and it is 
> taking too long.  The operation takes equally long from ij, so I think 
> it is independent of my application code.
> 
>  
> 
> So for instance in the case below, it took about 23 seconds to execute 
> the delete statement, which was deleting 156112 rows out of 380600:
> 

Hello Monica,

Just a general observation.
After you delete that many rows from a table, you should regenerate the 
statistics. If the statistics are outdated, the optimizer might take bad 
decisions regarding the query plan.

You can do this by compressing the table, or dropping and recreating the 
index. Work is also going on to make the statistics update automatic.

Regarding your delete, you can check the query plan to see if it is 
doing a table scan or using the index. You should also see how your IO 
system is taking the load. For instance, you may see improved 
performance by putting the log on a separate device.


Regards,
-- 
Kristian


[ snip ]