You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Sten Nordström <st...@navielektro.fi> on 2005/05/31 11:12:32 UTC

Behaviour of SYSCS_COMPRESS_TABLE

We are usimg the embedded variant of derby to handle various persistence 
requirements. This means that we do a lot of insert and delete operations, 
quickly growing the size of the on-disk files. I have been looking into using 
the SYSCS_COMPRESS_TABLE procedure in order to compress the database, since the 
applications can run for months, we want to avoid filling up the disk with data 
that is not needed anymore.

Have I understood correctly, that while running SYSCS_COMPRESS_TABLE no other 
operations can be done on the database? 

Are there any numbers on how database size and number of active rows versus 
deleted ones, influence the time it takes to run the compress operation?


Best Regards,

-- sten

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

non-leaf pages are also reclaimed when they are empty as part of merge
process when last leaf page pointed at is reclaimed.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> Derby will also reclaim space automatically in btree indexes for this
>     MM> case as long as all the old rows on a given leaf are deleted and
>     MM> the background process gets a chance to get a table level lock
>     MM> at some point - the current algorithm requires a table level lock
>     MM> to do the index management to delete an entire leaf from the tree.
> 
> What about non-leaf pages of the B-Tree?
>

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> Derby will also reclaim space automatically in btree indexes for this
    MM> case as long as all the old rows on a given leaf are deleted and
    MM> the background process gets a chance to get a table level lock
    MM> at some point - the current algorithm requires a table level lock
    MM> to do the index management to delete an entire leaf from the tree.

What about non-leaf pages of the B-Tree?

-- 
Øystein

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by The Wogster <wo...@yahoo.ca>.

Mike Matrigali wrote:
> Derby will also reclaim space automatically in btree indexes for this
> case as long as all the old rows on a given leaf are deleted and
> the background process gets a chance to get a table level lock
> at some point - the current algorithm requires a table level lock
> to do the index management to delete an entire leaf from the tree.
> 
> If you only deleted every other key in this scenario then we would
> not reclaim space.
> 
> In the case of a even distribution of keys and inserts, then we
> try automatic space reclamation any time an insert is going to
> cause a split and the destination page has any deleted row.
> 
> The hope is that most applications need never call the compress
> table functions, and I hope in the future to implement some more
> automatic space reclamation techniques based on the work I have
> been putting in this release to reclaim space in place.
> 
> Currently the application we don't handle well is one with does
> a lot of deletes and never does another insert.  In that case
> the space sits there ready to use for future inserts, but it is
> not returned to the OS automatically.

This would be quite rare, typically business databases collect inserts, 
for example an order entry system, so you do say 50 inserts a day, then 
it does a bunch of deletes (say all records over 1 year old), then 
starts collecting again.

An application has 2 possible actions here, it could simply reuse the 
space as it goes or an application might, backup the database, do the 
deletes, do a compress, returning the space to the OS, then continue on.

For a client/server database it's probably better to do the first, and 
let the DBA deal with maintenance.  For stand alone databases, it's 
probably better to take the second option.

W

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Derby will also reclaim space automatically in btree indexes for this
case as long as all the old rows on a given leaf are deleted and
the background process gets a chance to get a table level lock
at some point - the current algorithm requires a table level lock
to do the index management to delete an entire leaf from the tree.

If you only deleted every other key in this scenario then we would
not reclaim space.

In the case of a even distribution of keys and inserts, then we
try automatic space reclamation any time an insert is going to
cause a split and the destination page has any deleted row.

The hope is that most applications need never call the compress
table functions, and I hope in the future to implement some more
automatic space reclamation techniques based on the work I have
been putting in this release to reclaim space in place.

Currently the application we don't handle well is one with does
a lot of deletes and never does another insert.  In that case
the space sits there ready to use for future inserts, but it is
not returned to the OS automatically.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> Note that cloudscape automatically reuses space from deleted rows when
>     MM> new rows are inserted into the table.  The main problem
>     MM> SYSCS_COMPRESS_TABLE is solving is if there are a number of deletes
>     MM> which will not be followed by a number of inserts.  The reuse of space
>     MM> is not as efficient as the compress table at it squeezes every last bit
>     MM> of free space out, and returns that space to the OS.
> 
> Is this also true for B-tree indexes?  I would imagine that if you
> have a index on a monotocally increasing key (e.g., a timestamp) and
> where you regularly delete old records, there may be a lot of empty
> B-tree pages that will never be possible to reuse.
>

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by The Wogster <wo...@yahoo.ca>.

Mike Matrigali wrote:
> The split description below describes Derby behavior.
> Derby does not merge "almost" empty pages - only empty pages.
> Derby maintains a separate file per table and a separate file
> per index.  Derby maintains a free page pool per file and only
> can use the free pages in the same file to fill space requests.
> 

I was talking in generalities, each specific database works differently, 
   some store the entire database in a single file, like Firebird does 
others store each table in a file, like PostgreSQL does, and others 
store each item in a separate file, as it appears Derby does.

Personally I don't think it really matters, if your compressing a 
database to save space, then there is a bigger issue, like too small a 
hard drive.  Back in the days when $10,000 would buy you 10MB of storage
(roughly 1,000,000 per GB) saving space was a big deal, now, when $100 
buys you 100GB ($1 per GB) it's often a false economy to waste the time 
fussing with it, just to save disk space, especially when the guy doing 
the fussing is worth $40/hour.  Compression should be used because it's 
good for the database, not to save disk space.

It also ends up being a false economy, unless the database is static, 
because your just going to start adding to it again, and extending a 
file, typically means a kernel request, and that is going to be much 
slower then simply adding a page in the file already.

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

The split description below describes Derby behavior.
Derby does not merge "almost" empty pages - only empty pages.
Derby maintains a separate file per table and a separate file
per index.  Derby maintains a free page pool per file and only
can use the free pages in the same file to fill space requests.



The Wogster wrote:
> Øystein Grøvlen wrote:
> 
>>>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>>
>>
>>
>>     MM> Note that cloudscape automatically reuses space from deleted 
>> rows when
>>     MM> new rows are inserted into the table.  The main problem
>>     MM> SYSCS_COMPRESS_TABLE is solving is if there are a number of 
>> deletes
>>     MM> which will not be followed by a number of inserts.  The reuse 
>> of space
>>     MM> is not as efficient as the compress table at it squeezes every 
>> last bit
>>     MM> of free space out, and returns that space to the OS.
>>
>> Is this also true for B-tree indexes?  I would imagine that if you
>> have a index on a monotocally increasing key (e.g., a timestamp) and
>> where you regularly delete old records, there may be a lot of empty
>> B-tree pages that will never be possible to reuse.
>>
> 
> What happens in most databases. is that the database has a fixed page 
> size, say 8K, when an index page is full, it splits that page into 2 
> half pages.  When an index page is empty it's dropped from the index, 
> and added to the empty page pool.  Many will merge almost empty 
> neighbouring pages, but that doesn't matter for this discussion.
> 
> When a page is added, either an index page or a data page, and the EPP 
> is empty, then the database adds a number of empty pages, and those are 
> added to the empty page pool.  Now say you have a database and add 5 000 
> 000 entries, and get 5 entries per page, you now have ~1 000 000 pages. 
>  Now you delete 4 999 999 records, you end up with 1 data page, one 
> index page, and probably 1 header page, total of say 3 pages.  However 
> there are now 4 999 996 pages in the empty page pool (typically a table 
> itself).
> 
> When you compress the database it knows it only needs 3 pages, so it 
> builds a new file of only 3 pages.  Most databases create a number of 
> empty pages in addition to the 3 needed anyway.
> 
> 
> W
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

The only way to give the space from empty index pages back to the OS is
through manual compression.  Algorithms exist to reuse the empty index
and base table pages within the same table or index.

Øystein Grøvlen wrote:
>>>>>>"TW" == The Wogster <wo...@yahoo.ca> writes:
> 
> 
>     TW> Øystein Grøvlen wrote:
> 
>     >> Is this also true for B-tree indexes?  I would imagine that if you
>     >> have a index on a monotocally increasing key (e.g., a timestamp) and
>     >> where you regularly delete old records, there may be a lot of empty
>     >> B-tree pages that will never be possible to reuse.
>     >> 
> 
> 
>     TW> What happens in most databases. is  that the database has a fixed page
>     TW> size, say 8K, when  an index page is full, it splits  that page into 2
>     TW> half pages. When an index page  is empty it's dropped from the index,
>     TW> and  added to  the  empty page  pool.  Many will  merge almost  empty
>     TW> neighbouring pages, but that doesn't matter for this discussion.
> 
> I know this.  The reason I asked was because I have got the impression
> that in Derby the only way to drop empty index pages is to do
> compression.
>

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Wogster makes a good point, SYSCS_COMPRESS_TABLE may help get data
in tables to be more contiguous.

The Wogster wrote:
> Øystein Grøvlen wrote:
> 
>>>>>>> "TW" == The Wogster <wo...@yahoo.ca> writes:
>>
>>
>>
>>     TW> Øystein Grøvlen wrote:
>>
>>     >> Is this also true for B-tree indexes?  I would imagine that if you
>>     >> have a index on a monotocally increasing key (e.g., a 
>> timestamp) and
>>     >> where you regularly delete old records, there may be a lot of 
>> empty
>>     >> B-tree pages that will never be possible to reuse.
>>     >>
>>
>>     TW> What happens in most databases. is  that the database has a 
>> fixed page
>>     TW> size, say 8K, when  an index page is full, it splits  that 
>> page into 2
>>     TW> half pages. When an index page  is empty it's dropped from the 
>> index,
>>     TW> and  added to  the  empty page  pool.  Many will  merge 
>> almost  empty
>>     TW> neighbouring pages, but that doesn't matter for this discussion.
>>
>> I know this.  The reason I asked was because I have got the impression
>> that in Derby the only way to drop empty index pages is to do
>> compression.
>>
> 
> Derby should work in a similar way to other databases, the techniques 
> for developing a database were established years ago.  When it comes to 
> indexes there are 3 technologies:
> 
> ISAM: The index is loaded into an array in memory, it adds and deletes 
> index members through moving pointers around, then dumps the index back 
> to disk when the database is closed:  Dbase worked this way at one time.
> 
> B-Tree, traditional B-tree isn't used much anymore, because it's easy to 
> get an off-balance tree, which ends up very inefficient.
> 
> Balanced B-Tree, most databases use this one, logic in the indexing code 
>  is meant to keep shifting the tree around so that it stays in balance. 
>  Requires that the indexes at least be page based, so most page based 
> databases use this one.  Most efficient in reads, can be slower at adds 
> and deletes because of the shifting around, but for most databases there 
> are 100 reads for every add or delete.
> 
> For dropping index pages, look at the source code and see what it does 
> on a delete, if it releases the page to the empty page pool (more like a 
> cache actually), then it doesn't matter whether you compress or not.
> 
> One thing you need to remember, to shorten (if possible) or lengthen a 
> file are expensive operations, this is why most databases add a bunch of 
> pages to the empty page pool, rather then add a single page at a time. 
> They also reuse empty pages within the file, because to reuse an 
> existing page means changing a pointer, adding to or deleting from the 
> file means kernel operations, which are much slower.
> 
> Compression can be good though, as a database ages, an index can have 
> the first page on page 25, the next on page 234 535 the next on page 43,
> and these pages can be all over the disk volume as well, so your moving 
> the heads all over the place to find where these pages are. Compress a 
> database, follow this by a disk defrag can make the database faster.
> 
> W
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by The Wogster <wo...@yahoo.ca>.

Øystein Grøvlen wrote:
>>>>>>"TW" == The Wogster <wo...@yahoo.ca> writes:
> 
> 
>     TW> Øystein Grøvlen wrote:
> 
>     >> Is this also true for B-tree indexes?  I would imagine that if you
>     >> have a index on a monotocally increasing key (e.g., a timestamp) and
>     >> where you regularly delete old records, there may be a lot of empty
>     >> B-tree pages that will never be possible to reuse.
>     >> 
> 
> 
>     TW> What happens in most databases. is  that the database has a fixed page
>     TW> size, say 8K, when  an index page is full, it splits  that page into 2
>     TW> half pages. When an index page  is empty it's dropped from the index,
>     TW> and  added to  the  empty page  pool.  Many will  merge almost  empty
>     TW> neighbouring pages, but that doesn't matter for this discussion.
> 
> I know this.  The reason I asked was because I have got the impression
> that in Derby the only way to drop empty index pages is to do
> compression.
> 

Derby should work in a similar way to other databases, the techniques 
for developing a database were established years ago.  When it comes to 
indexes there are 3 technologies:

ISAM: The index is loaded into an array in memory, it adds and deletes 
index members through moving pointers around, then dumps the index back 
to disk when the database is closed:  Dbase worked this way at one time.

B-Tree, traditional B-tree isn't used much anymore, because it's easy to 
get an off-balance tree, which ends up very inefficient.

Balanced B-Tree, most databases use this one, logic in the indexing code 
  is meant to keep shifting the tree around so that it stays in balance. 
  Requires that the indexes at least be page based, so most page based 
databases use this one.  Most efficient in reads, can be slower at adds 
and deletes because of the shifting around, but for most databases there 
are 100 reads for every add or delete.

For dropping index pages, look at the source code and see what it does 
on a delete, if it releases the page to the empty page pool (more like a 
cache actually), then it doesn't matter whether you compress or not.

One thing you need to remember, to shorten (if possible) or lengthen a 
file are expensive operations, this is why most databases add a bunch of 
pages to the empty page pool, rather then add a single page at a time. 
They also reuse empty pages within the file, because to reuse an 
existing page means changing a pointer, adding to or deleting from the 
file means kernel operations, which are much slower.

Compression can be good though, as a database ages, an index can have 
the first page on page 25, the next on page 234 535 the next on page 43,
and these pages can be all over the disk volume as well, so your moving 
the heads all over the place to find where these pages are. Compress a 
database, follow this by a disk defrag can make the database faster.

W

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "TW" == The Wogster <wo...@yahoo.ca> writes:

    TW> Øystein Grøvlen wrote:

    >> Is this also true for B-tree indexes?  I would imagine that if you
    >> have a index on a monotocally increasing key (e.g., a timestamp) and
    >> where you regularly delete old records, there may be a lot of empty
    >> B-tree pages that will never be possible to reuse.
    >> 


    TW> What happens in most databases. is  that the database has a fixed page
    TW> size, say 8K, when  an index page is full, it splits  that page into 2
    TW> half pages. When an index page  is empty it's dropped from the index,
    TW> and  added to  the  empty page  pool.  Many will  merge almost  empty
    TW> neighbouring pages, but that doesn't matter for this discussion.

I know this.  The reason I asked was because I have got the impression
that in Derby the only way to drop empty index pages is to do
compression.

-- 
Øystein

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by The Wogster <wo...@yahoo.ca>.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> Note that cloudscape automatically reuses space from deleted rows when
>     MM> new rows are inserted into the table.  The main problem
>     MM> SYSCS_COMPRESS_TABLE is solving is if there are a number of deletes
>     MM> which will not be followed by a number of inserts.  The reuse of space
>     MM> is not as efficient as the compress table at it squeezes every last bit
>     MM> of free space out, and returns that space to the OS.
> 
> Is this also true for B-tree indexes?  I would imagine that if you
> have a index on a monotocally increasing key (e.g., a timestamp) and
> where you regularly delete old records, there may be a lot of empty
> B-tree pages that will never be possible to reuse.
> 

What happens in most databases. is that the database has a fixed page 
size, say 8K, when an index page is full, it splits that page into 2 
half pages.  When an index page is empty it's dropped from the index, 
and added to the empty page pool.  Many will merge almost empty 
neighbouring pages, but that doesn't matter for this discussion.

When a page is added, either an index page or a data page, and the EPP 
is empty, then the database adds a number of empty pages, and those are 
added to the empty page pool.  Now say you have a database and add 5 000 
000 entries, and get 5 entries per page, you now have ~1 000 000 pages. 
  Now you delete 4 999 999 records, you end up with 1 data page, one 
index page, and probably 1 header page, total of say 3 pages.  However 
there are now 4 999 996 pages in the empty page pool (typically a table 
itself).

When you compress the database it knows it only needs 3 pages, so it 
builds a new file of only 3 pages.  Most databases create a number of 
empty pages in addition to the 3 needed anyway.

W

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> Note that cloudscape automatically reuses space from deleted rows when
    MM> new rows are inserted into the table.  The main problem
    MM> SYSCS_COMPRESS_TABLE is solving is if there are a number of deletes
    MM> which will not be followed by a number of inserts.  The reuse of space
    MM> is not as efficient as the compress table at it squeezes every last bit
    MM> of free space out, and returns that space to the OS.

Is this also true for B-tree indexes?  I would imagine that if you
have a index on a monotocally increasing key (e.g., a timestamp) and
where you regularly delete old records, there may be a lot of empty
B-tree pages that will never be possible to reuse.

-- 
Øystein

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Sten Nordström <st...@navielektro.fi>.

Mike Matrigali <mi...@...> writes:
> 
> Note that cloudscape automatically reuses space from deleted rows when
> new rows are inserted into the table.  The main problem
> SYSCS_COMPRESS_TABLE is solving is if there are a number of deletes
> which will not be followed by a number of inserts.  The reuse of space
> is not as efficient as the compress table at it squeezes every last bit
> of free space out, and returns that space to the OS.

Hello!

Thank you for answering! This explains why my testcase did not grow the 
physical size of the db as fast as I expected, since I do a lot of deletes and 
insertions. We will probably manage by running compress less often than I 
expected/feared. I must say that I'm quite impressed with what I have seen so 
far.

regards,

-- sten

Re: Behaviour of SYSCS_COMPRESS_TABLE

Posted by Mike Matrigali <mi...@sbcglobal.net>.

While running SYSCS_COMPRESS_TABLE on a particular table, no operations
can be done on that particular table or it's indexes.  Operations on
other tables and indexes are not affected, so work can be done
concurrently by other threads on other tables.

Note that cloudscape automatically reuses space from deleted rows when
new rows are inserted into the table.  The main problem
SYSCS_COMPRESS_TABLE is solving is if there are a number of deletes
which will not be followed by a number of inserts.  The reuse of space
is not as efficient as the compress table at it squeezes every last bit
of free space out, and returns that space to the OS.

I don't have numbers, but the compress table operation is a very
expensive operation.  It basically scans every row in the table and
inserts every non-deleted row into a new table.  And then rebuilds all
existing indexes on the new table, finally followed by deleting the old
files associated with the old tables and indexes.

/mikem

Sten Nordström wrote:

> We are usimg the embedded variant of derby to handle various persistence 
> requirements. This means that we do a lot of insert and delete operations, 
> quickly growing the size of the on-disk files. I have been looking into using 
> the SYSCS_COMPRESS_TABLE procedure in order to compress the database, since the 
> applications can run for months, we want to avoid filling up the disk with data 
> that is not needed anymore.
> 
> Have I understood correctly, that while running SYSCS_COMPRESS_TABLE no other 
> operations can be done on the database? 
> 
> Are there any numbers on how database size and number of active rows versus 
> deleted ones, influence the time it takes to run the compress operation?
> 
> 
> Best Regards,
> 
> -- sten
> 
>