You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Sundar Narayanaswamy <su...@yahoo.com> on 2011/10/28 06:26:54 UTC

SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Hi,
I am trying to use Derby database in the embedded mode in an application. I wrote a test program where I have a 

table (named LOCATION) with 4 columns as below:create table location(id int, num int, addr varchar(40), zip int, primary key(id, zip))
create index loc_index on location (num)

I insert 10000 rows into the table, then delete all that rows. I then call  SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE 

with just the option to purge rows so that the space left behind by deleted rowscan be reused for future inserts. I have 

the inserts, deletes and compress in a loop. After running through the loop a few times, I am noticing that the 

number of allocated and unfilled pages for the primary key keeps growing (causing database size to grow over time). The
longer I run the loop, larger these numbers are. That does not happen for the table or the index files though..


CONGLOMERATE            NumAllocatedPages      NumFreePages    NumUnFilledPages
LOCATION                        1                                  831                      0
SQL111027234806120       1342                            294                       594
LOC_INDEX                       1                                  521                      1    

The primary key space continues to grow even when I include the options to defragment_rows and truncate_end to the above function. 

CONGLOMERATE            NumAllocatedPages   NumFreePages    NumUnFilledPages
LOCATION                        1                                   0                           0
SQL111027233119770       1674                            47                         704
LOC_INDEX                       13                                357                        3

The SYSCS_UTIL.SYSCS_COMPRESS_TABLE function works well and leaves no free/unfilled pages as expected. However, I am concerned with the 
efficiency (time taken) of using the compress function since there could be 10s of millions of rows in a production setup. It seems that the
SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE function with just the purge_rows option would work best for my situation. Is 
this function not expected to free up deleted space in primary key files ? 

Thanks very much for you help,
Sundar.

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Sundar Narayanaswamy <su...@yahoo.com>.

Brett, 

Thanks for the suggestion. Based on prior discussions on this mailing list, I have considered this option of partitioning data chronologically, and just dropping (or truncating)
old tables instead using just one table. However, that is a significant change to the application that has been in use for over 10 years and that works well with most other databases (granted, that other databases like Oracle, MySQL, SQL Server etc are more heavy weight database servers). In any case, we have been experimenting with Derby during the past few months since it is Java based and has an embedded db option. Everything about Derby works fabulously well for our needs except the space reclamation part.. I am hoping we can resolve it in some manner even if it is not a perfect solution.


Thanks again for your thoughts,
Sundar




>________________________________
>From: "Bergquist, Brett" <BB...@canoga.com>
>To: Derby Discussion <de...@db.apache.org>
>Sent: Thursday, November 3, 2011 10:19 AM
>Subject: RE: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question
>
>Just a heads up, I have recently went through this exercise of having a table that is always being inserted at one end and deleted at the other.  Network test results were being inserted at a rate of about 4 million per day and the same amount from earlier days needed to be purged out.  I had major performance issues in trying to delete rows while inserting rows.  Turns out I could insert faster than delete.  And on top of that, the space was not being reused efficiently enough and I had no down time to run the compress procedures.
>
>What I ended up doing was to do data partitioning.  I ended up creating a table for each week of the year and used the Restrict VTI functions to build a "view" combining all of the tables using UNION back into one virtual table, and then using "truncate table" to purge a whole week of data in a few seconds.
>
>The Restricted VTI was used because of the ability to optimize the returned rows based on the incoming query constraints.  Not perfect as it is a poor man's partitioning but it does work.
>
>You might consider this route if you have no downtime to delete and compress.
>
>Just some thoughts
>
>Brett
>
>-----Original Message-----
>From: Sundar Narayanaswamy [mailto:sundar007@yahoo.com] 
>Sent: Saturday, October 29, 2011 1:02 AM
>To: derby-user@db.apache.org
>Subject: Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question
>
>
>I have posted the issue to DERBY-5487. I have also attached the Java test
>program. 
>
>The test rows do insert at one end of the primary key and delete the other
>end.. Interestingly, I noticed that primary key space is reclaimed if I
>reuse the primary keys across the insert-delete loops. But, my application
>requires me to use continuously increasing primary keys (not reuse them).
>
>
>Mike Matrigali wrote:
>> 
>> Posting your test to a JIRA issue would be best.  It would be 
>> interesting to post the space table results after each
>> insert/delete/compress iteration (or every 10, ...).
>> When do you commit (every row or every 10000)?  Is it multi-threaded? 
>> Does your
>> test always insert rows at one end of the index and delete them
>> from the other end.  If so it may be DERBY-5473 (a runtime issue,
>> not a compress table issue).
>> 
>> 
>
>-- 
>View this message in context: http://old.nabble.com/SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE-question-tp32736560p32742387.html
>Sent from the Apache Derby Users mailing list archive at Nabble.com.
>
>
>
>
>
>

RE: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by "Bergquist, Brett" <BB...@canoga.com>.

Just a heads up, I have recently went through this exercise of having a table that is always being inserted at one end and deleted at the other.  Network test results were being inserted at a rate of about 4 million per day and the same amount from earlier days needed to be purged out.  I had major performance issues in trying to delete rows while inserting rows.  Turns out I could insert faster than delete.  And on top of that, the space was not being reused efficiently enough and I had no down time to run the compress procedures.

What I ended up doing was to do data partitioning.  I ended up creating a table for each week of the year and used the Restrict VTI functions to build a "view" combining all of the tables using UNION back into one virtual table, and then using "truncate table" to purge a whole week of data in a few seconds.

The Restricted VTI was used because of the ability to optimize the returned rows based on the incoming query constraints.  Not perfect as it is a poor man's partitioning but it does work.

You might consider this route if you have no downtime to delete and compress.

Just some thoughts

Brett

-----Original Message-----
From: Sundar Narayanaswamy [mailto:sundar007@yahoo.com] 
Sent: Saturday, October 29, 2011 1:02 AM
To: derby-user@db.apache.org
Subject: Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

I have posted the issue to DERBY-5487. I have also attached the Java test
program. 

The test rows do insert at one end of the primary key and delete the other
end.. Interestingly, I noticed that primary key space is reclaimed if I
reuse the primary keys across the insert-delete loops. But, my application
requires me to use continuously increasing primary keys (not reuse them).

Mike Matrigali wrote:
> 
> Posting your test to a JIRA issue would be best.  It would be 
> interesting to post the space table results after each
> insert/delete/compress iteration (or every 10, ...).
> When do you commit (every row or every 10000)?  Is it multi-threaded? 
> Does your
> test always insert rows at one end of the index and delete them
> from the other end.  If so it may be DERBY-5473 (a runtime issue,
> not a compress table issue).
> 
> 

-- 
View this message in context: http://old.nabble.com/SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE-question-tp32736560p32742387.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Mike Matrigali <mi...@sbcglobal.net>.

David Zanter wrote:
> The worst case for Derby would be a data distribution of an index which
> 
>     resulted in one row on each leaf.
> 
> 
> By this are you meaning that any "CREATE UNIQUE INDEX"  would be worse 
> case scenario?
> 
no, what I am referring to is the data distribution in the index of 
inserts and deletes.  It depends on a number of factors, but lets assume
1000 keys fit per leaf.

A worst case application would be an application that inserted keys 1 
through 1000, then 1001 through 2000, and then deleted keys 2 through 
1000, and keys 1002 through 2000.  And then continued this process 
forever.  This would leave leafs with a single key and until manual
offline compress was called we would never use the space in those leafs
again.

Because the app never again inserts in the range of the old leafs then
the split space reclamation code never runs.  And because the leafs 
never get empty the merge space reclamation code never gets called.

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by David Zanter <dz...@gmail.com>.

The worst case for Derby would be a data distribution of an index which

> resulted in one row on each leaf.
>

By this are you meaning that any "CREATE UNIQUE INDEX"  would be worse case
scenario?

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Sundar Narayanaswamy <su...@yahoo.com>.

Mike, Thanks for looking into the issue. I think it would work if inplace
compress got table level locks during purge. Would the improvement you are
considering help when Derby tries to reclaim space outside of the inplace
compress procedure as well ? That would be great ! I have read from the
Derby docs that Derby tries to reuse space from deleted rows for future
inserts (even without inplace compress), but, I found that wasn't happening
in my case. I was hence trying to use inplace compress with purge_rows
option. 

Please let me know if/how I can help further.

Mike Matrigali wrote:
> 
> I am thinking about this issue, thank you for reporting.
> 
> Your application is the 2nd worst case for Derby.  I don't know if we 
> can fix at row level, but there may be some extra work we can do to try
> and get table level locks and do page merging more often and/or in
> inplace compress.  For your specific application would it work if 
> inplace compress got table level locks during the purge phase?
> 

-- 
View this message in context: http://old.nabble.com/SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE-question-tp32736560p32757763.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Mike Matrigali <mi...@sbcglobal.net>.

I am thinking about this issue, thank you for reporting.

Sundar Narayanaswamy wrote:
> I have posted the issue to DERBY-5487. I have also attached the Java test
> program. 
> 
> The test rows do insert at one end of the primary key and delete the other
> end.. Interestingly, I noticed that primary key space is reclaimed if I
> reuse the primary keys across the insert-delete loops. But, my application
This is not surprising, it confirms that in general the reclaim space at
split time works well for uniform type index distributions.

Your application is the 2nd worst case for Derby.  I don't know if we 
can fix at row level, but there may be some extra work we can do to try
and get table level locks and do page merging more often and/or in
inplace compress.  For your specific application would it work if 
inplace compress got table level locks during the purge phase?

The worst case for Derby would be a data distribution of an index which
resulted in one row on each leaf.  There is not support for merging 
non-empty leaf pages other than full offline compress.  Anyone know if 
this case is handled in other databases?

> requires me to use continuously increasing primary keys (not reuse them).
> 
> 
> Mike Matrigali wrote:
>> Posting your test to a JIRA issue would be best.  It would be 
>> interesting to post the space table results after each
>> insert/delete/compress iteration (or every 10, ...).
>> When do you commit (every row or every 10000)?  Is it multi-threaded? 
>> Does your
>> test always insert rows at one end of the index and delete them
>> from the other end.  If so it may be DERBY-5473 (a runtime issue,
>> not a compress table issue).
>>
>>
>

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Sundar Narayanaswamy <su...@yahoo.com>.

I have posted the issue to DERBY-5487. I have also attached the Java test
program. 

The test rows do insert at one end of the primary key and delete the other
end.. Interestingly, I noticed that primary key space is reclaimed if I
reuse the primary keys across the insert-delete loops. But, my application
requires me to use continuously increasing primary keys (not reuse them).

Mike Matrigali wrote:
> 
> Posting your test to a JIRA issue would be best.  It would be 
> interesting to post the space table results after each
> insert/delete/compress iteration (or every 10, ...).
> When do you commit (every row or every 10000)?  Is it multi-threaded? 
> Does your
> test always insert rows at one end of the index and delete them
> from the other end.  If so it may be DERBY-5473 (a runtime issue,
> not a compress table issue).
> 
> 

-- 
View this message in context: http://old.nabble.com/SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE-question-tp32736560p32742387.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Posting your test to a JIRA issue would be best.  It would be 
interesting to post the space table results after each
insert/delete/compress iteration (or every 10, ...).
When do you commit (every row or every 10000)?  Is it multi-threaded? 
Does your
test always insert rows at one end of the index and delete them
from the other end.  If so it may be DERBY-5473 (a runtime issue,
not a compress table issue).

inplace compress as currently inplemented does not do much for
indexes.  Originally the problem it was solving was row level
space reclamation of base tables, indexes were not an issue.  Lately
multiple reports of index space issues have been coming up so more
work would be profitable in this area.
Currently it's only use is to give back space to the OS if possible,
and that depends on the highest page number of a used page.

Here is a discussion of the 3 phases:

1) it does not do purging of index rows.
    The original reason for this is that there is no space savings
    from purging
    alone in indexes.  "half filled" pages can't be used as they are in
    base tables.  Each leaf page "owns" a range of keys and when a key
    is inserted in this range and no space is found on the page, it
    automatically at that point purges the deleted rows before splitting.

    Given DERBY-5473 it may make sense to implement purging and merging
    to handle cases where we "miss" reclaiming fully empty pages.

2) it does not do defragment of indexes.

    Especially without a table level lock this is very complicated.
    Moving rows is not possible as they only can go where they are.  It
    could be possible to move pages but then all pointers would also
    have to fixed up.  Currently only code exists to do splits under
    row level locking, this operation would be much more complicated
    than a split.  The process is even more complicated in that one
    can not look at a page in an index and get all necessary pointer
    that need to be updated from that page (there are no parent pointers
    on children).

3) it does try to truncate the file of an index if possible, but it does
    no work to manipulate the pages such that the last page used in the
    file is moved to earlier in the file.



Sundar Narayanaswamy wrote:
> Hi,
> I am trying to use Derby database in the embedded mode in an 
> application. I wrote a test program where I have a
> table (named LOCATION) with 4 columns as below:
> create table location(id int, num int, addr varchar(40), zip int, 
> primary key(id, zip))
> create index loc_index on location (num)
> 
> I insert 10000 rows into the table, then delete all that rows. I then 
> call  SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE
> with just the option to purge rows so that the space left behind by 
> deleted rows can be reused for future inserts. I have
> the inserts, deletes and compress in a loop. After running through the 
> loop a few times, I am noticing that the
> number of allocated and unfilled pages for the primary key keeps growing 
> (causing database size to grow over time). The
> longer I run the loop, larger these numbers are. That does not happen 
> for the table or the index files though..
> 
> CONGLOMERATE            NumAllocatedPages      NumFreePages    
> NumUnFilledPages
> LOCATION                        1                                   
> 831                      0
> SQL111027234806120       1342                            294    
>                    594
> LOC_INDEX                       1                                  
> 521                      1   
> 
> The primary key space continues to grow even when I include the options 
> to defragment_rows and truncate_end to the above function.
> 
> CONGLOMERATE            NumAllocatedPages   NumFreePages    NumUnFilledPages
> LOCATION                        1                                   0    
>                        0
> SQL111027233119770       1674                            47    
>                      704
> LOC_INDEX                       13                                357    
>                     3
> 
> The SYSCS_UTIL.SYSCS_COMPRESS_TABLE function works well and leaves no 
> free/unfilled pages as expected. However, I am concerned with the
> efficiency (time taken) of using the compress function since there could 
> be 10s of millions of rows in a production setup. It seems that the
> SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE function with just the 
> purge_rows option would work best for my situation. Is
> this function not expected to free up deleted space in primary key files ?
> 
> Thanks very much for you help,
> Sundar.
>

Re: SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE question

Posted by Bryan Pendleton <bp...@gmail.com>.

On 10/27/2011 09:26 PM, Sundar Narayanaswamy wrote:
> I insert 10000 rows into the table, then delete all that rows. I then call SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE
> with just the option to purge rows so that the space left behind by deleted rows can be reused for future inserts. I have
> the inserts, deletes and compress in a loop. After running through the loop a few times, I am noticing that the
> number of allocated and unfilled pages for the primary key keeps growing (causing database size to grow over time). The
> longer I run the loop, larger these numbers are. That does not happen for the table or the index files though..

This seems like incorrect behavior to me; I think all the space should
be reclaimed. I think you should open a job in JIRA and provide your
test program so the developers can have a look at the behavior and try
to understand what is doing on.

http://db.apache.org/derby/DerbyBugGuidelines.html

thanks,

bryan