You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Volker Edelmann (JIRA)" <de...@db.apache.org> on 2005/08/16 14:50:55 UTC

[jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

DERBY-132 resolved ? Table not automatically compressed 
--------------------------------------------------------

         Key: DERBY-510
         URL: http://issues.apache.org/jira/browse/DERBY-510
     Project: Derby
        Type: Bug
    Versions: 10.1.1.0    
 Environment: JDK 1.4.2, JDK 1.5.0
Windows XP
    Reporter: Volker Edelmann


 I tried a test-program that repeatedly inserts a bunch of data into 1 table and repeatedly deletes a bunch of data.

    // table is not empty  when test-program starts
     derby.executeSelect("select count(*) c from rclvalues");

   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 2000000); // insert 2.000.000 rows
        derby.executeDelete("delete from rclvalues where MOD(id, 3) = 0");
   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
        derby.executeDelete("delete from rclvalues where MOD(id, 5) = 0");

     derby.executeSelect("select count(*) c from rclvalues");

At the end of the operation, the table contains approximately the same number of rows. But the size of the database has grown from
581 MB to 1.22 GB. From the description of item DERBY-132, I hoped that Derby does the compression now ( version 10.1.X.X.).  


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> Full compression of derby tables is not done automatically, I
    MM> am looking for input on how to schedule such an operation.  An
    MM> operation like this is going to have a large cpu, i/o, and
    MM> possible temporary disk space impact on the rest of the server.
    MM> As a zero admin db I think we should figure out some way to
    MM> do this automatically, but I think there are a number of
    MM> applications which would not be happy with such a performance
    MM> impact not under their control.

Ideally, one should be able to do such maintenance tasks
incrementally.  That way, one could interrupt a table compressions
when the user load is high and resume at a later time.  One could use
a low-priority thread for such tasks.

    MM> My initial thoughts are to pick a default time frame, say
    MM> once every 30 days to check for table level events like
    MM> compression and statistics generation and then execute the operations
    MM> at low priority.  Also add some sort of parameter so that
    MM> applications could disable the automatic background jobs.

Once every thirty days seems very seldom to me.

    MM> Note that derby does automatically reclaim space from deletes
    MM> for subsequent inserts, but the granularity currently is at
    MM> a page level.  So deleting every 3rd or 5th row is the worst
    MM> case behavior.  The page level decision was a tradeoff as
    MM> reclaiming the space is time consuming so did not want to
    MM> schedule to work on a row by row basis.  Currently we schedule
    MM> the work when all the rows on a page are marked deleted.

That an operation is more time consuming would not matter if it could
be done at times when the system is idle.  (Assuming that it is not so
time consuming that it is not possible to achieve sufficient
throughput. E.g., space is freed quciker than one is able to reclaim
it.)

-- 
Øystein


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Mike Matrigali <mi...@sbcglobal.net>.
thanks, this is useful.

It turns out that the compressions metric is not too hard, as the
internal space vti already basically does the work - at least at
a page level granularity.  I think there may be some use for an
admin thread to maintain info about this compression metric over time
and then use that info to decide to do the compression.  For instance
if a table total size is staying constant but the free space is
growing and shrinking over time then there is no need to compress -
similarly if the table is growing and free space is staying constant
again no need to compress.  Another issue is picking
 a good default for how much free space is enough: any space, 10%, 20%,
1 meg, 10 meg, ...

As you point out, scheduling a compression should be based on whether
it will compress.  The other problem that is similar is that currently
there is not way to update the cardinality statistic.  This statistic
is currently updated only when an index is created explictly or, or
when an index is updated internally as part of a compress table.
Unfortunately a number of applications tend to create empty tables and
indexes and then load the data, so never get this statistic correct.

Keeping with the zero admin goal it would be better to figure out a way
to automatically update this, rather than require an explicit call from
the user.   Currently the code to generate this statistic requires
a scan of the entire index and a compare of every row to the next one -
producing a single statistic for every leading set of columns in an
index.  It is used basically to determine the average number of keys
per value for a given value in an index.  Note that other histogram
type information used by other db's are gathered straight from the
btree, and thus don't require any type of statistic maintenance.

Any good ideas on how to tell when we should update that statistic,
some options include:
    o when the table has grown by X%
    o time based
    o number of rows have changed by X%
    o some sort of sample scheme compared with stored result
    o default for small number of rows, and once when reaches N rows.



Rick Hillegas wrote:
> Continuing to maunder, let me fine-tune this a bit:
> 
> 1) Imagine that, on an ongoing basis we maintain some CompressionMetric,
> which measures whether a given table needs compression/reoptimization.
> Dead space might be part of this metric or not. Time since last
> compression could be part of the metric. The metric could be as crude or
> fancy as we like.
> 
> 2) At some point, based on its CompressionMetric, a table Qualifies for
> compression/reoptimization.
> 
> 3) At some fairly fine-grained interval, a low priority thread wakes up,
> looks for Qualifying tables, and compresses/reoptimizes them. By
> default, this thread runs in a 0-administration database, but we expose
> a knob for scheduling/disabling the thread.
> 
> Your original proposal is a degenerate case of this approach and maybe
> it's the first solution we implement. However, we can get fancier as we
> need to support bigger datasets.
> 
> Cheers,
> -Rick
> 
> Rick Hillegas wrote:
> 
>> Hi Mike,
>>
>> I like your suggestions that a low priority thread should perform the
>> compressions and that we should expose a knob for disabling this
>> thread. Here are some further suggestions:
>>
>> Compressing all the tables and recalculating all the statistics once a
>> month could cause quite a hiccup for a large database. Maybe we could
>> do something finer grained. For instance, we could try to make it easy
>> to ask some question like "Is more than 20% of this table's space
>> dead?" No doubt there are some tricky issues in maintaining a
>> per-table dead-space counter and in keeping that counter from being a
>> sync point during writes. However, if we could answer a question like
>> that, then we could pay the compression/reoptimization penalty as we
>> go rather than incurring a heavy, monthly lump-sum tax.
>>
>> Cheers,
>> -Rick
>>
>> Mike Matrigali wrote:
>>
>>> Full compression of derby tables is not done automatically, I
>>> am looking for input on how to schedule such an operation.  An
>>> operation like this is going to have a large cpu, i/o, and
>>> possible temporary disk space impact on the rest of the server.
>>> As a zero admin db I think we should figure out some way to
>>> do this automatically, but I think there are a number of
>>> applications which would not be happy with such a performance
>>> impact not under their control.
>>>
>>> My initial thoughts are to pick a default time frame, say
>>> once every 30 days to check for table level events like
>>> compression and statistics generation and then execute the operations
>>> at low priority.  Also add some sort of parameter so that
>>> applications could disable the automatic background jobs.
>>>
>>> Note that derby does automatically reclaim space from deletes
>>> for subsequent inserts, but the granularity currently is at
>>> a page level.  So deleting every 3rd or 5th row is the worst
>>> case behavior.  The page level decision was a tradeoff as
>>> reclaiming the space is time consuming so did not want to
>>> schedule to work on a row by row basis.  Currently we schedule
>>> the work when all the rows on a page are marked deleted.
>>>
>>> Volker Edelmann (JIRA) wrote:
>>>
>>>  
>>>
>>>> DERBY-132 resolved ? Table not automatically compressed
>>>> --------------------------------------------------------
>>>>
>>>>         Key: DERBY-510
>>>>         URL: http://issues.apache.org/jira/browse/DERBY-510
>>>>     Project: Derby
>>>>        Type: Bug
>>>>    Versions: 10.1.1.0    Environment: JDK 1.4.2, JDK 1.5.0
>>>> Windows XP
>>>>    Reporter: Volker Edelmann
>>>>
>>>>
>>>> I tried a test-program that repeatedly inserts a bunch of data into
>>>> 1 table and repeatedly deletes a bunch of data.
>>>>
>>>>    // table is not empty  when test-program starts
>>>>     derby.executeSelect("select count(*) c from rclvalues");
>>>>
>>>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(),
>>>> 2000000); // insert 2.000.000 rows
>>>>        derby.executeDelete("delete from rclvalues where MOD(id, 3) =
>>>> 0");
>>>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
>>>>        derby.executeDelete("delete from rclvalues where MOD(id, 5) =
>>>> 0");
>>>>
>>>>     derby.executeSelect("select count(*) c from rclvalues");
>>>>
>>>> At the end of the operation, the table contains approximately the
>>>> same number of rows. But the size of the database has grown from
>>>> 581 MB to 1.22 GB. From the description of item DERBY-132, I hoped
>>>> that Derby does the compression now ( version 10.1.X.X.).
>>>>   
>>>
>>>
>>
> 
> 

Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Rick Hillegas <Ri...@Sun.COM>.
Continuing to maunder, let me fine-tune this a bit:

1) Imagine that, on an ongoing basis we maintain some CompressionMetric, 
which measures whether a given table needs compression/reoptimization. 
Dead space might be part of this metric or not. Time since last 
compression could be part of the metric. The metric could be as crude or 
fancy as we like.

2) At some point, based on its CompressionMetric, a table Qualifies for 
compression/reoptimization.

3) At some fairly fine-grained interval, a low priority thread wakes up, 
looks for Qualifying tables, and compresses/reoptimizes them. By 
default, this thread runs in a 0-administration database, but we expose 
a knob for scheduling/disabling the thread.

Your original proposal is a degenerate case of this approach and maybe 
it's the first solution we implement. However, we can get fancier as we 
need to support bigger datasets.

Cheers,
-Rick

Rick Hillegas wrote:

> Hi Mike,
>
> I like your suggestions that a low priority thread should perform the 
> compressions and that we should expose a knob for disabling this 
> thread. Here are some further suggestions:
>
> Compressing all the tables and recalculating all the statistics once a 
> month could cause quite a hiccup for a large database. Maybe we could 
> do something finer grained. For instance, we could try to make it easy 
> to ask some question like "Is more than 20% of this table's space 
> dead?" No doubt there are some tricky issues in maintaining a 
> per-table dead-space counter and in keeping that counter from being a 
> sync point during writes. However, if we could answer a question like 
> that, then we could pay the compression/reoptimization penalty as we 
> go rather than incurring a heavy, monthly lump-sum tax.
>
> Cheers,
> -Rick
>
> Mike Matrigali wrote:
>
>> Full compression of derby tables is not done automatically, I
>> am looking for input on how to schedule such an operation.  An
>> operation like this is going to have a large cpu, i/o, and
>> possible temporary disk space impact on the rest of the server.
>> As a zero admin db I think we should figure out some way to
>> do this automatically, but I think there are a number of
>> applications which would not be happy with such a performance
>> impact not under their control.
>>
>> My initial thoughts are to pick a default time frame, say
>> once every 30 days to check for table level events like
>> compression and statistics generation and then execute the operations
>> at low priority.  Also add some sort of parameter so that
>> applications could disable the automatic background jobs.
>>
>> Note that derby does automatically reclaim space from deletes
>> for subsequent inserts, but the granularity currently is at
>> a page level.  So deleting every 3rd or 5th row is the worst
>> case behavior.  The page level decision was a tradeoff as
>> reclaiming the space is time consuming so did not want to
>> schedule to work on a row by row basis.  Currently we schedule
>> the work when all the rows on a page are marked deleted.
>>
>> Volker Edelmann (JIRA) wrote:
>>
>>  
>>
>>> DERBY-132 resolved ? Table not automatically compressed 
>>> --------------------------------------------------------
>>>
>>>         Key: DERBY-510
>>>         URL: http://issues.apache.org/jira/browse/DERBY-510
>>>     Project: Derby
>>>        Type: Bug
>>>    Versions: 10.1.1.0    Environment: JDK 1.4.2, JDK 1.5.0
>>> Windows XP
>>>    Reporter: Volker Edelmann
>>>
>>>
>>> I tried a test-program that repeatedly inserts a bunch of data into 
>>> 1 table and repeatedly deletes a bunch of data.
>>>
>>>    // table is not empty  when test-program starts
>>>     derby.executeSelect("select count(*) c from rclvalues");
>>>
>>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 
>>> 2000000); // insert 2.000.000 rows
>>>        derby.executeDelete("delete from rclvalues where MOD(id, 3) = 
>>> 0");
>>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
>>>        derby.executeDelete("delete from rclvalues where MOD(id, 5) = 
>>> 0");
>>>
>>>     derby.executeSelect("select count(*) c from rclvalues");
>>>
>>> At the end of the operation, the table contains approximately the 
>>> same number of rows. But the size of the database has grown from
>>> 581 MB to 1.22 GB. From the description of item DERBY-132, I hoped 
>>> that Derby does the compression now ( version 10.1.X.X.). 
>>>
>>>   
>>
>


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Mike,

I like your suggestions that a low priority thread should perform the 
compressions and that we should expose a knob for disabling this thread. 
Here are some further suggestions:

Compressing all the tables and recalculating all the statistics once a 
month could cause quite a hiccup for a large database. Maybe we could do 
something finer grained. For instance, we could try to make it easy to 
ask some question like "Is more than 20% of this table's space dead?" No 
doubt there are some tricky issues in maintaining a per-table dead-space 
counter and in keeping that counter from being a sync point during 
writes. However, if we could answer a question like that, then we could 
pay the compression/reoptimization penalty as we go rather than 
incurring a heavy, monthly lump-sum tax.

Cheers,
-Rick

Mike Matrigali wrote:

>Full compression of derby tables is not done automatically, I
>am looking for input on how to schedule such an operation.  An
>operation like this is going to have a large cpu, i/o, and
>possible temporary disk space impact on the rest of the server.
>As a zero admin db I think we should figure out some way to
>do this automatically, but I think there are a number of
>applications which would not be happy with such a performance
>impact not under their control.
>
>My initial thoughts are to pick a default time frame, say
>once every 30 days to check for table level events like
>compression and statistics generation and then execute the operations
>at low priority.  Also add some sort of parameter so that
>applications could disable the automatic background jobs.
>
>Note that derby does automatically reclaim space from deletes
>for subsequent inserts, but the granularity currently is at
>a page level.  So deleting every 3rd or 5th row is the worst
>case behavior.  The page level decision was a tradeoff as
>reclaiming the space is time consuming so did not want to
>schedule to work on a row by row basis.  Currently we schedule
>the work when all the rows on a page are marked deleted.
>
>Volker Edelmann (JIRA) wrote:
>
>  
>
>>DERBY-132 resolved ? Table not automatically compressed 
>>--------------------------------------------------------
>>
>>         Key: DERBY-510
>>         URL: http://issues.apache.org/jira/browse/DERBY-510
>>     Project: Derby
>>        Type: Bug
>>    Versions: 10.1.1.0    
>> Environment: JDK 1.4.2, JDK 1.5.0
>>Windows XP
>>    Reporter: Volker Edelmann
>>
>>
>> I tried a test-program that repeatedly inserts a bunch of data into 1 table and repeatedly deletes a bunch of data.
>>
>>    // table is not empty  when test-program starts
>>     derby.executeSelect("select count(*) c from rclvalues");
>>
>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 2000000); // insert 2.000.000 rows
>>        derby.executeDelete("delete from rclvalues where MOD(id, 3) = 0");
>>   TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
>>        derby.executeDelete("delete from rclvalues where MOD(id, 5) = 0");
>>
>>     derby.executeSelect("select count(*) c from rclvalues");
>>
>>At the end of the operation, the table contains approximately the same number of rows. But the size of the database has grown from
>>581 MB to 1.22 GB. From the description of item DERBY-132, I hoped that Derby does the compression now ( version 10.1.X.X.).  
>>
>>
>>    
>>


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Francois Orsini <fr...@gmail.com>.
Maybe like having somewhat of an "HouseKeeper" module servicing
"tasks/chores" with one of them being data compression, triggered
during Derby "idle" times...configuration settings properties could
let the service chore know which table to compress, if not all the
them...just some thoughts...

--francois

On 8/16/05, Mike Matrigali <mi...@sbcglobal.net> wrote:
> Full compression of derby tables is not done automatically, I
> am looking for input on how to schedule such an operation.  An
> operation like this is going to have a large cpu, i/o, and
> possible temporary disk space impact on the rest of the server.
> As a zero admin db I think we should figure out some way to
> do this automatically, but I think there are a number of
> applications which would not be happy with such a performance
> impact not under their control.
> 
> My initial thoughts are to pick a default time frame, say
> once every 30 days to check for table level events like
> compression and statistics generation and then execute the operations
> at low priority.  Also add some sort of parameter so that
> applications could disable the automatic background jobs.
> 
> Note that derby does automatically reclaim space from deletes
> for subsequent inserts, but the granularity currently is at
> a page level.  So deleting every 3rd or 5th row is the worst
> case behavior.  The page level decision was a tradeoff as
> reclaiming the space is time consuming so did not want to
> schedule to work on a row by row basis.  Currently we schedule
> the work when all the rows on a page are marked deleted.
> 
> Volker Edelmann (JIRA) wrote:
> 
> > DERBY-132 resolved ? Table not automatically compressed
> > --------------------------------------------------------
> >
> >          Key: DERBY-510
> >          URL: http://issues.apache.org/jira/browse/DERBY-510
> >      Project: Derby
> >         Type: Bug
> >     Versions: 10.1.1.0
> >  Environment: JDK 1.4.2, JDK 1.5.0
> > Windows XP
> >     Reporter: Volker Edelmann
> >
> >
> >  I tried a test-program that repeatedly inserts a bunch of data into 1 table and repeatedly deletes a bunch of data.
> >
> >     // table is not empty  when test-program starts
> >      derby.executeSelect("select count(*) c from rclvalues");
> >
> >    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 2000000); // insert 2.000.000 rows
> >         derby.executeDelete("delete from rclvalues where MOD(id, 3) = 0");
> >    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
> >         derby.executeDelete("delete from rclvalues where MOD(id, 5) = 0");
> >
> >      derby.executeSelect("select count(*) c from rclvalues");
> >
> > At the end of the operation, the table contains approximately the same number of rows. But the size of the database has grown from
> > 581 MB to 1.22 GB. From the description of item DERBY-132, I hoped that Derby does the compression now ( version 10.1.X.X.).
> >
> >
>

Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> 1.) is done today, as I was trying to say in the note.  We currently
    MM>      maintain a bit map that either marks pages as completely empty,
    MM>      or "somewhat" empty.  Both sets of pages are used when doing
    MM>      inserts.  There may be more work to get the "somewhat" empty
    MM>      pages to be used more.

So the question here is the definition of "somewhat" empty?

    MM> 2.) As you say, space is never returned to the OS unless the compress
    MM>      system procedures are called manually.

It would be very nice if one would be able to do this automatically.
I guess the major problem is that since this involves moving records
between pages, one will have to update index references.  Doing this
row by row will be more expensive than first make a compressed version
of the table and then rebuild its indexes on this copy.

One idea I have is to only lazily update an index when a record is
moved. Instead, a mapping between previous and current record id will
be recorded. When an outdated index entry is used, this could be
detected and remapped to the new record id.  A background thread could
sequentially scan the index and update outdated references.  This
would be more efficient than an index look-up for each record that is
moved.

-- 
Øystein


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Mike Matrigali <mi...@sbcglobal.net>.
1.) is done today, as I was trying to say in the note.  We currently
     maintain a bit map that either marks pages as completely empty,
     or "somewhat" empty.  Both sets of pages are used when doing
     inserts.  There may be more work to get the "somewhat" empty
     pages to be used more.

2.) As you say, space is never returned to the OS unless the compress
     system procedures are called manually.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> Note that derby does automatically reclaim space from deletes
>     MM> for subsequent inserts, but the granularity currently is at
>     MM> a page level.  So deleting every 3rd or 5th row is the worst
>     MM> case behavior.  The page level decision was a tradeoff as
>     MM> reclaiming the space is time consuming so did not want to
>     MM> schedule to work on a row by row basis.  Currently we schedule
>     MM> the work when all the rows on a page are marked deleted.
> 
> It seems like you are mixing two things here:
> 
>   1. Reclaiming space for subsequent inserts in the same table, and
> 
>   2. Reclaiming space to the file system (which can be used for
>      inserts into other tables).
> 
> I do not understand why reclaiming of space for 1. should be time
> consuming.  You can have a list of pages that have lot of free space,
> each time you delete a record so that the amount of free space is
> above a certain threshold, the page is added to this list.
> 
> For 2., it does not seem to happen automatically even for empty pages?
> I have today deleted all records of a table with a 440MB file.  The
> size of the file has not changed.
> 


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> Note that derby does automatically reclaim space from deletes
    MM> for subsequent inserts, but the granularity currently is at
    MM> a page level.  So deleting every 3rd or 5th row is the worst
    MM> case behavior.  The page level decision was a tradeoff as
    MM> reclaiming the space is time consuming so did not want to
    MM> schedule to work on a row by row basis.  Currently we schedule
    MM> the work when all the rows on a page are marked deleted.

It seems like you are mixing two things here:

  1. Reclaiming space for subsequent inserts in the same table, and

  2. Reclaiming space to the file system (which can be used for
     inserts into other tables).

I do not understand why reclaiming of space for 1. should be time
consuming.  You can have a list of pages that have lot of free space,
each time you delete a record so that the amount of free space is
above a certain threshold, the page is added to this list.

For 2., it does not seem to happen automatically even for empty pages?
I have today deleted all records of a table with a 440MB file.  The
size of the file has not changed.

-- 
Øystein


Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by Mike Matrigali <mi...@sbcglobal.net>.
Full compression of derby tables is not done automatically, I
am looking for input on how to schedule such an operation.  An
operation like this is going to have a large cpu, i/o, and
possible temporary disk space impact on the rest of the server.
As a zero admin db I think we should figure out some way to
do this automatically, but I think there are a number of
applications which would not be happy with such a performance
impact not under their control.

My initial thoughts are to pick a default time frame, say
once every 30 days to check for table level events like
compression and statistics generation and then execute the operations
at low priority.  Also add some sort of parameter so that
applications could disable the automatic background jobs.

Note that derby does automatically reclaim space from deletes
for subsequent inserts, but the granularity currently is at
a page level.  So deleting every 3rd or 5th row is the worst
case behavior.  The page level decision was a tradeoff as
reclaiming the space is time consuming so did not want to
schedule to work on a row by row basis.  Currently we schedule
the work when all the rows on a page are marked deleted.

Volker Edelmann (JIRA) wrote:

> DERBY-132 resolved ? Table not automatically compressed 
> --------------------------------------------------------
> 
>          Key: DERBY-510
>          URL: http://issues.apache.org/jira/browse/DERBY-510
>      Project: Derby
>         Type: Bug
>     Versions: 10.1.1.0    
>  Environment: JDK 1.4.2, JDK 1.5.0
> Windows XP
>     Reporter: Volker Edelmann
> 
> 
>  I tried a test-program that repeatedly inserts a bunch of data into 1 table and repeatedly deletes a bunch of data.
> 
>     // table is not empty  when test-program starts
>      derby.executeSelect("select count(*) c from rclvalues");
> 
>    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 2000000); // insert 2.000.000 rows
>         derby.executeDelete("delete from rclvalues where MOD(id, 3) = 0");
>    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
>         derby.executeDelete("delete from rclvalues where MOD(id, 5) = 0");
> 
>      derby.executeSelect("select count(*) c from rclvalues");
> 
> At the end of the operation, the table contains approximately the same number of rows. But the size of the database has grown from
> 581 MB to 1.22 GB. From the description of item DERBY-132, I hoped that Derby does the compression now ( version 10.1.X.X.).  
> 
> 

[jira] Closed: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Posted by "Mike Matrigali (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-510?page=all ]
     
Mike Matrigali closed DERBY-510:
--------------------------------

    Resolution: Invalid

This is the current expected behavior.  Full compress is not done automatically, at 
the page level deleted space is reused by  subsequent inserts. This test case is
the worst case scenario as only every 3rd or 5th row is deleted.  I will file a separate
enhancement  to somehow  automatically run compress table.  

> DERBY-132 resolved ? Table not automatically compressed
> -------------------------------------------------------
>
>          Key: DERBY-510
>          URL: http://issues.apache.org/jira/browse/DERBY-510
>      Project: Derby
>         Type: Bug
>     Versions: 10.1.1.0
>  Environment: JDK 1.4.2, JDK 1.5.0
> Windows XP
>     Reporter: Volker Edelmann

>
>  I tried a test-program that repeatedly inserts a bunch of data into 1 table and repeatedly deletes a bunch of data.
>     // table is not empty  when test-program starts
>      derby.executeSelect("select count(*) c from rclvalues");
>    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 2000000); // insert 2.000.000 rows
>         derby.executeDelete("delete from rclvalues where MOD(id, 3) = 0");
>    TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
>         derby.executeDelete("delete from rclvalues where MOD(id, 5) = 0");
>      derby.executeSelect("select count(*) c from rclvalues");
> At the end of the operation, the table contains approximately the same number of rows. But the size of the database has grown from
> 581 MB to 1.22 GB. From the description of item DERBY-132, I hoped that Derby does the compression now ( version 10.1.X.X.).  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira