You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 14:15:53 UTC

[jira] [Created] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

bookie server needs to do compaction over entry log files to reclaim disk space
-------------------------------------------------------------------------------

                 Key: BOOKKEEPER-160
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
             Project: Bookkeeper
          Issue Type: Improvement
          Components: bookkeeper-server
    Affects Versions: 4.0.0
            Reporter: Sijie Guo


bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225156#comment-13225156 ] 

Hudson commented on BOOKKEEPER-160:
-----------------------------------

Integrated in bookkeeper-trunk #392 (See [https://builds.apache.org/job/bookkeeper-trunk/392/])
    BOOKKEEPER-160: bookie server needs to do compaction over entry log files to reclaim disk space (sijie via ivank) (Revision 1298357)

     Result = ABORTED
ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/conf/bk_server.conf
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java

                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2, BK-160.patch_v3
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211945#comment-13211945 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/
-----------------------------------------------------------

(Updated 2012-02-20 16:08:51.382283)


Review request for bookkeeper.


Changes
-------

Attach a new patch addressing Ivan's comments. Also did some changes in EntryLogger to move GarbageCollectorThread into a separator class, which makes code more clear.  


Summary
-------

bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.


This addresses bug BOOKKEEPER-160.
    https://issues.apache.org/jira/browse/BOOKKEEPER-160


Diffs (updated)
-----

  bookkeeper-server/conf/bk_server.conf d005d01 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java d4ece94 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java aca66e6 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java PRE-CREATION 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 6bbe943 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java f661e90 

Diff: https://reviews.apache.org/r/3874/diff


Testing
-------


Thanks,

Sijie


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210161#comment-13210161 ] 

Ivan Kelly commented on BOOKKEEPER-160:
---------------------------------------

{quote}
    > How I had imagined this working is that you create a new entry log to compact into. You copy the entries across and then rename the file to the name of the old entry log.

it is not safe to do as what you described. since after compaction, the position of entries has been changed and we should let this kind of change be reflected in the index file, otherwise we would lost it. The safe way is to let the compacted entries go thru again the addEntry flow to apply the position change to index file.
{quote}
Ack, you're right. I don't like it though. It creates a very tight coupling between Bookie and EntryLogger. EntryLogger already has a coupling to Bookie, but it could be decoupled quite simply (it only uses it to receive some members). Im not sure how to deal with this.

{quote}
what I am thinking is that we need to a setting to let user control whether to enable compaction or not and when to do compaction. e.g, user can configuration the threshold to the size to half of the capacity of a disk, the compaction is enabled when the capacity reaches this threshold. since compaction is expensive to move entries.{quote}
They can disable compaction by setting both watermarks to 0.0. I think it's better to enable it by default, as it means the code gets run more often, so any bugs in the code will fall out of it quicker. 

                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206372#comment-13206372 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/
-----------------------------------------------------------

Review request for bookkeeper.


Summary
-------

bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.


This addresses bug BOOKKEEPER-160.
    https://issues.apache.org/jira/browse/BOOKKEEPER-160


Diffs
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java 57a6c29 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java 42f54d2 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 0c83977 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java 2e5d784 

Diff: https://reviews.apache.org/r/3874/diff


Testing
-------


Thanks,

Sijie


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210000#comment-13210000 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------



bq.  On 2012-02-16 17:43:31, Ivan Kelly wrote:
bq.  > In general the patch is good, but there's a few mods i'd like to see.

thanks Ivan for reviewing it. I will fix the issues as comment and attach a new patch.


bq.  On 2012-02-16 17:43:31, Ivan Kelly wrote:
bq.  > bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java, line 739
bq.  > <https://reviews.apache.org/r/3874/diff/1/?file=74506#file74506line739>
bq.  >
bq.  >     I don't like how EntryLogger is reaching back into Bookie to write the entries. Also, I don't think it's necessary. 
bq.  >     
bq.  >     How I had imagined this working is that you create a new entry log to compact into. You copy the entries across and then rename the file to the name of the old entry log.

it is not safe to do as what you described. since after compaction, the position of entries has been changed and we should let this kind of change be reflected in the index file, otherwise we would lost it. The safe way is to let the compacted entries go thru again the addEntry flow to apply the position change to index file.


bq.  On 2012-02-16 17:43:31, Ivan Kelly wrote:
bq.  > bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java, line 412
bq.  > <https://reviews.apache.org/r/3874/diff/1/?file=74507#file74507line412>
bq.  >
bq.  >     I think we should run compaction based purely on the watermarks. Otherwise, once we hit disk pressure, we'll have to run a lot of compaction at once.

what I am thinking is that we need to a setting to let user control whether to enable compaction or not and when to do compaction. e.g, user can configuration the threshold to the size to half of the capacity of a disk, the compaction is enabled when the capacity reaches this threshold. since compaction is expensive to move entries.


- Sijie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/#review5162
-----------------------------------------------------------


On 2012-02-12 08:47:34, Sijie Guo wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3874/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-12 08:47:34)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-160.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-160
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java 57a6c29 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java 42f54d2 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 0c83977 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java 2e5d784 
bq.  
bq.  Diff: https://reviews.apache.org/r/3874/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sijie
bq.  
bq.


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209532#comment-13209532 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/#review5162
-----------------------------------------------------------


In general the patch is good, but there's a few mods i'd like to see.


bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
<https://reviews.apache.org/r/3874/#comment11305>

    EntryLogMeta or EntryLogMetadata would be a better name as it only refers to a single log of the entry logger.
    



bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
<https://reviews.apache.org/r/3874/#comment11307>

    Would it not be better to loop over entrySet rather than keySet here? 
    
    for (Map.Entry<Long, Etc> e : entryLogs2LedgersMap.entrySet()) {
    
    }



bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
<https://reviews.apache.org/r/3874/#comment11310>

    1024*1024 should be defined as a constant somewhere
    



bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
<https://reviews.apache.org/r/3874/#comment11309>

    Shouldn't you throw an exception here?



bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
<https://reviews.apache.org/r/3874/#comment11308>

    I don't like how EntryLogger is reaching back into Bookie to write the entries. Also, I don't think it's necessary. 
    
    How I had imagined this working is that you create a new entry log to compact into. You copy the entries across and then rename the file to the name of the old entry log. 



bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
<https://reviews.apache.org/r/3874/#comment11298>

    The defaults for the new options should be added to bk_server.conf in a commented out fashion like other options.
    



bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
<https://reviews.apache.org/r/3874/#comment11311>

    I think we should run compaction based purely on the watermarks. Otherwise, once we hit disk pressure, we'll have to run a lot of compaction at once.


- Ivan


On 2012-02-12 08:47:34, Sijie Guo wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3874/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-12 08:47:34)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-160.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-160
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java 57a6c29 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java 42f54d2 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 0c83977 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java 2e5d784 
bq.  
bq.  Diff: https://reviews.apache.org/r/3874/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sijie
bq.  
bq.


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210164#comment-13210164 ] 

Ivan Kelly commented on BOOKKEEPER-160:
---------------------------------------

Hmm, I thought about this a bit, and the core problem is that Bookie has the LedgerCache and LedgerDescriptors and the EntryLogger, which I think is wrong and very tangled. There's no hierarchy which leads to the tight coupling problem just observed. Bookie should only have access to the EntryLogger, and the EntryLogger should own the LedgerDescriptors and LedgerCache. LedgerCache should own the ledgerManager. etc. 

This is a major refactor, and Im not suggesting we do it here. But I think it does need to be done to make the code more maintainable. 
                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-160:
---------------------------------

    Attachment: BK-160.patch_v3
    
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2, BK-160.patch_v3
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221540#comment-13221540 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/
-----------------------------------------------------------

(Updated 2012-03-03 08:07:49.773622)


Review request for bookkeeper.


Changes
-------

attach a new patch addressed Ivan's comment to separate minor & major compaction into different threshold and interval to make them clearly.


Summary
-------

bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.


This addresses bug BOOKKEEPER-160.
    https://issues.apache.org/jira/browse/BOOKKEEPER-160


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 23a1ffc 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java f661e90 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java PRE-CREATION 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java aca66e6 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java 51195d4 
  bookkeeper-server/conf/bk_server.conf d005d01 

Diff: https://reviews.apache.org/r/3874/diff


Testing
-------


Thanks,

Sijie


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-160:
---------------------------------

    Attachment: BK-160.patch

add a patch to do entry log compaction in GC thread. 
                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-160:
---------------------------------

    Attachment: BK-160.patch_v2

attach a new patch.
                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215751#comment-13215751 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-160:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3874/#review5322
-----------------------------------------------------------



bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java
<https://reviews.apache.org/r/3874/#comment11609>

    I like this interface. It solves the entanglement issue nicely.



bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
<https://reviews.apache.org/r/3874/#comment11604>

    I still find the meaning of low watermark and high watermark and minor compaction and major compaction confusing. 
    
    As I understand it, a minor compaction is triggered when low watermark is hit. Major compaction is triggered when the high watermark is hit and no minor compaction ran this round. 
    
    High watermark must be lower than low watermark. So high is 0.6 for example, and low is 0.9. This is confusing, as one would expect low to be lower than high. 
    
    Also, if a minor compaction is run on low, it means (in the example above) 90% of entries in the ledger will be copied. For major, 60% will be copied. So minor compaction is heavier than major compaction, which is confusing.
    
    I think having a minor and major compaction requires that they run with different regularity, and that major compaction is heavier on the system than the minor compaction. For example, we should only run major compaction once a day, but with a threshold of 0.9, minor compaction can be run every hour with a threshold of 0.6. Both the regularity of the compactions and the threshold should be configurable and explicit. I think we should have:
     - minorCompactionInterval
     - minorCompactionThreshold
     - majorCompactionInterval
     - majorCompactionThreshold
    
    And i think some compaction should be on by default, just to make sure this stuff gets run regularly. 


- Ivan


On 2012-02-20 16:08:51, Sijie Guo wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3874/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-20 16:08:51)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-160.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-160
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/conf/bk_server.conf d005d01 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java d4ece94 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java aca66e6 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java PRE-CREATION 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java 6bbe943 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/CompactionTest.java PRE-CREATION 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java f661e90 
bq.  
bq.  Diff: https://reviews.apache.org/r/3874/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sijie
bq.  
bq.


                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch, BK-160.patch_v2
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-160) bookie server needs to do compaction over entry log files to reclaim disk space

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210182#comment-13210182 ] 

Sijie Guo commented on BOOKKEEPER-160:
--------------------------------------

actually, I am thinking why we need to put GC thread in EntryLogger. since GC thread will communicate with other components such as LedgerCache, EntryLogger, it would be better to run GC thread in bookie, just like SyncThread.
                
> bookie server needs to do compaction over entry log files to reclaim disk space
> -------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-160
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-160
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-160.patch
>
>
> bookie server aggregates entries into entry log file. suppose there is lots of ledgers, each ledger has little messages. so a entry log file would contains messages from lots of different ledgers. if there is only one ledger not be deleted, the entry log file would not be removed, whose occupied disk space could not be reclaimed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira