You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by "gaozhangmin (via GitHub)" <gi...@apache.org> on 2023/05/26 02:36:43 UTC

[GitHub] [bookkeeper] gaozhangmin opened a new issue, #3973: Introduce disk cache storage layer.

gaozhangmin opened a new issue, #3973:
URL: https://github.com/apache/bookkeeper/issues/3973

   In a typical bookkeeper deployment, SSD disks are used to store Journal log data, while HDD disks are used to store Ledger data. Data writes are initially stored in memory and then asynchronously flushed to the HDD disk in the background. However, due to memory limitations, the amount of data that can be cached is restricted. Consequently, requests for historical data ultimately rely on the HDD disk, which becomes a bottleneck for the entire Bookkeeper cluster. Moreover, during data recovery processes following node failures, a substantial amount of historical data needs to be read from the HDD disk, leading to the disk's I/O utilization reaching maximum capacity and resulting in significant read request delays or failures.
   
   To address these challenges, a new architecture is proposed: the introduction of a disk cache between the memory cache and the HDD disk, utilizing an SSD disk as an intermediary medium to significantly extend data caching duration. The data flow is as follows: journal -> write cache -> SSD cache -> HDD disk. The SSD disk cache functions as a regular LedgerStorage layer and is compatible with all existing LedgerStorage implementations. The following outlines the process:
   
   1. Data eviction from the disk cache to the Ledger data disk occurs on a per-log file basis.
   2. A new configuration parameter, diskCacheRetentionTime, is added to set the duration for which hot data is retained. Files with write timestamps older than the retention time will be evicted to the Ledger data disk.
   3. A new configuration parameter, diskCacheThreshold, is added. If the disk cache utilization exceeds the threshold, the eviction process is accelerated. Data is evicted to the Ledger data disk based on the order of file write time until the disk space recovers above the threshold.
   4. A new thread, ColdStorageArchiveThread, is introduced to periodically evict data from the disk cache to the Ledger data disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin closed issue #3973: Introduce disk cache storage layer.

Posted by "gaozhangmin (via GitHub)" <gi...@apache.org>.
gaozhangmin closed issue #3973: Introduce disk cache storage layer.
URL: https://github.com/apache/bookkeeper/issues/3973


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin closed issue #3973: Introduce disk cache storage layer.

Posted by "gaozhangmin (via GitHub)" <gi...@apache.org>.
gaozhangmin closed issue #3973: Introduce disk cache storage layer.
URL: https://github.com/apache/bookkeeper/issues/3973


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin commented on issue #3973: Introduce disk cache storage layer.

Posted by "gaozhangmin (via GitHub)" <gi...@apache.org>.
gaozhangmin commented on issue #3973:
URL: https://github.com/apache/bookkeeper/issues/3973#issuecomment-1612394973

   https://lists.apache.org/thread/r4tbxm9s3m2jd4vzynfcnrrw5jkooqqc
   Dicussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin commented on issue #3973: Introduce disk cache storage layer.

Posted by "gaozhangmin (via GitHub)" <gi...@apache.org>.
gaozhangmin commented on issue #3973:
URL: https://github.com/apache/bookkeeper/issues/3973#issuecomment-1643006599

   ## The revisied proposal:
   The new features will not have any impact on the existing architecture implementation.
   
   We have introduced a new implementation called DirectDbLedgerStorage, which eliminates the use of journal pre-writing logs. Now, ledger data can be directly written to the SSD disk, eliminating the need for data to be written twice to the SSD disk.
   
   Data in the SSD disk cache is evicted to the HDD disk based on the write time at the granularity of entry logs.
   
   Furthermore, we have added support for write degradation. When the SSD disk cache reaches the warning threshold, the system automatically switches to the journal+ledger approach to ensure system stability.
   
   I'd like to split this proposal into tasks:
   - [ ]  Introduce an new LedgerStorage implementation `DirectDbLedgerStorage` 
   - [ ] Introduce `ColdStorageArchiveThread`, the main class to archive log file in disk cache to cold storage.
   - [ ] cookie validation include cold ledger dirs
   - [ ] disk cache wtiteing status api
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] gaozhangmin closed issue #3973: Introduce disk cache storage layer.

Posted by "gaozhangmin (via GitHub)" <gi...@apache.org>.
gaozhangmin closed issue #3973: Introduce disk cache storage layer.
URL: https://github.com/apache/bookkeeper/issues/3973


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org