You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Gavin gao <ga...@gmail.com> on 2023/05/26 02:36:10 UTC

Introduce cold ledger storage layer.

In a typical bookkeeper deployment, SSD disks are used to store Journal log
data, while HDD disks are used to store Ledger data. Data writes are
initially stored in memory and then asynchronously flushed to the HDD disk
in the background. However, due to memory limitations, the amount of data
that can be cached is restricted. Consequently, requests for historical
data ultimately rely on the HDD disk, which becomes a bottleneck for the
entire Bookkeeper cluster. Moreover, during data recovery processes
following node failures, a substantial amount of historical data needs to
be read from the HDD disk, leading to the disk's I/O utilization reaching
maximum capacity and resulting in significant read request delays or
failures.

To address these challenges, a new architecture is proposed: the
introduction of a disk cache between the memory cache and the HDD disk,
utilizing an SSD disk as an intermediary medium to significantly extend
data caching duration. The data flow is as follows: journal -> write cache
-> SSD cache -> HDD disk. The SSD disk cache functions as a regular
LedgerStorage layer and is compatible with all existing LedgerStorage
implementations. The following outlines the process:

   1. Data eviction from the disk cache to the Ledger data disk occurs on a
   per-log file basis.
   2. A new configuration parameter, diskCacheRetentionTime, is added to
   set the duration for which hot data is retained. Files with write
   timestamps older than the retention time will be evicted to the Ledger data
   disk.
   3. A new configuration parameter, diskCacheThreshold, is added. If the
   disk cache utilization exceeds the threshold, the eviction process is
   accelerated. Data is evicted to the Ledger data disk based on the order of
   file writes until the disk space recovers above the threshold.
   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
   evict data from the disk cache to the Ledger data disk.

Re: Introduce cold ledger storage layer.

Posted by Enrico Olivelli <eo...@gmail.com>.
Gavin,
This idea looks promising, as Dave mentioned, it could anticipate
adding support for moving cold data to cheaper cloud storage

Enrico

Il giorno ven 26 mag 2023 alle ore 06:17 Dave Fisher
<wa...@comcast.net> ha scritto:
>
>
>
> Sent from my iPhone
>
> > On May 25, 2023, at 7:37 PM, Gavin gao <ga...@gmail.com> wrote:
> >
> > In a typical bookkeeper deployment, SSD disks are used to store Journal log
> > data, while HDD disks are used to store Ledger data.
>
> What is used is a deployment choice. I know that when OMB is run locally attached SSDs are used for both.
>
> I do agree that the choice of SSD and HDD disks can impact Bookkeeper performance. Increasing IOPs and throughput will impact performance significantly. For example in AWS a default gp3 attached disk will have large latencies, but upping performance will give performance maybe 4x slower than an SSD locally attached.
> > Data writes are
> > initially stored in memory and then asynchronously flushed to the HDD disk
> > in the background. However, due to memory limitations, the amount of data
> > that can be cached is restricted. Consequently, requests for historical
> > data ultimately rely on the HDD disk, which becomes a bottleneck for the
> > entire Bookkeeper cluster. Moreover, during data recovery processes
> > following node failures, a substantial amount of historical data needs to
> > be read from the HDD disk, leading to the disk's I/O utilization reaching
> > maximum capacity and resulting in significant read request delays or
> > failures.
> >
> > To address these challenges, a new architecture is proposed: the
> > introduction of a disk cache between the memory cache and the HDD disk,
> > utilizing an SSD disk as an intermediary medium to significantly extend
> > data caching duration. The data flow is as follows: journal -> write cache
> > -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> > LedgerStorage layer and is compatible with all existing LedgerStorage
> > implementations.
>
> A different way to look at this is to consider the cold layer as being optional and within HDD or even S3. In S3 you could have advantages with recovery into different AZs. You could also significantly improve replay.
> > The following outlines the process:
> >
> >   1. Data eviction from the disk cache to the Ledger data disk occurs on a
> >   per-log file basis.
> >   2. A new configuration parameter, diskCacheRetentionTime, is added to
> >   set the duration for which hot data is retained. Files with write
> >   timestamps older than the retention time will be evicted to the Ledger data
> >   disk.
>
> If you can adjust this to use a recent use approach then very long ledger can be easily read with predictively moving ledgers from cold to hot.
>
> >   3. A new configuration parameter, diskCacheThreshold, is added. If the
> >   disk cache utilization exceeds the threshold, the eviction process is
> >   accelerated. Data is evicted to the Ledger data disk based on the order of
> >   file writes until the disk space recovers above the threshold.
> >   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
> >   evict data from the disk cache to the Ledger data disk.
>
> Anotger thread is also needed - ColdStorageRetrievalThread.
>
> Just some thoughts.
>
> Best,
> Dave

Re: Introduce cold ledger storage layer.

Posted by Dave Fisher <wa...@comcast.net>.

Sent from my iPhone

> On May 25, 2023, at 7:37 PM, Gavin gao <ga...@gmail.com> wrote:
> 
> In a typical bookkeeper deployment, SSD disks are used to store Journal log
> data, while HDD disks are used to store Ledger data.

What is used is a deployment choice. I know that when OMB is run locally attached SSDs are used for both.

I do agree that the choice of SSD and HDD disks can impact Bookkeeper performance. Increasing IOPs and throughput will impact performance significantly. For example in AWS a default gp3 attached disk will have large latencies, but upping performance will give performance maybe 4x slower than an SSD locally attached.
> Data writes are
> initially stored in memory and then asynchronously flushed to the HDD disk
> in the background. However, due to memory limitations, the amount of data
> that can be cached is restricted. Consequently, requests for historical
> data ultimately rely on the HDD disk, which becomes a bottleneck for the
> entire Bookkeeper cluster. Moreover, during data recovery processes
> following node failures, a substantial amount of historical data needs to
> be read from the HDD disk, leading to the disk's I/O utilization reaching
> maximum capacity and resulting in significant read request delays or
> failures.
> 
> To address these challenges, a new architecture is proposed: the
> introduction of a disk cache between the memory cache and the HDD disk,
> utilizing an SSD disk as an intermediary medium to significantly extend
> data caching duration. The data flow is as follows: journal -> write cache
> -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> LedgerStorage layer and is compatible with all existing LedgerStorage
> implementations.

A different way to look at this is to consider the cold layer as being optional and within HDD or even S3. In S3 you could have advantages with recovery into different AZs. You could also significantly improve replay.
> The following outlines the process:
> 
>   1. Data eviction from the disk cache to the Ledger data disk occurs on a
>   per-log file basis.
>   2. A new configuration parameter, diskCacheRetentionTime, is added to
>   set the duration for which hot data is retained. Files with write
>   timestamps older than the retention time will be evicted to the Ledger data
>   disk.

If you can adjust this to use a recent use approach then very long ledger can be easily read with predictively moving ledgers from cold to hot.

>   3. A new configuration parameter, diskCacheThreshold, is added. If the
>   disk cache utilization exceeds the threshold, the eviction process is
>   accelerated. Data is evicted to the Ledger data disk based on the order of
>   file writes until the disk space recovers above the threshold.
>   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
>   evict data from the disk cache to the Ledger data disk.

Anotger thread is also needed - ColdStorageRetrievalThread.

Just some thoughts.

Best,
Dave

Re: Introduce cold ledger storage layer.

Posted by Wenbing Shen <ol...@gmail.com>.
Hi, Gavin gao

A very interesting new feature. Our team once discussed implementing the
ssd cache layer in bookkeeper, because our other internal message queues,
such as kafka, use this architecture. We hope that bookkeeper storage can
be applied to the same machine type, but because Other internal work, this
work has not been formally arranged in daily work.

Adding the ssd cache layer, I believe that the read and write timeout
caused by the hot traffic of the local HDD disk will be effectively
improved.

I'm really looking forward to this feature. :)

Thanks,
wenbingshen

Gavin gao <ga...@gmail.com> 于2023年5月26日周五 10:37写道:

> In a typical bookkeeper deployment, SSD disks are used to store Journal log
> data, while HDD disks are used to store Ledger data. Data writes are
> initially stored in memory and then asynchronously flushed to the HDD disk
> in the background. However, due to memory limitations, the amount of data
> that can be cached is restricted. Consequently, requests for historical
> data ultimately rely on the HDD disk, which becomes a bottleneck for the
> entire Bookkeeper cluster. Moreover, during data recovery processes
> following node failures, a substantial amount of historical data needs to
> be read from the HDD disk, leading to the disk's I/O utilization reaching
> maximum capacity and resulting in significant read request delays or
> failures.
>
> To address these challenges, a new architecture is proposed: the
> introduction of a disk cache between the memory cache and the HDD disk,
> utilizing an SSD disk as an intermediary medium to significantly extend
> data caching duration. The data flow is as follows: journal -> write cache
> -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> LedgerStorage layer and is compatible with all existing LedgerStorage
> implementations. The following outlines the process:
>
>    1. Data eviction from the disk cache to the Ledger data disk occurs on a
>    per-log file basis.
>    2. A new configuration parameter, diskCacheRetentionTime, is added to
>    set the duration for which hot data is retained. Files with write
>    timestamps older than the retention time will be evicted to the Ledger
> data
>    disk.
>    3. A new configuration parameter, diskCacheThreshold, is added. If the
>    disk cache utilization exceeds the threshold, the eviction process is
>    accelerated. Data is evicted to the Ledger data disk based on the order
> of
>    file writes until the disk space recovers above the threshold.
>    4. A new thread, ColdStorageArchiveThread, is introduced to periodically
>    evict data from the disk cache to the Ledger data disk.
>

Re: Introduce cold ledger storage layer.

Posted by Gavin Gao <zh...@apache.org>.
Based on the questions and suggestions raised by Andrey Yegorov, I have updated my proposal. Here is the revised introduction:

The new features will not have any impact on the existing architecture implementation.

We have introduced a new implementation called DirectDbLedgerStorage, which eliminates the use of journal pre-writing logs. Now, ledger data can be directly written to the SSD disk, eliminating the need for data to be written twice to the SSD disk.

Data in the SSD disk cache is evicted to the HDD disk based on the write time at the granularity of entry logs.

Furthermore, we have added support for write degradation. When the SSD disk cache reaches the warning threshold, the system automatically switches to the journal+ledger approach to ensure system stability.


On 2023/05/29 08:49:37 Gavin Gao wrote:
> The answer for question 1 is `yes`.
> After reading through it, your main oncerning is the impact on journal write throughput.  I need to do some load test to verfiy How much impact is added?
> 
> For question 2: Other solutions like linux's storage tiering the drawback is:
> 1. Both FlashCache/OpenCAS, in every mode, will flush data back to the SSD cache, similar to PageCache, causing cache pollution.
> 2.Cache Miss results in an additional access to the device, increasing latency.
> 3. All metadata is maintained by the operating system, leading to increased memory consumption by the kernel.
> 
> On 2023/05/26 17:13:13 Andrey Yegorov wrote:
> > Semi-random notes:
> > 
> > 1. Are you planning to use the journal's SSD as cache?
> > 
> > I guess this is a typical thing that pops up after looking at e.g. 256GB
> > SSD with e.g. only 10GB actually utilized by journals.
> > The problem with that idea is that reads and writes start mixing up on the
> > same disk.
> > The value of the dedicated disk for journal (even if that's HDD) is that it
> > only gets sequential write i/o.
> > As soon as you add cache writes (and random reads, you are adding cache to
> > read from it) you'll see a significant drop in write throughput and
> > significant increase in latency for the journal, fsync latency spikes up. I
> > do not have access to the numbers from the tests I've done a few years ago
> > but that was the conclusion.
> > 
> > At the end you are sacrificing write performance for read performance which
> > is ok for some usecases but not for all.
> > You can simulate this by running anything that generates
> > writes/reads/deletes to the journal's ssd (at expected cache rates) during
> > your regular load tests. You won't see read improvements but you will see
> > the write perf changes.
> > 
> > 2. In case you have dedicated SSD for the cache on the hardware that you
> > manage I'd look at other options first, like linux's storage tiering
> > https://blog.delouw.ch/2020/01/29/using-lvm-cache-for-storage-tiering/ and
> > proceed after having the perf test results.
> > 
> > 3. Writing to cache as an entry log and then moving the entry log to a
> > slower class of storage (as I understood; also this way the index can be
> > reused) complicates compaction.
> > It also assumes that journal(s) keep enough data for recovery of cached
> > data in case cache is lost. Both compaction and recovery from journals are
> > getting affected by this.
> > 
> > 4. for cloud-based deployments using local ssds as read cache could be an
> > option as long as we do not expect these disks to be reliable.
> > This means data goes to entry logs as usual and to the cache that can be
> > recreated with newly written data (not rebuilt)
> > One could use e.g. RocksDB and entries with TTL (is there a size-based
> > expiration in rocksdb?)
> > This has a few gotchas of its own:
> > - compaction will have to remove data from cache
> > - RocksDB's internal compaction can introduce latency spikes (it can block
> > reads/writes IIRC).
> > - to avoid returning stale data from the cache we need to read index
> > first to confirm the entry exists, then try cache, fallback to ledger
> > storage
> > As RocksDB alternative in this case, ehcache is mature, apache 2 licensed,
> > and supports tiering/cache persistence but I do not have any experience
> > with it. https://www.ehcache.org/documentation/3.10/tiering.html
> > 
> > 5. Cache has to be abstracted as an interface and support
> > pluggable/configurable implementations. the abstracted interface has to
> > define expected side-effects on compaction/recovery, how to access it
> > to prevent reading stale (compacted out) data.
> > 
> > To be clear, I am ok with the idea of having an optional tiered
> > storage/entry cache but I do not think that proposed implementation is the
> > way to go.
> > 
> > 
> > On Thu, May 25, 2023 at 7:37 PM Gavin gao <ga...@gmail.com> wrote:
> > 
> > > In a typical bookkeeper deployment, SSD disks are used to store Journal log
> > > data, while HDD disks are used to store Ledger data. Data writes are
> > > initially stored in memory and then asynchronously flushed to the HDD disk
> > > in the background. However, due to memory limitations, the amount of data
> > > that can be cached is restricted. Consequently, requests for historical
> > > data ultimately rely on the HDD disk, which becomes a bottleneck for the
> > > entire Bookkeeper cluster. Moreover, during data recovery processes
> > > following node failures, a substantial amount of historical data needs to
> > > be read from the HDD disk, leading to the disk's I/O utilization reaching
> > > maximum capacity and resulting in significant read request delays or
> > > failures.
> > >
> > > To address these challenges, a new architecture is proposed: the
> > > introduction of a disk cache between the memory cache and the HDD disk,
> > > utilizing an SSD disk as an intermediary medium to significantly extend
> > > data caching duration. The data flow is as follows: journal -> write cache
> > > -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> > > LedgerStorage layer and is compatible with all existing LedgerStorage
> > > implementations. The following outlines the process:
> > >
> > >    1. Data eviction from the disk cache to the Ledger data disk occurs on a
> > >    per-log file basis.
> > >    2. A new configuration parameter, diskCacheRetentionTime, is added to
> > >    set the duration for which hot data is retained. Files with write
> > >    timestamps older than the retention time will be evicted to the Ledger
> > > data
> > >    disk.
> > >    3. A new configuration parameter, diskCacheThreshold, is added. If the
> > >    disk cache utilization exceeds the threshold, the eviction process is
> > >    accelerated. Data is evicted to the Ledger data disk based on the order
> > > of
> > >    file writes until the disk space recovers above the threshold.
> > >    4. A new thread, ColdStorageArchiveThread, is introduced to periodically
> > >    evict data from the disk cache to the Ledger data disk.
> > >
> > 
> > 
> > -- 
> > Andrey Yegorov
> > 
> 

Re: Introduce cold ledger storage layer.

Posted by Gavin Gao <zh...@apache.org>.
The answer for question 1 is `yes`.
After reading through it, your main oncerning is the impact on journal write throughput.  I need to do some load test to verfiy How much impact is added?

For question 2: Other solutions like linux's storage tiering the drawback is:
1. Both FlashCache/OpenCAS, in every mode, will flush data back to the SSD cache, similar to PageCache, causing cache pollution.
2.Cache Miss results in an additional access to the device, increasing latency.
3. All metadata is maintained by the operating system, leading to increased memory consumption by the kernel.

On 2023/05/26 17:13:13 Andrey Yegorov wrote:
> Semi-random notes:
> 
> 1. Are you planning to use the journal's SSD as cache?
> 
> I guess this is a typical thing that pops up after looking at e.g. 256GB
> SSD with e.g. only 10GB actually utilized by journals.
> The problem with that idea is that reads and writes start mixing up on the
> same disk.
> The value of the dedicated disk for journal (even if that's HDD) is that it
> only gets sequential write i/o.
> As soon as you add cache writes (and random reads, you are adding cache to
> read from it) you'll see a significant drop in write throughput and
> significant increase in latency for the journal, fsync latency spikes up. I
> do not have access to the numbers from the tests I've done a few years ago
> but that was the conclusion.
> 
> At the end you are sacrificing write performance for read performance which
> is ok for some usecases but not for all.
> You can simulate this by running anything that generates
> writes/reads/deletes to the journal's ssd (at expected cache rates) during
> your regular load tests. You won't see read improvements but you will see
> the write perf changes.
> 
> 2. In case you have dedicated SSD for the cache on the hardware that you
> manage I'd look at other options first, like linux's storage tiering
> https://blog.delouw.ch/2020/01/29/using-lvm-cache-for-storage-tiering/ and
> proceed after having the perf test results.
> 
> 3. Writing to cache as an entry log and then moving the entry log to a
> slower class of storage (as I understood; also this way the index can be
> reused) complicates compaction.
> It also assumes that journal(s) keep enough data for recovery of cached
> data in case cache is lost. Both compaction and recovery from journals are
> getting affected by this.
> 
> 4. for cloud-based deployments using local ssds as read cache could be an
> option as long as we do not expect these disks to be reliable.
> This means data goes to entry logs as usual and to the cache that can be
> recreated with newly written data (not rebuilt)
> One could use e.g. RocksDB and entries with TTL (is there a size-based
> expiration in rocksdb?)
> This has a few gotchas of its own:
> - compaction will have to remove data from cache
> - RocksDB's internal compaction can introduce latency spikes (it can block
> reads/writes IIRC).
> - to avoid returning stale data from the cache we need to read index
> first to confirm the entry exists, then try cache, fallback to ledger
> storage
> As RocksDB alternative in this case, ehcache is mature, apache 2 licensed,
> and supports tiering/cache persistence but I do not have any experience
> with it. https://www.ehcache.org/documentation/3.10/tiering.html
> 
> 5. Cache has to be abstracted as an interface and support
> pluggable/configurable implementations. the abstracted interface has to
> define expected side-effects on compaction/recovery, how to access it
> to prevent reading stale (compacted out) data.
> 
> To be clear, I am ok with the idea of having an optional tiered
> storage/entry cache but I do not think that proposed implementation is the
> way to go.
> 
> 
> On Thu, May 25, 2023 at 7:37 PM Gavin gao <ga...@gmail.com> wrote:
> 
> > In a typical bookkeeper deployment, SSD disks are used to store Journal log
> > data, while HDD disks are used to store Ledger data. Data writes are
> > initially stored in memory and then asynchronously flushed to the HDD disk
> > in the background. However, due to memory limitations, the amount of data
> > that can be cached is restricted. Consequently, requests for historical
> > data ultimately rely on the HDD disk, which becomes a bottleneck for the
> > entire Bookkeeper cluster. Moreover, during data recovery processes
> > following node failures, a substantial amount of historical data needs to
> > be read from the HDD disk, leading to the disk's I/O utilization reaching
> > maximum capacity and resulting in significant read request delays or
> > failures.
> >
> > To address these challenges, a new architecture is proposed: the
> > introduction of a disk cache between the memory cache and the HDD disk,
> > utilizing an SSD disk as an intermediary medium to significantly extend
> > data caching duration. The data flow is as follows: journal -> write cache
> > -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> > LedgerStorage layer and is compatible with all existing LedgerStorage
> > implementations. The following outlines the process:
> >
> >    1. Data eviction from the disk cache to the Ledger data disk occurs on a
> >    per-log file basis.
> >    2. A new configuration parameter, diskCacheRetentionTime, is added to
> >    set the duration for which hot data is retained. Files with write
> >    timestamps older than the retention time will be evicted to the Ledger
> > data
> >    disk.
> >    3. A new configuration parameter, diskCacheThreshold, is added. If the
> >    disk cache utilization exceeds the threshold, the eviction process is
> >    accelerated. Data is evicted to the Ledger data disk based on the order
> > of
> >    file writes until the disk space recovers above the threshold.
> >    4. A new thread, ColdStorageArchiveThread, is introduced to periodically
> >    evict data from the disk cache to the Ledger data disk.
> >
> 
> 
> -- 
> Andrey Yegorov
> 

Re: Introduce cold ledger storage layer.

Posted by Andrey Yegorov <an...@datastax.com>.
Semi-random notes:

1. Are you planning to use the journal's SSD as cache?

I guess this is a typical thing that pops up after looking at e.g. 256GB
SSD with e.g. only 10GB actually utilized by journals.
The problem with that idea is that reads and writes start mixing up on the
same disk.
The value of the dedicated disk for journal (even if that's HDD) is that it
only gets sequential write i/o.
As soon as you add cache writes (and random reads, you are adding cache to
read from it) you'll see a significant drop in write throughput and
significant increase in latency for the journal, fsync latency spikes up. I
do not have access to the numbers from the tests I've done a few years ago
but that was the conclusion.

At the end you are sacrificing write performance for read performance which
is ok for some usecases but not for all.
You can simulate this by running anything that generates
writes/reads/deletes to the journal's ssd (at expected cache rates) during
your regular load tests. You won't see read improvements but you will see
the write perf changes.

2. In case you have dedicated SSD for the cache on the hardware that you
manage I'd look at other options first, like linux's storage tiering
https://blog.delouw.ch/2020/01/29/using-lvm-cache-for-storage-tiering/ and
proceed after having the perf test results.

3. Writing to cache as an entry log and then moving the entry log to a
slower class of storage (as I understood; also this way the index can be
reused) complicates compaction.
It also assumes that journal(s) keep enough data for recovery of cached
data in case cache is lost. Both compaction and recovery from journals are
getting affected by this.

4. for cloud-based deployments using local ssds as read cache could be an
option as long as we do not expect these disks to be reliable.
This means data goes to entry logs as usual and to the cache that can be
recreated with newly written data (not rebuilt)
One could use e.g. RocksDB and entries with TTL (is there a size-based
expiration in rocksdb?)
This has a few gotchas of its own:
- compaction will have to remove data from cache
- RocksDB's internal compaction can introduce latency spikes (it can block
reads/writes IIRC).
- to avoid returning stale data from the cache we need to read index
first to confirm the entry exists, then try cache, fallback to ledger
storage
As RocksDB alternative in this case, ehcache is mature, apache 2 licensed,
and supports tiering/cache persistence but I do not have any experience
with it. https://www.ehcache.org/documentation/3.10/tiering.html

5. Cache has to be abstracted as an interface and support
pluggable/configurable implementations. the abstracted interface has to
define expected side-effects on compaction/recovery, how to access it
to prevent reading stale (compacted out) data.

To be clear, I am ok with the idea of having an optional tiered
storage/entry cache but I do not think that proposed implementation is the
way to go.


On Thu, May 25, 2023 at 7:37 PM Gavin gao <ga...@gmail.com> wrote:

> In a typical bookkeeper deployment, SSD disks are used to store Journal log
> data, while HDD disks are used to store Ledger data. Data writes are
> initially stored in memory and then asynchronously flushed to the HDD disk
> in the background. However, due to memory limitations, the amount of data
> that can be cached is restricted. Consequently, requests for historical
> data ultimately rely on the HDD disk, which becomes a bottleneck for the
> entire Bookkeeper cluster. Moreover, during data recovery processes
> following node failures, a substantial amount of historical data needs to
> be read from the HDD disk, leading to the disk's I/O utilization reaching
> maximum capacity and resulting in significant read request delays or
> failures.
>
> To address these challenges, a new architecture is proposed: the
> introduction of a disk cache between the memory cache and the HDD disk,
> utilizing an SSD disk as an intermediary medium to significantly extend
> data caching duration. The data flow is as follows: journal -> write cache
> -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> LedgerStorage layer and is compatible with all existing LedgerStorage
> implementations. The following outlines the process:
>
>    1. Data eviction from the disk cache to the Ledger data disk occurs on a
>    per-log file basis.
>    2. A new configuration parameter, diskCacheRetentionTime, is added to
>    set the duration for which hot data is retained. Files with write
>    timestamps older than the retention time will be evicted to the Ledger
> data
>    disk.
>    3. A new configuration parameter, diskCacheThreshold, is added. If the
>    disk cache utilization exceeds the threshold, the eviction process is
>    accelerated. Data is evicted to the Ledger data disk based on the order
> of
>    file writes until the disk space recovers above the threshold.
>    4. A new thread, ColdStorageArchiveThread, is introduced to periodically
>    evict data from the disk cache to the Ledger data disk.
>


-- 
Andrey Yegorov