You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alex Plehanov <pl...@gmail.com> on 2020/11/18 15:15:39 UTC

[DISCUSS] Page replacement improvement

Hello, Igniters!

Currently, for page replacement (page rotation between page-memory and
disk) we use Random-LRU algorithm. It has a low maintenance cost and
relatively simple implementation, but it has many disadvantages and affects
performance very much when replacement is started. We even have warnings in
the log when page replacement started and a special event for this. I know
Ignite deployments where administrators force to restart cluster nodes
periodically to avoid page replacement.

I have a couple of proposals to improve page replacement in Ignite:

*Batch page replacement.*

Main idea: in some cases start background task to evict cold pages from
page-memory (for example, pages, last touched more than 12 hours ago).

The task can be started:
- Automatically, triggered by some events, for example, when we expect a
start of Random-LRU page replacing soon (allocated more than 90% of
page-memory) + we have enough amount of cold pages (we need some metric to
calculate the number of cold pages) + some time passed since last batch
page replacement (to avoid too much resource consumption by background
batch replacement).
- Manually (JMX or control.sh), if an administrator wants to control the
time of batch replacement more precisely (for example, to avoid the start
of this task during peak time).

Batch page replacement will be helpful in some workloads (when some data
much colder than another), it can prevent the starting of Random-LRU page
replacement, or if Random-LRU already started it can provide conditions to
stop it.

*Change the page replacement algorithm.*

Good page replacement algorithm should satisfy the requirements:
- low page-fault rates for typical workload
- low maintenance cost (low resource consumption to maintain additional
structures required for page replacement)
- fast searching of next page for replacement
- sequential scans resistance (one sequential scan should not evict all
relatively hot pages from page-memory)

Our Random-LRU has low maintenance cost and sequential scan resistant, but
to find the next page for replacement in the best case we scan 5 pages, in
the worst case we can scan all data region segment. Also, due to random
nature, it's not very effective in predicting the right page for
replacement to minimize the page-fault rate. And it's much time required to
totally evict old cold data.

Usually, database management systems and operating systems use
modifications of LRU algorithms. These algorithms have higher maintenance
costs (pages list should be modified on each page access), but often they
are effective from a "page-fault rate" point of view and have O(1)
complexity for a searching page to replace. Simple LRU is not sequential
scan resistant, but modifications that utilize page access frequency are
resistant to sequential scan.

We can try one of the modifications of LRU as well (for example, "segmented
LRU" seems suitable for Ignite).

Ignite is a memory-centric product, so "low maintenance cost" is very
critical. And there is a risk that page replacement algorithm can affect
workloads, where page replacement is not used (enough RAM to store all
data). Of course, any page replacement solution should be carefully
benchmarked.


Igniters, WDYT? If any of these proposals look reasonable to you, I will
create IEP and start implementation.

Also, I have a draft implementation of system view to determine how hot are
pages in page-memory [1]. I think it will be useful for any of these
approaches (and even if we decide to left page replacement as is).

[1]: https://issues.apache.org/jira/browse/IGNITE-13726

Re: [DISCUSS] Page replacement improvement

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Alex.

Do you have a reproducer for a Page memory issues, you described?
What is consequences of that types of issues?

Can we create some «benchmark» that will measure imrpovements of page replacement algorithm?
May be we can use ducktape branch for it?


> 20 нояб. 2020 г., в 12:05, Alex Plehanov <pl...@gmail.com> написал(а):
> 
> Zhenya,
> 
>> Alexey, we already have changes that partially fixes this issue [1]
> IGNITE-13086 it's a minor improvement. We still have major problems with
> our page replacement algorithm (slow page selection and non-optimal
> page-fault rate). I think changing from random 5 pages to 7 will make
> things even worse (it's better for page-fault rate, but page selection will
> be slower).
> 
>> This approach still not applicable for real life
> Why do you think batch replacement is not applicable for real-life? It can
> be applied for workloads, where some big amount of data periodically used,
> but not very often. For example, when OLAP request over historical data
> raised pages to page-memory, and after such request this data is not needed
> for a long time. Or when OLTP transactions mostly add new data and process
> recent data but rarely touch historical data. In these cases with the
> current approach, we will enter "page replacement mode" after some period
> of time and never leave it. With batch page replacement there is a chance
> to prevent random-LRU page replacement or postpone it.
> 
>> But request once more, do you really observe such problems with 2.9 ver ?
> Any graphs maybe ?
> I don't have production usage feedback after IGNITE-13086, but I doubt
> something changed significantly.
> 
> 
> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>> :
> 
>> 
>> Alexey, we already have changes that partially fixes this issue [1]
>> Easy way:
>> Looks like we already have converge in page replacement.
>> If we change 5 times touch iterator from random lru algo into, for
>> example — 7 we will obtain fast improvement from scratch.
>> 
>> » Batch page replacement
>> This approach still not applicable for real life if you wan`t to observe
>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> understand that dramatically reduce of such interval gives nothing?
>> 
>> » Change the page replacement algorithm.
>> That`s way i vote for ) But request once more, do you really observe such
>> problems with 2.9 ver ? Any graphs maybe ?
>> 
>> thanks !
>> 
>> [1] https://issues.apache.org/jira/browse/IGNITE-13086
>>> Hello, Igniters!
>>> 
>>> Currently, for page replacement (page rotation between page-memory and
>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
>>> relatively simple implementation, but it has many disadvantages and
>> affects
>>> performance very much when replacement is started. We even have warnings
>> in
>>> the log when page replacement started and a special event for this. I know
>>> Ignite deployments where administrators force to restart cluster nodes
>>> periodically to avoid page replacement.
>>> 
>>> I have a couple of proposals to improve page replacement in Ignite:
>>> 
>>> *Batch page replacement.*
>>> 
>>> Main idea: in some cases start background task to evict cold pages from
>>> page-memory (for example, pages, last touched more than 12 hours ago).
>>> 
>>> The task can be started:
>>> - Automatically, triggered by some events, for example, when we expect a
>>> start of Random-LRU page replacing soon (allocated more than 90% of
>>> page-memory) + we have enough amount of cold pages (we need some metric to
>>> calculate the number of cold pages) + some time passed since last batch
>>> page replacement (to avoid too much resource consumption by background
>>> batch replacement).
>>> - Manually (JMX or control.sh), if an administrator wants to control the
>>> time of batch replacement more precisely (for example, to avoid the start
>>> of this task during peak time).
>>> 
>>> Batch page replacement will be helpful in some workloads (when some data
>>> much colder than another), it can prevent the starting of Random-LRU page
>>> replacement, or if Random-LRU already started it can provide conditions to
>>> stop it.
>>> 
>>> *Change the page replacement algorithm.*
>>> 
>>> Good page replacement algorithm should satisfy the requirements:
>>> - low page-fault rates for typical workload
>>> - low maintenance cost (low resource consumption to maintain additional
>>> structures required for page replacement)
>>> - fast searching of next page for replacement
>>> - sequential scans resistance (one sequential scan should not evict all
>>> relatively hot pages from page-memory)
>>> 
>>> Our Random-LRU has low maintenance cost and sequential scan resistant, but
>>> to find the next page for replacement in the best case we scan 5 pages, in
>>> the worst case we can scan all data region segment. Also, due to random
>>> nature, it's not very effective in predicting the right page for
>>> replacement to minimize the page-fault rate. And it's much time required
>> to
>>> totally evict old cold data.
>>> 
>>> Usually, database management systems and operating systems use
>>> modifications of LRU algorithms. These algorithms have higher maintenance
>>> costs (pages list should be modified on each page access), but often they
>>> are effective from a "page-fault rate" point of view and have O(1)
>>> complexity for a searching page to replace. Simple LRU is not sequential
>>> scan resistant, but modifications that utilize page access frequency are
>>> resistant to sequential scan.
>>> 
>>> We can try one of the modifications of LRU as well (for example,
>> "segmented
>>> LRU" seems suitable for Ignite).
>>> 
>>> Ignite is a memory-centric product, so "low maintenance cost" is very
>>> critical. And there is a risk that page replacement algorithm can affect
>>> workloads, where page replacement is not used (enough RAM to store all
>>> data). Of course, any page replacement solution should be carefully
>>> benchmarked.
>>> 
>>> 
>>> Igniters, WDYT? If any of these proposals look reasonable to you, I will
>>> create IEP and start implementation.
>>> 
>>> Also, I have a draft implementation of system view to determine how hot
>> are
>>> pages in page-memory [1]. I think it will be useful for any of these
>>> approaches (and even if we decide to left page replacement as is).
>>> 
>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>>> 
>> 
>> 
>> 
>> 


Re: Re[2]: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Mirza, sure. Will wait for your review.

пн, 25 янв. 2021 г. в 11:06, Mirza Aliev <al...@gmail.com>:

> Hello, Alexey!
>
> Thank you for your effort, the feature looks great!
>
> Could you please give a bit more time to review it?
>
> Best regards,
> Mirza Aliev
>
> пт, 22 янв. 2021 г. в 10:19, Alex Plehanov <pl...@gmail.com>:
>
> > Hello, Igniters!
> >
> > Zhenya has already approved the patch that adds segmented-LRU and CLOCK
> > page replacement modes [1] (thanks for the review!). Would anyone else
> like
> > to review it?
> > I have plans to merge it next Monday if nobody else will be interested in
> > reviewing it.
> >
> > [1]: https://issues.apache.org/jira/browse/IGNITE-13761
> >
> > пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <pl...@gmail.com>:
> >
> > > Guys,
> > >
> > > I've implemented Segmented-LRU page replacement algorithm and
> benchmarked
> > > results, it gives some boost (5-10%) when page replacement is
> > > heavily used, but, unfortunately, when replacement is not used it gives
> > > about 2% drop compared to the current Random-LRU page replacement
> > > implementation. Due to this, I think Segmented-LRU can't be used as the
> > > only option or option by default.
> > >
> > > Also, I've implemented CLOCK page replacement algorithm (basic,
> > > not scan-resistant version) and benchmark results are much better. It
> > gives
> > > about the same performance as Segmented-LRU when page replacement is
> used
> > > and about the same performance as Random-LRU where there is no page
> > > replacement. There are advanced modifications of CLOCK algorithm
> exists,
> > > but usually, they require additional maintenance cost and we can again
> > get
> > > drop on environments without page replacements compared to Random-LRU.
> > I've
> > > written a benchmark with background full cache scans and even in this
> > case
> > > basic CLOCK version looks promising.
> > >
> > > So, my proposals:
> > > 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to
> > the
> > > product.
> > > 2. Make page replacement algorithm configurable.
> > > 3. Recommend to use Random-LRU for environments with no page
> replacements
> > > (as it has zero maintenance cost). Recommend to use Segmented-LRU for
> > > environments with a high page replacement rate and a large amount of
> > > one-time scans (as it has near to optimal page to replace selection
> > policy
> > > and scan-resistant). Recommend to use CLOCK for all other cases (as it
> > has
> > > near to zero maintenance cost and efficiency of page replacement near
> to
> > > Segmented-LRU).
> > > 4. Set CLOCK as the default page replacement algorithm.
> > >
> > > WDYT?
> > >
> > > I've filled the IEP [1] for this discussion and create the pull request
> > > [2] for the last proposal. I would appreciate for review of the patch.
> > >
> > > [1]:
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements
> > > [2]: https://github.com/apache/ignite/pull/8513
> > >
> > > пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky
> > > <ar...@mail.ru.invalid>:
> > >
> > >>
> > >>
> > >> Nikolay, i hope such case ignite users already observed)
> > >> I suggest to: put data bigger then available, full scan, get data only
> > >> for available inmem data in loop, check PageReplacement metric, how
> > match
> > >> iterations will bring to zero this metric.
> > >>
> > >> >Hello, Alex.
> > >> >
> > >> >> Perhaps we can implement a special benchmark for this case with
> > >> manually triggered "batch page replacement" using yardstick (but I'm
> not
> > >> sure if ducktape can help here).
> > >> >
> > >> >I think we should carefully describe the issues with the current
> > >> approach and then we can choose right tool to benchmark improvements.
> > >> >You said:
> > >> >
> > >> >> we use Random-LRU algorithm … it has many disadvantages and affects
> > >> performance very much when replacement is started
> > >> >
> > >> >Can you list disadvantages of the Random-LRU?
> > >> >
> > >> >AFAIU the first benchmark should be:
> > >> >
> > >> >1. Start cluster with persistence and put data bigger then available
> > RAM
> > >> to it.
> > >> >2. Measure performance of the queries that selects one entry from the
> > >> cache.
> > >> >3. Make some queries that will iterate throw all data - `SELECT
> SUM(x)
> > >> FROM t` or something similar.
> > >> >4. Repeat step 2. in this time performance of random queries should
> be
> > >> lower due to the page replacement.
> > >> >
> > >> >Is this scenario correct?
> > >> >
> > >> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.alex@gmail.com
> >
> > >> написал(а):
> > >> >>
> > >> >> Nikolay, Zhenya,
> > >> >>
> > >> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random
> puts
> > >> >> uniformly. It can be used to compare different page replacement
> > >> algorithms
> > >> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to
> > >> benchmark
> > >> >> batch page replacement.
> > >> >> Perhaps we can implement a special benchmark for this case with
> > >> manually
> > >> >> triggered "batch page replacement" using yardstick (but I'm not
> sure
> > >> >> if ducktape can help here).
> > >> >>
> > >> >>> I understand case you described, but who will pull the switch ?
> > Human,
> > >> >> artificial intelligence ?
> > >> >> As I described before, we can implement both manual and automatic
> > >> "batch
> > >> >> page replacement" triggering. For automatic triggering, there is no
> > >> >> artificial intelligence needed, just several conditions with
> > reasonable
> > >> >> thresholds. Automatic triggering also can be disabled by default.
> > >> >>
> > >> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <
> > >> arzamas123@mail.ru.invalid
> > >> >>> :
> > >> >>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>> Zhenya,
> > >> >>>>
> > >> >>>>> Alexey, we already have changes that partially fixes this issue
> > [1]
> > >> >>>> IGNITE-13086 it's a minor improvement. We still have major
> problems
> > >> with
> > >> >>>> our page replacement algorithm (slow page selection and
> non-optimal
> > >> >>>> page-fault rate). I think changing from random 5 pages to 7 will
> > make
> > >> >>>> things even worse (it's better for page-fault rate, but page
> > >> selection
> > >> >>> will
> > >> >>>> be slower).
> > >> >>> All this words above need to be proven, i hope. + 1 with Nikolay,
> we
> > >> need
> > >> >>> correct reproduces or some graphs from 2.9 ver.
> > >> >>>
> > >> >>>>
> > >> >>>>> This approach still not applicable for real life
> > >> >>>> Why do you think batch replacement is not applicable for
> real-life?
> > >> It can
> > >> >>>> be applied for workloads, where some big amount of data
> > periodically
> > >> used,
> > >> >>>> but not very often. For example, when OLAP request over
> historical
> > >> data
> > >> >>>> raised pages to page-memory, and after such request this data is
> > not
> > >> >>> needed
> > >> >>>> for a long time. Or when OLTP transactions mostly add new data
> and
> > >> process
> > >> >>>> recent data but rarely touch historical data. In these cases with
> > the
> > >> >>>> current approach, we will enter "page replacement mode" after
> some
> > >> period
> > >> >>>> of time and never leave it. With batch page replacement there is
> a
> > >> chance
> > >> >>>> to prevent random-LRU page replacement or postpone it.
> > >> >>> I understand case you described, but who will pull the switch ?
> > Human,
> > >> >>> artificial intelligence ?
> > >> >>> You approach assume some triggering from inner, i don`t like this.
> > >> >>>
> > >> >>>>
> > >> >>>>> But request once more, do you really observe such problems with
> > 2.9
> > >> ver
> > >> >>> ?
> > >> >>>> Any graphs maybe ?
> > >> >>>> I don't have production usage feedback after IGNITE-13086, but I
> > >> doubt
> > >> >>>> something changed significantly.
> > >> >>>
> > >> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes
> > >> yardstik
> > >> >>> bench for PR proven, we can use it once more.
> > >> >>>
> > >> >>> Thanks !
> > >> >>>>
> > >> >>>>
> > >> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> > >> >>>  arzamas123@mail.ru.invalid
> > >> >>>>> :
> > >> >>>>
> > >> >>>>>
> > >> >>>>> Alexey, we already have changes that partially fixes this issue
> > [1]
> > >> >>>>> Easy way:
> > >> >>>>> Looks like we already have converge in page replacement.
> > >> >>>>> If we change 5 times touch iterator from random lru algo into,
> for
> > >> >>>>> example — 7 we will obtain fast improvement from scratch.
> > >> >>>>>
> > >> >>>>> » Batch page replacement
> > >> >>>>> This approach still not applicable for real life if you wan`t to
> > >> observe
> > >> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course,
> > you
> > >> >>>>> understand that dramatically reduce of such interval gives
> > nothing?
> > >> >>>>>
> > >> >>>>> » Change the page replacement algorithm.
> > >> >>>>> That`s way i vote for ) But request once more, do you really
> > observe
> > >> >>> such
> > >> >>>>> problems with 2.9 ver ? Any graphs maybe ?
> > >> >>>>>
> > >> >>>>> thanks !
> > >> >>>>>
> > >> >>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> > >> >>>>>> Hello, Igniters!
> > >> >>>>>>
> > >> >>>>>> Currently, for page replacement (page rotation between
> > page-memory
> > >> and
> > >> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance
> cost
> > >> and
> > >> >>>>>> relatively simple implementation, but it has many disadvantages
> > and
> > >> >>>>> affects
> > >> >>>>>> performance very much when replacement is started. We even have
> > >> >>> warnings
> > >> >>>>> in
> > >> >>>>>> the log when page replacement started and a special event for
> > >> this. I
> > >> >>> know
> > >> >>>>>> Ignite deployments where administrators force to restart
> cluster
> > >> nodes
> > >> >>>>>> periodically to avoid page replacement.
> > >> >>>>>>
> > >> >>>>>> I have a couple of proposals to improve page replacement in
> > Ignite:
> > >> >>>>>>
> > >> >>>>>> *Batch page replacement.*
> > >> >>>>>>
> > >> >>>>>> Main idea: in some cases start background task to evict cold
> > pages
> > >> from
> > >> >>>>>> page-memory (for example, pages, last touched more than 12
> hours
> > >> ago).
> > >> >>>>>>
> > >> >>>>>> The task can be started:
> > >> >>>>>> - Automatically, triggered by some events, for example, when we
> > >> expect
> > >> >>> a
> > >> >>>>>> start of Random-LRU page replacing soon (allocated more than
> 90%
> > of
> > >> >>>>>> page-memory) + we have enough amount of cold pages (we need
> some
> > >> >>> metric to
> > >> >>>>>> calculate the number of cold pages) + some time passed since
> last
> > >> batch
> > >> >>>>>> page replacement (to avoid too much resource consumption by
> > >> background
> > >> >>>>>> batch replacement).
> > >> >>>>>> - Manually (JMX or control.sh), if an administrator wants to
> > >> control
> > >> >>> the
> > >> >>>>>> time of batch replacement more precisely (for example, to avoid
> > the
> > >> >>> start
> > >> >>>>>> of this task during peak time).
> > >> >>>>>>
> > >> >>>>>> Batch page replacement will be helpful in some workloads (when
> > some
> > >> >>> data
> > >> >>>>>> much colder than another), it can prevent the starting of
> > >> Random-LRU
> > >> >>> page
> > >> >>>>>> replacement, or if Random-LRU already started it can provide
> > >> >>> conditions to
> > >> >>>>>> stop it.
> > >> >>>>>>
> > >> >>>>>> *Change the page replacement algorithm.*
> > >> >>>>>>
> > >> >>>>>> Good page replacement algorithm should satisfy the
> requirements:
> > >> >>>>>> - low page-fault rates for typical workload
> > >> >>>>>> - low maintenance cost (low resource consumption to maintain
> > >> additional
> > >> >>>>>> structures required for page replacement)
> > >> >>>>>> - fast searching of next page for replacement
> > >> >>>>>> - sequential scans resistance (one sequential scan should not
> > >> evict all
> > >> >>>>>> relatively hot pages from page-memory)
> > >> >>>>>>
> > >> >>>>>> Our Random-LRU has low maintenance cost and sequential scan
> > >> resistant,
> > >> >>> but
> > >> >>>>>> to find the next page for replacement in the best case we scan
> 5
> > >> >>> pages, in
> > >> >>>>>> the worst case we can scan all data region segment. Also, due
> to
> > >> random
> > >> >>>>>> nature, it's not very effective in predicting the right page
> for
> > >> >>>>>> replacement to minimize the page-fault rate. And it's much time
> > >> >>> required
> > >> >>>>> to
> > >> >>>>>> totally evict old cold data.
> > >> >>>>>>
> > >> >>>>>> Usually, database management systems and operating systems use
> > >> >>>>>> modifications of LRU algorithms. These algorithms have higher
> > >> >>> maintenance
> > >> >>>>>> costs (pages list should be modified on each page access), but
> > >> often
> > >> >>> they
> > >> >>>>>> are effective from a "page-fault rate" point of view and have
> > O(1)
> > >> >>>>>> complexity for a searching page to replace. Simple LRU is not
> > >> >>> sequential
> > >> >>>>>> scan resistant, but modifications that utilize page access
> > >> frequency
> > >> >>> are
> > >> >>>>>> resistant to sequential scan.
> > >> >>>>>>
> > >> >>>>>> We can try one of the modifications of LRU as well (for
> example,
> > >> >>>>> "segmented
> > >> >>>>>> LRU" seems suitable for Ignite).
> > >> >>>>>>
> > >> >>>>>> Ignite is a memory-centric product, so "low maintenance cost"
> is
> > >> very
> > >> >>>>>> critical. And there is a risk that page replacement algorithm
> can
> > >> >>> affect
> > >> >>>>>> workloads, where page replacement is not used (enough RAM to
> > store
> > >> all
> > >> >>>>>> data). Of course, any page replacement solution should be
> > carefully
> > >> >>>>>> benchmarked.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to
> > you, I
> > >> >>> will
> > >> >>>>>> create IEP and start implementation.
> > >> >>>>>>
> > >> >>>>>> Also, I have a draft implementation of system view to determine
> > >> how hot
> > >> >>>>> are
> > >> >>>>>> pages in page-memory [1]. I think it will be useful for any of
> > >> these
> > >> >>>>>> approaches (and even if we decide to left page replacement as
> > is).
> > >> >>>>>>
> > >> >>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>

Re: Re[2]: [DISCUSS] Page replacement improvement

Posted by Mirza Aliev <al...@gmail.com>.
Hello, Alexey!

Thank you for your effort, the feature looks great!

Could you please give a bit more time to review it?

Best regards,
Mirza Aliev

пт, 22 янв. 2021 г. в 10:19, Alex Plehanov <pl...@gmail.com>:

> Hello, Igniters!
>
> Zhenya has already approved the patch that adds segmented-LRU and CLOCK
> page replacement modes [1] (thanks for the review!). Would anyone else like
> to review it?
> I have plans to merge it next Monday if nobody else will be interested in
> reviewing it.
>
> [1]: https://issues.apache.org/jira/browse/IGNITE-13761
>
> пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <pl...@gmail.com>:
>
> > Guys,
> >
> > I've implemented Segmented-LRU page replacement algorithm and benchmarked
> > results, it gives some boost (5-10%) when page replacement is
> > heavily used, but, unfortunately, when replacement is not used it gives
> > about 2% drop compared to the current Random-LRU page replacement
> > implementation. Due to this, I think Segmented-LRU can't be used as the
> > only option or option by default.
> >
> > Also, I've implemented CLOCK page replacement algorithm (basic,
> > not scan-resistant version) and benchmark results are much better. It
> gives
> > about the same performance as Segmented-LRU when page replacement is used
> > and about the same performance as Random-LRU where there is no page
> > replacement. There are advanced modifications of CLOCK algorithm exists,
> > but usually, they require additional maintenance cost and we can again
> get
> > drop on environments without page replacements compared to Random-LRU.
> I've
> > written a benchmark with background full cache scans and even in this
> case
> > basic CLOCK version looks promising.
> >
> > So, my proposals:
> > 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to
> the
> > product.
> > 2. Make page replacement algorithm configurable.
> > 3. Recommend to use Random-LRU for environments with no page replacements
> > (as it has zero maintenance cost). Recommend to use Segmented-LRU for
> > environments with a high page replacement rate and a large amount of
> > one-time scans (as it has near to optimal page to replace selection
> policy
> > and scan-resistant). Recommend to use CLOCK for all other cases (as it
> has
> > near to zero maintenance cost and efficiency of page replacement near to
> > Segmented-LRU).
> > 4. Set CLOCK as the default page replacement algorithm.
> >
> > WDYT?
> >
> > I've filled the IEP [1] for this discussion and create the pull request
> > [2] for the last proposal. I would appreciate for review of the patch.
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements
> > [2]: https://github.com/apache/ignite/pull/8513
> >
> > пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky
> > <ar...@mail.ru.invalid>:
> >
> >>
> >>
> >> Nikolay, i hope such case ignite users already observed)
> >> I suggest to: put data bigger then available, full scan, get data only
> >> for available inmem data in loop, check PageReplacement metric, how
> match
> >> iterations will bring to zero this metric.
> >>
> >> >Hello, Alex.
> >> >
> >> >> Perhaps we can implement a special benchmark for this case with
> >> manually triggered "batch page replacement" using yardstick (but I'm not
> >> sure if ducktape can help here).
> >> >
> >> >I think we should carefully describe the issues with the current
> >> approach and then we can choose right tool to benchmark improvements.
> >> >You said:
> >> >
> >> >> we use Random-LRU algorithm … it has many disadvantages and affects
> >> performance very much when replacement is started
> >> >
> >> >Can you list disadvantages of the Random-LRU?
> >> >
> >> >AFAIU the first benchmark should be:
> >> >
> >> >1. Start cluster with persistence and put data bigger then available
> RAM
> >> to it.
> >> >2. Measure performance of the queries that selects one entry from the
> >> cache.
> >> >3. Make some queries that will iterate throw all data - `SELECT SUM(x)
> >> FROM t` or something similar.
> >> >4. Repeat step 2. in this time performance of random queries should be
> >> lower due to the page replacement.
> >> >
> >> >Is this scenario correct?
> >> >
> >> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.alex@gmail.com >
> >> написал(а):
> >> >>
> >> >> Nikolay, Zhenya,
> >> >>
> >> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
> >> >> uniformly. It can be used to compare different page replacement
> >> algorithms
> >> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to
> >> benchmark
> >> >> batch page replacement.
> >> >> Perhaps we can implement a special benchmark for this case with
> >> manually
> >> >> triggered "batch page replacement" using yardstick (but I'm not sure
> >> >> if ducktape can help here).
> >> >>
> >> >>> I understand case you described, but who will pull the switch ?
> Human,
> >> >> artificial intelligence ?
> >> >> As I described before, we can implement both manual and automatic
> >> "batch
> >> >> page replacement" triggering. For automatic triggering, there is no
> >> >> artificial intelligence needed, just several conditions with
> reasonable
> >> >> thresholds. Automatic triggering also can be disabled by default.
> >> >>
> >> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <
> >> arzamas123@mail.ru.invalid
> >> >>> :
> >> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>> Zhenya,
> >> >>>>
> >> >>>>> Alexey, we already have changes that partially fixes this issue
> [1]
> >> >>>> IGNITE-13086 it's a minor improvement. We still have major problems
> >> with
> >> >>>> our page replacement algorithm (slow page selection and non-optimal
> >> >>>> page-fault rate). I think changing from random 5 pages to 7 will
> make
> >> >>>> things even worse (it's better for page-fault rate, but page
> >> selection
> >> >>> will
> >> >>>> be slower).
> >> >>> All this words above need to be proven, i hope. + 1 with Nikolay, we
> >> need
> >> >>> correct reproduces or some graphs from 2.9 ver.
> >> >>>
> >> >>>>
> >> >>>>> This approach still not applicable for real life
> >> >>>> Why do you think batch replacement is not applicable for real-life?
> >> It can
> >> >>>> be applied for workloads, where some big amount of data
> periodically
> >> used,
> >> >>>> but not very often. For example, when OLAP request over historical
> >> data
> >> >>>> raised pages to page-memory, and after such request this data is
> not
> >> >>> needed
> >> >>>> for a long time. Or when OLTP transactions mostly add new data and
> >> process
> >> >>>> recent data but rarely touch historical data. In these cases with
> the
> >> >>>> current approach, we will enter "page replacement mode" after some
> >> period
> >> >>>> of time and never leave it. With batch page replacement there is a
> >> chance
> >> >>>> to prevent random-LRU page replacement or postpone it.
> >> >>> I understand case you described, but who will pull the switch ?
> Human,
> >> >>> artificial intelligence ?
> >> >>> You approach assume some triggering from inner, i don`t like this.
> >> >>>
> >> >>>>
> >> >>>>> But request once more, do you really observe such problems with
> 2.9
> >> ver
> >> >>> ?
> >> >>>> Any graphs maybe ?
> >> >>>> I don't have production usage feedback after IGNITE-13086, but I
> >> doubt
> >> >>>> something changed significantly.
> >> >>>
> >> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes
> >> yardstik
> >> >>> bench for PR proven, we can use it once more.
> >> >>>
> >> >>> Thanks !
> >> >>>>
> >> >>>>
> >> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> >> >>>  arzamas123@mail.ru.invalid
> >> >>>>> :
> >> >>>>
> >> >>>>>
> >> >>>>> Alexey, we already have changes that partially fixes this issue
> [1]
> >> >>>>> Easy way:
> >> >>>>> Looks like we already have converge in page replacement.
> >> >>>>> If we change 5 times touch iterator from random lru algo into, for
> >> >>>>> example — 7 we will obtain fast improvement from scratch.
> >> >>>>>
> >> >>>>> » Batch page replacement
> >> >>>>> This approach still not applicable for real life if you wan`t to
> >> observe
> >> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course,
> you
> >> >>>>> understand that dramatically reduce of such interval gives
> nothing?
> >> >>>>>
> >> >>>>> » Change the page replacement algorithm.
> >> >>>>> That`s way i vote for ) But request once more, do you really
> observe
> >> >>> such
> >> >>>>> problems with 2.9 ver ? Any graphs maybe ?
> >> >>>>>
> >> >>>>> thanks !
> >> >>>>>
> >> >>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> >> >>>>>> Hello, Igniters!
> >> >>>>>>
> >> >>>>>> Currently, for page replacement (page rotation between
> page-memory
> >> and
> >> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost
> >> and
> >> >>>>>> relatively simple implementation, but it has many disadvantages
> and
> >> >>>>> affects
> >> >>>>>> performance very much when replacement is started. We even have
> >> >>> warnings
> >> >>>>> in
> >> >>>>>> the log when page replacement started and a special event for
> >> this. I
> >> >>> know
> >> >>>>>> Ignite deployments where administrators force to restart cluster
> >> nodes
> >> >>>>>> periodically to avoid page replacement.
> >> >>>>>>
> >> >>>>>> I have a couple of proposals to improve page replacement in
> Ignite:
> >> >>>>>>
> >> >>>>>> *Batch page replacement.*
> >> >>>>>>
> >> >>>>>> Main idea: in some cases start background task to evict cold
> pages
> >> from
> >> >>>>>> page-memory (for example, pages, last touched more than 12 hours
> >> ago).
> >> >>>>>>
> >> >>>>>> The task can be started:
> >> >>>>>> - Automatically, triggered by some events, for example, when we
> >> expect
> >> >>> a
> >> >>>>>> start of Random-LRU page replacing soon (allocated more than 90%
> of
> >> >>>>>> page-memory) + we have enough amount of cold pages (we need some
> >> >>> metric to
> >> >>>>>> calculate the number of cold pages) + some time passed since last
> >> batch
> >> >>>>>> page replacement (to avoid too much resource consumption by
> >> background
> >> >>>>>> batch replacement).
> >> >>>>>> - Manually (JMX or control.sh), if an administrator wants to
> >> control
> >> >>> the
> >> >>>>>> time of batch replacement more precisely (for example, to avoid
> the
> >> >>> start
> >> >>>>>> of this task during peak time).
> >> >>>>>>
> >> >>>>>> Batch page replacement will be helpful in some workloads (when
> some
> >> >>> data
> >> >>>>>> much colder than another), it can prevent the starting of
> >> Random-LRU
> >> >>> page
> >> >>>>>> replacement, or if Random-LRU already started it can provide
> >> >>> conditions to
> >> >>>>>> stop it.
> >> >>>>>>
> >> >>>>>> *Change the page replacement algorithm.*
> >> >>>>>>
> >> >>>>>> Good page replacement algorithm should satisfy the requirements:
> >> >>>>>> - low page-fault rates for typical workload
> >> >>>>>> - low maintenance cost (low resource consumption to maintain
> >> additional
> >> >>>>>> structures required for page replacement)
> >> >>>>>> - fast searching of next page for replacement
> >> >>>>>> - sequential scans resistance (one sequential scan should not
> >> evict all
> >> >>>>>> relatively hot pages from page-memory)
> >> >>>>>>
> >> >>>>>> Our Random-LRU has low maintenance cost and sequential scan
> >> resistant,
> >> >>> but
> >> >>>>>> to find the next page for replacement in the best case we scan 5
> >> >>> pages, in
> >> >>>>>> the worst case we can scan all data region segment. Also, due to
> >> random
> >> >>>>>> nature, it's not very effective in predicting the right page for
> >> >>>>>> replacement to minimize the page-fault rate. And it's much time
> >> >>> required
> >> >>>>> to
> >> >>>>>> totally evict old cold data.
> >> >>>>>>
> >> >>>>>> Usually, database management systems and operating systems use
> >> >>>>>> modifications of LRU algorithms. These algorithms have higher
> >> >>> maintenance
> >> >>>>>> costs (pages list should be modified on each page access), but
> >> often
> >> >>> they
> >> >>>>>> are effective from a "page-fault rate" point of view and have
> O(1)
> >> >>>>>> complexity for a searching page to replace. Simple LRU is not
> >> >>> sequential
> >> >>>>>> scan resistant, but modifications that utilize page access
> >> frequency
> >> >>> are
> >> >>>>>> resistant to sequential scan.
> >> >>>>>>
> >> >>>>>> We can try one of the modifications of LRU as well (for example,
> >> >>>>> "segmented
> >> >>>>>> LRU" seems suitable for Ignite).
> >> >>>>>>
> >> >>>>>> Ignite is a memory-centric product, so "low maintenance cost" is
> >> very
> >> >>>>>> critical. And there is a risk that page replacement algorithm can
> >> >>> affect
> >> >>>>>> workloads, where page replacement is not used (enough RAM to
> store
> >> all
> >> >>>>>> data). Of course, any page replacement solution should be
> carefully
> >> >>>>>> benchmarked.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to
> you, I
> >> >>> will
> >> >>>>>> create IEP and start implementation.
> >> >>>>>>
> >> >>>>>> Also, I have a draft implementation of system view to determine
> >> how hot
> >> >>>>> are
> >> >>>>>> pages in page-memory [1]. I think it will be useful for any of
> >> these
> >> >>>>>> approaches (and even if we decide to left page replacement as
> is).
> >> >>>>>>
> >> >>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >
> >>
> >>
> >>
> >>
> >
> >
>

Re: Re[2]: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Hello, Igniters!

Zhenya has already approved the patch that adds segmented-LRU and CLOCK
page replacement modes [1] (thanks for the review!). Would anyone else like
to review it?
I have plans to merge it next Monday if nobody else will be interested in
reviewing it.

[1]: https://issues.apache.org/jira/browse/IGNITE-13761

пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <pl...@gmail.com>:

> Guys,
>
> I've implemented Segmented-LRU page replacement algorithm and benchmarked
> results, it gives some boost (5-10%) when page replacement is
> heavily used, but, unfortunately, when replacement is not used it gives
> about 2% drop compared to the current Random-LRU page replacement
> implementation. Due to this, I think Segmented-LRU can't be used as the
> only option or option by default.
>
> Also, I've implemented CLOCK page replacement algorithm (basic,
> not scan-resistant version) and benchmark results are much better. It gives
> about the same performance as Segmented-LRU when page replacement is used
> and about the same performance as Random-LRU where there is no page
> replacement. There are advanced modifications of CLOCK algorithm exists,
> but usually, they require additional maintenance cost and we can again get
> drop on environments without page replacements compared to Random-LRU. I've
> written a benchmark with background full cache scans and even in this case
> basic CLOCK version looks promising.
>
> So, my proposals:
> 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to the
> product.
> 2. Make page replacement algorithm configurable.
> 3. Recommend to use Random-LRU for environments with no page replacements
> (as it has zero maintenance cost). Recommend to use Segmented-LRU for
> environments with a high page replacement rate and a large amount of
> one-time scans (as it has near to optimal page to replace selection policy
> and scan-resistant). Recommend to use CLOCK for all other cases (as it has
> near to zero maintenance cost and efficiency of page replacement near to
> Segmented-LRU).
> 4. Set CLOCK as the default page replacement algorithm.
>
> WDYT?
>
> I've filled the IEP [1] for this discussion and create the pull request
> [2] for the last proposal. I would appreciate for review of the patch.
>
> [1]:
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements
> [2]: https://github.com/apache/ignite/pull/8513
>
> пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky
> <ar...@mail.ru.invalid>:
>
>>
>>
>> Nikolay, i hope such case ignite users already observed)
>> I suggest to: put data bigger then available, full scan, get data only
>> for available inmem data in loop, check PageReplacement metric, how match
>> iterations will bring to zero this metric.
>>
>> >Hello, Alex.
>> >
>> >> Perhaps we can implement a special benchmark for this case with
>> manually triggered "batch page replacement" using yardstick (but I'm not
>> sure if ducktape can help here).
>> >
>> >I think we should carefully describe the issues with the current
>> approach and then we can choose right tool to benchmark improvements.
>> >You said:
>> >
>> >> we use Random-LRU algorithm … it has many disadvantages and affects
>> performance very much when replacement is started
>> >
>> >Can you list disadvantages of the Random-LRU?
>> >
>> >AFAIU the first benchmark should be:
>> >
>> >1. Start cluster with persistence and put data bigger then available RAM
>> to it.
>> >2. Measure performance of the queries that selects one entry from the
>> cache.
>> >3. Make some queries that will iterate throw all data - `SELECT SUM(x)
>> FROM t` or something similar.
>> >4. Repeat step 2. in this time performance of random queries should be
>> lower due to the page replacement.
>> >
>> >Is this scenario correct?
>> >
>> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.alex@gmail.com >
>> написал(а):
>> >>
>> >> Nikolay, Zhenya,
>> >>
>> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
>> >> uniformly. It can be used to compare different page replacement
>> algorithms
>> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to
>> benchmark
>> >> batch page replacement.
>> >> Perhaps we can implement a special benchmark for this case with
>> manually
>> >> triggered "batch page replacement" using yardstick (but I'm not sure
>> >> if ducktape can help here).
>> >>
>> >>> I understand case you described, but who will pull the switch ? Human,
>> >> artificial intelligence ?
>> >> As I described before, we can implement both manual and automatic
>> "batch
>> >> page replacement" triggering. For automatic triggering, there is no
>> >> artificial intelligence needed, just several conditions with reasonable
>> >> thresholds. Automatic triggering also can be disabled by default.
>> >>
>> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <
>> arzamas123@mail.ru.invalid
>> >>> :
>> >>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>> Zhenya,
>> >>>>
>> >>>>> Alexey, we already have changes that partially fixes this issue [1]
>> >>>> IGNITE-13086 it's a minor improvement. We still have major problems
>> with
>> >>>> our page replacement algorithm (slow page selection and non-optimal
>> >>>> page-fault rate). I think changing from random 5 pages to 7 will make
>> >>>> things even worse (it's better for page-fault rate, but page
>> selection
>> >>> will
>> >>>> be slower).
>> >>> All this words above need to be proven, i hope. + 1 with Nikolay, we
>> need
>> >>> correct reproduces or some graphs from 2.9 ver.
>> >>>
>> >>>>
>> >>>>> This approach still not applicable for real life
>> >>>> Why do you think batch replacement is not applicable for real-life?
>> It can
>> >>>> be applied for workloads, where some big amount of data periodically
>> used,
>> >>>> but not very often. For example, when OLAP request over historical
>> data
>> >>>> raised pages to page-memory, and after such request this data is not
>> >>> needed
>> >>>> for a long time. Or when OLTP transactions mostly add new data and
>> process
>> >>>> recent data but rarely touch historical data. In these cases with the
>> >>>> current approach, we will enter "page replacement mode" after some
>> period
>> >>>> of time and never leave it. With batch page replacement there is a
>> chance
>> >>>> to prevent random-LRU page replacement or postpone it.
>> >>> I understand case you described, but who will pull the switch ? Human,
>> >>> artificial intelligence ?
>> >>> You approach assume some triggering from inner, i don`t like this.
>> >>>
>> >>>>
>> >>>>> But request once more, do you really observe such problems with 2.9
>> ver
>> >>> ?
>> >>>> Any graphs maybe ?
>> >>>> I don't have production usage feedback after IGNITE-13086, but I
>> doubt
>> >>>> something changed significantly.
>> >>>
>> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes
>> yardstik
>> >>> bench for PR proven, we can use it once more.
>> >>>
>> >>> Thanks !
>> >>>>
>> >>>>
>> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
>> >>>  arzamas123@mail.ru.invalid
>> >>>>> :
>> >>>>
>> >>>>>
>> >>>>> Alexey, we already have changes that partially fixes this issue [1]
>> >>>>> Easy way:
>> >>>>> Looks like we already have converge in page replacement.
>> >>>>> If we change 5 times touch iterator from random lru algo into, for
>> >>>>> example — 7 we will obtain fast improvement from scratch.
>> >>>>>
>> >>>>> » Batch page replacement
>> >>>>> This approach still not applicable for real life if you wan`t to
>> observe
>> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> >>>>> understand that dramatically reduce of such interval gives nothing?
>> >>>>>
>> >>>>> » Change the page replacement algorithm.
>> >>>>> That`s way i vote for ) But request once more, do you really observe
>> >>> such
>> >>>>> problems with 2.9 ver ? Any graphs maybe ?
>> >>>>>
>> >>>>> thanks !
>> >>>>>
>> >>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>> >>>>>> Hello, Igniters!
>> >>>>>>
>> >>>>>> Currently, for page replacement (page rotation between page-memory
>> and
>> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost
>> and
>> >>>>>> relatively simple implementation, but it has many disadvantages and
>> >>>>> affects
>> >>>>>> performance very much when replacement is started. We even have
>> >>> warnings
>> >>>>> in
>> >>>>>> the log when page replacement started and a special event for
>> this. I
>> >>> know
>> >>>>>> Ignite deployments where administrators force to restart cluster
>> nodes
>> >>>>>> periodically to avoid page replacement.
>> >>>>>>
>> >>>>>> I have a couple of proposals to improve page replacement in Ignite:
>> >>>>>>
>> >>>>>> *Batch page replacement.*
>> >>>>>>
>> >>>>>> Main idea: in some cases start background task to evict cold pages
>> from
>> >>>>>> page-memory (for example, pages, last touched more than 12 hours
>> ago).
>> >>>>>>
>> >>>>>> The task can be started:
>> >>>>>> - Automatically, triggered by some events, for example, when we
>> expect
>> >>> a
>> >>>>>> start of Random-LRU page replacing soon (allocated more than 90% of
>> >>>>>> page-memory) + we have enough amount of cold pages (we need some
>> >>> metric to
>> >>>>>> calculate the number of cold pages) + some time passed since last
>> batch
>> >>>>>> page replacement (to avoid too much resource consumption by
>> background
>> >>>>>> batch replacement).
>> >>>>>> - Manually (JMX or control.sh), if an administrator wants to
>> control
>> >>> the
>> >>>>>> time of batch replacement more precisely (for example, to avoid the
>> >>> start
>> >>>>>> of this task during peak time).
>> >>>>>>
>> >>>>>> Batch page replacement will be helpful in some workloads (when some
>> >>> data
>> >>>>>> much colder than another), it can prevent the starting of
>> Random-LRU
>> >>> page
>> >>>>>> replacement, or if Random-LRU already started it can provide
>> >>> conditions to
>> >>>>>> stop it.
>> >>>>>>
>> >>>>>> *Change the page replacement algorithm.*
>> >>>>>>
>> >>>>>> Good page replacement algorithm should satisfy the requirements:
>> >>>>>> - low page-fault rates for typical workload
>> >>>>>> - low maintenance cost (low resource consumption to maintain
>> additional
>> >>>>>> structures required for page replacement)
>> >>>>>> - fast searching of next page for replacement
>> >>>>>> - sequential scans resistance (one sequential scan should not
>> evict all
>> >>>>>> relatively hot pages from page-memory)
>> >>>>>>
>> >>>>>> Our Random-LRU has low maintenance cost and sequential scan
>> resistant,
>> >>> but
>> >>>>>> to find the next page for replacement in the best case we scan 5
>> >>> pages, in
>> >>>>>> the worst case we can scan all data region segment. Also, due to
>> random
>> >>>>>> nature, it's not very effective in predicting the right page for
>> >>>>>> replacement to minimize the page-fault rate. And it's much time
>> >>> required
>> >>>>> to
>> >>>>>> totally evict old cold data.
>> >>>>>>
>> >>>>>> Usually, database management systems and operating systems use
>> >>>>>> modifications of LRU algorithms. These algorithms have higher
>> >>> maintenance
>> >>>>>> costs (pages list should be modified on each page access), but
>> often
>> >>> they
>> >>>>>> are effective from a "page-fault rate" point of view and have O(1)
>> >>>>>> complexity for a searching page to replace. Simple LRU is not
>> >>> sequential
>> >>>>>> scan resistant, but modifications that utilize page access
>> frequency
>> >>> are
>> >>>>>> resistant to sequential scan.
>> >>>>>>
>> >>>>>> We can try one of the modifications of LRU as well (for example,
>> >>>>> "segmented
>> >>>>>> LRU" seems suitable for Ignite).
>> >>>>>>
>> >>>>>> Ignite is a memory-centric product, so "low maintenance cost" is
>> very
>> >>>>>> critical. And there is a risk that page replacement algorithm can
>> >>> affect
>> >>>>>> workloads, where page replacement is not used (enough RAM to store
>> all
>> >>>>>> data). Of course, any page replacement solution should be carefully
>> >>>>>> benchmarked.
>> >>>>>>
>> >>>>>>
>> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I
>> >>> will
>> >>>>>> create IEP and start implementation.
>> >>>>>>
>> >>>>>> Also, I have a draft implementation of system view to determine
>> how hot
>> >>>>> are
>> >>>>>> pages in page-memory [1]. I think it will be useful for any of
>> these
>> >>>>>> approaches (and even if we decide to left page replacement as is).
>> >>>>>>
>> >>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >
>>
>>
>>
>>
>
>

Re: Re[2]: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Guys,

I've implemented Segmented-LRU page replacement algorithm and benchmarked
results, it gives some boost (5-10%) when page replacement is
heavily used, but, unfortunately, when replacement is not used it gives
about 2% drop compared to the current Random-LRU page replacement
implementation. Due to this, I think Segmented-LRU can't be used as the
only option or option by default.

Also, I've implemented CLOCK page replacement algorithm (basic,
not scan-resistant version) and benchmark results are much better. It gives
about the same performance as Segmented-LRU when page replacement is used
and about the same performance as Random-LRU where there is no page
replacement. There are advanced modifications of CLOCK algorithm exists,
but usually, they require additional maintenance cost and we can again get
drop on environments without page replacements compared to Random-LRU. I've
written a benchmark with background full cache scans and even in this case
basic CLOCK version looks promising.

So, my proposals:
1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to the
product.
2. Make page replacement algorithm configurable.
3. Recommend to use Random-LRU for environments with no page replacements
(as it has zero maintenance cost). Recommend to use Segmented-LRU for
environments with a high page replacement rate and a large amount of
one-time scans (as it has near to optimal page to replace selection policy
and scan-resistant). Recommend to use CLOCK for all other cases (as it has
near to zero maintenance cost and efficiency of page replacement near to
Segmented-LRU).
4. Set CLOCK as the default page replacement algorithm.

WDYT?

I've filled the IEP [1] for this discussion and create the pull request [2]
for the last proposal. I would appreciate for review of the patch.

[1]:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements
[2]: https://github.com/apache/ignite/pull/8513

пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
>
> Nikolay, i hope such case ignite users already observed)
> I suggest to: put data bigger then available, full scan, get data only for
> available inmem data in loop, check PageReplacement metric, how match
> iterations will bring to zero this metric.
>
> >Hello, Alex.
> >
> >> Perhaps we can implement a special benchmark for this case with
> manually triggered "batch page replacement" using yardstick (but I'm not
> sure if ducktape can help here).
> >
> >I think we should carefully describe the issues with the current approach
> and then we can choose right tool to benchmark improvements.
> >You said:
> >
> >> we use Random-LRU algorithm … it has many disadvantages and affects
> performance very much when replacement is started
> >
> >Can you list disadvantages of the Random-LRU?
> >
> >AFAIU the first benchmark should be:
> >
> >1. Start cluster with persistence and put data bigger then available RAM
> to it.
> >2. Measure performance of the queries that selects one entry from the
> cache.
> >3. Make some queries that will iterate throw all data - `SELECT SUM(x)
> FROM t` or something similar.
> >4. Repeat step 2. in this time performance of random queries should be
> lower due to the page replacement.
> >
> >Is this scenario correct?
> >
> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.alex@gmail.com >
> написал(а):
> >>
> >> Nikolay, Zhenya,
> >>
> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
> >> uniformly. It can be used to compare different page replacement
> algorithms
> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to
> benchmark
> >> batch page replacement.
> >> Perhaps we can implement a special benchmark for this case with manually
> >> triggered "batch page replacement" using yardstick (but I'm not sure
> >> if ducktape can help here).
> >>
> >>> I understand case you described, but who will pull the switch ? Human,
> >> artificial intelligence ?
> >> As I described before, we can implement both manual and automatic "batch
> >> page replacement" triggering. For automatic triggering, there is no
> >> artificial intelligence needed, just several conditions with reasonable
> >> thresholds. Automatic triggering also can be disabled by default.
> >>
> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <
> arzamas123@mail.ru.invalid
> >>> :
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> Zhenya,
> >>>>
> >>>>> Alexey, we already have changes that partially fixes this issue [1]
> >>>> IGNITE-13086 it's a minor improvement. We still have major problems
> with
> >>>> our page replacement algorithm (slow page selection and non-optimal
> >>>> page-fault rate). I think changing from random 5 pages to 7 will make
> >>>> things even worse (it's better for page-fault rate, but page selection
> >>> will
> >>>> be slower).
> >>> All this words above need to be proven, i hope. + 1 with Nikolay, we
> need
> >>> correct reproduces or some graphs from 2.9 ver.
> >>>
> >>>>
> >>>>> This approach still not applicable for real life
> >>>> Why do you think batch replacement is not applicable for real-life?
> It can
> >>>> be applied for workloads, where some big amount of data periodically
> used,
> >>>> but not very often. For example, when OLAP request over historical
> data
> >>>> raised pages to page-memory, and after such request this data is not
> >>> needed
> >>>> for a long time. Or when OLTP transactions mostly add new data and
> process
> >>>> recent data but rarely touch historical data. In these cases with the
> >>>> current approach, we will enter "page replacement mode" after some
> period
> >>>> of time and never leave it. With batch page replacement there is a
> chance
> >>>> to prevent random-LRU page replacement or postpone it.
> >>> I understand case you described, but who will pull the switch ? Human,
> >>> artificial intelligence ?
> >>> You approach assume some triggering from inner, i don`t like this.
> >>>
> >>>>
> >>>>> But request once more, do you really observe such problems with 2.9
> ver
> >>> ?
> >>>> Any graphs maybe ?
> >>>> I don't have production usage feedback after IGNITE-13086, but I doubt
> >>>> something changed significantly.
> >>>
> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes
> yardstik
> >>> bench for PR proven, we can use it once more.
> >>>
> >>> Thanks !
> >>>>
> >>>>
> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> >>>  arzamas123@mail.ru.invalid
> >>>>> :
> >>>>
> >>>>>
> >>>>> Alexey, we already have changes that partially fixes this issue [1]
> >>>>> Easy way:
> >>>>> Looks like we already have converge in page replacement.
> >>>>> If we change 5 times touch iterator from random lru algo into, for
> >>>>> example — 7 we will obtain fast improvement from scratch.
> >>>>>
> >>>>> » Batch page replacement
> >>>>> This approach still not applicable for real life if you wan`t to
> observe
> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> >>>>> understand that dramatically reduce of such interval gives nothing?
> >>>>>
> >>>>> » Change the page replacement algorithm.
> >>>>> That`s way i vote for ) But request once more, do you really observe
> >>> such
> >>>>> problems with 2.9 ver ? Any graphs maybe ?
> >>>>>
> >>>>> thanks !
> >>>>>
> >>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> >>>>>> Hello, Igniters!
> >>>>>>
> >>>>>> Currently, for page replacement (page rotation between page-memory
> and
> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >>>>>> relatively simple implementation, but it has many disadvantages and
> >>>>> affects
> >>>>>> performance very much when replacement is started. We even have
> >>> warnings
> >>>>> in
> >>>>>> the log when page replacement started and a special event for this.
> I
> >>> know
> >>>>>> Ignite deployments where administrators force to restart cluster
> nodes
> >>>>>> periodically to avoid page replacement.
> >>>>>>
> >>>>>> I have a couple of proposals to improve page replacement in Ignite:
> >>>>>>
> >>>>>> *Batch page replacement.*
> >>>>>>
> >>>>>> Main idea: in some cases start background task to evict cold pages
> from
> >>>>>> page-memory (for example, pages, last touched more than 12 hours
> ago).
> >>>>>>
> >>>>>> The task can be started:
> >>>>>> - Automatically, triggered by some events, for example, when we
> expect
> >>> a
> >>>>>> start of Random-LRU page replacing soon (allocated more than 90% of
> >>>>>> page-memory) + we have enough amount of cold pages (we need some
> >>> metric to
> >>>>>> calculate the number of cold pages) + some time passed since last
> batch
> >>>>>> page replacement (to avoid too much resource consumption by
> background
> >>>>>> batch replacement).
> >>>>>> - Manually (JMX or control.sh), if an administrator wants to control
> >>> the
> >>>>>> time of batch replacement more precisely (for example, to avoid the
> >>> start
> >>>>>> of this task during peak time).
> >>>>>>
> >>>>>> Batch page replacement will be helpful in some workloads (when some
> >>> data
> >>>>>> much colder than another), it can prevent the starting of Random-LRU
> >>> page
> >>>>>> replacement, or if Random-LRU already started it can provide
> >>> conditions to
> >>>>>> stop it.
> >>>>>>
> >>>>>> *Change the page replacement algorithm.*
> >>>>>>
> >>>>>> Good page replacement algorithm should satisfy the requirements:
> >>>>>> - low page-fault rates for typical workload
> >>>>>> - low maintenance cost (low resource consumption to maintain
> additional
> >>>>>> structures required for page replacement)
> >>>>>> - fast searching of next page for replacement
> >>>>>> - sequential scans resistance (one sequential scan should not evict
> all
> >>>>>> relatively hot pages from page-memory)
> >>>>>>
> >>>>>> Our Random-LRU has low maintenance cost and sequential scan
> resistant,
> >>> but
> >>>>>> to find the next page for replacement in the best case we scan 5
> >>> pages, in
> >>>>>> the worst case we can scan all data region segment. Also, due to
> random
> >>>>>> nature, it's not very effective in predicting the right page for
> >>>>>> replacement to minimize the page-fault rate. And it's much time
> >>> required
> >>>>> to
> >>>>>> totally evict old cold data.
> >>>>>>
> >>>>>> Usually, database management systems and operating systems use
> >>>>>> modifications of LRU algorithms. These algorithms have higher
> >>> maintenance
> >>>>>> costs (pages list should be modified on each page access), but often
> >>> they
> >>>>>> are effective from a "page-fault rate" point of view and have O(1)
> >>>>>> complexity for a searching page to replace. Simple LRU is not
> >>> sequential
> >>>>>> scan resistant, but modifications that utilize page access frequency
> >>> are
> >>>>>> resistant to sequential scan.
> >>>>>>
> >>>>>> We can try one of the modifications of LRU as well (for example,
> >>>>> "segmented
> >>>>>> LRU" seems suitable for Ignite).
> >>>>>>
> >>>>>> Ignite is a memory-centric product, so "low maintenance cost" is
> very
> >>>>>> critical. And there is a risk that page replacement algorithm can
> >>> affect
> >>>>>> workloads, where page replacement is not used (enough RAM to store
> all
> >>>>>> data). Of course, any page replacement solution should be carefully
> >>>>>> benchmarked.
> >>>>>>
> >>>>>>
> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I
> >>> will
> >>>>>> create IEP and start implementation.
> >>>>>>
> >>>>>> Also, I have a draft implementation of system view to determine how
> hot
> >>>>> are
> >>>>>> pages in page-memory [1]. I think it will be useful for any of these
> >>>>>> approaches (and even if we decide to left page replacement as is).
> >>>>>>
> >>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >
>
>
>
>

Re[2]: [DISCUSS] Page replacement improvement

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.

Nikolay, i hope such case ignite users already observed​​​​)
I suggest to: put data bigger then available, full scan, get data only for available inmem data in loop, check PageReplacement metric, how match iterations will bring to zero this metric. 
 
>Hello, Alex.
>
>> Perhaps we can implement a special benchmark for this case with manually triggered "batch page replacement" using yardstick (but I'm not sure if ducktape can help here).
>
>I think we should carefully describe the issues with the current approach and then we can choose right tool to benchmark improvements.
>You said:
>
>> we use Random-LRU algorithm … it has many disadvantages and affects performance very much when replacement is started
>
>Can you list disadvantages of the Random-LRU?
>
>AFAIU the first benchmark should be:
>
>1. Start cluster with persistence and put data bigger then available RAM to it.
>2. Measure performance of the queries that selects one entry from the cache.
>3. Make some queries that will iterate throw all data - `SELECT SUM(x) FROM t` or something similar.
>4. Repeat step 2. in this time performance of random queries should be lower due to the page replacement.
>
>Is this scenario correct?
>
>> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.alex@gmail.com > написал(а):
>>
>> Nikolay, Zhenya,
>>
>> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
>> uniformly. It can be used to compare different page replacement algorithms
>> (random-LRU vs segmented-LRU), but it is totally inapplicable to benchmark
>> batch page replacement.
>> Perhaps we can implement a special benchmark for this case with manually
>> triggered "batch page replacement" using yardstick (but I'm not sure
>> if ducktape can help here).
>>
>>> I understand case you described, but who will pull the switch ? Human,
>> artificial intelligence ?
>> As I described before, we can implement both manual and automatic "batch
>> page replacement" triggering. For automatic triggering, there is no
>> artificial intelligence needed, just several conditions with reasonable
>> thresholds. Automatic triggering also can be disabled by default.
>>
>> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky < arzamas123@mail.ru.invalid
>>> :
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Zhenya,
>>>>
>>>>> Alexey, we already have changes that partially fixes this issue [1]
>>>> IGNITE-13086 it's a minor improvement. We still have major problems with
>>>> our page replacement algorithm (slow page selection and non-optimal
>>>> page-fault rate). I think changing from random 5 pages to 7 will make
>>>> things even worse (it's better for page-fault rate, but page selection
>>> will
>>>> be slower).
>>> All this words above need to be proven, i hope. + 1 with Nikolay, we need
>>> correct reproduces or some graphs from 2.9 ver.
>>>
>>>>
>>>>> This approach still not applicable for real life
>>>> Why do you think batch replacement is not applicable for real-life? It can
>>>> be applied for workloads, where some big amount of data periodically used,
>>>> but not very often. For example, when OLAP request over historical data
>>>> raised pages to page-memory, and after such request this data is not
>>> needed
>>>> for a long time. Or when OLTP transactions mostly add new data and process
>>>> recent data but rarely touch historical data. In these cases with the
>>>> current approach, we will enter "page replacement mode" after some period
>>>> of time and never leave it. With batch page replacement there is a chance
>>>> to prevent random-LRU page replacement or postpone it.
>>> I understand case you described, but who will pull the switch ? Human,
>>> artificial intelligence ?
>>> You approach assume some triggering from inner, i don`t like this.
>>>
>>>>
>>>>> But request once more, do you really observe such problems with 2.9 ver
>>> ?
>>>> Any graphs maybe ?
>>>> I don't have production usage feedback after IGNITE-13086, but I doubt
>>>> something changed significantly.
>>>
>>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik
>>> bench for PR proven, we can use it once more.
>>>
>>> Thanks !
>>>>
>>>>
>>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
>>>  arzamas123@mail.ru.invalid
>>>>> :
>>>>
>>>>>
>>>>> Alexey, we already have changes that partially fixes this issue [1]
>>>>> Easy way:
>>>>> Looks like we already have converge in page replacement.
>>>>> If we change 5 times touch iterator from random lru algo into, for
>>>>> example — 7 we will obtain fast improvement from scratch.
>>>>>
>>>>> » Batch page replacement
>>>>> This approach still not applicable for real life if you wan`t to observe
>>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>>>>> understand that dramatically reduce of such interval gives nothing?
>>>>>
>>>>> » Change the page replacement algorithm.
>>>>> That`s way i vote for ) But request once more, do you really observe
>>> such
>>>>> problems with 2.9 ver ? Any graphs maybe ?
>>>>>
>>>>> thanks !
>>>>>
>>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>>>>>> Hello, Igniters!
>>>>>>
>>>>>> Currently, for page replacement (page rotation between page-memory and
>>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
>>>>>> relatively simple implementation, but it has many disadvantages and
>>>>> affects
>>>>>> performance very much when replacement is started. We even have
>>> warnings
>>>>> in
>>>>>> the log when page replacement started and a special event for this. I
>>> know
>>>>>> Ignite deployments where administrators force to restart cluster nodes
>>>>>> periodically to avoid page replacement.
>>>>>>
>>>>>> I have a couple of proposals to improve page replacement in Ignite:
>>>>>>
>>>>>> *Batch page replacement.*
>>>>>>
>>>>>> Main idea: in some cases start background task to evict cold pages from
>>>>>> page-memory (for example, pages, last touched more than 12 hours ago).
>>>>>>
>>>>>> The task can be started:
>>>>>> - Automatically, triggered by some events, for example, when we expect
>>> a
>>>>>> start of Random-LRU page replacing soon (allocated more than 90% of
>>>>>> page-memory) + we have enough amount of cold pages (we need some
>>> metric to
>>>>>> calculate the number of cold pages) + some time passed since last batch
>>>>>> page replacement (to avoid too much resource consumption by background
>>>>>> batch replacement).
>>>>>> - Manually (JMX or control.sh), if an administrator wants to control
>>> the
>>>>>> time of batch replacement more precisely (for example, to avoid the
>>> start
>>>>>> of this task during peak time).
>>>>>>
>>>>>> Batch page replacement will be helpful in some workloads (when some
>>> data
>>>>>> much colder than another), it can prevent the starting of Random-LRU
>>> page
>>>>>> replacement, or if Random-LRU already started it can provide
>>> conditions to
>>>>>> stop it.
>>>>>>
>>>>>> *Change the page replacement algorithm.*
>>>>>>
>>>>>> Good page replacement algorithm should satisfy the requirements:
>>>>>> - low page-fault rates for typical workload
>>>>>> - low maintenance cost (low resource consumption to maintain additional
>>>>>> structures required for page replacement)
>>>>>> - fast searching of next page for replacement
>>>>>> - sequential scans resistance (one sequential scan should not evict all
>>>>>> relatively hot pages from page-memory)
>>>>>>
>>>>>> Our Random-LRU has low maintenance cost and sequential scan resistant,
>>> but
>>>>>> to find the next page for replacement in the best case we scan 5
>>> pages, in
>>>>>> the worst case we can scan all data region segment. Also, due to random
>>>>>> nature, it's not very effective in predicting the right page for
>>>>>> replacement to minimize the page-fault rate. And it's much time
>>> required
>>>>> to
>>>>>> totally evict old cold data.
>>>>>>
>>>>>> Usually, database management systems and operating systems use
>>>>>> modifications of LRU algorithms. These algorithms have higher
>>> maintenance
>>>>>> costs (pages list should be modified on each page access), but often
>>> they
>>>>>> are effective from a "page-fault rate" point of view and have O(1)
>>>>>> complexity for a searching page to replace. Simple LRU is not
>>> sequential
>>>>>> scan resistant, but modifications that utilize page access frequency
>>> are
>>>>>> resistant to sequential scan.
>>>>>>
>>>>>> We can try one of the modifications of LRU as well (for example,
>>>>> "segmented
>>>>>> LRU" seems suitable for Ignite).
>>>>>>
>>>>>> Ignite is a memory-centric product, so "low maintenance cost" is very
>>>>>> critical. And there is a risk that page replacement algorithm can
>>> affect
>>>>>> workloads, where page replacement is not used (enough RAM to store all
>>>>>> data). Of course, any page replacement solution should be carefully
>>>>>> benchmarked.
>>>>>>
>>>>>>
>>>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I
>>> will
>>>>>> create IEP and start implementation.
>>>>>>
>>>>>> Also, I have a draft implementation of system view to determine how hot
>>>>> are
>>>>>> pages in page-memory [1]. I think it will be useful for any of these
>>>>>> approaches (and even if we decide to left page replacement as is).
>>>>>>
>>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>  
 
 
 
 

Re: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Nikolay,

> Can you list disadvantages of the Random-LRU?
Already described this in previous letters: "To find the next page for
replacement in the best case we scan 5 pages, in the worst case we can scan
all data region segment. Also, due to random nature, it's not very
effective in predicting the right page for replacement to minimize the
page-fault rate."

> Is this scenario correct?
No, page replacement will start right after step 1. When you put data to
cache bigger than available RAM. The performance will be the same in steps
2 and 4.
To benchmark different page replacement algorithm, just the first two steps
from your scenario are enough.
To benchmark batch page replacement we should make some data colder than
another. For example, the scenario where batch page replacement can be
useful:
1. Put data bigger than available RAM.
2. Work with some set of keys for some period of time.
3. Work with another set of keys and benchmark performance.
4. Trigger batch page replacement in parallel to replace cold data from
step 2.
Without step 4 on step 3, there will be more page replacements, and I
suppose performance with step 4 will be better.



пн, 23 нояб. 2020 г. в 09:48, Nikolay Izhikov <ni...@apache.org>:

> Hello, Alex.
>
> > Perhaps we can implement a special benchmark for this case with manually
> triggered "batch page replacement" using yardstick (but I'm not sure if
> ducktape can help here).
>
> I think we should carefully describe the issues with the current approach
> and then we can choose right tool to benchmark improvements.
> You said:
>
> >  we use Random-LRU algorithm … it has many disadvantages and affects
> performance very much when replacement is started
>
> Can you list disadvantages of the Random-LRU?
>
> AFAIU the first benchmark should be:
>
> 1. Start cluster with persistence and put data bigger then available RAM
> to it.
> 2. Measure performance of the queries that selects one entry from the
> cache.
> 3. Make some queries that will iterate throw all data - `SELECT SUM(x)
> FROM t` or something similar.
> 4. Repeat step 2. in this time performance of random queries should be
> lower due to the page replacement.
>
> Is this scenario correct?
>
> > 23 нояб. 2020 г., в 09:12, Alex Plehanov <pl...@gmail.com>
> написал(а):
> >
> > Nikolay, Zhenya,
> >
> > Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
> > uniformly. It can be used to compare different page replacement
> algorithms
> > (random-LRU vs segmented-LRU), but it is totally inapplicable to
> benchmark
> > batch page replacement.
> > Perhaps we can implement a special benchmark for this case with manually
> > triggered "batch page replacement" using yardstick (but I'm not sure
> > if ducktape can help here).
> >
> >> I understand case you described, but who will pull the switch ? Human,
> > artificial intelligence ?
> > As I described before, we can implement both manual and automatic "batch
> > page replacement" triggering. For automatic triggering, there is no
> > artificial intelligence needed, just several conditions with reasonable
> > thresholds. Automatic triggering also can be disabled by default.
> >
> > пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky
> <arzamas123@mail.ru.invalid
> >> :
> >
> >>
> >>
> >>
> >>
> >>
> >>
> >>> Zhenya,
> >>>
> >>>> Alexey, we already have changes that partially fixes this issue [1]
> >>> IGNITE-13086 it's a minor improvement. We still have major problems
> with
> >>> our page replacement algorithm (slow page selection and non-optimal
> >>> page-fault rate). I think changing from random 5 pages to 7 will make
> >>> things even worse (it's better for page-fault rate, but page selection
> >> will
> >>> be slower).
> >> All this words above need to be proven, i hope. + 1 with Nikolay, we
> need
> >> correct reproduces or some graphs from 2.9 ver.
> >>
> >>>
> >>>> This approach still not applicable for real life
> >>> Why do you think batch replacement is not applicable for real-life? It
> can
> >>> be applied for workloads, where some big amount of data periodically
> used,
> >>> but not very often. For example, when OLAP request over historical data
> >>> raised pages to page-memory, and after such request this data is not
> >> needed
> >>> for a long time. Or when OLTP transactions mostly add new data and
> process
> >>> recent data but rarely touch historical data. In these cases with the
> >>> current approach, we will enter "page replacement mode" after some
> period
> >>> of time and never leave it. With batch page replacement there is a
> chance
> >>> to prevent random-LRU page replacement or postpone it.
> >> I understand case you described, but who will pull the switch ? Human,
> >> artificial intelligence ?
> >> You approach assume some triggering from inner, i don`t like this.
> >>
> >>>
> >>>> But request once more, do you really observe such problems with 2.9
> ver
> >> ?
> >>> Any graphs maybe ?
> >>> I don't have production usage feedback after IGNITE-13086, but I doubt
> >>> something changed significantly.
> >>
> >> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik
> >> bench for PR proven, we can use it once more.
> >>
> >> Thanks !
> >>>
> >>>
> >>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> >> arzamas123@mail.ru.invalid
> >>>> :
> >>>
> >>>>
> >>>> Alexey, we already have changes that partially fixes this issue [1]
> >>>> Easy way:
> >>>> Looks like we already have converge in page replacement.
> >>>> If we change 5 times touch iterator from random lru algo into, for
> >>>> example — 7 we will obtain fast improvement from scratch.
> >>>>
> >>>> » Batch page replacement
> >>>> This approach still not applicable for real life if you wan`t to
> observe
> >>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> >>>> understand that dramatically reduce of such interval gives nothing?
> >>>>
> >>>> » Change the page replacement algorithm.
> >>>> That`s way i vote for ) But request once more, do you really observe
> >> such
> >>>> problems with 2.9 ver ? Any graphs maybe ?
> >>>>
> >>>> thanks !
> >>>>
> >>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> >>>>> Hello, Igniters!
> >>>>>
> >>>>> Currently, for page replacement (page rotation between page-memory
> and
> >>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >>>>> relatively simple implementation, but it has many disadvantages and
> >>>> affects
> >>>>> performance very much when replacement is started. We even have
> >> warnings
> >>>> in
> >>>>> the log when page replacement started and a special event for this. I
> >> know
> >>>>> Ignite deployments where administrators force to restart cluster
> nodes
> >>>>> periodically to avoid page replacement.
> >>>>>
> >>>>> I have a couple of proposals to improve page replacement in Ignite:
> >>>>>
> >>>>> *Batch page replacement.*
> >>>>>
> >>>>> Main idea: in some cases start background task to evict cold pages
> from
> >>>>> page-memory (for example, pages, last touched more than 12 hours
> ago).
> >>>>>
> >>>>> The task can be started:
> >>>>> - Automatically, triggered by some events, for example, when we
> expect
> >> a
> >>>>> start of Random-LRU page replacing soon (allocated more than 90% of
> >>>>> page-memory) + we have enough amount of cold pages (we need some
> >> metric to
> >>>>> calculate the number of cold pages) + some time passed since last
> batch
> >>>>> page replacement (to avoid too much resource consumption by
> background
> >>>>> batch replacement).
> >>>>> - Manually (JMX or control.sh), if an administrator wants to control
> >> the
> >>>>> time of batch replacement more precisely (for example, to avoid the
> >> start
> >>>>> of this task during peak time).
> >>>>>
> >>>>> Batch page replacement will be helpful in some workloads (when some
> >> data
> >>>>> much colder than another), it can prevent the starting of Random-LRU
> >> page
> >>>>> replacement, or if Random-LRU already started it can provide
> >> conditions to
> >>>>> stop it.
> >>>>>
> >>>>> *Change the page replacement algorithm.*
> >>>>>
> >>>>> Good page replacement algorithm should satisfy the requirements:
> >>>>> - low page-fault rates for typical workload
> >>>>> - low maintenance cost (low resource consumption to maintain
> additional
> >>>>> structures required for page replacement)
> >>>>> - fast searching of next page for replacement
> >>>>> - sequential scans resistance (one sequential scan should not evict
> all
> >>>>> relatively hot pages from page-memory)
> >>>>>
> >>>>> Our Random-LRU has low maintenance cost and sequential scan
> resistant,
> >> but
> >>>>> to find the next page for replacement in the best case we scan 5
> >> pages, in
> >>>>> the worst case we can scan all data region segment. Also, due to
> random
> >>>>> nature, it's not very effective in predicting the right page for
> >>>>> replacement to minimize the page-fault rate. And it's much time
> >> required
> >>>> to
> >>>>> totally evict old cold data.
> >>>>>
> >>>>> Usually, database management systems and operating systems use
> >>>>> modifications of LRU algorithms. These algorithms have higher
> >> maintenance
> >>>>> costs (pages list should be modified on each page access), but often
> >> they
> >>>>> are effective from a "page-fault rate" point of view and have O(1)
> >>>>> complexity for a searching page to replace. Simple LRU is not
> >> sequential
> >>>>> scan resistant, but modifications that utilize page access frequency
> >> are
> >>>>> resistant to sequential scan.
> >>>>>
> >>>>> We can try one of the modifications of LRU as well (for example,
> >>>> "segmented
> >>>>> LRU" seems suitable for Ignite).
> >>>>>
> >>>>> Ignite is a memory-centric product, so "low maintenance cost" is very
> >>>>> critical. And there is a risk that page replacement algorithm can
> >> affect
> >>>>> workloads, where page replacement is not used (enough RAM to store
> all
> >>>>> data). Of course, any page replacement solution should be carefully
> >>>>> benchmarked.
> >>>>>
> >>>>>
> >>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I
> >> will
> >>>>> create IEP and start implementation.
> >>>>>
> >>>>> Also, I have a draft implementation of system view to determine how
> hot
> >>>> are
> >>>>> pages in page-memory [1]. I think it will be useful for any of these
> >>>>> approaches (and even if we decide to left page replacement as is).
> >>>>>
> >>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >>
>
>

Re: [DISCUSS] Page replacement improvement

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Alex.

> Perhaps we can implement a special benchmark for this case with manually triggered "batch page replacement" using yardstick (but I'm not sure if ducktape can help here).

I think we should carefully describe the issues with the current approach and then we can choose right tool to benchmark improvements.
You said:

>  we use Random-LRU algorithm … it has many disadvantages and affects performance very much when replacement is started

Can you list disadvantages of the Random-LRU?

AFAIU the first benchmark should be:

1. Start cluster with persistence and put data bigger then available RAM to it.
2. Measure performance of the queries that selects one entry from the cache.
3. Make some queries that will iterate throw all data - `SELECT SUM(x) FROM t` or something similar.
4. Repeat step 2. in this time performance of random queries should be lower due to the page replacement.

Is this scenario correct?

> 23 нояб. 2020 г., в 09:12, Alex Plehanov <pl...@gmail.com> написал(а):
> 
> Nikolay, Zhenya,
> 
> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
> uniformly. It can be used to compare different page replacement algorithms
> (random-LRU vs segmented-LRU), but it is totally inapplicable to benchmark
> batch page replacement.
> Perhaps we can implement a special benchmark for this case with manually
> triggered "batch page replacement" using yardstick (but I'm not sure
> if ducktape can help here).
> 
>> I understand case you described, but who will pull the switch ? Human,
> artificial intelligence ?
> As I described before, we can implement both manual and automatic "batch
> page replacement" triggering. For automatic triggering, there is no
> artificial intelligence needed, just several conditions with reasonable
> thresholds. Automatic triggering also can be disabled by default.
> 
> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>> :
> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> Zhenya,
>>> 
>>>> Alexey, we already have changes that partially fixes this issue [1]
>>> IGNITE-13086 it's a minor improvement. We still have major problems with
>>> our page replacement algorithm (slow page selection and non-optimal
>>> page-fault rate). I think changing from random 5 pages to 7 will make
>>> things even worse (it's better for page-fault rate, but page selection
>> will
>>> be slower).
>> All this words above need to be proven, i hope. + 1 with Nikolay, we need
>> correct reproduces or some graphs from 2.9 ver.
>> 
>>> 
>>>> This approach still not applicable for real life
>>> Why do you think batch replacement is not applicable for real-life? It can
>>> be applied for workloads, where some big amount of data periodically used,
>>> but not very often. For example, when OLAP request over historical data
>>> raised pages to page-memory, and after such request this data is not
>> needed
>>> for a long time. Or when OLTP transactions mostly add new data and process
>>> recent data but rarely touch historical data. In these cases with the
>>> current approach, we will enter "page replacement mode" after some period
>>> of time and never leave it. With batch page replacement there is a chance
>>> to prevent random-LRU page replacement or postpone it.
>> I understand case you described, but who will pull the switch ? Human,
>> artificial intelligence ?
>> You approach assume some triggering from inner, i don`t like this.
>> 
>>> 
>>>> But request once more, do you really observe such problems with 2.9 ver
>> ?
>>> Any graphs maybe ?
>>> I don't have production usage feedback after IGNITE-13086, but I doubt
>>> something changed significantly.
>> 
>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik
>> bench for PR proven, we can use it once more.
>> 
>> Thanks !
>>> 
>>> 
>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
>> arzamas123@mail.ru.invalid
>>>> :
>>> 
>>>> 
>>>> Alexey, we already have changes that partially fixes this issue [1]
>>>> Easy way:
>>>> Looks like we already have converge in page replacement.
>>>> If we change 5 times touch iterator from random lru algo into, for
>>>> example — 7 we will obtain fast improvement from scratch.
>>>> 
>>>> » Batch page replacement
>>>> This approach still not applicable for real life if you wan`t to observe
>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>>>> understand that dramatically reduce of such interval gives nothing?
>>>> 
>>>> » Change the page replacement algorithm.
>>>> That`s way i vote for ) But request once more, do you really observe
>> such
>>>> problems with 2.9 ver ? Any graphs maybe ?
>>>> 
>>>> thanks !
>>>> 
>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>>>>> Hello, Igniters!
>>>>> 
>>>>> Currently, for page replacement (page rotation between page-memory and
>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
>>>>> relatively simple implementation, but it has many disadvantages and
>>>> affects
>>>>> performance very much when replacement is started. We even have
>> warnings
>>>> in
>>>>> the log when page replacement started and a special event for this. I
>> know
>>>>> Ignite deployments where administrators force to restart cluster nodes
>>>>> periodically to avoid page replacement.
>>>>> 
>>>>> I have a couple of proposals to improve page replacement in Ignite:
>>>>> 
>>>>> *Batch page replacement.*
>>>>> 
>>>>> Main idea: in some cases start background task to evict cold pages from
>>>>> page-memory (for example, pages, last touched more than 12 hours ago).
>>>>> 
>>>>> The task can be started:
>>>>> - Automatically, triggered by some events, for example, when we expect
>> a
>>>>> start of Random-LRU page replacing soon (allocated more than 90% of
>>>>> page-memory) + we have enough amount of cold pages (we need some
>> metric to
>>>>> calculate the number of cold pages) + some time passed since last batch
>>>>> page replacement (to avoid too much resource consumption by background
>>>>> batch replacement).
>>>>> - Manually (JMX or control.sh), if an administrator wants to control
>> the
>>>>> time of batch replacement more precisely (for example, to avoid the
>> start
>>>>> of this task during peak time).
>>>>> 
>>>>> Batch page replacement will be helpful in some workloads (when some
>> data
>>>>> much colder than another), it can prevent the starting of Random-LRU
>> page
>>>>> replacement, or if Random-LRU already started it can provide
>> conditions to
>>>>> stop it.
>>>>> 
>>>>> *Change the page replacement algorithm.*
>>>>> 
>>>>> Good page replacement algorithm should satisfy the requirements:
>>>>> - low page-fault rates for typical workload
>>>>> - low maintenance cost (low resource consumption to maintain additional
>>>>> structures required for page replacement)
>>>>> - fast searching of next page for replacement
>>>>> - sequential scans resistance (one sequential scan should not evict all
>>>>> relatively hot pages from page-memory)
>>>>> 
>>>>> Our Random-LRU has low maintenance cost and sequential scan resistant,
>> but
>>>>> to find the next page for replacement in the best case we scan 5
>> pages, in
>>>>> the worst case we can scan all data region segment. Also, due to random
>>>>> nature, it's not very effective in predicting the right page for
>>>>> replacement to minimize the page-fault rate. And it's much time
>> required
>>>> to
>>>>> totally evict old cold data.
>>>>> 
>>>>> Usually, database management systems and operating systems use
>>>>> modifications of LRU algorithms. These algorithms have higher
>> maintenance
>>>>> costs (pages list should be modified on each page access), but often
>> they
>>>>> are effective from a "page-fault rate" point of view and have O(1)
>>>>> complexity for a searching page to replace. Simple LRU is not
>> sequential
>>>>> scan resistant, but modifications that utilize page access frequency
>> are
>>>>> resistant to sequential scan.
>>>>> 
>>>>> We can try one of the modifications of LRU as well (for example,
>>>> "segmented
>>>>> LRU" seems suitable for Ignite).
>>>>> 
>>>>> Ignite is a memory-centric product, so "low maintenance cost" is very
>>>>> critical. And there is a risk that page replacement algorithm can
>> affect
>>>>> workloads, where page replacement is not used (enough RAM to store all
>>>>> data). Of course, any page replacement solution should be carefully
>>>>> benchmarked.
>>>>> 
>>>>> 
>>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I
>> will
>>>>> create IEP and start implementation.
>>>>> 
>>>>> Also, I have a draft implementation of system view to determine how hot
>>>> are
>>>>> pages in page-memory [1]. I think it will be useful for any of these
>>>>> approaches (and even if we decide to left page replacement as is).
>>>>> 
>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> 


Re: Re[2]: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Nikolay, Zhenya,

Benchmark from IGNITE-13034 is fully synthetic, it makes random puts
uniformly. It can be used to compare different page replacement algorithms
(random-LRU vs segmented-LRU), but it is totally inapplicable to benchmark
batch page replacement.
Perhaps we can implement a special benchmark for this case with manually
triggered "batch page replacement" using yardstick (but I'm not sure
if ducktape can help here).

> I understand case you described, but who will pull the switch ? Human,
artificial intelligence ?
As I described before, we can implement both manual and automatic "batch
page replacement" triggering. For automatic triggering, there is no
artificial intelligence needed, just several conditions with reasonable
thresholds. Automatic triggering also can be disabled by default.

пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
>
>
>
>
>
> >Zhenya,
> >
> >> Alexey, we already have changes that partially fixes this issue [1]
> >IGNITE-13086 it's a minor improvement. We still have major problems with
> >our page replacement algorithm (slow page selection and non-optimal
> >page-fault rate). I think changing from random 5 pages to 7 will make
> >things even worse (it's better for page-fault rate, but page selection
> will
> >be slower).
> All this words above need to be proven, i hope. + 1 with Nikolay, we need
> correct reproduces or some graphs from 2.9 ver.
>
> >
> >> This approach still not applicable for real life
> >Why do you think batch replacement is not applicable for real-life? It can
> >be applied for workloads, where some big amount of data periodically used,
> >but not very often. For example, when OLAP request over historical data
> >raised pages to page-memory, and after such request this data is not
> needed
> >for a long time. Or when OLTP transactions mostly add new data and process
> >recent data but rarely touch historical data. In these cases with the
> >current approach, we will enter "page replacement mode" after some period
> >of time and never leave it. With batch page replacement there is a chance
> >to prevent random-LRU page replacement or postpone it.
> I understand case you described, but who will pull the switch ? Human,
> artificial intelligence ?
> You approach assume some triggering from inner, i don`t like this.
>
> >
> >> But request once more, do you really observe such problems with 2.9 ver
> ?
> >Any graphs maybe ?
> >I don't have production usage feedback after IGNITE-13086, but I doubt
> >something changed significantly.
>
> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik
> bench for PR proven, we can use it once more.
>
> Thanks !
> >
> >
> >чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> arzamas123@mail.ru.invalid
> >>:
> >
> >>
> >> Alexey, we already have changes that partially fixes this issue [1]
> >> Easy way:
> >> Looks like we already have converge in page replacement.
> >> If we change 5 times touch iterator from random lru algo into, for
> >> example — 7 we will obtain fast improvement from scratch.
> >>
> >> » Batch page replacement
> >> This approach still not applicable for real life if you wan`t to observe
> >> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> >> understand that dramatically reduce of such interval gives nothing?
> >>
> >> » Change the page replacement algorithm.
> >> That`s way i vote for ) But request once more, do you really observe
> such
> >> problems with 2.9 ver ? Any graphs maybe ?
> >>
> >> thanks !
> >>
> >> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> >> >Hello, Igniters!
> >> >
> >> >Currently, for page replacement (page rotation between page-memory and
> >> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >> >relatively simple implementation, but it has many disadvantages and
> >> affects
> >> >performance very much when replacement is started. We even have
> warnings
> >> in
> >> >the log when page replacement started and a special event for this. I
> know
> >> >Ignite deployments where administrators force to restart cluster nodes
> >> >periodically to avoid page replacement.
> >> >
> >> >I have a couple of proposals to improve page replacement in Ignite:
> >> >
> >> >*Batch page replacement.*
> >> >
> >> >Main idea: in some cases start background task to evict cold pages from
> >> >page-memory (for example, pages, last touched more than 12 hours ago).
> >> >
> >> >The task can be started:
> >> >- Automatically, triggered by some events, for example, when we expect
> a
> >> >start of Random-LRU page replacing soon (allocated more than 90% of
> >> >page-memory) + we have enough amount of cold pages (we need some
> metric to
> >> >calculate the number of cold pages) + some time passed since last batch
> >> >page replacement (to avoid too much resource consumption by background
> >> >batch replacement).
> >> >- Manually (JMX or control.sh), if an administrator wants to control
> the
> >> >time of batch replacement more precisely (for example, to avoid the
> start
> >> >of this task during peak time).
> >> >
> >> >Batch page replacement will be helpful in some workloads (when some
> data
> >> >much colder than another), it can prevent the starting of Random-LRU
> page
> >> >replacement, or if Random-LRU already started it can provide
> conditions to
> >> >stop it.
> >> >
> >> >*Change the page replacement algorithm.*
> >> >
> >> >Good page replacement algorithm should satisfy the requirements:
> >> >- low page-fault rates for typical workload
> >> >- low maintenance cost (low resource consumption to maintain additional
> >> >structures required for page replacement)
> >> >- fast searching of next page for replacement
> >> >- sequential scans resistance (one sequential scan should not evict all
> >> >relatively hot pages from page-memory)
> >> >
> >> >Our Random-LRU has low maintenance cost and sequential scan resistant,
> but
> >> >to find the next page for replacement in the best case we scan 5
> pages, in
> >> >the worst case we can scan all data region segment. Also, due to random
> >> >nature, it's not very effective in predicting the right page for
> >> >replacement to minimize the page-fault rate. And it's much time
> required
> >> to
> >> >totally evict old cold data.
> >> >
> >> >Usually, database management systems and operating systems use
> >> >modifications of LRU algorithms. These algorithms have higher
> maintenance
> >> >costs (pages list should be modified on each page access), but often
> they
> >> >are effective from a "page-fault rate" point of view and have O(1)
> >> >complexity for a searching page to replace. Simple LRU is not
> sequential
> >> >scan resistant, but modifications that utilize page access frequency
> are
> >> >resistant to sequential scan.
> >> >
> >> >We can try one of the modifications of LRU as well (for example,
> >> "segmented
> >> >LRU" seems suitable for Ignite).
> >> >
> >> >Ignite is a memory-centric product, so "low maintenance cost" is very
> >> >critical. And there is a risk that page replacement algorithm can
> affect
> >> >workloads, where page replacement is not used (enough RAM to store all
> >> >data). Of course, any page replacement solution should be carefully
> >> >benchmarked.
> >> >
> >> >
> >> >Igniters, WDYT? If any of these proposals look reasonable to you, I
> will
> >> >create IEP and start implementation.
> >> >
> >> >Also, I have a draft implementation of system view to determine how hot
> >> are
> >> >pages in page-memory [1]. I think it will be useful for any of these
> >> >approaches (and even if we decide to left page replacement as is).
> >> >
> >> >[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >> >
> >>
> >>
> >>
> >>
> >
>
>
>
>

Re[2]: [DISCUSS] Page replacement improvement

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.




 
>Zhenya,
>
>> Alexey, we already have changes that partially fixes this issue [1]
>IGNITE-13086 it's a minor improvement. We still have major problems with
>our page replacement algorithm (slow page selection and non-optimal
>page-fault rate). I think changing from random 5 pages to 7 will make
>things even worse (it's better for page-fault rate, but page selection will
>be slower).
All this words above need to be proven, i hope. + 1 with Nikolay, we need correct reproduces or some graphs from 2.9 ver.
 
>
>> This approach still not applicable for real life
>Why do you think batch replacement is not applicable for real-life? It can
>be applied for workloads, where some big amount of data periodically used,
>but not very often. For example, when OLAP request over historical data
>raised pages to page-memory, and after such request this data is not needed
>for a long time. Or when OLTP transactions mostly add new data and process
>recent data but rarely touch historical data. In these cases with the
>current approach, we will enter "page replacement mode" after some period
>of time and never leave it. With batch page replacement there is a chance
>to prevent random-LRU page replacement or postpone it.
I understand case you described, but who will pull the switch ? Human, artificial intelligence ?
You approach assume some triggering from inner, i don`t like this.  
 
>
>> But request once more, do you really observe such problems with 2.9 ver ?
>Any graphs maybe ?
>I don't have production usage feedback after IGNITE-13086, but I doubt
>something changed significantly.
 
Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik bench for PR proven, we can use it once more.
 
Thanks !
>
>
>чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < arzamas123@mail.ru.invalid
>>:
>
>>
>> Alexey, we already have changes that partially fixes this issue [1]
>> Easy way:
>> Looks like we already have converge in page replacement.
>> If we change 5 times touch iterator from random lru algo into, for
>> example — 7 we will obtain fast improvement from scratch.
>>
>> » Batch page replacement
>> This approach still not applicable for real life if you wan`t to observe
>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> understand that dramatically reduce of such interval gives nothing?
>>
>> » Change the page replacement algorithm.
>> That`s way i vote for ) But request once more, do you really observe such
>> problems with 2.9 ver ? Any graphs maybe ?
>>
>> thanks !
>>
>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>> >Hello, Igniters!
>> >
>> >Currently, for page replacement (page rotation between page-memory and
>> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
>> >relatively simple implementation, but it has many disadvantages and
>> affects
>> >performance very much when replacement is started. We even have warnings
>> in
>> >the log when page replacement started and a special event for this. I know
>> >Ignite deployments where administrators force to restart cluster nodes
>> >periodically to avoid page replacement.
>> >
>> >I have a couple of proposals to improve page replacement in Ignite:
>> >
>> >*Batch page replacement.*
>> >
>> >Main idea: in some cases start background task to evict cold pages from
>> >page-memory (for example, pages, last touched more than 12 hours ago).
>> >
>> >The task can be started:
>> >- Automatically, triggered by some events, for example, when we expect a
>> >start of Random-LRU page replacing soon (allocated more than 90% of
>> >page-memory) + we have enough amount of cold pages (we need some metric to
>> >calculate the number of cold pages) + some time passed since last batch
>> >page replacement (to avoid too much resource consumption by background
>> >batch replacement).
>> >- Manually (JMX or control.sh), if an administrator wants to control the
>> >time of batch replacement more precisely (for example, to avoid the start
>> >of this task during peak time).
>> >
>> >Batch page replacement will be helpful in some workloads (when some data
>> >much colder than another), it can prevent the starting of Random-LRU page
>> >replacement, or if Random-LRU already started it can provide conditions to
>> >stop it.
>> >
>> >*Change the page replacement algorithm.*
>> >
>> >Good page replacement algorithm should satisfy the requirements:
>> >- low page-fault rates for typical workload
>> >- low maintenance cost (low resource consumption to maintain additional
>> >structures required for page replacement)
>> >- fast searching of next page for replacement
>> >- sequential scans resistance (one sequential scan should not evict all
>> >relatively hot pages from page-memory)
>> >
>> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
>> >to find the next page for replacement in the best case we scan 5 pages, in
>> >the worst case we can scan all data region segment. Also, due to random
>> >nature, it's not very effective in predicting the right page for
>> >replacement to minimize the page-fault rate. And it's much time required
>> to
>> >totally evict old cold data.
>> >
>> >Usually, database management systems and operating systems use
>> >modifications of LRU algorithms. These algorithms have higher maintenance
>> >costs (pages list should be modified on each page access), but often they
>> >are effective from a "page-fault rate" point of view and have O(1)
>> >complexity for a searching page to replace. Simple LRU is not sequential
>> >scan resistant, but modifications that utilize page access frequency are
>> >resistant to sequential scan.
>> >
>> >We can try one of the modifications of LRU as well (for example,
>> "segmented
>> >LRU" seems suitable for Ignite).
>> >
>> >Ignite is a memory-centric product, so "low maintenance cost" is very
>> >critical. And there is a risk that page replacement algorithm can affect
>> >workloads, where page replacement is not used (enough RAM to store all
>> >data). Of course, any page replacement solution should be carefully
>> >benchmarked.
>> >
>> >
>> >Igniters, WDYT? If any of these proposals look reasonable to you, I will
>> >create IEP and start implementation.
>> >
>> >Also, I have a draft implementation of system view to determine how hot
>> are
>> >pages in page-memory [1]. I think it will be useful for any of these
>> >approaches (and even if we decide to left page replacement as is).
>> >
>> >[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>> >
>>
>>
>>
>>
>  
 
 
 
 

Re: [DISCUSS] Page replacement improvement

Posted by Alex Plehanov <pl...@gmail.com>.
Zhenya,

> Alexey, we already have changes that partially fixes this issue [1]
IGNITE-13086 it's a minor improvement. We still have major problems with
our page replacement algorithm (slow page selection and non-optimal
page-fault rate). I think changing from random 5 pages to 7 will make
things even worse (it's better for page-fault rate, but page selection will
be slower).

> This approach still not applicable for real life
Why do you think batch replacement is not applicable for real-life? It can
be applied for workloads, where some big amount of data periodically used,
but not very often. For example, when OLAP request over historical data
raised pages to page-memory, and after such request this data is not needed
for a long time. Or when OLTP transactions mostly add new data and process
recent data but rarely touch historical data. In these cases with the
current approach, we will enter "page replacement mode" after some period
of time and never leave it. With batch page replacement there is a chance
to prevent random-LRU page replacement or postpone it.

> But request once more, do you really observe such problems with 2.9 ver ?
Any graphs maybe ?
I don't have production usage feedback after IGNITE-13086, but I doubt
something changed significantly.


чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Alexey, we already have changes that partially fixes this issue [1]
> Easy way:
> Looks like we already have converge in page replacement.
> If we change 5 times touch iterator from random lru algo into, for
> example — 7 we will obtain fast improvement from scratch.
>
> » Batch page replacement
> This approach still not applicable for real life if you wan`t to observe
> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> understand that dramatically reduce of such interval gives nothing?
>
> » Change the page replacement algorithm.
> That`s way i vote for ) But request once more, do you really observe such
> problems with 2.9 ver ? Any graphs maybe ?
>
> thanks !
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13086
> >Hello, Igniters!
> >
> >Currently, for page replacement (page rotation between page-memory and
> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >relatively simple implementation, but it has many disadvantages and
> affects
> >performance very much when replacement is started. We even have warnings
> in
> >the log when page replacement started and a special event for this. I know
> >Ignite deployments where administrators force to restart cluster nodes
> >periodically to avoid page replacement.
> >
> >I have a couple of proposals to improve page replacement in Ignite:
> >
> >*Batch page replacement.*
> >
> >Main idea: in some cases start background task to evict cold pages from
> >page-memory (for example, pages, last touched more than 12 hours ago).
> >
> >The task can be started:
> >- Automatically, triggered by some events, for example, when we expect a
> >start of Random-LRU page replacing soon (allocated more than 90% of
> >page-memory) + we have enough amount of cold pages (we need some metric to
> >calculate the number of cold pages) + some time passed since last batch
> >page replacement (to avoid too much resource consumption by background
> >batch replacement).
> >- Manually (JMX or control.sh), if an administrator wants to control the
> >time of batch replacement more precisely (for example, to avoid the start
> >of this task during peak time).
> >
> >Batch page replacement will be helpful in some workloads (when some data
> >much colder than another), it can prevent the starting of Random-LRU page
> >replacement, or if Random-LRU already started it can provide conditions to
> >stop it.
> >
> >*Change the page replacement algorithm.*
> >
> >Good page replacement algorithm should satisfy the requirements:
> >- low page-fault rates for typical workload
> >- low maintenance cost (low resource consumption to maintain additional
> >structures required for page replacement)
> >- fast searching of next page for replacement
> >- sequential scans resistance (one sequential scan should not evict all
> >relatively hot pages from page-memory)
> >
> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
> >to find the next page for replacement in the best case we scan 5 pages, in
> >the worst case we can scan all data region segment. Also, due to random
> >nature, it's not very effective in predicting the right page for
> >replacement to minimize the page-fault rate. And it's much time required
> to
> >totally evict old cold data.
> >
> >Usually, database management systems and operating systems use
> >modifications of LRU algorithms. These algorithms have higher maintenance
> >costs (pages list should be modified on each page access), but often they
> >are effective from a "page-fault rate" point of view and have O(1)
> >complexity for a searching page to replace. Simple LRU is not sequential
> >scan resistant, but modifications that utilize page access frequency are
> >resistant to sequential scan.
> >
> >We can try one of the modifications of LRU as well (for example,
> "segmented
> >LRU" seems suitable for Ignite).
> >
> >Ignite is a memory-centric product, so "low maintenance cost" is very
> >critical. And there is a risk that page replacement algorithm can affect
> >workloads, where page replacement is not used (enough RAM to store all
> >data). Of course, any page replacement solution should be carefully
> >benchmarked.
> >
> >
> >Igniters, WDYT? If any of these proposals look reasonable to you, I will
> >create IEP and start implementation.
> >
> >Also, I have a draft implementation of system view to determine how hot
> are
> >pages in page-memory [1]. I think it will be useful for any of these
> >approaches (and even if we decide to left page replacement as is).
> >
> >[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >
>
>
>
>

[DISCUSS] Page replacement improvement

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.
Alexey, we already have changes that partially fixes this issue [1]
Easy way:
Looks like we already have converge in page replacement.
If we change 5 times touch iterator from random lru algo into, for example — 7 we will obtain fast improvement from scratch.
 
» Batch page replacement
This approach still not applicable for real life if you wan`t to observe ugly people for threshold (i.e. 12 h) interval. And, of course, you understand that dramatically reduce of such interval gives nothing?
 
» Change the page replacement algorithm.
That`s way i vote for ) But request once more, do you really observe such problems with 2.9 ver ? Any graphs maybe ?
 
thanks !

[1] https://issues.apache.org/jira/browse/IGNITE-13086
>Hello, Igniters!
>
>Currently, for page replacement (page rotation between page-memory and
>disk) we use Random-LRU algorithm. It has a low maintenance cost and
>relatively simple implementation, but it has many disadvantages and affects
>performance very much when replacement is started. We even have warnings in
>the log when page replacement started and a special event for this. I know
>Ignite deployments where administrators force to restart cluster nodes
>periodically to avoid page replacement.
>
>I have a couple of proposals to improve page replacement in Ignite:
>
>*Batch page replacement.*
>
>Main idea: in some cases start background task to evict cold pages from
>page-memory (for example, pages, last touched more than 12 hours ago).
>
>The task can be started:
>- Automatically, triggered by some events, for example, when we expect a
>start of Random-LRU page replacing soon (allocated more than 90% of
>page-memory) + we have enough amount of cold pages (we need some metric to
>calculate the number of cold pages) + some time passed since last batch
>page replacement (to avoid too much resource consumption by background
>batch replacement).
>- Manually (JMX or control.sh), if an administrator wants to control the
>time of batch replacement more precisely (for example, to avoid the start
>of this task during peak time).
>
>Batch page replacement will be helpful in some workloads (when some data
>much colder than another), it can prevent the starting of Random-LRU page
>replacement, or if Random-LRU already started it can provide conditions to
>stop it.
>
>*Change the page replacement algorithm.*
>
>Good page replacement algorithm should satisfy the requirements:
>- low page-fault rates for typical workload
>- low maintenance cost (low resource consumption to maintain additional
>structures required for page replacement)
>- fast searching of next page for replacement
>- sequential scans resistance (one sequential scan should not evict all
>relatively hot pages from page-memory)
>
>Our Random-LRU has low maintenance cost and sequential scan resistant, but
>to find the next page for replacement in the best case we scan 5 pages, in
>the worst case we can scan all data region segment. Also, due to random
>nature, it's not very effective in predicting the right page for
>replacement to minimize the page-fault rate. And it's much time required to
>totally evict old cold data.
>
>Usually, database management systems and operating systems use
>modifications of LRU algorithms. These algorithms have higher maintenance
>costs (pages list should be modified on each page access), but often they
>are effective from a "page-fault rate" point of view and have O(1)
>complexity for a searching page to replace. Simple LRU is not sequential
>scan resistant, but modifications that utilize page access frequency are
>resistant to sequential scan.
>
>We can try one of the modifications of LRU as well (for example, "segmented
>LRU" seems suitable for Ignite).
>
>Ignite is a memory-centric product, so "low maintenance cost" is very
>critical. And there is a risk that page replacement algorithm can affect
>workloads, where page replacement is not used (enough RAM to store all
>data). Of course, any page replacement solution should be carefully
>benchmarked.
>
>
>Igniters, WDYT? If any of these proposals look reasonable to you, I will
>create IEP and start implementation.
>
>Also, I have a draft implementation of system view to determine how hot are
>pages in page-memory [1]. I think it will be useful for any of these
>approaches (and even if we decide to left page replacement as is).
>
>[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>