You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by ткаленко кирилл <tk...@yandex.ru> on 2021/05/04 08:29:37 UTC

Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Hello everybody!

At the moment, if there are partitions for the rebalance for which the historical rebalance will be used, then we reserve segments in the WAL archive (we do not allow cleaning the WAL archive) until the rebalance for all cache groups is over.

If a cluster is under load during the rebalance, WAL archive size may significantly exceed limits set in DataStorageConfiguration#getMaxWalArchiveSize until the process is complete. This may lead to user issues and nodes may crash with the "No space left on device" error.

We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize) from which and up to which the WAL archive will be cleared, i.e. sets the size of the WAL archive that will always be on the node. I propose to replace this system property with the  DataStorageConfiguration#getWalArchiveSize in bytes, the default is (getMaxWalArchiveSize * 0.5) as it is now.

Main proposal:
When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel and do not give the reservation of the WAL segments until we reach DataStorageConfiguration#getWalArchiveSize. In this case, if there is no segment for historical rebalance, we will automatically switch to full rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by ткаленко кирилл <tk...@yandex.ru>.

Created the second task by this discussion, IGNITE-14952.

17.06.2021, 14:26, "ткаленко кирилл" <tk...@yandex.ru>:
> Created the first task by this discussion IGNITE-14923.
>
> 13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
>>  What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
>>  It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
>>  In any case, I think we're on the same page.
>>
>>>   On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>>
>>>   Stan
>>>
>>>>   If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>
>>>   Why does the condition "archive size is less than min" lead to system
>>>   degradation? Actually, the described case is a normal situation for
>>>   brand new clusters.
>>>
>>>   I'm okay with the proposed minWalArchiveSize property. Looks like
>>>   relatively understandable property.
>>>
>>>   On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>>   <st...@gmail.com> wrote:
>>>>   Discuss this with Kirill verbally.
>>>>
>>>>   Kirill showed me that having the min threshold doesn't quite work.
>>>>   It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>>
>>>>   For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>>>   Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>>   Now, say we're doing historical rebalance and reserve the WAL archive.
>>>>   The WAL archive starts growing and soon it occupies 2 GB.
>>>>   Now what?
>>>>   We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>>>   But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>>>   there is no meaningful point the system can use as a "minimum" WAL size.
>>>>
>>>>   I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>>>   after drawing this on paper.
>>>>
>>>>   I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>>
>>>>   I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>>>   with the behavior as initially described by Kirill.
>>>>
>>>>   Stan
>>>>
>>>>>   On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>
>>>>>   Stas hello!
>>>>>
>>>>>   I didn't quite get your last idea.
>>>>>   What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>>
>>>>>   06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>>>   An interesting suggestion I heard today.
>>>>>>
>>>>>>   The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>>
>>>>>>   I think this makes perfect sense from the user point of view.
>>>>>>   "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>>
>>>>>>   Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>>   Perhaps we can actually implement this?
>>>>>>
>>>>>>   Thanks,
>>>>>>   Stan
>>>>>>
>>>>>>>   On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>>
>>>>>>>   +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>>   +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>>
>>>>>>>   I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>>>   I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>>>   The archive size at all times should be between min and max.
>>>>>>>   If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>>>   I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>>
>>>>>>>   Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>   Stan
>>>>>>>
>>>>>>>>   On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>>
>>>>>>>>   Hello, Kirill
>>>>>>>>
>>>>>>>>   +1 for this change, however, there are too many configuration settings
>>>>>>>>   that exist for the user to configure Ignite cluster. It is better to
>>>>>>>>   keep the options that we already have and fix the behaviour of the
>>>>>>>>   rebalance process as you suggested.
>>>>>>>>
>>>>>>>>   On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>>>   Hi Ilya!
>>>>>>>>>
>>>>>>>>>   Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>>
>>>>>>>>>   04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>>>   Hello!
>>>>>>>>>>
>>>>>>>>>>   Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>>>   write throttling?
>>>>>>>>>>
>>>>>>>>>>   So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>>
>>>>>>>>>>   Regards,
>>>>>>>>>>   --
>>>>>>>>>>   Ilya Kasnacheev
>>>>>>>>>>
>>>>>>>>>>   вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>>
>>>>>>>>>>>   Hello everybody!
>>>>>>>>>>>
>>>>>>>>>>>   At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>>>   historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>>>   archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>>>   all cache groups is over.
>>>>>>>>>>>
>>>>>>>>>>>   If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>>>   significantly exceed limits set in
>>>>>>>>>>>   DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>>   complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>>>   space left on device" error.
>>>>>>>>>>>
>>>>>>>>>>>   We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>>   default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>>>   from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>>>   size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>>>   replace this system property with the
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>>   (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>>
>>>>>>>>>>>   Main proposal:
>>>>>>>>>>>   When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>>>   and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>>>   segment for historical rebalance, we will automatically switch to full
>>>>>>>>>>>   rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by ткаленко кирилл <tk...@yandex.ru>.

Created the first task by this discussion IGNITE-14923.

13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
> What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
> It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
> In any case, I think we're on the same page.
>
>>  On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>
>>  Stan
>>
>>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>
>>  Why does the condition "archive size is less than min" lead to system
>>  degradation? Actually, the described case is a normal situation for
>>  brand new clusters.
>>
>>  I'm okay with the proposed minWalArchiveSize property. Looks like
>>  relatively understandable property.
>>
>>  On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>  <st...@gmail.com> wrote:
>>>  Discuss this with Kirill verbally.
>>>
>>>  Kirill showed me that having the min threshold doesn't quite work.
>>>  It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>
>>>  For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>>  Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>  Now, say we're doing historical rebalance and reserve the WAL archive.
>>>  The WAL archive starts growing and soon it occupies 2 GB.
>>>  Now what?
>>>  We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>>  But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>>  there is no meaningful point the system can use as a "minimum" WAL size.
>>>
>>>  I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>>  after drawing this on paper.
>>>
>>>  I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>
>>>  I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>>  with the behavior as initially described by Kirill.
>>>
>>>  Stan
>>>
>>>>  On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>
>>>>  Stas hello!
>>>>
>>>>  I didn't quite get your last idea.
>>>>  What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>
>>>>  06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>>  An interesting suggestion I heard today.
>>>>>
>>>>>  The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>
>>>>>  I think this makes perfect sense from the user point of view.
>>>>>  "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>
>>>>>  Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>  Perhaps we can actually implement this?
>>>>>
>>>>>  Thanks,
>>>>>  Stan
>>>>>
>>>>>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>
>>>>>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>
>>>>>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>>  The archive size at all times should be between min and max.
>>>>>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>>  I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>
>>>>>>  Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>
>>>>>>  Thanks,
>>>>>>  Stan
>>>>>>
>>>>>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>
>>>>>>>  Hello, Kirill
>>>>>>>
>>>>>>>  +1 for this change, however, there are too many configuration settings
>>>>>>>  that exist for the user to configure Ignite cluster. It is better to
>>>>>>>  keep the options that we already have and fix the behaviour of the
>>>>>>>  rebalance process as you suggested.
>>>>>>>
>>>>>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>>  Hi Ilya!
>>>>>>>>
>>>>>>>>  Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>
>>>>>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>>  Hello!
>>>>>>>>>
>>>>>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>>  write throttling?
>>>>>>>>>
>>>>>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>
>>>>>>>>>  Regards,
>>>>>>>>>  --
>>>>>>>>>  Ilya Kasnacheev
>>>>>>>>>
>>>>>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>
>>>>>>>>>>  Hello everybody!
>>>>>>>>>>
>>>>>>>>>>  At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>>  all cache groups is over.
>>>>>>>>>>
>>>>>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>>  significantly exceed limits set in
>>>>>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>>  space left on device" error.
>>>>>>>>>>
>>>>>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>>  size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>>  replace this system property with the
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>
>>>>>>>>>>  Main proposal:
>>>>>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>>  segment for historical rebalance, we will automatically switch to full
>>>>>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Stanislav Lukyanov <st...@gmail.com>.

What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :) 
In any case, I think we're on the same page.


> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
> 
> Stan
> 
>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
> 
> Why does the condition "archive size is less than min" lead to system
> degradation? Actually, the described case is a normal situation for
> brand new clusters.
> 
> I'm okay with the proposed minWalArchiveSize property. Looks like
> relatively understandable property.
> 
> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
> <st...@gmail.com> wrote:
>> 
>> Discuss this with Kirill verbally.
>> 
>> Kirill showed me that having the min threshold doesn't quite work.
>> It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>> 
>> For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>> Now, say we're doing historical rebalance and reserve the WAL archive.
>> The WAL archive starts growing and soon it occupies 2 GB.
>> Now what?
>> We're supposed to give up WAL reservations and start agressively removing WAL archive.
>> But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>> there is no meaningful point the system can use as a "minimum" WAL size.
>> 
>> I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>> after drawing this on paper.
>> 
>> 
>> I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>> 
>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>> with the behavior as initially described by Kirill.
>> 
>> Stan
>> 
>> 
>>> On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>> 
>>> Stas hello!
>>> 
>>> I didn't quite get your last idea.
>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>> 
>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>> An interesting suggestion I heard today.
>>>> 
>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>> 
>>>> I think this makes perfect sense from the user point of view.
>>>> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>> 
>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>> Perhaps we can actually implement this?
>>>> 
>>>> Thanks,
>>>> Stan
>>>> 
>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>> 
>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>> 
>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>> I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>> The archive size at all times should be between min and max.
>>>>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>> I think these rules are intuitively understood from the "min" and "max" names.
>>>>> 
>>>>> Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>> 
>>>>> Thanks,
>>>>> Stan
>>>>> 
>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>> 
>>>>>> Hello, Kirill
>>>>>> 
>>>>>> +1 for this change, however, there are too many configuration settings
>>>>>> that exist for the user to configure Ignite cluster. It is better to
>>>>>> keep the options that we already have and fix the behaviour of the
>>>>>> rebalance process as you suggested.
>>>>>> 
>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>> Hi Ilya!
>>>>>>> 
>>>>>>> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>> 
>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>> Hello!
>>>>>>>> 
>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>> write throttling?
>>>>>>>> 
>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>> 
>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>> 
>>>>>>>>> Hello everybody!
>>>>>>>>> 
>>>>>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>> all cache groups is over.
>>>>>>>>> 
>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>> significantly exceed limits set in
>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>> space left on device" error.
>>>>>>>>> 
>>>>>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>>>>>> replace this system property with the
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>> 
>>>>>>>>> Main proposal:
>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>> and do not give the reservation of the WAL segments until we reach
>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>> segment for historical rebalance, we will automatically switch to full
>>>>>>>>> rebalance.
>>

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Andrey Gura <ag...@apache.org>.

Stan

> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).

Why does the condition "archive size is less than min" lead to system
degradation? Actually, the described case is a normal situation for
brand new clusters.

I'm okay with the proposed minWalArchiveSize property. Looks like
relatively understandable property.

On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
<st...@gmail.com> wrote:
>
> Discuss this with Kirill verbally.
>
> Kirill showed me that having the min threshold doesn't quite work.
> It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>
> For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
> Now, say we're doing historical rebalance and reserve the WAL archive.
> The WAL archive starts growing and soon it occupies 2 GB.
> Now what?
> We're supposed to give up WAL reservations and start agressively removing WAL archive.
> But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
> there is no meaningful point the system can use as a "minimum" WAL size.
>
> I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
> after drawing this on paper.
>
>
> I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>
> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
> with the behavior as initially described by Kirill.
>
> Stan
>
>
> > On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
> >
> > Stas hello!
> >
> > I didn't quite get your last idea.
> > What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
> >
> > 06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
> >> An interesting suggestion I heard today.
> >>
> >> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
> >>
> >> I think this makes perfect sense from the user point of view.
> >> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
> >>
> >> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
> >> Perhaps we can actually implement this?
> >>
> >> Thanks,
> >> Stan
> >>
> >>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
> >>>
> >>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
> >>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
> >>>
> >>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
> >>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
> >>>  The archive size at all times should be between min and max.
> >>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
> >>>  I think these rules are intuitively understood from the "min" and "max" names.
> >>>
> >>>  Ilya's suggestion about throttling is great although I'd do this in a different ticket.
> >>>
> >>>  Thanks,
> >>>  Stan
> >>>
> >>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
> >>>>
> >>>>  Hello, Kirill
> >>>>
> >>>>  +1 for this change, however, there are too many configuration settings
> >>>>  that exist for the user to configure Ignite cluster. It is better to
> >>>>  keep the options that we already have and fix the behaviour of the
> >>>>  rebalance process as you suggested.
> >>>>
> >>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
> >>>>>  Hi Ilya!
> >>>>>
> >>>>>  Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
> >>>>>
> >>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
> >>>>>>  Hello!
> >>>>>>
> >>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
> >>>>>>  write throttling?
> >>>>>>
> >>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
> >>>>>>
> >>>>>>  Regards,
> >>>>>>  --
> >>>>>>  Ilya Kasnacheev
> >>>>>>
> >>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
> >>>>>>
> >>>>>>>  Hello everybody!
> >>>>>>>
> >>>>>>>  At the moment, if there are partitions for the rebalance for which the
> >>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
> >>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
> >>>>>>>  all cache groups is over.
> >>>>>>>
> >>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
> >>>>>>>  significantly exceed limits set in
> >>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
> >>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
> >>>>>>>  space left on device" error.
> >>>>>>>
> >>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> >>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> >>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets the
> >>>>>>>  size of the WAL archive that will always be on the node. I propose to
> >>>>>>>  replace this system property with the
> >>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> >>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
> >>>>>>>
> >>>>>>>  Main proposal:
> >>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> >>>>>>>  and do not give the reservation of the WAL segments until we reach
> >>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> >>>>>>>  segment for historical rebalance, we will automatically switch to full
> >>>>>>>  rebalance.
>

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Stanislav Lukyanov <st...@gmail.com>.

Discuss this with Kirill verbally.

Kirill showed me that having the min threshold doesn't quite work.
It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.

For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
Now, say we're doing historical rebalance and reserve the WAL archive.
The WAL archive starts growing and soon it occupies 2 GB.
Now what?
We're supposed to give up WAL reservations and start agressively removing WAL archive.
But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
there is no meaningful point the system can use as a "minimum" WAL size.

I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
after drawing this on paper.


I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.

I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
with the behavior as initially described by Kirill.

Stan


> On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
> 
> Stas hello!
> 
> I didn't quite get your last idea. 
> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
> 
> 06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>> An interesting suggestion I heard today.
>> 
>> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>> 
>> I think this makes perfect sense from the user point of view.
>> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>> 
>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>> Perhaps we can actually implement this?
>> 
>> Thanks,
>> Stan
>> 
>>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>> 
>>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>> 
>>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>  The archive size at all times should be between min and max.
>>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>  I think these rules are intuitively understood from the "min" and "max" names.
>>> 
>>>  Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>> 
>>>  Thanks,
>>>  Stan
>>> 
>>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>> 
>>>>  Hello, Kirill
>>>> 
>>>>  +1 for this change, however, there are too many configuration settings
>>>>  that exist for the user to configure Ignite cluster. It is better to
>>>>  keep the options that we already have and fix the behaviour of the
>>>>  rebalance process as you suggested.
>>>> 
>>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>  Hi Ilya!
>>>>> 
>>>>>  Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>> 
>>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>  Hello!
>>>>>> 
>>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>  write throttling?
>>>>>> 
>>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>> 
>>>>>>  Regards,
>>>>>>  --
>>>>>>  Ilya Kasnacheev
>>>>>> 
>>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>> 
>>>>>>>  Hello everybody!
>>>>>>> 
>>>>>>>  At the moment, if there are partitions for the rebalance for which the
>>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>  all cache groups is over.
>>>>>>> 
>>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>  significantly exceed limits set in
>>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>  space left on device" error.
>>>>>>> 
>>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>  size of the WAL archive that will always be on the node. I propose to
>>>>>>>  replace this system property with the
>>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>> 
>>>>>>>  Main proposal:
>>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>  segment for historical rebalance, we will automatically switch to full
>>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by ткаленко кирилл <tk...@yandex.ru>.

Stas hello!

I didn't quite get your last idea. 
What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?

06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
> An interesting suggestion I heard today.
>
> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>
> I think this makes perfect sense from the user point of view.
> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>
> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
> Perhaps we can actually implement this?
>
> Thanks,
> Stan
>
>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>
>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>
>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>  The archive size at all times should be between min and max.
>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>  I think these rules are intuitively understood from the "min" and "max" names.
>>
>>  Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>
>>  Thanks,
>>  Stan
>>
>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>
>>>  Hello, Kirill
>>>
>>>  +1 for this change, however, there are too many configuration settings
>>>  that exist for the user to configure Ignite cluster. It is better to
>>>  keep the options that we already have and fix the behaviour of the
>>>  rebalance process as you suggested.
>>>
>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>  Hi Ilya!
>>>>
>>>>  Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>
>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>  Hello!
>>>>>
>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>  write throttling?
>>>>>
>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>
>>>>>  Regards,
>>>>>  --
>>>>>  Ilya Kasnacheev
>>>>>
>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>
>>>>>>  Hello everybody!
>>>>>>
>>>>>>  At the moment, if there are partitions for the rebalance for which the
>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>  all cache groups is over.
>>>>>>
>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
>>>>>>  significantly exceed limits set in
>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
>>>>>>  space left on device" error.
>>>>>>
>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>  size of the WAL archive that will always be on the node. I propose to
>>>>>>  replace this system property with the
>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>
>>>>>>  Main proposal:
>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>  segment for historical rebalance, we will automatically switch to full
>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Stanislav Lukyanov <st...@gmail.com>.

An interesting suggestion I heard today.

The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!

I think this makes perfect sense from the user point of view.
"I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".

Do we have checkpoint timestamp stored anywhere? (cp start markers?)
Perhaps we can actually implement this?

Thanks,
Stan


> On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
> 
> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
> 
> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
> I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
> The archive size at all times should be between min and max.
> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
> I think these rules are intuitively understood from the "min" and "max" names.
> 
> Ilya's suggestion about throttling is great although I'd do this in a different ticket.
> 
> Thanks,
> Stan
> 
>> On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>> 
>> Hello, Kirill
>> 
>> +1 for this change, however, there are too many configuration settings
>> that exist for the user to configure Ignite cluster. It is better to
>> keep the options that we already have and fix the behaviour of the
>> rebalance process as you suggested.
>> 
>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>> 
>>> Hi Ilya!
>>> 
>>> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>> 
>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>> Hello!
>>>> 
>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>> write throttling?
>>>> 
>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>> 
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>> 
>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>> 
>>>>> Hello everybody!
>>>>> 
>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>> all cache groups is over.
>>>>> 
>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>> significantly exceed limits set in
>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>>> space left on device" error.
>>>>> 
>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>> from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>> replace this system property with the
>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>> 
>>>>> Main proposal:
>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>> and do not give the reservation of the WAL segments until we reach
>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>> segment for historical rebalance, we will automatically switch to full
>>>>> rebalance.
>

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Stanislav Lukyanov <st...@gmail.com>.

+1 to cancel WAL reservation on reaching getMaxWalArchiveSize
+1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE

I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
The archive size at all times should be between min and max.
If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
I think these rules are intuitively understood from the "min" and "max" names.

Ilya's suggestion about throttling is great although I'd do this in a different ticket.

Thanks,
Stan

> On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
> 
> Hello, Kirill
> 
> +1 for this change, however, there are too many configuration settings
> that exist for the user to configure Ignite cluster. It is better to
> keep the options that we already have and fix the behaviour of the
> rebalance process as you suggested.
> 
> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>> 
>> Hi Ilya!
>> 
>> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>> 
>> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>> Hello!
>>> 
>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>> write throttling?
>>> 
>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>> 
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>> 
>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>> 
>>>> Hello everybody!
>>>> 
>>>> At the moment, if there are partitions for the rebalance for which the
>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>> archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>> all cache groups is over.
>>>> 
>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>> significantly exceed limits set in
>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>> space left on device" error.
>>>> 
>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>> from which and up to which the WAL archive will be cleared, i.e. sets the
>>>> size of the WAL archive that will always be on the node. I propose to
>>>> replace this system property with the
>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>> 
>>>> Main proposal:
>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>> and do not give the reservation of the WAL segments until we reach
>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>> segment for historical rebalance, we will automatically switch to full
>>>> rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Maxim Muzafarov <mm...@apache.org>.

Hello, Kirill

+1 for this change, however, there are too many configuration settings
that exist for the user to configure Ignite cluster. It is better to
keep the options that we already have and fix the behaviour of the
rebalance process as you suggested.

On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>
> Hi Ilya!
>
> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>
> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
> > Hello!
> >
> > Maybe we can have a mechanic here similar (or equal) to checkpoint based
> > write throttling?
> >
> > So we will be throttling for both checkpoint page buffer and WAL limit.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> > вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
> >
> >>  Hello everybody!
> >>
> >>  At the moment, if there are partitions for the rebalance for which the
> >>  historical rebalance will be used, then we reserve segments in the WAL
> >>  archive (we do not allow cleaning the WAL archive) until the rebalance for
> >>  all cache groups is over.
> >>
> >>  If a cluster is under load during the rebalance, WAL archive size may
> >>  significantly exceed limits set in
> >>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
> >>  complete. This may lead to user issues and nodes may crash with the "No
> >>  space left on device" error.
> >>
> >>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> >>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> >>  from which and up to which the WAL archive will be cleared, i.e. sets the
> >>  size of the WAL archive that will always be on the node. I propose to
> >>  replace this system property with the
> >>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> >>  (getMaxWalArchiveSize * 0.5) as it is now.
> >>
> >>  Main proposal:
> >>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> >>  and do not give the reservation of the WAL segments until we reach
> >>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> >>  segment for historical rebalance, we will automatically switch to full
> >>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by ткаленко кирилл <tk...@yandex.ru>.

Hi Ilya!

Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.

04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
> Hello!
>
> Maybe we can have a mechanic here similar (or equal) to checkpoint based
> write throttling?
>
> So we will be throttling for both checkpoint page buffer and WAL limit.
>
> Regards,
> --
> Ilya Kasnacheev
>
> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>
>>  Hello everybody!
>>
>>  At the moment, if there are partitions for the rebalance for which the
>>  historical rebalance will be used, then we reserve segments in the WAL
>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>  all cache groups is over.
>>
>>  If a cluster is under load during the rebalance, WAL archive size may
>>  significantly exceed limits set in
>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>  complete. This may lead to user issues and nodes may crash with the "No
>>  space left on device" error.
>>
>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>  size of the WAL archive that will always be on the node. I propose to
>>  replace this system property with the
>>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>
>>  Main proposal:
>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>  and do not give the reservation of the WAL segments until we reach
>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>  segment for historical rebalance, we will automatically switch to full
>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Maybe we can have a mechanic here similar (or equal) to checkpoint based
write throttling?

So we will be throttling for both checkpoint page buffer and WAL limit.

Regards,
-- 
Ilya Kasnacheev


вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:

> Hello everybody!
>
> At the moment, if there are partitions for the rebalance for which the
> historical rebalance will be used, then we reserve segments in the WAL
> archive (we do not allow cleaning the WAL archive) until the rebalance for
> all cache groups is over.
>
> If a cluster is under load during the rebalance, WAL archive size may
> significantly exceed limits set in
> DataStorageConfiguration#getMaxWalArchiveSize until the process is
> complete. This may lead to user issues and nodes may crash with the "No
> space left on device" error.
>
> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> from which and up to which the WAL archive will be cleared, i.e. sets the
> size of the WAL archive that will always be on the node. I propose to
> replace this system property with the
>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> (getMaxWalArchiveSize * 0.5) as it is now.
>
> Main proposal:
> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> and do not give the reservation of the WAL segments until we reach
> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> segment for historical rebalance, we will automatically switch to full
> rebalance.
>