You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by ткаленко кирилл <tk...@yandex.ru> on 2021/06/17 11:26:08 UTC

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Created the first task by this discussion IGNITE-14923.

13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
> What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
> It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
> In any case, I think we're on the same page.
>
>>  On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>
>>  Stan
>>
>>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>
>>  Why does the condition "archive size is less than min" lead to system
>>  degradation? Actually, the described case is a normal situation for
>>  brand new clusters.
>>
>>  I'm okay with the proposed minWalArchiveSize property. Looks like
>>  relatively understandable property.
>>
>>  On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>  <st...@gmail.com> wrote:
>>>  Discuss this with Kirill verbally.
>>>
>>>  Kirill showed me that having the min threshold doesn't quite work.
>>>  It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>
>>>  For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>>  Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>  Now, say we're doing historical rebalance and reserve the WAL archive.
>>>  The WAL archive starts growing and soon it occupies 2 GB.
>>>  Now what?
>>>  We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>>  But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>>  there is no meaningful point the system can use as a "minimum" WAL size.
>>>
>>>  I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>>  after drawing this on paper.
>>>
>>>  I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>
>>>  I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>>  with the behavior as initially described by Kirill.
>>>
>>>  Stan
>>>
>>>>  On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>
>>>>  Stas hello!
>>>>
>>>>  I didn't quite get your last idea.
>>>>  What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>
>>>>  06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>>  An interesting suggestion I heard today.
>>>>>
>>>>>  The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>
>>>>>  I think this makes perfect sense from the user point of view.
>>>>>  "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>
>>>>>  Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>  Perhaps we can actually implement this?
>>>>>
>>>>>  Thanks,
>>>>>  Stan
>>>>>
>>>>>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>
>>>>>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>  +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>
>>>>>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>>  I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>>  The archive size at all times should be between min and max.
>>>>>>  If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>>  I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>
>>>>>>  Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>
>>>>>>  Thanks,
>>>>>>  Stan
>>>>>>
>>>>>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>
>>>>>>>  Hello, Kirill
>>>>>>>
>>>>>>>  +1 for this change, however, there are too many configuration settings
>>>>>>>  that exist for the user to configure Ignite cluster. It is better to
>>>>>>>  keep the options that we already have and fix the behaviour of the
>>>>>>>  rebalance process as you suggested.
>>>>>>>
>>>>>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>>  Hi Ilya!
>>>>>>>>
>>>>>>>>  Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>
>>>>>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>>  Hello!
>>>>>>>>>
>>>>>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>>  write throttling?
>>>>>>>>>
>>>>>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>
>>>>>>>>>  Regards,
>>>>>>>>>  --
>>>>>>>>>  Ilya Kasnacheev
>>>>>>>>>
>>>>>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>
>>>>>>>>>>  Hello everybody!
>>>>>>>>>>
>>>>>>>>>>  At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>>  all cache groups is over.
>>>>>>>>>>
>>>>>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>>  significantly exceed limits set in
>>>>>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>>  space left on device" error.
>>>>>>>>>>
>>>>>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>  default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>>  size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>>  replace this system property with the
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>
>>>>>>>>>>  Main proposal:
>>>>>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>>  segment for historical rebalance, we will automatically switch to full
>>>>>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Posted by ткаленко кирилл <tk...@yandex.ru>.
Created the second task by this discussion, IGNITE-14952.

17.06.2021, 14:26, "ткаленко кирилл" <tk...@yandex.ru>:
> Created the first task by this discussion IGNITE-14923.
>
> 13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
>>  What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
>>  It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
>>  In any case, I think we're on the same page.
>>
>>>   On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>>
>>>   Stan
>>>
>>>>   If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>
>>>   Why does the condition "archive size is less than min" lead to system
>>>   degradation? Actually, the described case is a normal situation for
>>>   brand new clusters.
>>>
>>>   I'm okay with the proposed minWalArchiveSize property. Looks like
>>>   relatively understandable property.
>>>
>>>   On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>>   <st...@gmail.com> wrote:
>>>>   Discuss this with Kirill verbally.
>>>>
>>>>   Kirill showed me that having the min threshold doesn't quite work.
>>>>   It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>>
>>>>   For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>>>   Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>>   Now, say we're doing historical rebalance and reserve the WAL archive.
>>>>   The WAL archive starts growing and soon it occupies 2 GB.
>>>>   Now what?
>>>>   We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>>>   But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>>>   there is no meaningful point the system can use as a "minimum" WAL size.
>>>>
>>>>   I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>>>   after drawing this on paper.
>>>>
>>>>   I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>>
>>>>   I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>>>   with the behavior as initially described by Kirill.
>>>>
>>>>   Stan
>>>>
>>>>>   On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>
>>>>>   Stas hello!
>>>>>
>>>>>   I didn't quite get your last idea.
>>>>>   What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>>
>>>>>   06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>>>   An interesting suggestion I heard today.
>>>>>>
>>>>>>   The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>>
>>>>>>   I think this makes perfect sense from the user point of view.
>>>>>>   "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>>
>>>>>>   Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>>   Perhaps we can actually implement this?
>>>>>>
>>>>>>   Thanks,
>>>>>>   Stan
>>>>>>
>>>>>>>   On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>>
>>>>>>>   +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>>   +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>>
>>>>>>>   I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>>>   I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>>>   The archive size at all times should be between min and max.
>>>>>>>   If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>>>   I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>>
>>>>>>>   Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>   Stan
>>>>>>>
>>>>>>>>   On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>>
>>>>>>>>   Hello, Kirill
>>>>>>>>
>>>>>>>>   +1 for this change, however, there are too many configuration settings
>>>>>>>>   that exist for the user to configure Ignite cluster. It is better to
>>>>>>>>   keep the options that we already have and fix the behaviour of the
>>>>>>>>   rebalance process as you suggested.
>>>>>>>>
>>>>>>>>   On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>>>   Hi Ilya!
>>>>>>>>>
>>>>>>>>>   Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>>
>>>>>>>>>   04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>>>   Hello!
>>>>>>>>>>
>>>>>>>>>>   Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>>>   write throttling?
>>>>>>>>>>
>>>>>>>>>>   So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>>
>>>>>>>>>>   Regards,
>>>>>>>>>>   --
>>>>>>>>>>   Ilya Kasnacheev
>>>>>>>>>>
>>>>>>>>>>   вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>>
>>>>>>>>>>>   Hello everybody!
>>>>>>>>>>>
>>>>>>>>>>>   At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>>>   historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>>>   archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>>>   all cache groups is over.
>>>>>>>>>>>
>>>>>>>>>>>   If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>>>   significantly exceed limits set in
>>>>>>>>>>>   DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>>   complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>>>   space left on device" error.
>>>>>>>>>>>
>>>>>>>>>>>   We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>>   default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>>>   from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>>>   size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>>>   replace this system property with the
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>>   (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>>
>>>>>>>>>>>   Main proposal:
>>>>>>>>>>>   When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>>>   and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>>   DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>>>   segment for historical rebalance, we will automatically switch to full
>>>>>>>>>>>   rebalance.