You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by ткаленко кирилл <tk...@yandex.ru> on 2021/06/17 11:26:08 UTC
Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Created the first task by this discussion IGNITE-14923.
13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
> What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
> It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
> In any case, I think we're on the same page.
>
>> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>
>> Stan
>>
>>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>
>> Why does the condition "archive size is less than min" lead to system
>> degradation? Actually, the described case is a normal situation for
>> brand new clusters.
>>
>> I'm okay with the proposed minWalArchiveSize property. Looks like
>> relatively understandable property.
>>
>> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>> <st...@gmail.com> wrote:
>>> Discuss this with Kirill verbally.
>>>
>>> Kirill showed me that having the min threshold doesn't quite work.
>>> It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>
>>> For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>> Now, say we're doing historical rebalance and reserve the WAL archive.
>>> The WAL archive starts growing and soon it occupies 2 GB.
>>> Now what?
>>> We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>> But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>> there is no meaningful point the system can use as a "minimum" WAL size.
>>>
>>> I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>> after drawing this on paper.
>>>
>>> I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>
>>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>> with the behavior as initially described by Kirill.
>>>
>>> Stan
>>>
>>>> On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>
>>>> Stas hello!
>>>>
>>>> I didn't quite get your last idea.
>>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>
>>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>> An interesting suggestion I heard today.
>>>>>
>>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>
>>>>> I think this makes perfect sense from the user point of view.
>>>>> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>
>>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>> Perhaps we can actually implement this?
>>>>>
>>>>> Thanks,
>>>>> Stan
>>>>>
>>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>
>>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>
>>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>> I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>> The archive size at all times should be between min and max.
>>>>>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>> I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>
>>>>>> Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>
>>>>>> Thanks,
>>>>>> Stan
>>>>>>
>>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>
>>>>>>> Hello, Kirill
>>>>>>>
>>>>>>> +1 for this change, however, there are too many configuration settings
>>>>>>> that exist for the user to configure Ignite cluster. It is better to
>>>>>>> keep the options that we already have and fix the behaviour of the
>>>>>>> rebalance process as you suggested.
>>>>>>>
>>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>> Hi Ilya!
>>>>>>>>
>>>>>>>> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>
>>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>> write throttling?
>>>>>>>>>
>>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> --
>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>
>>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>
>>>>>>>>>> Hello everybody!
>>>>>>>>>>
>>>>>>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>> all cache groups is over.
>>>>>>>>>>
>>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>> significantly exceed limits set in
>>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>> space left on device" error.
>>>>>>>>>>
>>>>>>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>> replace this system property with the
>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>
>>>>>>>>>> Main proposal:
>>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>> and do not give the reservation of the WAL segments until we reach
>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>> segment for historical rebalance, we will automatically switch to full
>>>>>>>>>> rebalance.
Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance
Posted by ткаленко кирилл <tk...@yandex.ru>.
Created the second task by this discussion, IGNITE-14952.
17.06.2021, 14:26, "ткаленко кирилл" <tk...@yandex.ru>:
> Created the first task by this discussion IGNITE-14923.
>
> 13.05.2021, 18:37, "Stanislav Lukyanov" <st...@gmail.com>:
>> What I mean by degradation when archive size < min is that, for example, historical rebalance is available for a smaller timespan than expected by the system design.
>> It may not be an issue of course, especially for a new cluster. If "degradation" is the wrong word we can call it "non-steady state" :)
>> In any case, I think we're on the same page.
>>
>>> On 11 May 2021, at 13:18, Andrey Gura <ag...@apache.org> wrote:
>>>
>>> Stan
>>>
>>>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>
>>> Why does the condition "archive size is less than min" lead to system
>>> degradation? Actually, the described case is a normal situation for
>>> brand new clusters.
>>>
>>> I'm okay with the proposed minWalArchiveSize property. Looks like
>>> relatively understandable property.
>>>
>>> On Sun, May 9, 2021 at 7:12 PM Stanislav Lukyanov
>>> <st...@gmail.com> wrote:
>>>> Discuss this with Kirill verbally.
>>>>
>>>> Kirill showed me that having the min threshold doesn't quite work.
>>>> It doesn't work because we no longer know how much WAL we should remove if we reach getMaxWalArchiveSize.
>>>>
>>>> For example, say we have minWalArchiveTimespan=2 hours and maxWalArchiveSize=2GB.
>>>> Say, under normal load on stable topology 2 hours of WAL use 1 GB of space.
>>>> Now, say we're doing historical rebalance and reserve the WAL archive.
>>>> The WAL archive starts growing and soon it occupies 2 GB.
>>>> Now what?
>>>> We're supposed to give up WAL reservations and start agressively removing WAL archive.
>>>> But it is not clear when can we stop removing WAL archive - since last 2 hours of WAL are larger than our maxWalArchiveSize
>>>> there is no meaningful point the system can use as a "minimum" WAL size.
>>>>
>>>> I understand the description above is a bit messy but I believe that whoever is interested in this will understand it
>>>> after drawing this on paper.
>>>>
>>>> I'm giving up on my latest suggestion about time-based minimum. Let's keep it simple.
>>>>
>>>> I suggest the minWalArchiveSize and maxWalArchvieSize properties as the solution,
>>>> with the behavior as initially described by Kirill.
>>>>
>>>> Stan
>>>>
>>>>> On 7 May 2021, at 15:09, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>
>>>>> Stas hello!
>>>>>
>>>>> I didn't quite get your last idea.
>>>>> What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
>>>>>
>>>>> 06.05.2021, 20:00, "Stanislav Lukyanov" <st...@gmail.com>:
>>>>>> An interesting suggestion I heard today.
>>>>>>
>>>>>> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. be a number of seconds instead of a number of bytes!
>>>>>>
>>>>>> I think this makes perfect sense from the user point of view.
>>>>>> "I want to have WAL archive for at least N hours but I have a limit of M gigabytes to store it".
>>>>>>
>>>>>> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
>>>>>> Perhaps we can actually implement this?
>>>>>>
>>>>>> Thanks,
>>>>>> Stan
>>>>>>
>>>>>>> On 6 May 2021, at 14:13, Stanislav Lukyanov <st...@gmail.com> wrote:
>>>>>>>
>>>>>>> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>>>>>> +1 to add a public property to replace IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>>>>>>
>>>>>>> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it the current size? the minimal size? the target size?)
>>>>>>> I suggest to name the property geMintWalArchiveSize. I think that this is exactly what it is - the minimal size of the archive that we want to have.
>>>>>>> The archive size at all times should be between min and max.
>>>>>>> If archive size is less than min or more than max then the system functionality can degrade (e.g. historical rebalance may not work as expected).
>>>>>>> I think these rules are intuitively understood from the "min" and "max" names.
>>>>>>>
>>>>>>> Ilya's suggestion about throttling is great although I'd do this in a different ticket.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Stan
>>>>>>>
>>>>>>>> On 5 May 2021, at 19:25, Maxim Muzafarov <mm...@apache.org> wrote:
>>>>>>>>
>>>>>>>> Hello, Kirill
>>>>>>>>
>>>>>>>> +1 for this change, however, there are too many configuration settings
>>>>>>>> that exist for the user to configure Ignite cluster. It is better to
>>>>>>>> keep the options that we already have and fix the behaviour of the
>>>>>>>> rebalance process as you suggested.
>>>>>>>>
>>>>>>>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tk...@yandex.ru> wrote:
>>>>>>>>> Hi Ilya!
>>>>>>>>>
>>>>>>>>> Then we can greatly reduce the user load on the cluster until the rebalance is over. Which can be critical for the user.
>>>>>>>>>
>>>>>>>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <il...@gmail.com>:
>>>>>>>>>> Hello!
>>>>>>>>>>
>>>>>>>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>>>>>> write throttling?
>>>>>>>>>>
>>>>>>>>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> --
>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>
>>>>>>>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tk...@yandex.ru>:
>>>>>>>>>>
>>>>>>>>>>> Hello everybody!
>>>>>>>>>>>
>>>>>>>>>>> At the moment, if there are partitions for the rebalance for which the
>>>>>>>>>>> historical rebalance will be used, then we reserve segments in the WAL
>>>>>>>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance for
>>>>>>>>>>> all cache groups is over.
>>>>>>>>>>>
>>>>>>>>>>> If a cluster is under load during the rebalance, WAL archive size may
>>>>>>>>>>> significantly exceed limits set in
>>>>>>>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>>>>>> complete. This may lead to user issues and nodes may crash with the "No
>>>>>>>>>>> space left on device" error.
>>>>>>>>>>>
>>>>>>>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
>>>>>>>>>>> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
>>>>>>>>>>> from which and up to which the WAL archive will be cleared, i.e. sets the
>>>>>>>>>>> size of the WAL archive that will always be on the node. I propose to
>>>>>>>>>>> replace this system property with the
>>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>>>>>> (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>>>>>>
>>>>>>>>>>> Main proposal:
>>>>>>>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>>>>>> and do not give the reservation of the WAL segments until we reach
>>>>>>>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>>>>>> segment for historical rebalance, we will automatically switch to full
>>>>>>>>>>> rebalance.