You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Uttsel (Jira)" <ji...@apache.org> on 2023/03/28 12:11:00 UTC

[jira] [Comment Edited] (IGNITE-19104) Late logicalTopology initialization in DistributionZoneManager

    [ https://issues.apache.org/jira/browse/IGNITE-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705856#comment-17705856 ] 

Sergey Uttsel edited comment on IGNITE-19104 at 3/28/23 12:10 PM:
------------------------------------------------------------------

Actually we have a race between async 'logicalTopology' initialization in initDataNodesFromVaultManager() and 'logicalTopology' updating in DistributionZoneManager#watchListener. So we need to initialize 'logicalTopology' sync in DistributionZoneManager#start().

Some additional remarks.
Another issues is a race in updating metastorage data nodes between DistributionZoneManager#start() and DistributionZoneManager#watchListener. It is possible that DistributionZoneManager#watchListener updates 'logicalTopology' and schedules timers with some revision. Then async logic in start() read this revision and updates metastorage data nodes emmediately before timers are expired. This race will be fixed in https://ggsystems.atlassian.net/browse/IGN-21354.


was (Author: sergey uttsel):
Actually we have a race between async 'logicalTopology' initialization in initDataNodesFromVaultManager() and 'logicalTopology' updating in DistributionZoneManager#watchListener. So we need to initialize 'logicalTopology' sync in DistributionZoneManager#start().
Another issues is a race in async invocation of DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage in DistributionZoneManager#start(). This method use zonesChangeTriggerKey(zoneId) as a condition for metastorage invoke. And parallel metastorage invokes in DistributionZoneManager#watchListener which use zoneScaleUpChangeTriggerKey(zoneId)/zoneScaleDownChangeTriggerKey(zoneId) as a condition.
So I think need to sync invoke DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage on DistributionZoneManager#start().

> Late logicalTopology initialization in DistributionZoneManager
> --------------------------------------------------------------
>
>                 Key: IGNITE-19104
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19104
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> DistributionZoneManager run next methods on start
> {code:java}
> initDataNodesFromVaultManager();
> initLogicalTopologyAndVersionInMetaStorageOnStart();
> {code}
> The first method gets logicalTopology from Vault and try to put it into MetaStorage.
> The second one gets logicalTopology from CMG and try to put it into MetaStorage.
> Both methods actually asynchronous, because Vault.get() and TopologyService.logicalTopologyOnLeader() are async.
> There are 2 issues:
> * these methods may run concurrently in separate threads
> * we unconditionally rewrite local volatile field 'logicalTopology'  in initDataNodesFromVaultManager()
> Thus, we may see initial value (empty topology) after DistributionZoneManager.start() finish.
> Also, seems, there is a chance to see stale value from Vault, however a new value was got from config, then rewritten by stale value.
> DistributionZoneManagerConfigurationChangesTest passes, because test Metastorage initialization happens before the DistributionZoneManagerConfigurationChangesTest starts (in reality, they start in different order), 
> and because test initialization seems a bit slower than DistributionZoneManagerConfigurationChangesTest.start().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)