You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/14 10:12:00 UTC

[jira] [Updated] (HDFS-16804) AddVolume contains a race condition with shutdown block pool

     [ https://issues.apache.org/jira/browse/HDFS-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HDFS-16804:
----------------------------------
    Labels: pull-request-available  (was: )

> AddVolume contains a race condition with shutdown block pool
> ------------------------------------------------------------
>
>                 Key: HDFS-16804
>                 URL: https://issues.apache.org/jira/browse/HDFS-16804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>
> Add Volume contains a race condition with shutdown block pool, causing the ReplicaMap still contains some blocks belong to the removed block pool.
> And the new volume still contains one unused BlockPoolSlice belongs to the removed block pool, caused some problems, such as: incorrect dfsUsed, incorrect numBlocks of the volume.
> Let's review the logic of addVolume and shutdownBlockPool respectively.
>  
> AddVolume Logic:
>  * Step1: Get all namespaceInfo from blockPoolManager
>  * Step2: Create one temporary FsVolumeImpl object
>  * Step3: Create some blockPoolSlice according to the namespaceInfo and add them to the temporary FsVolumeImpl object
>  * Step4: Scan all blocks of the namespaceInfo from the volume and store them by one temporary ReplicaMap
>  * Step5: Active the temporary FsVolumeImpl which created before (with FsDatasetImpl synchronized lock)
>  ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global ReplicaMap
>  ** Step5.2: Add the FsVolumeImpl to the volumes
> ShutdownBlockPool Logic:(with blockPool write lock)
>  * Step1: Cleanup the blockPool from the global ReplicaMap
>  * Step2: Shutdown the block pool from all the volumes
>  ** Step2.1: do some clean operations for the block pool, such as saveReplica, saveDfsUsed, etc
>  ** Step2.2: remove the blockPool from bpSlices
> The race condition can be reproduced by the following steps:
>  * AddVolume Step1: Get all namespaceInfo from blockPoolManager
>  * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
>  * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
>  * AddVolume Step 2~5
> And result:
>  * The global replicaMap contains some blocks belong to the removed blockPool
>  * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the removed blockPool
> Expected result:
>  * The global replicaMap shouldn't contain any blocks belong to the removed blockPool
>  * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice belong to the removed blockPool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org