You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-issues@hadoop.apache.org by "Xiaoqiao He (Jira)" <ji...@apache.org> on 2023/06/14 08:06:00 UTC

[jira] [Commented] (HDFS-17048) FSNamesystem.delete() maybe cause data residue when active namenode crash or shutdown

    [ https://issues.apache.org/jira/browse/HDFS-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732434#comment-17732434 ] 

Xiaoqiao He commented on HDFS-17048:
------------------------------------

[~liuguanghua] Thanks for your report. Sorry I didn't understand this case totally, would you mind to give some way to reproduce this case. IIUC, if meet Active NameNode crash when clean blocks, it will switch to new Active NameNode (failover) to continue clean, if meet both NameNode are crash (HA mode), it will be clean when re-full block report from DataNode, right? Thanks.

> FSNamesystem.delete() maybe cause data residue when active namenode crash  or shutdown 
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-17048
>                 URL: https://issues.apache.org/jira/browse/HDFS-17048
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>         Environment: hdfs3.3
>            Reporter: liuguanghua
>            Priority: Major
>
> Consider the following scenario：
> (1) User delete a hdfs dir with many blocks.
> (2) Then ative Namenode is crash or shutdown or failover to standby Namenode  by administrator
> (3) This may result in residual data
>  
> FSNamesystem.delete() will
> (1)delete dir first
> (2)add toRemovedBlocks into markedDeleteQueue. 
> (3) MarkedDeleteBlockScrubber Thread will consumer the markedDeleteQueue and delete blocks.
> If the active namenode crash, the blocks in markedDeleteQueue will be lost and never be deleted. And the block cloud not find via hdfs fsck command. But it is alive in datanode disk.
>  
> Thus , 
> SummaryA =  hdfs dfs -du -s / 
> SummaryB =sum( datanode report dfsused)
> SummaryA < SummaryB
>  
> This may be unavoidable.  But is there any way to find out the blocks that should be deleted and clean it ?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org