You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Adam Antal (JIRA)" <ji...@apache.org> on 2018/08/14 09:15:00 UTC
[jira] [Resolved] (HDFS-13031) To detect fsimage corruption on the
spot
[ https://issues.apache.org/jira/browse/HDFS-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Antal resolved HDFS-13031.
-------------------------------
Resolution: Won't Fix
Created HDFS-13818 as follow-up Jira for the issue.
> To detect fsimage corruption on the spot
> ----------------------------------------
>
> Key: HDFS-13031
> URL: https://issues.apache.org/jira/browse/HDFS-13031
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Environment:
> Reporter: Yongjun Zhang
> Assignee: Adam Antal
> Priority: Major
>
> Since we fixed HDFS-9406, there are new cases reported from the field that similar fsimage corruption happens. We need good fsimage + editlogs to replay to reproduce the corruption. However, usually when the corruption is detected (at later NN restart), the good fsimage is already deleted.
> We need to have a way to detect fsimage corruption on the spot. Currently what I think we could do is:
> # after SNN creates a new fsimage, it spawn a new modified NN process (NN with some new command line args) to just load the fsimage and do nothing else.
> # If the process failed, the currently running SNN will do either a) backup the fsimage + editlogs or b) no longer do checkpointing. And it need to somehow raise a flag to user that the fsimage is corrupt.
> In step 2, if we do a, we need to introduce new NN->JN API to backup editlogs; if we do b, it changes SNN's behavior, and kind of not compatible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org