You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2021/03/09 11:38:00 UTC

[jira] [Created] (HDDS-4935) Ratis server couln't be recovered from failed initialization state

Marton Elek created HDDS-4935:
---------------------------------

             Summary: Ratis server couln't be recovered from failed initialization state
                 Key: HDDS-4935
                 URL: https://issues.apache.org/jira/browse/HDDS-4935
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Marton Elek


I found this problem during the test of ratis 2.0.0-rc3 and earlier.

I noticed that in some cases the Ozone Manager (with ratis enabled true) couldn't be started any more (see HDDS-4703 for details).

After some investigation I found the following problem:

 1. Ratis server initialized BEFORE om RPC (OzoneManager.startRpcServer)
 2. If the RPC server is failed (due to missing DNS for example) the Ratis server is stopped during the initialization
 3. AtomicOutputStream can leave some tmp files behind (like raft-meta.tmp, if it's not yet renamed)
 4. After DNS problem is fixed the OM couldn't be started anymore as RaftStorageImpl.analyzeAndRecoverStorage requires FORMATTED or empty (!!!) directory. Directory with leftover tmp file is not empty.

{code}
  private StorageState analyzeAndRecoverStorage(boolean toLock) throws IOException {
    StorageState storageState = storageDir.analyzeStorage(toLock);
    if (storageState == StorageState.NORMAL) {
        // ...
    } else if (storageState == StorageState.NOT_FORMATTED &&
        storageDir.isCurrentEmpty()) {
     //never called this if one .tmp file exists from the previous attempts
      format();
      return StorageState.NORMAL;
    } else {
      return storageState;
    }
  }
{code}

The problem is that `cleanMetaTmpFile();` is called only in the first branch, but before checking if the directory is empty or not...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org