You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "adoroszlai (via GitHub)" <gi...@apache.org> on 2023/02/19 20:55:40 UTC

[GitHub] [ozone] adoroszlai opened a new pull request, #4287: HDDS-7991. Intermittent failure in FS contract tests

adoroszlai opened a new pull request, #4287:
URL: https://github.com/apache/ozone/pull/4287

   ## What changes were proposed in this pull request?
   
   Fix intermittent failures in `ITestOzoneContractRootDir` and `ITestOzoneContractDelete`, which were caused by `KeyManagerImpl#createFakeDirIfShould` due to:
   
   1. `OMKeysDeleteRequest` first marks keys deleted in the cache, removal is propagated to the DB in a background thread.  `createFakeDirIfShould` checks the key table, but not the cache.  Thus it intermittently finds the deleted entry, returning a fake dir that might no longer be needed.
   2. `createFakeDirIfShould` may even return a fake dir for a normal directory that was just deleted.
   
   ```
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneFileSystem.java:mkdirs(761)) - mkdir() path:/testmkdirdepth1 
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneClientAdapterImpl.java:createDirectory(312)) - creating dir for key:testmkdirdepth1
   [OM StateMachine ApplyTransaction Thread - 0] DEBUG (OMDirectoryCreateRequest.java:logResult(323)) - Directory created. Volume:volume21291, Bucket:bucket26900, Key:testmkdirdepth1
   [JUnit-testMkDirDepth1[0]] DEBUG (BasicOzoneFileSystem.java:delete(540)) - Delete path /testmkdirdepth1 - recursive true
   [JUnit-testMkDirDepth1[0]] DEBUG (BasicOzoneFileSystem.java:delete(567)) - delete: Path is a directory: /testmkdirdepth1
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneFileSystem.java:innerDelete(518)) - delete() path:/testmkdirdepth1 recursive:true
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneFileSystem.java:processKey(501)) - deleting key:[testmkdirdepth1/]
   [OM StateMachine ApplyTransaction Thread - 0] DEBUG (OMKeysDeleteRequest.java:validateAndUpdateCache(223)) - Keys delete success. Volume:volume21291, Bucket:bucket26900, Keys:testmkdirdepth1/
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneFileSystem.java:getFileStatus(779)) - getFileStatus() path:/testmkdirdepth1
   [JUnit-testMkDirDepth1[0]] TRACE (BasicOzoneFileSystem.java:pathToKey(1046)) - path for key:/testmkdirdepth1 is:o3fs://bucket26900.volume21291/testmkdirdepth1
   [IPC Server handler 2 on default port 45085] DEBUG (KeyManagerImpl.java:createFakeDirIfShould(1193)) - Returning fake dir testmkdirdepth1/ for /volume21291/bucket26900/testmkdirdepth1/ -> /volume21291/bucket26900/testmkdirdepth1/
   ```
   
   The PR proposes to fix this by:
   
   1. requiring that the item which makes the fake dir necessary is not deleted,
   2. requiring the fake dir to be strictly a prefix of the item, not equal to it.
   
   Also check the cache for any matching entries first.
   
   Also refactor contract tests to use `BeforeParam` and `AfterParam` to hook into the JUnit test lifecycle, instead of restarting the cluster from the test constructor (this turned out not to be the cause of the bug, but worth keeping).
   
   https://issues.apache.org/jira/browse/HDDS-7991
   
   ## How was this patch tested?
   
   Executed the intermittent contract tests repeatedly:
   
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4216890609 (100/100 passed)
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4217247319 (100/100 passed)
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4217317817 (non-FSO only, 100/100 passed)
   
   It probably also fixes HDDS-7940, so also included `TestOzoneFileSystem` and `TestRootedOzoneFileSystem` in one batch:
   
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4217350948 (90/91 passed; one failure in `TestOzoneFileSystem.testListStatusWithIntermediateDir`, which doesn't include delete, hence not affected by this bug/fix)
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4217340348


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #4287:
URL: https://github.com/apache/ozone/pull/4287#discussion_r1111675074


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java:
##########
@@ -1176,24 +1176,51 @@ private OzoneFileStatus getOzoneFileStatus(OmKeyArgs args,
    */
   private OmKeyInfo createFakeDirIfShould(String volume, String bucket,
       String keyName, BucketLayout layout) throws IOException {
-    OmKeyInfo fakeDirKeyInfo = null;
     String dirKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
     String targetKey = OzoneFSUtils.addTrailingSlashIfNeeded(
         metadataManager.getOzoneKey(volume, bucket, keyName));
+
+    Table<String, OmKeyInfo> keyTable = metadataManager.getKeyTable(layout);
+    Iterator<Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>>> cacheIterator =
+        keyTable.cacheIterator();
+    while (cacheIterator.hasNext()) {

Review Comment:
   KeyTableCache is `PartialTableCache`, which is implemented by `ConcurrentHashMap`. Will there be a problem If the iterator visits the entries out of order?
   I think it's fine since the exact cache hit should be filtered in https://github.com/apache/ozone/pull/4287#issuecomment-1436492008
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai merged pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai merged PR #4287:
URL: https://github.com/apache/ozone/pull/4287


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #4287:
URL: https://github.com/apache/ozone/pull/4287#issuecomment-1439726622

   Thanks @kaijchen for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on PR #4287:
URL: https://github.com/apache/ozone/pull/4287#issuecomment-1436492008

   @adoroszlai thanks a lot for the fix!
   
   Directory with the exact name is expected to be filtered here.
   Perhaps it's because of the cache hits but the cached value is `null`? 
   
   https://github.com/apache/ozone/blob/d2e3e7580dff33a262796a95d95f17660aad6f8c/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1111-L1120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #4287:
URL: https://github.com/apache/ozone/pull/4287#issuecomment-1436608298

   > Perhaps it's because of the cache hits but the cached value is `null`?
   
   Yes, that's how the item is marked as deleted.  This is accounted for by the `!isKeyDeleted` condition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on code in PR #4287:
URL: https://github.com/apache/ozone/pull/4287#discussion_r1111706488


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java:
##########
@@ -1176,24 +1176,51 @@ private OzoneFileStatus getOzoneFileStatus(OmKeyArgs args,
    */
   private OmKeyInfo createFakeDirIfShould(String volume, String bucket,
       String keyName, BucketLayout layout) throws IOException {
-    OmKeyInfo fakeDirKeyInfo = null;
     String dirKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
     String targetKey = OzoneFSUtils.addTrailingSlashIfNeeded(
         metadataManager.getOzoneKey(volume, bucket, keyName));
+
+    Table<String, OmKeyInfo> keyTable = metadataManager.getKeyTable(layout);
+    Iterator<Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>>> cacheIterator =
+        keyTable.cacheIterator();
+    while (cacheIterator.hasNext()) {

Review Comment:
   The loop over the cache does not assume any order, iterates until the first match, so it should be OK.
   
   I'm not sure about the effect on performance, it may be better or worse with this.  I'm OK with omitting this first loop completely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #4287: HDDS-7991. Intermittent failure in FS contract tests

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #4287:
URL: https://github.com/apache/ozone/pull/4287#discussion_r1111718406


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java:
##########
@@ -1176,24 +1176,51 @@ private OzoneFileStatus getOzoneFileStatus(OmKeyArgs args,
    */
   private OmKeyInfo createFakeDirIfShould(String volume, String bucket,
       String keyName, BucketLayout layout) throws IOException {
-    OmKeyInfo fakeDirKeyInfo = null;
     String dirKey = OzoneFSUtils.addTrailingSlashIfNeeded(keyName);
     String targetKey = OzoneFSUtils.addTrailingSlashIfNeeded(
         metadataManager.getOzoneKey(volume, bucket, keyName));
+
+    Table<String, OmKeyInfo> keyTable = metadataManager.getKeyTable(layout);
+    Iterator<Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>>> cacheIterator =
+        keyTable.cacheIterator();
+    while (cacheIterator.hasNext()) {

Review Comment:
   Thanks for the confirmation. Let's keep the first loop then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org