You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/10/12 13:31:04 UTC

[GitHub] [ozone] adoroszlai commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

adoroszlai commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r993465720


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,98 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);

Review Comment:
   Do we need to wrap `ex`?  How about simply `throw ex`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org