You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/10/08 08:47:49 UTC

[GitHub] [ozone] Xushaohong opened a new pull request, #3811: HDDS-7300. Conflict between full data scan and block deletion

Xushaohong opened a new pull request, #3811:
URL: https://github.com/apache/ozone/pull/3811

   ## What changes were proposed in this pull request?
   
   We have enabled the full data scan and found that one container is marked as unhealthy due to the conflict between full data scan and block deletion.
   
   The block deleting service first deletes the block and then updates the DB, while the data scan first scans the DB and then checks the existence of the blocks. 
   
   Once getting the DB record and finds the block not existing in the FS, the `Missing chunk file exception` will be thrown and the container will be marked as unhealthy.
   
    
   
   The block deleting service has a write lock during the process but the data scan has no read lock to avoid the conflict.
   
   Even by double checking the block if the block is still in the block-data table when the block is not found on the FS for the first time, the problem still happens. The flush time of DB batch operation is not predictable, so the direct second retrieval may not be a good solution as we cannot determine a fixed delay that could guarantee every batch could be flushed after this delay.
   
   This PR adds a read lock in full data scan to avoid concurrent RW the block-data table.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7300
   
   ## How was this patch tested?
   /
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992076932


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);

Review Comment:
   It would be better to abstract a method which takes `BlockData` as input and do the verification.
   So we can make unit tests easier to write for this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992007985


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);
+        // In EC, client may write empty putBlock in padding block nodes.
+        // So, we need to make sure, chunk length > 0, before declaring
+        // the missing chunk file.
+        if (bdata != null && bdata.getChunks().size() > 0 && bdata
+            .getChunks().get(0).getLen() > 0) {
+          throw new IOException(

Review Comment:
   Can we check `getBlockDataFromDBWithLock() == null` here before throwing exception?
   
   PS: Do we need a lock to check DB for the second time here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Galsza commented on pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Galsza commented on PR #3811:
URL: https://github.com/apache/ozone/pull/3811#issuecomment-1275975502

   @adoroszlai could you restart the failing jobs here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992028884


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);
+        // In EC, client may write empty putBlock in padding block nodes.
+        // So, we need to make sure, chunk length > 0, before declaring
+        // the missing chunk file.
+        if (bdata != null && bdata.getChunks().size() > 0 && bdata
+            .getChunks().get(0).getLen() > 0) {
+          throw new IOException(

Review Comment:
   I mean something like this, (yes the lock is necessary):
   
   ```suggestion
             if (getBlockDataFromDBWithLock(db, block) == null) {
               if (LOG.isDebugEnabled()) {
                 LOG.debug("Scanned outdated blockData {} in container {}.",
                     block, containerID);
               }
             } else {
               throw new IOException(
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r993060584


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -851,7 +852,15 @@ public boolean scanData(DataTransferThrottler throttler, Canceler canceler) {
         new KeyValueContainerCheck(containerData.getMetadataPath(), config,
             containerId, containerData.getVolume());
 
-    return checker.fullCheck(throttler, canceler);
+    // Lock here avoids concurrent RW the block-data table by ongoing
+    // block deletion and the checker.
+    readLock();

Review Comment:
   Yes we have encountered this problem in our testing environment on  c93d0ea6f9bd82529639ff622bbcdbd3a6f4c659. The BlockDeletingService is throttled by the lock, and reports warning like this 
   ```
   2022-10-12 14:42:08,632 [BlockDeletingService#4] WARN org.apache.hadoop.hdds.utils.BackgroundService: BlockDeletingService Background task execution took 3103753128975ns > 300000000000ns(timeout)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on PR #3811:
URL: https://github.com/apache/ozone/pull/3811#issuecomment-1274275285

   @adoroszlai  
   Thx for the notice!  I have submitted a new commit.  Ideally, we don't hold the lock for checking the container, only when an exception happens do we double-check if the block data still exists. PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992178348


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);

Review Comment:
   Could you give an example?  
    Seems **getBlockDataFromDB(WithLock)** is already doing the retrieval and verification, and the logic to deal with the returned block data is different. If it returns null, meaning the concurrent modification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3811:
URL: https://github.com/apache/ozone/pull/3811#issuecomment-1278932248

   Thanks @Xushaohong, @JacksonYao287 for the patches, @errose28, @Galsza, @kaijchen for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3811: HDDS-7300. Conflict between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3811:
URL: https://github.com/apache/ozone/pull/3811#issuecomment-1272302698

   "Race condition" might be a more accurate word than "Conflict" in title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Galsza commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Galsza commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r991288472


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -851,7 +852,15 @@ public boolean scanData(DataTransferThrottler throttler, Canceler canceler) {
         new KeyValueContainerCheck(containerData.getMetadataPath(), config,
             containerId, containerData.getVolume());
 
-    return checker.fullCheck(throttler, canceler);
+    // Lock here avoids concurrent RW the block-data table by ongoing
+    // block deletion and the checker.
+    readLock();

Review Comment:
   A side effect of this change could be that we acquire the readlock on a container for 20+ minutes. It will NOT cause a problem however, as this is a closed container and the block will be deleted eventually anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r995294754


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,98 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(block, throttler, canceler);
+        } catch (IOException ex) {

Review Comment:
   there are two case we can catch IOException here, one is chunkFile not found , the other is verifyChecksum fail。so we need to check if ex is an instance of OzoneChecksumException. if yes, it means verifyChecksum error , we just throw it directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r994046535


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,98 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);

Review Comment:
   Good idea, done~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r993465720


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,98 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);

Review Comment:
   Do we need to wrap `ex`?  How about simply `throw ex`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r995309558


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,98 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(block, throttler, canceler);
+        } catch (IOException ex) {

Review Comment:
   Thx for the suggestion! We could catch the checksum exception in advance to avoid wasting lock



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992080035


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);
+        // In EC, client may write empty putBlock in padding block nodes.
+        // So, we need to make sure, chunk length > 0, before declaring
+        // the missing chunk file.
+        if (bdata != null && bdata.getChunks().size() > 0 && bdata
+            .getChunks().get(0).getLen() > 0) {
+          throw new IOException(

Review Comment:
   Ok, got your point, This is not reasonable. The deletion could happen when doing checksum, so we don't need to add the check here but after checksum instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai merged pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
adoroszlai merged PR #3811:
URL: https://github.com/apache/ozone/pull/3811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Galsza commented on pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Galsza commented on PR #3811:
URL: https://github.com/apache/ozone/pull/3811#issuecomment-1273314712

   Looking good to me, thank you for this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] Xushaohong commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992017313


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);
+        // In EC, client may write empty putBlock in padding block nodes.
+        // So, we need to make sure, chunk length > 0, before declaring
+        // the missing chunk file.
+        if (bdata != null && bdata.getChunks().size() > 0 && bdata
+            .getChunks().get(0).getLen() > 0) {
+          throw new IOException(

Review Comment:
   
   > Can we check `getBlockDataFromDBWithLock() == null` here before throwing exception?
   > 
   No. To minimize the lock granularity,  here add the lock in the exception to double confirm if the block date is outdated. If check it before the exception, it will be as the same effect as what the first commit does. 
   
   > PS: Do we need a lock to check DB for the second time here?
   
   The getBlockDataFromDBWithLock() is acquiring the lock
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992028884


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);
+        // In EC, client may write empty putBlock in padding block nodes.
+        // So, we need to make sure, chunk length > 0, before declaring
+        // the missing chunk file.
+        if (bdata != null && bdata.getChunks().size() > 0 && bdata
+            .getChunks().get(0).getLen() > 0) {
+          throw new IOException(

Review Comment:
   I mean something like this, (yes the lock is necessary):
   
   ```suggestion
             if (getBlockDataFromDBWithLock(db, block) == null) {
               if (LOG.isDebugEnabled()) {
                 LOG.debug("Scanned outdated blockData {} in container {}.",
                     block, containerID);
               }
               break; // skip rest chunks of this block
             } else {
               throw new IOException(
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992076932


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);

Review Comment:
   It would be better to abstract a method which takes `BlockData` as input and do the verification.
   So we can make unit tests easier to write for this issue.
   
   (should be coupled with this https://github.com/apache/ozone/pull/3811/files#r992007985)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3811: HDDS-7300. Race condition between full data scan and block deletion

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3811:
URL: https://github.com/apache/ozone/pull/3811#discussion_r992198466


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java:
##########
@@ -230,44 +232,100 @@ private void scanData(DataTransferThrottler throttler, Canceler canceler)
 
     onDiskContainerData.setDbFile(dbFile);
 
-    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
-
     try (DBHandle db = BlockUtils.getDB(onDiskContainerData, checkConfig);
         BlockIterator<BlockData> kvIter = db.getStore().getBlockIterator(
             onDiskContainerData.getContainerID(),
             onDiskContainerData.getUnprefixedKeyFilter())) {
 
       while (kvIter.hasNext()) {
         BlockData block = kvIter.nextBlock();
-        for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
-          File chunkFile = layout.getChunkFile(onDiskContainerData,
-              block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
-
-          if (!chunkFile.exists()) {
-            // concurrent mutation in Block DB? lookup the block again.
-            String blockKey =
-                onDiskContainerData.blockKey(block.getBlockID().getLocalID());
-            BlockData bdata = db.getStore()
-                    .getBlockDataTable()
-                    .get(blockKey);
-            // In EC, client may write empty putBlock in padding block nodes.
-            // So, we need to make sure, chunk length > 0, before declaring
-            // the missing chunk file.
-            if (bdata != null && bdata.getChunks().size() > 0 && bdata
-                .getChunks().get(0).getLen() > 0) {
-              throw new IOException(
-                  "Missing chunk file " + chunkFile.getAbsolutePath());
+
+        // If holding read lock for the entire duration, including wait() calls
+        // in DataTransferThrottler, would effectively make other threads
+        // throttled.
+        // Here try optimistically and retry with the container lock to
+        // make sure reading the latest record. If the record is just removed,
+        // the block should be skipped to scan.
+        try {
+          scanBlock(db, block, throttler, canceler);
+        } catch (IOException ex) {
+          if (getBlockDataFromDBWithLock(db, block) == null) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Scanned outdated blockData {} in container {}.",
+                  block, containerID);
             }
-          } else if (chunk.getChecksumData().getType()
-              != ContainerProtos.ChecksumType.NONE) {
-            verifyChecksum(block, chunk, chunkFile, layout, throttler,
-                canceler);
+          } else {
+            throw new IOException(ex);
           }
         }
       }
     }
   }
 
+  /**
+   *  Attempt to read the block data without the container lock.
+   *  The block onDisk might be in modification by other thread and not yet
+   *  flushed to DB, so the content might be outdated.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDB(DBHandle db, BlockData block)
+      throws IOException {
+    String blockKey =
+        onDiskContainerData.blockKey(block.getBlockID().getLocalID());
+    return db.getStore().getBlockDataTable().get(blockKey);
+  }
+
+  /**
+   *  Attempt to read the block data with the container lock.
+   *  The container lock ensure the latest DB record could be retrieved, since
+   *  other block related write operation will acquire the container write lock.
+   *
+   * @param db DB of container
+   * @param block last queried blockData
+   * @return blockData in DB
+   * @throws IOException
+   */
+  private BlockData getBlockDataFromDBWithLock(DBHandle db, BlockData block)
+      throws IOException {
+    container.readLock();
+    try {
+      return getBlockDataFromDB(db, block);
+    } finally {
+      container.readUnlock();
+    }
+  }
+
+  private void scanBlock(DBHandle db, BlockData block,
+      DataTransferThrottler throttler, Canceler canceler) throws IOException {
+    ContainerLayoutVersion layout = onDiskContainerData.getLayoutVersion();
+
+    for (ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+      File chunkFile = layout.getChunkFile(onDiskContainerData,
+          block.getBlockID(), ChunkInfo.getFromProtoBuf(chunk));
+
+      if (!chunkFile.exists()) {
+        // concurrent mutation in Block DB? lookup the block again.
+        BlockData bdata = getBlockDataFromDB(db, block);

Review Comment:
   Oops, I didn't realize this is the second time we retrive block data from DB.
   
   In this case, we can just pass different `DBHandle` and `BlockData` to `scanBlock` for the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org