You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/08/06 02:36:10 UTC

[GitHub] [hadoop-ozone] ChenSammi opened a new pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes information…

ChenSammi opened a new pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295


   https://issues.apache.org/jira/browse/HDDS-4037


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-672834932


   Thanks @adoroszlai for review the code. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx edited a comment on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
avijayanhwx edited a comment on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-670551035


   @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier. 
   
   Sure. Currently there are two issues. 
   1.  Container keyCount in datanode side is incorrent.  KeyCount is increased by 1 for every putblock command.  This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout. With FilePerBlock layout, you will find keyCount value is higher than accutal block number. 
   2.  Container keyCount and bytesUsed hold in SCM side is based on the container report data from each datanode.   AbstractContainerReportHandler#updateContainerStats,  currently SCM only updates the in-memory Container data if the reported container data is bigger.  This's true for OPEN containers. But doesn't stand for CLOSED containers. 
   
   In our cluster, there are a lot of containers, all blocks are deleted. But I can still get the keyCount and BytesUsed data through CLI "sh container info" command.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#discussion_r468409907



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -766,6 +773,32 @@ private void handleUnstableContainer(final ContainerInfo container,
 
   }
 
+  /**
+   * Check and update Container key count and used bytes based on it's replica's
+   * data.
+   */
+  private void checkAndUpdateContainerState(final ContainerInfo container,

Review comment:
       I think something along the lines of `updateContainerStatsFromReplicas` would be more descriptive, especially since container state (closed, etc.) is not being changed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-671191882


   > 
   > 
   > @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier.
   > 
   
   Sure. Currently there are two issues.     
    1. Container keyCount in datanode side is incorrent.  KeyCount is increased by 1 for every putblock command.  This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout. With FilePerBlock layout, you will find keyCount value is higher than accutal block number.
   
      2. Container keyCount and bytesUsed hold in SCM side is based on the container report data from each datanode.   AbstractContainerReportHandler#updateContainerStats,  currently SCM only updates the in-memory Container data if the reported container data is bigger.  This's true for OPEN containers. But doesn't stand for CLOSED containers. 
   
    In our cluster, there are a lot of containers, all blocks are deleted. But I can still get the keyCount and BytesUsed data through CLI "sh container info" command.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-671699950


   > However, there is another case where similar problem still exists: if key is so large that it spans multiple blocks, `numberOfKeys` will include each block separately.
   > 
   > Test with patch using default block size of 256MB (after previous 64MB test):
   > 
   > ```
   > $ ozone freon ockg -n1 -t1 -s 536870912 -p 512MB
   > ...
   > Successful executions: 1
   > 
   > $ ozone admin container list
   > {
   >   "state" : "OPEN",
   >   "replicationFactor" : "THREE",
   >   "replicationType" : "RATIS",
   >   "usedBytes" : 600834048,
   >   "numberOfKeys" : 3,
   >   "lastUsed" : "2020-08-10T08:32:43.112Z",
   >   "stateEnterTime" : "2020-08-10T08:22:39.855Z",
   >   "owner" : "f430cf45-c4ba-4271-8b3a-22bd314be60d",
   >   "containerID" : 1,
   >   "deleteTransactionId" : 0,
   >   "sequenceId" : 0,
   >   "open" : true
   > }
   > ```
   > 
   > Here `numberOfKeys` should be 2, but it is 3 = 1 (for 64MB key) + 2 (for 512MB key).
   
   @adoroszlai  thanks for the review and comments.  The keyCount in each container is different from the keyCount from OM point of view.  The keyCount in container is actually the block count(FilePerBlock) or the chunk count(FilePerChunk).  So in the above case, 3 numberOfKeys is an expected value. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] github-actions[bot] commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-672894915


   To re-run CI checks, please follow these steps with the source branch checked out:
   ```
   git commit --allow-empty -m 'trigger new CI check'
   git push
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-671702603


   > I don't think this problem is specific to the FilePerBlock layout. PutBlock may be called for the same block multiple times in both layouts, depending on the buffer flush and block size configuration.
   > 
   > Config:
   > 
   > ```
   > ozone.client.stream.buffer.size=1MB
   > ozone.client.stream.buffer.flush.size=4MB
   > ozone.scm.chunk.size=1MB
   > ozone.scm.container.size=1GB
   > ```
   > 
   > Test with Ozone 0.5.0, before FilePerBlock was introduced:
   > 
   > ```
   > $ ozone freon ockg -n1 -t1 -s 67108864 -p 64MB
   > ...
   > Successful executions: 1
   > 
   > $ ozone scmcli container list
   > {
   >   "state" : "OPEN",
   >   "replicationFactor" : "THREE",
   >   "replicationType" : "RATIS",
   >   "usedBytes" : 67108864,
   >   "numberOfKeys" : 16,
   >   "lastUsed" : 156717083,
   >   "stateEnterTime" : 156589360,
   >   "owner" : "5b4981a0-61f4-42d1-af2d-b990e9a77741",
   >   "containerID" : 1,
   >   "deleteTransactionId" : 0,
   >   "sequenceId" : 0,
   >   "open" : true
   > }
   > ```
   > 
   > I have verified that this case is fixed by the patch.
   
   @adoroszlai , here I think WriteBlock command will be called multiple times instead of putBlock command, right? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx edited a comment on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
avijayanhwx edited a comment on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-670551035


   @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi edited a comment on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi edited a comment on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-671702603


   > I don't think this problem is specific to the FilePerBlock layout. PutBlock may be called for the same block multiple times in both layouts, depending on the buffer flush and block size configuration.
   > 
   > Config:
   > 
   > ```
   > ozone.client.stream.buffer.size=1MB
   > ozone.client.stream.buffer.flush.size=4MB
   > ozone.scm.chunk.size=1MB
   > ozone.scm.container.size=1GB
   > ```
   > 
   > Test with Ozone 0.5.0, before FilePerBlock was introduced:
   > 
   > ```
   > $ ozone freon ockg -n1 -t1 -s 67108864 -p 64MB
   > ...
   > Successful executions: 1
   > 
   > $ ozone scmcli container list
   > {
   >   "state" : "OPEN",
   >   "replicationFactor" : "THREE",
   >   "replicationType" : "RATIS",
   >   "usedBytes" : 67108864,
   >   "numberOfKeys" : 16,
   >   "lastUsed" : 156717083,
   >   "stateEnterTime" : 156589360,
   >   "owner" : "5b4981a0-61f4-42d1-af2d-b990e9a77741",
   >   "containerID" : 1,
   >   "deleteTransactionId" : 0,
   >   "sequenceId" : 0,
   >   "open" : true
   > }
   > ```
   > 
   > I have verified that this case is fixed by the patch.
   
   @adoroszlai , here I think WriteChunk command will be called multiple times instead of putBlock command, right? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-671191882


   > 
   > 
   > @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier.
   > 
   
   Sure. Currently there are two issues.      1. Container keyCount in datanode side is incorrent.  KeyCount is increased by 1 for every putblock command.  This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout. With FilePerBlock layout, you will find keyCount value is higher than accutal block number.
       2. Container keyCount and bytesUsed hold in SCM side is based on the container report data from each datanode.   AbstractContainerReportHandler#updateContainerStats,  currently SCM only updates the in-memory Container data if the reported container data is bigger.  This's true for OPEN containers. But doesn't stand for CLOSED containers. 
    In our cluster, there are a lot of containers, all blocks are deleted. But I can still get the keyCount and BytesUsed data through CLI "sh container info" command.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-670551035


   @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help the review easier. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx edited a comment on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
avijayanhwx edited a comment on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-670551035


   @ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#issuecomment-672894409


   /retest


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi merged pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1295: HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #1295:
URL: https://github.com/apache/hadoop-ozone/pull/1295#discussion_r467721841



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java
##########
@@ -82,7 +83,6 @@
 /**
  * Tests for Block deletion.
  */
-@Ignore
 public class TestBlockDeletion {

Review comment:
       Please see #1121 for an attempt to enable this integration test.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java
##########
@@ -109,8 +109,8 @@ public void handlePendingDeletes(PendingDeleteStatusList deletionStatusList) {
     DatanodeDetails dnDetails = deletionStatusList.getDatanodeDetails();
     for (PendingDeleteStatusList.PendingDeleteStatus deletionStatus :
         deletionStatusList.getPendingDeleteStatuses()) {
-      LOG.info(
-          "Block deletion txnID mismatch in datanode {} for containerID {}."
+      LOG.debug(
+          "Block deletion txnID behinds in datanode {} for containerID {}."

Review comment:
       ```suggestion
             "Block deletion txnID lagging in datanode {} for containerID {}."
   ```

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -312,6 +312,26 @@ private void processContainer(ContainerID id) {
           action -> replicas.stream()
               .noneMatch(r -> r.getDatanodeDetails().equals(action.datanode)));
 
+      if (state == LifeCycleState.CLOSED) {
+        // check container key count and bytes used
+        long maxUsedBytes = 0;
+        long maxKeyCount = 0;
+        ContainerReplica[] rps = replicas.toArray(new ContainerReplica[0]);
+        for (int i = 0; i < rps.length; i++) {
+          maxUsedBytes = Math.max(maxUsedBytes, rps[i].getBytesUsed());
+          maxKeyCount = Math.max(maxKeyCount, rps[i].getKeyCount());
+          LOG.info("Replica key count {}, bytes used {}",
+              rps[i].getKeyCount(), rps[i].getBytesUsed());
+        }
+        if (maxKeyCount < container.getNumberOfKeys()) {
+          container.setNumberOfKeys(maxKeyCount);
+        }
+        if (maxUsedBytes < container.getUsedBytes()) {
+          container.setUsedBytes(maxUsedBytes);
+        }
+        LOG.info("Container key count {}, bytes used {}",
+            container.getNumberOfKeys(), container.getUsedBytes());

Review comment:
       Please extract this block to a new method to keep the level of abstraction consistent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org