You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/04/20 08:27:10 UTC

[GitHub] [ozone] guihecheng opened a new pull request, #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

guihecheng opened a new pull request, #3323:
URL: https://github.com/apache/ozone/pull/3323

   ## What changes were proposed in this pull request?
   
   EC: Fix Datanode block file INCONSISTENCY during heavy load.
   Problem descriptions and analysis in the JIRA below.
   In brief, we should get the file size by using the fileChannel instead of the file.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6614
   
   ## How was this patch tested?
   
   Manual test with experiment programs, seen in the JIRA above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] cchenax commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
cchenax commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1104993908

   > Thanks @guihecheng for reporting this. This issue looks very similar to https://issues.apache.org/jira/browse/HDDS-6356 I think it was consistently reproduced in @cchenax environment which was basically starting all DNs in same machine. That time we could not reproduce in actual cluster, so we did not really dig further.
   > 
   > So, to make sure this issue is fixed, how about @cchenax help to reproduce with his env and try this fix there?
   
   ok,I will test it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3323:
URL: https://github.com/apache/ozone/pull/3323#discussion_r854583566


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java:
##########
@@ -383,14 +383,21 @@ private static ContainerProtos.Result translate(Exception cause) {
    * Checks if the block file length is equal to the chunk offset.
    *
    */
-  public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
+  public static void validateChunkSize(FileChannel fileChannel,
+      ChunkInfo chunkInfo, String fileName)
       throws StorageContainerException {
     long offset = chunkInfo.getOffset();
-    long len = chunkFile.length();
-    if (chunkFile.length() != offset) {
+    long fileLen;
+    try {
+      fileLen = fileChannel.size();
+    } catch (IOException e) {
+      throw new StorageContainerException("IO error encountered while " +

Review Comment:
   Please add the offset details for the exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1103917071

   > https://issues.apache.org/jira/browse/HDDS-6614
   
   Yes I've read this, you could read the descriptions in the JIRA and try the provided small program for some hints.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1104644811

   > Thanks @guihecheng for reporting this. This issue looks very similar to https://issues.apache.org/jira/browse/HDDS-6356 I think it was consistently reproduced in @cchenax environment which was basically starting all DNs in same machine. That time we could not reproduce in actual cluster, so we did not really dig further.
   > 
   > So, to make sure this issue is fixed, how about @cchenax help to reproduce with his env and try this fix there?
   
   Oh, yes, may be this problem is similar to that one, but I'm not sure, I'll fix the failed tests and ping cchenax then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1103903711

   > https://docs.oracle.com/javase/7/docs/api/java/io/File.html#length()
   >
   > `public long length()`
   >
   > **Returns:**
   > The length, in bytes, of the file denoted by this abstract pathname, or 0L if the file does not exist. Some operating systems may return 0L for pathnames denoting system-dependent entities such as devices or pipes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1104587044

   Thanks @guihecheng for reporting this. This issue looks very similar to https://issues.apache.org/jira/browse/HDDS-6356
   I think it was consistently reproduced in @cchenax environment which was basically starting all DNs in same machine. That time we could not reproduce in actual cluster, so we did not really dig further.
   
   So, to make sure this issue is fixed, how about @cchenax help to reproduce with his env and try this fix there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] cchenax commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
cchenax commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1105021137

   > Thanks @guihecheng for reporting this. This issue looks very similar to https://issues.apache.org/jira/browse/HDDS-6356 I think it was consistently reproduced in @cchenax environment which was basically starting all DNs in same machine. That time we could not reproduce in actual cluster, so we did not really dig further.
   > 
   > So, to make sure this issue is fixed, how about @cchenax help to reproduce with his env and try this fix there?
   
   it may be not the same problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng merged pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng merged PR #3323:
URL: https://github.com/apache/ozone/pull/3323


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on a diff in pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on code in PR #3323:
URL: https://github.com/apache/ozone/pull/3323#discussion_r854721652


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java:
##########
@@ -383,14 +383,21 @@ private static ContainerProtos.Result translate(Exception cause) {
    * Checks if the block file length is equal to the chunk offset.
    *
    */
-  public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
+  public static void validateChunkSize(FileChannel fileChannel,
+      ChunkInfo chunkInfo, String fileName)
       throws StorageContainerException {
     long offset = chunkInfo.getOffset();
-    long len = chunkFile.length();
-    if (chunkFile.length() != offset) {
+    long fileLen;
+    try {
+      fileLen = fileChannel.size();
+    } catch (IOException e) {
+      throw new StorageContainerException("IO error encountered while " +

Review Comment:
   OK, makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on a diff in pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on code in PR #3323:
URL: https://github.com/apache/ozone/pull/3323#discussion_r854115145


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java:
##########
@@ -383,14 +383,21 @@ private static ContainerProtos.Result translate(Exception cause) {
    * Checks if the block file length is equal to the chunk offset.
    *
    */
-  public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
+  public static void validateChunkSize(FileChannel fileChannel,
+      ChunkInfo chunkInfo, String fileName)
       throws StorageContainerException {
     long offset = chunkInfo.getOffset();
-    long len = chunkFile.length();
-    if (chunkFile.length() != offset) {
+    long fileLen;
+    try {
+      fileLen = fileChannel.size();
+    } catch (IOException e) {
+      throw new StorageContainerException("IO error encountered while " +
+          "getting the file size for " + fileName, CHUNK_FILE_INCONSISTENCY);
+    }
+    if (fileLen != offset) {
       throw new StorageContainerException(
-          "Chunk file offset " + offset + " does not match blockFile length " +
-          len, CHUNK_FILE_INCONSISTENCY);
+          "Chunk offset " + offset + " does not match length " +
+          fileLen + "of blockFile " + fileName, CHUNK_FILE_INCONSISTENCY);

Review Comment:
   Ah, thanks for the indent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1104841669

   Hi @kerneltime @umamaheswararao , thanks for reviewing , test failures addressed and Exception refined.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3323:
URL: https://github.com/apache/ozone/pull/3323#issuecomment-1104487246

   This makes sense. Thanks for catching this. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on a diff in pull request #3323: HDDS-6614. EC: Fix Datanode block file INCONSISTENCY during heavy load.

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #3323:
URL: https://github.com/apache/ozone/pull/3323#discussion_r854095099


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java:
##########
@@ -383,14 +383,21 @@ private static ContainerProtos.Result translate(Exception cause) {
    * Checks if the block file length is equal to the chunk offset.
    *
    */
-  public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
+  public static void validateChunkSize(FileChannel fileChannel,
+      ChunkInfo chunkInfo, String fileName)
       throws StorageContainerException {
     long offset = chunkInfo.getOffset();
-    long len = chunkFile.length();
-    if (chunkFile.length() != offset) {
+    long fileLen;
+    try {
+      fileLen = fileChannel.size();
+    } catch (IOException e) {
+      throw new StorageContainerException("IO error encountered while " +
+          "getting the file size for " + fileName, CHUNK_FILE_INCONSISTENCY);
+    }
+    if (fileLen != offset) {
       throw new StorageContainerException(
-          "Chunk file offset " + offset + " does not match blockFile length " +
-          len, CHUNK_FILE_INCONSISTENCY);
+          "Chunk offset " + offset + " does not match length " +
+          fileLen + "of blockFile " + fileName, CHUNK_FILE_INCONSISTENCY);

Review Comment:
   Missing space here.
   
   ```suggestion
             fileLen + " of blockFile " + fileName, CHUNK_FILE_INCONSISTENCY);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org