You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2023/01/16 13:13:48 UTC

[GitHub] [ozone] sodonnel opened a new pull request, #4180: HDDS-7787. GetChecksum for EC files can fail intermittently with IndexOutOfBounds exception

sodonnel opened a new pull request, #4180:
URL: https://github.com/apache/ozone/pull/4180

   ## What changes were proposed in this pull request?
   
   When calculating a checksum for an EC file with Rack Topology enabled, you can get the following error intermittently:
   
   ```
   ERROR : Failed with exception null
     java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
   ...
   ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
   ...
   INFO  : Completed executing command(queryId=hive_20221214035652_bc45477d-98df-408e-b945-a63b4ac6896a); Time taken: 22.167 seconds
     INFO  : OK
     Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
           ...
   ```
   
   This is because the wrong nodes are used to obtain the stripe checksum sometimes as the node does not correctly use the replicaIndex in the pipeline to order the nodes.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7787
   
   ## How was this patch tested?
   
   An existing test covers the checksum validate, so it confirms this change has not broken anything. The actual problem is difficult to reproduce in a unit test as the rack awareness is not easy to setup in such a way to affect the node order in the pipeline. We do have a reproducible test with a Hive workload that causes this, so we can validate the fix that way after this has been committed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #4180: HDDS-7787. GetChecksum for EC files can fail intermittently with IndexOutOfBounds exception

Posted by GitBox <gi...@apache.org>.
sodonnel merged PR #4180:
URL: https://github.com/apache/ozone/pull/4180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org