You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/09/16 21:36:30 UTC

[GitHub] [ozone] jojochuang opened a new pull request, #3759: HDDS-7228. ChecksumByteBufferImpl.update() is expensive

jojochuang opened a new pull request, #3759:
URL: https://github.com/apache/ozone/pull/3759

   ChecksumByteBufferImpl.update() copies the bytebuffer before checksum calculation because the isReadOnly field of ByteBuffer is false.
   
   This is a quick&dirty hack, which uses inflection to reset the isReadOnly field of ByteBuffer to true, so that ChecksumByteBufferImpl.update performs checksum calculation directly without memory copy.
   
   Posting this as a draft. It's a dirty hack and I like it, and it does work without any visible overhead.
   
   ## What changes were proposed in this pull request?
   
   HDDS-7228
   
   ## What is the link to the Apache JIRA
   
   (Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. Ex: unit tests, manual tests)
   (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #3759:
URL: https://github.com/apache/ozone/pull/3759#discussion_r1345530296


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java:
##########
@@ -27,15 +28,31 @@ public class ChecksumByteBufferImpl implements ChecksumByteBuffer {
 
   private Checksum checksum;
 
+  private Field isReadyOnlyField = null;
+
   public ChecksumByteBufferImpl(Checksum impl) {
     this.checksum = impl;
+
+    try {
+      isReadyOnlyField = ByteBuffer.class
+          .getDeclaredField("isReadOnly");
+      isReadyOnlyField.setAccessible(true);
+    } catch (NoSuchFieldException e) {
+      e.printStackTrace();
+    }
   }
 
   @Override
   // TODO - when we eventually move to a minimum Java version >= 9 this method
   //        should be refactored to simply call checksum.update(buffer), as the
   //        Checksum interface has been enhanced to allow this since Java 9.
   public void update(ByteBuffer buffer) {
+    // this is a hack to not do memory copy.
+    try {
+      isReadyOnlyField.setBoolean(buffer, false);

Review Comment:
   Its the protobuf that originates the buffer as readOnly. When the protobuf bytes are received, the data goes into a protobuf ByteString, which is backed by a byteArray. ByteString has some methods to return a ByteBuffer, one of which is "asReadOnlyByteBuffer". This wraps the original byte array and returns a readonly buffer. At that stage the Java checksum interface only accepts a byte or byteArray. Even the java.util.zip.CRC32 implementation has an implementation to take a ByteBuffer, but this is what it does:
   
   ```
   public void update(ByteBuffer buffer) {
           int pos = buffer.position();
           int limit = buffer.limit();
           assert (pos <= limit);
           int rem = limit - pos;
           if (rem <= 0)
               return;
           if (buffer instanceof DirectBuffer) {
               crc = updateByteBuffer(crc, ((DirectBuffer)buffer).address(), pos, rem);
           } else if (buffer.hasArray()) {
               crc = updateBytes(crc, buffer.array(), pos + buffer.arrayOffset(), rem);
           } else {
               byte[] b = new byte[rem];
               buffer.get(b);
               crc = updateBytes(crc, b, 0, b.length);
           }
           buffer.position(limit);
       }
   ```
   
   buffer.hasArray() checks for readOnly and returns false if it is readonly, meaning we cannot get at the array, and it will end up copying it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1757077946

   > We should probably move these to a log message, rather than printing directly to stdout / stderr, which is what I assume e.printStackTrace() does.
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1746468245

   > I suspect it is the warning message that is breaking the acceptance tests, as they are likely expecting a certain output on stdout / stderr and this warning is breaking that expectation.
   
   Yes, the test expects empty output for these commands.
   
   This can be suppressed by:
    * add `--add-opens java.base/java.nio=ALL-UNNAMED` to `OZONE_MODULE_ACCESS_ARGS` for Java9+
    * add `$OZONE_MODULE_ACCESS_ARGS` to `OZONE_SH_OPTS`, `OZONE_FS_OPTS`, maybe some others, too
   
   https://github.com/apache/ozone/blob/b3c84845926fb79e4b3b4ae20ded2be1c97ada06/hadoop-ozone/dist/src/shell/ozone/ozone#L89-L95


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1751067213

   @jojochuang I've added the JVM arg to avoid the warning, plus changed initialization of the `Field` to `static`, please take a look if you're OK with that.
   
   I think we should refine the `printStackTrace()` statements, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on pull request #3759: HDDS-7228. ChecksumByteBufferImpl.update() is expensive

Posted by GitBox <gi...@apache.org>.
jojochuang commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1273641429

   I'll still trying to come up with a better implementation. This is hacky.
   And while this hack looked okay on a spindle disk cluster, unfortunately this hack still shows some overhead on a NVME cluster I tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1755493296

   > @jojochuang I've added the JVM arg to avoid the warning, plus changed initialization of the `Field` to `static`, please take a look if you're OK with that.
   > 
   > I think we should refine the `printStackTrace()` statements, too.
   
   We should probably move these to a log message, rather than printing directly to stdout / stderr, which is what I assume e.printStackTrace() does. Otherwise I think we should commit this, as it greatly improves what is there performance wise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3759: HDDS-7228. ChecksumByteBufferImpl.update() is expensive

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1251227580

   @duongkame 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1745132194

   Various suggestions to suppress the warning - https://stackoverflow.com/questions/46454995/how-to-hide-warning-illegal-reflective-access-in-java-9-without-jvm-argument 
   
   I suspect it is the warning message that is breaking the acceptance tests, as they are likely expecting a certain output on stdout / stderr and this warning is breaking that expectation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "Cyrill (via GitHub)" <gi...@apache.org>.
Cyrill commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1750651517

   @jojochuang Is there any estimation when this can be fixed. This change adds a significant boost to the Ozone performance and I wanted to continue investigating the performance further having this issue resolved.
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on code in PR #3759:
URL: https://github.com/apache/ozone/pull/3759#discussion_r1345508465


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java:
##########
@@ -27,15 +28,31 @@ public class ChecksumByteBufferImpl implements ChecksumByteBuffer {
 
   private Checksum checksum;
 
+  private Field isReadyOnlyField = null;
+
   public ChecksumByteBufferImpl(Checksum impl) {
     this.checksum = impl;
+
+    try {
+      isReadyOnlyField = ByteBuffer.class
+          .getDeclaredField("isReadOnly");
+      isReadyOnlyField.setAccessible(true);
+    } catch (NoSuchFieldException e) {
+      e.printStackTrace();
+    }
   }
 
   @Override
   // TODO - when we eventually move to a minimum Java version >= 9 this method
   //        should be refactored to simply call checksum.update(buffer), as the
   //        Checksum interface has been enhanced to allow this since Java 9.
   public void update(ByteBuffer buffer) {
+    // this is a hack to not do memory copy.
+    try {
+      isReadyOnlyField.setBoolean(buffer, false);

Review Comment:
   I must be missing something here, but isn't `buffer` defined as readonly by us here:
   
   https://github.com/apache/ozone/blob/b3c84845926fb79e4b3b4ae20ded2be1c97ada06/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBuffer.java#L42-L43
   
   Instead of using reflection, can't we simply remove the `.asReadOnlyBuffer()` call?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel merged PR #3759:
URL: https://github.com/apache/ozone/pull/3759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on code in PR #3759:
URL: https://github.com/apache/ozone/pull/3759#discussion_r1345520315


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java:
##########
@@ -27,15 +28,31 @@ public class ChecksumByteBufferImpl implements ChecksumByteBuffer {
 
   private Checksum checksum;
 
+  private Field isReadyOnlyField = null;
+
   public ChecksumByteBufferImpl(Checksum impl) {
     this.checksum = impl;
+
+    try {
+      isReadyOnlyField = ByteBuffer.class
+          .getDeclaredField("isReadOnly");
+      isReadyOnlyField.setAccessible(true);
+    } catch (NoSuchFieldException e) {
+      e.printStackTrace();
+    }
   }
 
   @Override
   // TODO - when we eventually move to a minimum Java version >= 9 this method
   //        should be refactored to simply call checksum.update(buffer), as the
   //        Checksum interface has been enhanced to allow this since Java 9.
   public void update(ByteBuffer buffer) {
+    // this is a hack to not do memory copy.
+    try {
+      isReadyOnlyField.setBoolean(buffer, false);

Review Comment:
   I guess there are other places that are not so simple to change, e.g. this function used by `computeChecksum` / `verifyChecksum`:
   
   https://github.com/apache/ozone/blob/b3c84845926fb79e4b3b4ae20ded2be1c97ada06/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/utils/BufferUtils.java#L68-L75



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1745115170

   Huum, some acceptance tests are failing with:
   
   ```
   Put                                                                   | FAIL |
   'WARNING: An illegal reflective access operation has occurred
   --
   Put with Streaming                                                    | FAIL |
   'WARNING: An illegal reflective access operation has occurred
   --
   ozonefs-o3fs-link :: Ozone FS tests                                   | FAIL |
   23 tests, 21 passed, 2 failed
   ```
   
   Which I suspect is related


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-7228. ChecksumByteBufferImpl.update() is expensive [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #3759:
URL: https://github.com/apache/ozone/pull/3759#issuecomment-1742864836

   I've marked this ready for review - any reason not to commit this? It may not be perfect, but its a lot better than what is there at the moment without this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org