You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/25 13:13:01 UTC

[GitHub] [arrow] lidavidm opened a new pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

lidavidm opened a new pull request #7543:
URL: https://github.com/apache/arrow/pull/7543


   `ArrowBuf.setBytes` has an override that uses a 8-byte-at-a-time copy loop if the byte buffer does not provide an array and is not direct. Unfortunately, this means it'll mangle data when the byte buffer is big-endian, as it then writes the data into the little-endian ArrowBuf. This fixes it by setting the byte order before copying, and then restoring it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#discussion_r448338460



##########
File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java
##########
@@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {
           --length;
           ++dstAddress;
         }
+        src.order(originalByteOrder);

Review comment:
       should we place this in a finally block?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-652379935


   @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to big-endian](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/ByteBuffer.html#wrap(byte%5B%5D)), and some libraries, like Protobuf, inherit that behavior. We discovered this when developing with Flight internally; there are Protobuf APIs that return a ByteBuffer that may be big-endian; when Flight tries to copy the buffer into an ArrowBuf, it corrupts the data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-652395634


   > @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to big-endian](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/ByteBuffer.html#wrap(byte%5B%5D)), and some libraries, like Protobuf, inherit that behavior. We discovered this when developing with Flight internally; there are Protobuf APIs that return a ByteBuffer that may be big-endian; when Flight tries to copy the buffer into an ArrowBuf, it corrupts the data.
   
   @lidavidm Thanks a lot for your clarification. This is really a problem that needs to be fixed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-652984542


   Rebased!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 closed pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
liyafan82 closed pull request #7543:
URL: https://github.com/apache/arrow/pull/7543


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-649534666


   https://issues.apache.org/jira/browse/ARROW-9221


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-652211147


   @lidavidm Thanks for reporting the problem.
   I am curious how this problem has ever arised. The default byte order is platform dependent. For a big endian machine, the program should crash at the stage of class loading, as our current implementation does not support big endian platforms. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-651777225


   Would a Java maintainer be able to look at this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#issuecomment-652737865


   Seems a rebase is required. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #7543:
URL: https://github.com/apache/arrow/pull/7543#discussion_r448345297



##########
File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java
##########
@@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {
           --length;
           ++dstAddress;
         }
+        src.order(originalByteOrder);

Review comment:
       Good idea, I've updated the PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org