You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/28 20:08:16 UTC

[GitHub] [arrow] lidavidm opened a new pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

lidavidm opened a new pull request #9354:
URL: https://github.com/apache/arrow/pull/9354


   It turns out the zero-copy optimization was never actually applied in Java. Fixing this brings us +50% throughput on localhost.
   
   Before:
   
   ```
   Transferred 100000000 records totaling 3200000000 bytes at 329.056671 MiB/s. 10782528.996111 record/s. 2633.309231 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 518.238607 MiB/s. 16981642.690398 record/s. 4147.256778 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 614.855615 MiB/s. 20147588.800460 record/s. 4920.444137 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 614.137783 MiB/s. 20124066.880701 record/s. 4914.699614 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 650.376169 MiB/s. 21311526.320277 record/s. 5204.700958 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 633.765603 MiB/s. 20767231.292374 record/s. 5071.773226 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 639.984121 MiB/s. 20970999.671369 record/s. 5121.537540 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 601.255186 MiB/s. 19701929.947094 record/s. 4811.605332 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 593.782434 MiB/s. 19457062.784742 record/s. 4751.803873 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 610.451501 MiB/s. 20003274.776120 record/s. 4885.199766 batch/s.
   Summary: 
   Average throughput: 580.590369 MiB/s, standard deviation: 90.650159 MiB/s
   ```
   
   After:
   ```
   Transferred 100000000 records totaling 3200000000 bytes at 411.363707 MiB/s. 13479565.944378 record/s. 3291.979595 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 709.494366 MiB/s. 23248711.374843 record/s. 5677.800292 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 870.269771 MiB/s. 28516999.860372 record/s. 6964.421706 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 963.586464 MiB/s. 31574801.258002 record/s. 7711.197963 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 924.437899 MiB/s. 30291981.064273 record/s. 7397.907616 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 954.509764 MiB/s. 31277375.943584 record/s. 7638.560753 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 966.933269 MiB/s. 31684469.365217 record/s. 7737.981108 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 884.759214 MiB/s. 28991789.926919 record/s. 7080.374936 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 962.418507 MiB/s. 31536529.643890 record/s. 7701.851270 batch/s.
   Transferred 100000000 records totaling 3200000000 bytes at 997.708658 MiB/s. 32692917.301419 record/s. 7984.264263 batch/s.
   Summary: 
   Average throughput: 864.548162 MiB/s, standard deviation: 170.041331 MiB/s
   ```
   
   I looked at adding a unit test, but we'd need reflection to construct the internal gRPC classes. IMO, we're better off establishing benchmarks to guard against a regression here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #9354:
URL: https://github.com/apache/arrow/pull/9354


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#issuecomment-769352026


   https://issues.apache.org/jira/browse/ARROW-11066


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#discussion_r567373267



##########
File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/AddWritableBuffer.java
##########
@@ -88,11 +87,8 @@
    * @param buf The buffer to add.
    * @param stream The Candidate OutputStream to add to.
    * @return True if added. False if not possible.
-   * @throws IOException on error
    */
-  public static boolean add(ByteBuf buf, OutputStream stream) throws IOException {
-    buf.readBytes(stream, buf.readableBytes());

Review comment:
       while I'm trying to parse the code, its not clear to me why this didn't end up doing a double send? (it is markers in the buf already indicate it is drained?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#issuecomment-770511209


   Yes, I was poking it...sometimes the Debian Java build fails, other times the MacOS build. I'll take a look when I get a chance, though without a MacOS machine, that might take a while.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#discussion_r567372744



##########
File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java
##########
@@ -68,7 +68,7 @@ public void register(FlightProducer.ServerStreamListener listener) {
 
       @Override
       public WaitResult waitForListener(long timeout) {
-        while (!listener.isReady()) {

Review comment:
       this is unrelated to the change is suppose?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#issuecomment-770496625


   It looks like this might have broken mac os java.   I'll rollback tonight if this is the case and we can open a new pr to fix


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#discussion_r567429952



##########
File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/AddWritableBuffer.java
##########
@@ -88,11 +87,8 @@
    * @param buf The buffer to add.
    * @param stream The Candidate OutputStream to add to.
    * @return True if added. False if not possible.
-   * @throws IOException on error
    */
-  public static boolean add(ByteBuf buf, OutputStream stream) throws IOException {
-    buf.readBytes(stream, buf.readableBytes());

Review comment:
       Yes - `readBytes` increments the reader index and so the subsequent getBytes call wouldn't do anything.
   
   https://github.com/apache/arrow/blob/6b85f6eba6d8403dbd549212fbc86f0e8ee192d0/java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java#L441




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#discussion_r567429767



##########
File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java
##########
@@ -68,7 +68,7 @@ public void register(FlightProducer.ServerStreamListener listener) {
 
       @Override
       public WaitResult waitForListener(long timeout) {
-        while (!listener.isReady()) {

Review comment:
       This was needed or the test would get stuck - I think it's just I bumped something that exposed a race condition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #9354: ARROW-11066: [Java][FlightRPC] fix zero-copy optimization

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #9354:
URL: https://github.com/apache/arrow/pull/9354#issuecomment-770454227


   +1. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org