You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/18 15:47:25 UTC

[GitHub] [arrow] carols10cents opened a new pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

carols10cents opened a new pull request #8962:
URL: https://github.com/apache/arrow/pull/8962


   (repeating context from the Jira issue)
   
   # Problem
   
   ProtoBuf `proto3` specifies that [if a message does not contain a particular singular element, the field should get the default value](https://developers.google.com/protocol-buffers/docs/proto3#default). However, when the C++ `flight-test-integration-server` gets a `DoPut` request with a `FlightData` message for a record batch containing no items, and the `FlightData` is missing the `data_body` field, the server responds with an error "Expected body in IPC message of type record batch".
   
   ## What happens
   
   If I run the C++ `flight-test-integration-server` and the C++ `flight-test-integration-client` with the `generated_null_trivial` test case, the test passes and I see this in wireshark:
   
   <img width="712" alt="cpp-client-empty-data-body" src="https://user-images.githubusercontent.com/193874/102633180-13168780-411e-11eb-87a0-375271a39f83.png">
   
   Note the `data_body` field is present but has no value.
   
   If I run the Rust `flight-test-integration-client` that I'm working on developing, it does not send the `data_body` field at all if there are no bytes to send. I see this in wireshark:
   
   <img width="715" alt="rust-client-missing-data-body" src="https://user-images.githubusercontent.com/193874/102633213-1e69b300-411e-11eb-9f38-7d5e2759fbb6.png">
   
   Note the `data_body` field is not present.
   
   The C++ server then returns the error message `Expected body in IPC message of type record batch`, which comes from [this check for message body](https://github.com/apache/arrow/blob/519e9da4fc1698f686525f4226295f3680a3f3db/cpp/src/arrow/ipc/reader.cc#L92) called in [`ReadNext` of the record batch stream reader](https://github.com/apache/arrow/blob/519e9da4fc1698f686525f4226295f3680a3f3db/cpp/src/arrow/ipc/reader.cc#L787).
   
   ## What I expect to happen
   
   Instead of returning an error message because of a null pointer, the Message should get the default value of empty bytes.
   
   @shepmaster and I worked on this fix together, but I'm not at all confident this is the exact right fix. It's hard for me to trace through the code from a `FlightData` `data_body` to an IPC `Message` `body`, but this does fix the problem. I'm also not sure what other cases this change might affect. Feedback much appreciated!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748197373


   Thanks for catching & taking a look at this!
   
   I think, rather than change the IPC class, we should make the FlightData Protobuf deserializer act more like a real Protobuf parser, and fill in an empty body instead of leaving the buffer as null if there was not a buffer read from the wire. That should probably happen here: https://github.com/apache/arrow/blob/519e9da4fc1698f686525f4226295f3680a3f3db/cpp/src/arrow/flight/serialization_internal.cc#L376-L377
   
   I'll admit that even having written the code, it's hard to trace the path from here to where you currently have the fix...
   
   Here is where we invoke the deserializer:
   https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/arrow/flight/serialization_internal.cc#L429-L432
   That in turn is called from a peekable reader:
   https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/arrow/flight/serialization_internal.h#L133
   The reader fills in a FlightData which gets passed to the RecordBatchReader here:
   https://github.com/apache/arrow/blob/519e9da4fc1698f686525f4226295f3680a3f3db/cpp/src/arrow/flight/server.cc#L190
   Calling ReadNext implicitly bounces through this glue class:
   https://github.com/apache/arrow/blob/519e9da4fc1698f686525f4226295f3680a3f3db/cpp/src/arrow/flight/server.cc#L96-L119
   Which finally actually constructs the IPC Message in this PR:
   https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/arrow/flight/serialization_internal.cc#L382-L384


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] carols10cents commented on a change in pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
carols10cents commented on a change in pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#discussion_r546068440



##########
File path: cpp/src/arrow/flight/serialization_internal.cc
##########
@@ -374,7 +374,18 @@ grpc::Status FlightDataDeserialize(ByteBuffer* buffer, FlightData* out) {
   buffer->Clear();
 
   // TODO(wesm): Where and when should we verify that the FlightData is not
-  // malformed or missing components?
+  // malformed?
+
+  // Set default values for unspecified FlightData fields
+  if (out->app_metadata == nullptr) {
+    out->app_metadata = std::make_shared<Buffer>(nullptr, 0);
+  }
+  if (out->metadata == nullptr) {
+    out->metadata = std::make_shared<Buffer>(nullptr, 0);
+  }

Review comment:
       Whoops! So much for trying to be overly helpful 😅 Pushed a change so this just handles `body`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] carols10cents commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
carols10cents commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748281158


   I made the same sort of fix in the spot you pointed out, to each of the 3 byte fields in `FlightData`. What do you think? 
   
   > Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue. 
   
   Yuck. Should I open a new Jira ticket to track the Java side or should this ticket cover the same problem in the Java and C++ implementations?
   
   > (Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)
   
   Hm, isn't the C++ client currently writing the empty field? Or is it writing the field to `null`, rather than an empty list of bytes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748285044


   > > Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be _no_ buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)
   > 
   > Yes, I meant that because the C++ client is currently a Protobuf implementation that is writing out an empty field (I think), wouldn't the archery integration tests with the Java server and C++ client on the generated_null test currently be failing?
   
   Ah sorry - C++ is fine because it only writes out those bytes conditionally, for message types that require a body (e.g. a RecordBatch). But if an implementation wrote out an (empty) body for a Schema, the Java client/server would break.
   
   https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/arrow/flight/serialization_internal.cc#L274


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] carols10cents commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
carols10cents commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748283717


   > Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be _no_ buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)
   
   Yes, I meant that because the C++ client is currently a Protobuf implementation that is writing out an empty field (I think), wouldn't the archery integration tests with the Java server and C++ client on the generated_null test currently be failing?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748274099


   Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue. (Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #8962: ARROW-10960: [C++][FlightRPC] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #8962:
URL: https://github.com/apache/arrow/pull/8962


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] carols10cents commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
carols10cents commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748242049


   @lidavidm Thank you for the pointers! That's very helpful; I'm going to try making the fix in the spot you suggested instead and push an update in a bit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#discussion_r546065801



##########
File path: cpp/src/arrow/flight/serialization_internal.cc
##########
@@ -374,7 +374,18 @@ grpc::Status FlightDataDeserialize(ByteBuffer* buffer, FlightData* out) {
   buffer->Clear();
 
   // TODO(wesm): Where and when should we verify that the FlightData is not
-  // malformed or missing components?
+  // malformed?
+
+  // Set default values for unspecified FlightData fields
+  if (out->app_metadata == nullptr) {
+    out->app_metadata = std::make_shared<Buffer>(nullptr, 0);
+  }
+  if (out->metadata == nullptr) {
+    out->metadata = std::make_shared<Buffer>(nullptr, 0);
+  }

Review comment:
       I think the test failures might come from this, actually - if there's a metadata buffer, that causes some code to assume this must be a schema or record batch and try to parse it accordingly (but it'll then encounter the empty buffer and fail to parse it).
   
   Also, the app_metadata buffer can be omitted - it's meant to be optional so having an empty vs null one is no big deal.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748172758


   https://issues.apache.org/jira/browse/ARROW-10960


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748198461


   And FWIW, there's a long-standing issue (ARROW-4419) for testing the Flight implementation against a 'plain gRPC' client/server to catch issues like this; we intend for Flight to still be compatible with regular gRPC clients that haven't been specially optimized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #8962: ARROW-10960: [C++] [Flight] Default to empty buffer instead of null

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #8962:
URL: https://github.com/apache/arrow/pull/8962#issuecomment-748282913


   > I made the same sort of fix in the spot you pointed out, to each of the 3 byte fields in `FlightData`. What do you think?
   
   Should be good once CI passes, thank you!
   
   > > Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue.
   > 
   > Yuck. Should I open a new Jira ticket to track the Java side or should this ticket cover the same problem in the Java and C++ implementations?
   
   I filed ARROW-10939 and am looking at it right now. I also cross-linked the three outstanding issues on Jira.
   
   > 
   > > (Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)
   > 
   > Hm, isn't the C++ client currently writing the empty field? Or is it writing the field to `null`, rather than an empty list of bytes?
   
   Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be _no_ buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org