You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/10 18:28:48 UTC

[GitHub] [arrow] lidavidm opened a new pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

lidavidm opened a new pull request #7398:
URL: https://github.com/apache/arrow/pull/7398


   (Recreated since I rebased before reopening.)
   
   I'd appreciate feedback on this. This is a specialized API to limit the payload size when a client is sending data to a server.
   
   While Flight by default removes the gRPC-level limits on message sizes, we've found setting server-side limits on the size of incoming messages is still useful to ensure clients don't overwhelm a server/exhaust server memory by sending very large messages. However, if you exceed the limit at the gRPC level, the server resets the connection, making it hard for a client to recover gracefully. This PR adds an optional client-side check, so the client can catch the error and retry with a smaller record batch, or provide a meaningful error to the user if the batch cannot be made small enough.
   
   Unlike solutions involving pa.ipc.get_record_batch_size, this avoids duplicate re-serialization of the record batch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7398:
URL: https://github.com/apache/arrow/pull/7398


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#discussion_r438946065



##########
File path: cpp/src/arrow/flight/client.h
##########
@@ -73,6 +91,16 @@ class ARROW_FLIGHT_EXPORT FlightClientOptions {
   std::string private_key;
   /// \brief A list of client middleware to apply.
   std::vector<std::shared_ptr<ClientMiddlewareFactory>> middleware;
+  /// \brief A soft limit on the number of bytes to write when sending
+  ///     Arrow data to a server.
+  ///
+  /// Used to help limit server memory consumption. Only enabled if
+  /// positive. When enabled, \a FlightStreamWriter.Write* may yield a

Review comment:
       I was under the mistaken impression that Doxygen uses them for monospace text, but it's just intended to annotate arguments, so I've removed them. https://www.doxygen.nl/manual/commands.html#cmda




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#issuecomment-642857268


   The s390x infrastructure on Travis-CI seems unreliable. I'm going to merge anyway.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#discussion_r438929219



##########
File path: python/pyarrow/_flight.pyx
##########
@@ -177,6 +185,16 @@ cdef class FlightUnavailableError(FlightError, ArrowException):
         return MakeFlightError(CFlightStatusUnavailable, tobytes(str(self)),
                                self.extra_info)
 
+
+class FlightWriteSizeExceededError(OSError, ArrowException):

Review comment:
       I'm not sure I get the point of `OSError` here. I would expect `ArrowInvalid`.

##########
File path: cpp/src/arrow/flight/client.h
##########
@@ -73,6 +91,16 @@ class ARROW_FLIGHT_EXPORT FlightClientOptions {
   std::string private_key;
   /// \brief A list of client middleware to apply.
   std::vector<std::shared_ptr<ClientMiddlewareFactory>> middleware;
+  /// \brief A soft limit on the number of bytes to write when sending
+  ///     Arrow data to a server.

Review comment:
       For a single message / batch, right?

##########
File path: cpp/src/arrow/flight/client.h
##########
@@ -73,6 +91,16 @@ class ARROW_FLIGHT_EXPORT FlightClientOptions {
   std::string private_key;
   /// \brief A list of client middleware to apply.
   std::vector<std::shared_ptr<ClientMiddlewareFactory>> middleware;
+  /// \brief A soft limit on the number of bytes to write when sending
+  ///     Arrow data to a server.
+  ///
+  /// Used to help limit server memory consumption. Only enabled if
+  /// positive. When enabled, \a FlightStreamWriter.Write* may yield a

Review comment:
       Why the `\a`s?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#discussion_r438945868



##########
File path: cpp/src/arrow/flight/client.h
##########
@@ -73,6 +91,16 @@ class ARROW_FLIGHT_EXPORT FlightClientOptions {
   std::string private_key;
   /// \brief A list of client middleware to apply.
   std::vector<std::shared_ptr<ClientMiddlewareFactory>> middleware;
+  /// \brief A soft limit on the number of bytes to write when sending
+  ///     Arrow data to a server.

Review comment:
       Yes - updated the docstring.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#issuecomment-642183483


   See #7299 and #7387 for context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#discussion_r438945688



##########
File path: python/pyarrow/_flight.pyx
##########
@@ -177,6 +185,16 @@ cdef class FlightUnavailableError(FlightError, ArrowException):
         return MakeFlightError(CFlightStatusUnavailable, tobytes(str(self)),
                                self.extra_info)
 
+
+class FlightWriteSizeExceededError(OSError, ArrowException):

Review comment:
       Ah, I did this because it originates as IOError (which gets mapped to OSError in Python), but I've changed both to ArrowInvalid.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#issuecomment-642184909


   https://issues.apache.org/jira/browse/ARROW-8487


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7398: ARROW-8487: [FlightRPC] Provide a way to target a particular payload size

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7398:
URL: https://github.com/apache/arrow/pull/7398#issuecomment-642640738


   Rebased. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org