You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/06 16:28:30 UTC

[GitHub] [arrow] lidavidm opened a new pull request, #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

lidavidm opened a new pull request, #12815:
URL: https://github.com/apache/arrow/pull/12815

   This reworks the Flight protocol documentation and the C++ Flight documentation, and adds a brief Python documentation page. It also fixes some warnings and errors I found.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12815:
URL: https://github.com/apache/arrow/pull/12815#issuecomment-1106004466

   Benchmark runs are scheduled for baseline = f2d12138f410d37c343109a59a35619b821ea870 and contender = 7dd8a4bd62879416eca189fe6d9e0023e4936d87. 7dd8a4bd62879416eca189fe6d9e0023e4936d87 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/85fe868743ec4fc9bf4c7082c5be248d...51bd15f1e58c435c9aef674af906e510/)
   [Finished :arrow_down:0.29% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6fa9076d38f54f77ba117099e8b37a32...982c25f4a7e74db49f8236612c2c61dc/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0cd11c1aeb2e4b26a9308f885925595a...c71300d7092e41f3b5cd76da8736bdb1/)
   [Finished :arrow_down:0.04% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/3bdbfb5519334e51bafbca05738db188...32796d2f8d404f3ea3e1830524fbc4cc/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/555| `7dd8a4bd` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/543| `7dd8a4bd` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/541| `7dd8a4bd` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/553| `7dd8a4bd` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/554| `f2d12138` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/542| `f2d12138` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/540| `f2d12138` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/552| `f2d12138` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #12815:
URL: https://github.com/apache/arrow/pull/12815#discussion_r844422589


##########
docs/source/cpp/flight.rst:
##########
@@ -82,38 +94,78 @@ server stops.
    std::cout << "Server listening on localhost:" << server->port() << std::endl;
    ARROW_CHECK_OK(server->Serve());
 
-
-Enabling TLS and Authentication
--------------------------------
-
-TLS can be enabled by providing a certificate and key pair to
-:func:`FlightServerBase::Init
-<arrow::flight::FlightServerBase::Init>`. Additionally, use
-:func:`Location::ForGrpcTls <arrow::flight::Location::ForGrpcTls>` to
-construct the :class:`arrow::flight::Location` to listen on.
-
-Similarly, authentication can be enabled by providing an
-implementation of :class:`ServerAuthHandler
-<arrow::flight::ServerAuthHandler>`. Authentication consists of two
-parts: on initial client connection, the server and client
-authentication implementations can perform any negotiation needed;
-then, on each RPC thereafter, the client provides a token. The server
-authentication handler validates the token and provides the identity
-of the client. This identity can be obtained from the
-:class:`arrow::flight::ServerCallContext`.
-
 Using the Flight Client
 =======================
 
 To connect to a Flight service, create an instance of
 :class:`arrow::flight::FlightClient` by calling :func:`Connect
 <arrow::flight::FlightClient::Connect>`. This takes a Location and
-returns the client through an out parameter. To authenticate, call
-:func:`Authenticate <arrow::flight::FlightClient::Authenticate>` with
-the desired client authentication implementation.
+returns the client through an out parameter.

Review Comment:
   Good point. I removed/reworded things (except for the server-side RPC handlers where they still use out parameters).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #12815:
URL: https://github.com/apache/arrow/pull/12815#issuecomment-1090468092

   <details>
   <summary>New protocol docs</summary>
   
   ![image](https://user-images.githubusercontent.com/327919/162022807-8f873830-f1ce-4f48-83e0-15a5b86a5f04.png)
   
   </details>
   
   <details>
   <summary>New C++ docs</summary>
   
   ![image](https://user-images.githubusercontent.com/327919/162022880-0d9df368-1915-462d-b9a8-50c9c9f41aac.png)
   
   </details>
   
   <details>
   <summary>New Python docs</summary>
   
   ![image](https://user-images.githubusercontent.com/327919/162022899-09180af5-d698-473d-99aa-8c67c21c842d.png)
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm closed pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation
URL: https://github.com/apache/arrow/pull/12815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #12815:
URL: https://github.com/apache/arrow/pull/12815#discussion_r844317220


##########
docs/source/cpp/flight.rst:
##########
@@ -82,38 +94,78 @@ server stops.
    std::cout << "Server listening on localhost:" << server->port() << std::endl;
    ARROW_CHECK_OK(server->Serve());
 
-
-Enabling TLS and Authentication
--------------------------------
-
-TLS can be enabled by providing a certificate and key pair to
-:func:`FlightServerBase::Init
-<arrow::flight::FlightServerBase::Init>`. Additionally, use
-:func:`Location::ForGrpcTls <arrow::flight::Location::ForGrpcTls>` to
-construct the :class:`arrow::flight::Location` to listen on.
-
-Similarly, authentication can be enabled by providing an
-implementation of :class:`ServerAuthHandler
-<arrow::flight::ServerAuthHandler>`. Authentication consists of two
-parts: on initial client connection, the server and client
-authentication implementations can perform any negotiation needed;
-then, on each RPC thereafter, the client provides a token. The server
-authentication handler validates the token and provides the identity
-of the client. This identity can be obtained from the
-:class:`arrow::flight::ServerCallContext`.
-
 Using the Flight Client
 =======================
 
 To connect to a Flight service, create an instance of
 :class:`arrow::flight::FlightClient` by calling :func:`Connect
 <arrow::flight::FlightClient::Connect>`. This takes a Location and
-returns the client through an out parameter. To authenticate, call
-:func:`Authenticate <arrow::flight::FlightClient::Authenticate>` with
-the desired client authentication implementation.
+returns the client through an out parameter.

Review Comment:
   Should we be pointing users toward the `Result` variants of these functions instead given https://github.com/apache/arrow/pull/12719?



##########
docs/source/cpp/flight.rst:
##########
@@ -82,38 +94,78 @@ server stops.
    std::cout << "Server listening on localhost:" << server->port() << std::endl;
    ARROW_CHECK_OK(server->Serve());
 
-
-Enabling TLS and Authentication
--------------------------------
-
-TLS can be enabled by providing a certificate and key pair to
-:func:`FlightServerBase::Init
-<arrow::flight::FlightServerBase::Init>`. Additionally, use
-:func:`Location::ForGrpcTls <arrow::flight::Location::ForGrpcTls>` to
-construct the :class:`arrow::flight::Location` to listen on.
-
-Similarly, authentication can be enabled by providing an
-implementation of :class:`ServerAuthHandler
-<arrow::flight::ServerAuthHandler>`. Authentication consists of two
-parts: on initial client connection, the server and client
-authentication implementations can perform any negotiation needed;
-then, on each RPC thereafter, the client provides a token. The server
-authentication handler validates the token and provides the identity
-of the client. This identity can be obtained from the
-:class:`arrow::flight::ServerCallContext`.
-
 Using the Flight Client
 =======================
 
 To connect to a Flight service, create an instance of
 :class:`arrow::flight::FlightClient` by calling :func:`Connect
 <arrow::flight::FlightClient::Connect>`. This takes a Location and
-returns the client through an out parameter. To authenticate, call
-:func:`Authenticate <arrow::flight::FlightClient::Authenticate>` with
-the desired client authentication implementation.
+returns the client through an out parameter.
 
 Each RPC method returns :class:`arrow::Status` to indicate the
 success/failure of the request. Any other return values are specified
-through out parameters. They also take an optional :class:`options
-<arrow::flight::FlightCallOptions>` parameter that allows specifying a
-timeout for the call.
+through out parameters.
+
+Cancellation and Timeouts
+=========================
+
+When making a call, clients can optionally provide
+:class:`FlightCallOptions <arrow::flight::FlightCallOptions>`. This
+allows clients to set a timeout on calls or provide custom HTTP
+headers, among other features. Also, some objects returned by client
+RPC calls expose a ``Cancel`` method which allows terminating a call
+early.
+
+On the server side, timeouts are transparent. For cancellation, the

Review Comment:
   I find "transparent" isn't always straight foward in meaning. What do you think of this:
   
   ```suggestion
   On the server side, no additional code is needed to implement timeouts. For cancellation, the
   ```



##########
docs/source/format/Flight.rst:
##########
@@ -29,73 +30,187 @@ either downloaded from or uploaded to another service. A set of
 metadata methods offers discovery and introspection of streams, as
 well as the ability to implement application-specific methods.
 
-Methods and message wire formats are defined by Protobuf, enabling
+Methods and message wire formats are defined by Protobuf_, enabling
 interoperability with clients that may support gRPC and Arrow
 separately, but not Flight. However, Flight implementations include
 further optimizations to avoid overhead in usage of Protobuf (mostly
 around avoiding excessive memory copies).
 
 .. _gRPC: https://grpc.io/
+.. _Protobuf: https://developers.google.com/protocol-buffers/
 
-RPC Methods
------------
+RPC Methods and Request Patterns
+================================
 
 Flight defines a set of RPC methods for uploading/downloading data,
 retrieving metadata about a data stream, listing available data
 streams, and for implementing application-specific RPC methods. A
 Flight service implements some subset of these methods, while a Flight
-client can call any of these methods. Thus, one Flight client can
-connect to any Flight service and perform basic operations.
+client can call any of these methods.
+
+Data streams are identified by descriptors (the ``FlightDescriptor``
+message), which are either a path or an arbitrary binary command. For
+instance, the descriptor may encode a SQL query, a path to a file on a
+distributed file system, or even a pickled Python object; the
+application can use this message as it sees fit.
+
+Thus, one Flight client can connect to any service and perform basic
+operations. To facilitate this, Flight services are *expected* to
+support some common request patterns, described next. Of course,
+applications may ignore compatibility and simply treat the Flight RPC
+methods as low-level building blocks for their own purposes.
+
+See `Protocol Buffer Definitions`_ for full details on the methods and
+messages involved.
 
-Data streams are identified by descriptors, which are either a path or
-an arbitrary binary command. A client that wishes to download the data
-would:
+Downloading Data
+----------------
+
+A client that wishes to download the data would:
+
+.. figure:: ./Flight/DoGet.mmd.svg
+
+   Retrieving data via ``DoGet``.
 
 #. Construct or acquire a ``FlightDescriptor`` for the data set they
-   are interested in. A client may know what descriptor they want
-   already, or they may use methods like ``ListFlights`` to discover
-   them.
+   are interested in.
+
+   A client may know what descriptor they want already, or they may
+   use methods like ``ListFlights`` to discover them.
 #. Call ``GetFlightInfo(FlightDescriptor)`` to get a ``FlightInfo``
-   message containing details on where the data is located (as well as
-   other metadata, like the schema and possibly an estimate of the
-   dataset size).
+   message.
 
    Flight does not require that data live on the same server as
-   metadata: this call may list other servers to connect to. The
-   ``FlightInfo`` message includes a ``Ticket``, an opaque binary
-   token that the server uses to identify the exact data set being
-   requested.
-#. Connect to other servers (if needed).
-#. Call ``DoGet(Ticket)`` to get back a stream of Arrow record
-   batches.
+   metadata. Hence, ``FlightInfo`` contains details on where the data
+   is located, so the client can go fetch the data from an appropriate
+   server. This is encoded as a series of ``FlightEndpoint`` messages
+   inside ``FlightInfo``. Each endpoint represents some location that
+   contains a subset of the response data.
+
+   An endpoint contains a list of locations (server addresses) where
+   this data can be retrieved from, and a ``Ticket``, an opaque binary
+   token that the server will use to identify the data being
+   requested. There is no ordering defined on endpoints or the data
+   within, so if the dataset is sorted, applications should return
+   data in a single endpoint.
+
+   The response also contains other metadata, like the schema, and
+   optionally an estimate of the dataset size.
+#. Consume each endpoint returned by the server.
+
+   To consume an endpoint, the client should connect to one of the
+   locations in the endpoint, then call ``DoGet(Ticket)`` with the
+   ticket in the endpoint. This will give the client a stream of Arrow
+   record batches.
+
+   If the server wishes to indicate that the data is on the local
+   server and not a different location, then it can return an empty
+   list of locations. The client can then reuse the existing
+   connection to the original server to fetch data. Otherwise, the
+   client must connect to one of the indicated locations.
+
+   In this way, the locations inside an endpoint can also be thought
+   of as performing lookaside load balacing or service discovery

Review Comment:
   ```suggestion
   of as performing look-aside load balancing or service discovery
   ```



##########
docs/source/cpp/build_system.rst:
##########
@@ -176,7 +176,7 @@ text version of the IANA timezone database and add the Windows timezone mapping
 XML. To download, you can use the following batch script:
 
 .. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
-   :language: cmd
+   :language: batch

Review Comment:
   Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #12815:
URL: https://github.com/apache/arrow/pull/12815#issuecomment-1090468964

   CC @davisusanibar we can base the Java documentation page on the Python/C++ ones here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12815: ARROW-16065: [FlightRPC][Docs] Improve Flight documentation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12815:
URL: https://github.com/apache/arrow/pull/12815#issuecomment-1090467382

   https://issues.apache.org/jira/browse/ARROW-16065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org