You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by li...@apache.org on 2022/04/05 12:28:54 UTC

[arrow] branch master updated: ARROW-16046: [Docs][FlightRPC][Python] Ensure Flight Python API is documented

This is an automated email from the ASF dual-hosted git repository.

lidavidm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new c5a8129756 ARROW-16046: [Docs][FlightRPC][Python] Ensure Flight Python API is documented
c5a8129756 is described below

commit c5a812975686a094088a95ab35b6215d52bc2b80
Author: David Li <li...@gmail.com>
AuthorDate: Tue Apr 5 08:28:45 2022 -0400

    ARROW-16046: [Docs][FlightRPC][Python] Ensure Flight Python API is documented
    
    Add some missing classes to the docs, also fix a couple build warnings.
    
    Closes #12737 from lidavidm/arrow-16046
    
    Lead-authored-by: David Li <li...@gmail.com>
    Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
    Signed-off-by: David Li <li...@gmail.com>
---
 .../developers/guide/tutorials/r_tutorial.rst      |  22 +--
 docs/source/java/vector_schema_root.rst            |  41 +++--
 docs/source/python/api/flight.rst                  |   7 +
 python/pyarrow/_flight.pyx                         | 182 ++++++++++++++++++++-
 4 files changed, 219 insertions(+), 33 deletions(-)

diff --git a/docs/source/developers/guide/tutorials/r_tutorial.rst b/docs/source/developers/guide/tutorials/r_tutorial.rst
index d536f0de7e..3b8acaab65 100644
--- a/docs/source/developers/guide/tutorials/r_tutorial.rst
+++ b/docs/source/developers/guide/tutorials/r_tutorial.rst
@@ -52,12 +52,12 @@ to Arrow R package following the steps specified by the
 :ref:`step_by_step` section. Navigate there whenever there is
 some information you may find is missing here.
 
-The binding will be added to the ``expression.R`` file in the 
+The binding will be added to the ``expression.R`` file in the
 R package. But you can also follow these steps in case you are
 adding a binding that will live somewhere else.
 
 .. seealso::
-   
+
    To read more about the philosophy behind R bindings, refer to the
    `Writing Bindings article <https://arrow.apache.org/docs/r/articles/developers/bindings.html>`_.
 
@@ -219,7 +219,7 @@ tests we have is in ``test-dplyr-funcs-datetime.R``:
      )
    })
 
-And 
+And
 
 .. code-block:: R
 
@@ -245,7 +245,7 @@ more research and code corrections.
    ℹ Testing arrow
    See arrow_info() for available features
    ✔ | F W S  OK | Context
-   ✖ | 1     230 | dplyr-funcs-datetime [1.4s]                                                                                             
+   ✖ | 1     230 | dplyr-funcs-datetime [1.4s]
    ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    Failure (test-dplyr-funcs-datetime.R:187:3): strftime
    ``%>%`(...)` did not throw the expected error.
@@ -328,7 +328,7 @@ And ``git diff`` to see the changes in the files in order to spot any error we m
    @@ -444,6 +444,15 @@ test_that("extract wday from timestamp", {
       )
     })
-    
+
    +test_that("extract mday from timestamp", {
    +  compare_dplyr_binding(
    +    .input %>%
@@ -383,11 +383,11 @@ We can use ``git log`` to check the history of commits:
    Date:   Thu Jan 20 09:45:59 2022 +0900
 
        ARROW-15372: [C++][Gandiva] Gandiva now depends on boost/crc.hpp which is missing from the trimmed boost archive
-       
+
        See build error https://github.com/ursacomputing/crossbow/runs/4871392838?check_suite_focus=true#step:5:11762
-       
+
        Closes #12190 from kszucs/ARROW-15372
-       
+
        Authored-by: Krisztián Szűcs <sz...@gmail.com>
        Signed-off-by: Sutou Kouhei <ko...@clear-code.com>
 
@@ -411,10 +411,10 @@ on GitHub called origin.
    Writing objects: 100% (151/151), 35.78 KiB | 8.95 MiB/s, done.
    Total 151 (delta 129), reused 33 (delta 20), pack-reused 0
    remote: Resolving deltas: 100% (129/129), completed with 80 local objects.
-   remote: 
+   remote:
    remote: Create a pull request for 'ARROW-14816' on GitHub by visiting:
    remote:      https://github.com/AlenkaF/arrow/pull/new/ARROW-14816
-   remote: 
+   remote:
    To https://github.com/AlenkaF/arrow.git
     * [new branch]          ARROW-14816 -> ARROW-14816
 
@@ -423,7 +423,7 @@ to create a Pull Request. On the GitHub Arrow
 page (main or forked) we will see a yellow notice
 bar with a note that we made recent pushes to the branch
 ARROW-14816. That’s great, now we can make the Pull Request
-by clicking on **Compare & pull request**. 
+by clicking on **Compare & pull request**.
 
 .. figure:: /developers/images/R_tutorial_create_pr_notice.jpeg
    :scale: 60 %
diff --git a/docs/source/java/vector_schema_root.rst b/docs/source/java/vector_schema_root.rst
index 53c8c579dc..34392a46af 100644
--- a/docs/source/java/vector_schema_root.rst
+++ b/docs/source/java/vector_schema_root.rst
@@ -73,22 +73,23 @@ with some optional schema-wide metadata (in addition to per-field metadata).
 VectorSchemaRoot
 ================
 
-.. note::
+A `VectorSchemaRoot`_ is a container for batches of data. Batches flow through
+VectorSchemaRoot as part of a pipeline.
 
-    VectorSchemaRoot is somewhat analogous to tables and record batches in the other Arrow implementations
-    in that they all are 2D datasets, but the usage is different.
+.. note::
 
-A :class:`VectorSchemaRoot` is a container that can hold batches, batches flow through :class:`VectorSchemaRoot`
-as part of a pipeline. Note this is different from other implementations (i.e. in C++ and Python,
-a :class:`RecordBatch` is a collection of equal-length vector instances and was created each time for a new batch).
+    VectorSchemaRoot is somewhat analogous to tables or record batches in the
+    other Arrow implementations in that they all are 2D datasets, but their
+    usage is different.
 
-The recommended usage for :class:`VectorSchemaRoot` is creating a single :class:`VectorSchemaRoot`
-based on the known schema and populated data over and over into the same VectorSchemaRoot in a stream
-of batches rather than creating a new :class:`VectorSchemaRoot` instance each time
-(see `Flight`_ or ``ArrowFileWriter`` for better understanding). Thus at any one point a VectorSchemaRoot may have data or
-may have no data (say it was transferred downstream or not yet populated).
+The recommended usage is to create a single VectorSchemaRoot based on a known
+schema and populate data over and over into that root in a stream of batches,
+rather than creating a new instance each time (see `Flight`_ or
+``ArrowFileWriter`` as examples). Thus at any one point, a VectorSchemaRoot may
+have data or may have no data (say it was transferred downstream or not yet
+populated).
 
-Here is the example of building a :class:`VectorSchemaRoot`
+Here is an example of creating a VectorSchemaRoot:
 
 .. code-block:: Java
 
@@ -107,9 +108,10 @@ Here is the example of building a :class:`VectorSchemaRoot`
     List<FieldVector> vectors = Arrays.asList(bitVector, varCharVector);
     VectorSchemaRoot vectorSchemaRoot = new VectorSchemaRoot(fields, vectors);
 
-The vectors within a :class:`VectorSchemaRoot` could be loaded/unloaded via :class:`VectorLoader` and :class:`VectorUnloader`.
-:class:`VectorLoader` and :class:`VectorUnloader` handles converting between :class:`VectorSchemaRoot` and :class:`ArrowRecordBatch` (
-representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Examples as below
+Data can be loaded into/unloaded from a VectorSchemaRoot via `VectorLoader`_
+and `VectorUnloader`_.  They handle converting between VectorSchemaRoot and
+`ArrowRecordBatch`_ (a representation of a RecordBatch :ref:`IPC <format-ipc>`
+message). For example:
 
 .. code-block:: Java
 
@@ -123,13 +125,18 @@ representation of a RecordBatch :doc:`IPC <../format/IPC.rst>` message). Example
     VectorLoader loader = new VectorLoader(root2);
     loader.load(recordBatch);
 
-A new :class:`VectorSchemaRoot` could be sliced from an existing instance with zero-copy
+A new VectorSchemaRoot can be sliced from an existing root without copying
+data:
 
 .. code-block:: Java
 
     // 0 indicates start index (inclusive) and 5 indicated length (exclusive).
     VectorSchemaRoot newRoot = vectorSchemaRoot.slice(0, 5);
 
+.. _`ArrowRecordBatch`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ipc/message/ArrowRecordBatch.html
 .. _`Field`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Field.html
-.. _`Schema`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Schema.html
 .. _`Flight`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/flight/package-summary.html
+.. _`Schema`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/types/pojo/Schema.html
+.. _`VectorLoader`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorLoader.html
+.. _`VectorSchemaRoot`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorSchemaRoot.html
+.. _`VectorUnloader`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorUnloader.html
diff --git a/docs/source/python/api/flight.rst b/docs/source/python/api/flight.rst
index 0cfbb6b4bd..ea1e7d9018 100644
--- a/docs/source/python/api/flight.rst
+++ b/docs/source/python/api/flight.rst
@@ -46,6 +46,8 @@ Common Types
     FlightEndpoint
     FlightInfo
     Location
+    MetadataRecordBatchReader
+    MetadataRecordBatchWriter
     Ticket
     Result
 
@@ -57,6 +59,8 @@ Flight Client
 
     FlightCallOptions
     FlightClient
+    FlightStreamReader
+    FlightStreamWriter
     ClientMiddlewareFactory
     ClientMiddleware
 
@@ -66,9 +70,12 @@ Flight Server
 .. autosummary::
    :toctree: ../generated/
 
+    FlightDataStream
+    FlightMetadataWriter
     FlightServerBase
     GeneratorStream
     RecordBatchStream
+    ServerCallContext
     ServerMiddlewareFactory
     ServerMiddleware
 
diff --git a/python/pyarrow/_flight.pyx b/python/pyarrow/_flight.pyx
index 81a5c921fd..59e2e30f27 100644
--- a/python/pyarrow/_flight.pyx
+++ b/python/pyarrow/_flight.pyx
@@ -877,7 +877,12 @@ cdef class _MetadataRecordBatchReader(_Weakrefable, _ReadPandasMixin):
 
 
 cdef class MetadataRecordBatchReader(_MetadataRecordBatchReader):
-    """The virtual base class for readers for Flight streams."""
+    """The base class for readers for Flight streams.
+
+    See Also
+    --------
+    FlightStreamReader
+    """
 
 
 cdef class FlightStreamReader(MetadataRecordBatchReader):
@@ -1172,6 +1177,11 @@ cdef class FlightClient(_Weakrefable):
     def connect(cls, location, tls_root_certs=None, cert_chain=None,
                 private_key=None, override_hostname=None,
                 disable_server_verification=None):
+        """Connect to a Flight server.
+
+        .. deprecated:: 0.15.0
+            Use the ``FlightClient`` constructor or ``pyarrow.flight.connect`` function instead.
+        """
         warnings.warn("The 'FlightClient.connect' method is deprecated, use "
                       "FlightClient constructor or pyarrow.flight.connect "
                       "function instead")
@@ -1447,6 +1457,7 @@ cdef class FlightClient(_Weakrefable):
         return py_writer, py_reader
 
     def close(self):
+        """Close the client and disconnect."""
         check_flight_status(self.client.get().Close())
 
     def __del__(self):
@@ -1462,7 +1473,14 @@ cdef class FlightClient(_Weakrefable):
 
 
 cdef class FlightDataStream(_Weakrefable):
-    """Abstract base class for Flight data streams."""
+    """
+    Abstract base class for Flight data streams.
+
+    See Also
+    --------
+    RecordBatchStream
+    GeneratorStream
+    """
 
     cdef CFlightDataStream* to_stream(self) except *:
         """Create the C++ data stream for the backing Python object.
@@ -1474,7 +1492,12 @@ cdef class FlightDataStream(_Weakrefable):
 
 
 cdef class RecordBatchStream(FlightDataStream):
-    """A Flight data stream backed by RecordBatches."""
+    """A Flight data stream backed by RecordBatches.
+
+    The remainder of this DoGet request will be handled in C++,
+    without having to acquire the GIL.
+
+    """
     cdef:
         object data_source
         CIpcWriteOptions write_options
@@ -1485,7 +1508,9 @@ cdef class RecordBatchStream(FlightDataStream):
         Parameters
         ----------
         data_source : RecordBatchReader or Table
+            The data to stream to the client.
         options : pyarrow.ipc.IpcWriteOptions, optional
+            Optional IPC options to control how to write the data.
         """
         if (not isinstance(data_source, RecordBatchReader) and
                 not isinstance(data_source, lib.Table)):
@@ -1561,6 +1586,7 @@ cdef class ServerCallContext(_Weakrefable):
         return frombytes(self.context.peer(), safe=True)
 
     def is_cancelled(self):
+        """Check if the current RPC call has been canceled by the client."""
         return self.context.is_cancelled()
 
     def get_middleware(self, key):
@@ -2452,6 +2478,10 @@ cdef class _ServerMiddlewareWrapper(ServerMiddleware):
 cdef class FlightServerBase(_Weakrefable):
     """A Flight service definition.
 
+    To start the server, create an instance of this class with an
+    appropriate location. The server will be running as soon as the
+    instance is created; it is not required to call :meth:`serve`.
+
     Override methods to define your Flight service.
 
     Parameters
@@ -2564,32 +2594,169 @@ cdef class FlightServerBase(_Weakrefable):
         return self.server.get().port()
 
     def list_flights(self, context, criteria):
+        """List flights available on this service.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        criteria : bytes
+            Filter criteria provided by the client.
+
+        Returns
+        -------
+        iterator of FlightInfo
+
+        """
         raise NotImplementedError
 
     def get_flight_info(self, context, descriptor):
+        """Get information about a flight.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        descriptor : FlightDescriptor
+            The descriptor for the flight provided by the client.
+
+        Returns
+        -------
+        FlightInfo
+
+        """
         raise NotImplementedError
 
     def get_schema(self, context, descriptor):
+        """Get the schema of a flight.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        descriptor : FlightDescriptor
+            The descriptor for the flight provided by the client.
+
+        Returns
+        -------
+        Schema
+
+        """
         raise NotImplementedError
 
-    def do_put(self, context, descriptor, reader,
+    def do_put(self, context, descriptor, reader: MetadataRecordBatchReader,
                writer: FlightMetadataWriter):
+        """Write data to a flight.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        descriptor : FlightDescriptor
+            The descriptor for the flight provided by the client.
+        reader : MetadataRecordBatchReader
+            A reader for data uploaded by the client.
+        writer : FlightMetadataWriter
+            A writer to send responses to the client.
+
+        """
         raise NotImplementedError
 
     def do_get(self, context, ticket):
+        """Write data to a flight.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        ticket : Ticket
+            The ticket for the flight.
+
+        Returns
+        -------
+        FlightDataStream
+            A stream of data to send back to the client.
+
+        """
         raise NotImplementedError
 
     def do_exchange(self, context, descriptor, reader, writer):
+        """Write data to a flight.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        descriptor : FlightDescriptor
+            The descriptor for the flight provided by the client.
+        reader : MetadataRecordBatchReader
+            A reader for data uploaded by the client.
+        writer : MetadataRecordBatchWriter
+            A writer to send responses to the client.
+
+        """
         raise NotImplementedError
 
     def list_actions(self, context):
+        """List custom actions available on this server.
+
+        Applications should override this method to implement their
+        own behavior. The default method raises a NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+
+        Returns
+        -------
+        iterator of ActionType or tuple
+
+        """
         raise NotImplementedError
 
     def do_action(self, context, action):
+        """Execute a custom action.
+
+        This method should return an iterator, or it should be a
+        generator. Applications should override this method to
+        implement their own behavior. The default method raises a
+        NotImplementedError.
+
+        Parameters
+        ----------
+        context : ServerCallContext
+            Common contextual information.
+        action : Action
+            The action to execute.
+
+        Returns
+        -------
+        iterator of bytes
+
+        """
         raise NotImplementedError
 
     def serve(self):
-        """Start serving.
+        """Block until the server shuts down.
 
         This method only returns if shutdown() is called or a signal a
         received.
@@ -2600,6 +2767,11 @@ cdef class FlightServerBase(_Weakrefable):
             check_flight_status(self.server.get().ServeWithSignals())
 
     def run(self):
+        """Block until the server shuts down.
+
+        .. deprecated:: 0.15.0
+            Use the ``FlightServer.serve`` method instead
+        """
         warnings.warn("The 'FlightServer.run' method is deprecated, use "
                       "FlightServer.serve method instead")
         self.serve()