You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2023/06/06 03:58:25 UTC

[arrow] branch main updated: MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)

This is an automated email from the ASF dual-hosted git repository.

alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new e2ae492b63 MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)
e2ae492b63 is described below

commit e2ae492b63245a8bc6330b10785777ab04659798
Author: Bryce Mecum <pe...@gmail.com>
AuthorDate: Mon Jun 5 19:58:15 2023 -0800

    MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)
    
    ### Rationale for this change
    
    @ jorisvandenbossche pointed out that the docstring for RecordBatch.serialize might not be clear enough and that users may be surprised to find out a Schema isn't included. This PR makes that clearer and improves up related documentation.
    
    ### Are these changes tested?
    
    Yes. I built the docs and verified the rendering was correct and I ran the changed example locally. I didn't --doctest-cython it because it looks like that hasn't been set up for this module.
    
    ### Are there any user-facing changes?
    
    No, these are just docs improvements.
    
    Lead-authored-by: Bryce Mecum <pe...@gmail.com>
    Co-authored-by: Alenka Frim <Al...@users.noreply.github.com>
    Signed-off-by: AlenkaF <fr...@gmail.com>
---
 docs/source/format/Glossary.rst |  7 +++++--
 docs/source/format/Other.rst    | 10 ++++++----
 python/pyarrow/table.pxi        | 20 ++++++++++++++++++--
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/docs/source/format/Glossary.rst b/docs/source/format/Glossary.rst
index ac18c1618b..65d6e0afa4 100644
--- a/docs/source/format/Glossary.rst
+++ b/docs/source/format/Glossary.rst
@@ -151,8 +151,11 @@ Glossary
 
    IPC message
    message
-       The IPC representation of a particular in-memory structure,
-       like a record batch or schema.
+       The IPC representation of a particular in-memory structure, like a :term:`record
+       batch` or :term:`schema`. Will always be one of the members of ``MessageHeader``
+       in the `Flatbuffers protocol file
+       <https://github.com/apache/arrow/blob/main/format/Message.fbs>`_.
+
 
    IPC streaming format
    streaming format
diff --git a/docs/source/format/Other.rst b/docs/source/format/Other.rst
index 9504998d62..cb5234e0c2 100644
--- a/docs/source/format/Other.rst
+++ b/docs/source/format/Other.rst
@@ -18,10 +18,10 @@
 Other Data Structures
 =====================
 
-Our Flatbuffers protocol files have metadata for some other data
-structures defined to allow other kinds of applications to take
-advantage of common interprocess communication machinery. These data
-structures are not considered to be part of the columnar format.
+Our `Flatbuffers protocol definition files`_ have metadata for some other data
+structures defined to allow other kinds of applications to take advantage of
+common interprocess communication machinery. These data structures are not
+considered to be part of the columnar format.
 
 An Arrow columnar implementation is not required to implement these
 types.
@@ -61,3 +61,5 @@ region) to be multiples of 64 bytes: ::
 
 The contents of the sparse tensor index depends on what kind of sparse
 format is used.
+
+.. _Flatbuffers protocol definition files: https://github.com/apache/arrow/tree/main/format
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 5f1ee00201..2c3092d093 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -2274,7 +2274,12 @@ cdef class RecordBatch(_Tabular):
 
     def serialize(self, memory_pool=None):
         """
-        Write RecordBatch to Buffer as encapsulated IPC message.
+        Write RecordBatch to Buffer as encapsulated IPC message, which does not
+        include a Schema.
+
+        To reconstruct a RecordBatch from the encapsulated IPC message Buffer 
+        returned by this function, a Schema must be passed separately. See 
+        Examples.
 
         Parameters
         ----------
@@ -2292,8 +2297,19 @@ cdef class RecordBatch(_Tabular):
         >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
         >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
         ...                                     names=["n_legs", "animals"])
-        >>> batch.serialize()
+        >>> buf = batch.serialize()
+        >>> buf
         <pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
+
+        Reconstruct RecordBatch from IPC message Buffer and original Schema
+
+        >>> pa.ipc.read_record_batch(buf, batch.schema)
+        pyarrow.RecordBatch
+        n_legs: int64
+        animals: string
+        ----
+        n_legs: [2,2,4,4,5,100]
+        animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
         """
         cdef shared_ptr[CBuffer] buffer
         cdef CIpcWriteOptions options = CIpcWriteOptions.Defaults()