You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2023/06/06 03:58:25 UTC
[arrow] branch main updated: MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)
This is an automated email from the ASF dual-hosted git repository.
alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new e2ae492b63 MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)
e2ae492b63 is described below
commit e2ae492b63245a8bc6330b10785777ab04659798
Author: Bryce Mecum <pe...@gmail.com>
AuthorDate: Mon Jun 5 19:58:15 2023 -0800
MINOR: [Documentation][Python] Minor tweaks to docs around IPC messages (#35880)
### Rationale for this change
@ jorisvandenbossche pointed out that the docstring for RecordBatch.serialize might not be clear enough and that users may be surprised to find out a Schema isn't included. This PR makes that clearer and improves up related documentation.
### Are these changes tested?
Yes. I built the docs and verified the rendering was correct and I ran the changed example locally. I didn't --doctest-cython it because it looks like that hasn't been set up for this module.
### Are there any user-facing changes?
No, these are just docs improvements.
Lead-authored-by: Bryce Mecum <pe...@gmail.com>
Co-authored-by: Alenka Frim <Al...@users.noreply.github.com>
Signed-off-by: AlenkaF <fr...@gmail.com>
---
docs/source/format/Glossary.rst | 7 +++++--
docs/source/format/Other.rst | 10 ++++++----
python/pyarrow/table.pxi | 20 ++++++++++++++++++--
3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/docs/source/format/Glossary.rst b/docs/source/format/Glossary.rst
index ac18c1618b..65d6e0afa4 100644
--- a/docs/source/format/Glossary.rst
+++ b/docs/source/format/Glossary.rst
@@ -151,8 +151,11 @@ Glossary
IPC message
message
- The IPC representation of a particular in-memory structure,
- like a record batch or schema.
+ The IPC representation of a particular in-memory structure, like a :term:`record
+ batch` or :term:`schema`. Will always be one of the members of ``MessageHeader``
+ in the `Flatbuffers protocol file
+ <https://github.com/apache/arrow/blob/main/format/Message.fbs>`_.
+
IPC streaming format
streaming format
diff --git a/docs/source/format/Other.rst b/docs/source/format/Other.rst
index 9504998d62..cb5234e0c2 100644
--- a/docs/source/format/Other.rst
+++ b/docs/source/format/Other.rst
@@ -18,10 +18,10 @@
Other Data Structures
=====================
-Our Flatbuffers protocol files have metadata for some other data
-structures defined to allow other kinds of applications to take
-advantage of common interprocess communication machinery. These data
-structures are not considered to be part of the columnar format.
+Our `Flatbuffers protocol definition files`_ have metadata for some other data
+structures defined to allow other kinds of applications to take advantage of
+common interprocess communication machinery. These data structures are not
+considered to be part of the columnar format.
An Arrow columnar implementation is not required to implement these
types.
@@ -61,3 +61,5 @@ region) to be multiples of 64 bytes: ::
The contents of the sparse tensor index depends on what kind of sparse
format is used.
+
+.. _Flatbuffers protocol definition files: https://github.com/apache/arrow/tree/main/format
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 5f1ee00201..2c3092d093 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -2274,7 +2274,12 @@ cdef class RecordBatch(_Tabular):
def serialize(self, memory_pool=None):
"""
- Write RecordBatch to Buffer as encapsulated IPC message.
+ Write RecordBatch to Buffer as encapsulated IPC message, which does not
+ include a Schema.
+
+ To reconstruct a RecordBatch from the encapsulated IPC message Buffer
+ returned by this function, a Schema must be passed separately. See
+ Examples.
Parameters
----------
@@ -2292,8 +2297,19 @@ cdef class RecordBatch(_Tabular):
>>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
>>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
... names=["n_legs", "animals"])
- >>> batch.serialize()
+ >>> buf = batch.serialize()
+ >>> buf
<pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
+
+ Reconstruct RecordBatch from IPC message Buffer and original Schema
+
+ >>> pa.ipc.read_record_batch(buf, batch.schema)
+ pyarrow.RecordBatch
+ n_legs: int64
+ animals: string
+ ----
+ n_legs: [2,2,4,4,5,100]
+ animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
"""
cdef shared_ptr[CBuffer] buffer
cdef CIpcWriteOptions options = CIpcWriteOptions.Defaults()