You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2021/09/15 11:26:20 UTC
[arrow-cookbook] branch main updated: ARROW-13716: Add RecordBatch
recipe (#66)
This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new aa1c1b2 ARROW-13716: Add RecordBatch recipe (#66)
aa1c1b2 is described below
commit aa1c1b29f963c5e5c42428e9bc54dfa112f91926
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Wed Sep 15 13:26:16 2021 +0200
ARROW-13716: Add RecordBatch recipe (#66)
* Add RecordBatch recipe
* Apply suggestions from code review
Co-authored-by: Weston Pace <we...@gmail.com>
* Make example obvious
* Apply suggestions from code review
Co-authored-by: Nic <th...@gmail.com>
Co-authored-by: Weston Pace <we...@gmail.com>
Co-authored-by: Nic <th...@gmail.com>
---
python/source/create.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/python/source/create.rst b/python/source/create.rst
index 67ce6ef..3a1cb62 100644
--- a/python/source/create.rst
+++ b/python/source/create.rst
@@ -129,6 +129,58 @@ from a variety of inputs, including plain python objects
:func:`pyarrow.array` for conversion to Arrow arrays,
and will benefit from zero copy behaviour when possible.
+Creating Record Batches
+======================
+
+Most I/O operations in Arrow happen when shipping batches of data
+to their destination. :class:`pyarrow.RecordBatch` is the way
+Arrow represents batches of data. A RecordBatch can be seen as a slice
+of a table.
+
+.. testcode::
+
+ import pyarrow as pa
+
+ batch = pa.RecordBatch.from_arrays([
+ pa.array([1, 3, 5, 7, 9]),
+ pa.array([2, 4, 6, 8, 10])
+ ], names=["odd", "even"])
+
+Multiple batches can be combined into a table using
+:meth:`pyarrow.Table.from_batches`
+
+.. testcode::
+
+ second_batch = pa.RecordBatch.from_arrays([
+ pa.array([11, 13, 15, 17, 19]),
+ pa.array([12, 14, 16, 18, 20])
+ ], names=["odd", "even"])
+
+ table = pa.Table.from_batches([batch, second_batch])
+
+.. testcode::
+
+ print(table)
+
+.. testoutput::
+
+ pyarrow.Table
+ odd: int64
+ even: int64
+
+Equally, :class:`pyarrow.Table` can be converted to a list of
+:class:`pyarrow.RecordBatch` using the :meth:`pyarrow.Table.to_batches`
+method
+
+.. testcode::
+
+ record_batches = table.to_batches(max_chunksize=5)
+ print(len(record_batches))
+
+.. testoutput::
+
+ 2
+
Store Categorical Data
======================