You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2021/09/15 11:26:20 UTC

[arrow-cookbook] branch main updated: ARROW-13716: Add RecordBatch recipe (#66)

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new aa1c1b2  ARROW-13716: Add RecordBatch recipe (#66)
aa1c1b2 is described below

commit aa1c1b29f963c5e5c42428e9bc54dfa112f91926
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Wed Sep 15 13:26:16 2021 +0200

    ARROW-13716: Add RecordBatch recipe (#66)
    
    * Add RecordBatch recipe
    
    * Apply suggestions from code review
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Make example obvious
    
    * Apply suggestions from code review
    
    Co-authored-by: Nic <th...@gmail.com>
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    Co-authored-by: Nic <th...@gmail.com>
---
 python/source/create.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/python/source/create.rst b/python/source/create.rst
index 67ce6ef..3a1cb62 100644
--- a/python/source/create.rst
+++ b/python/source/create.rst
@@ -129,6 +129,58 @@ from a variety of inputs, including plain python objects
     :func:`pyarrow.array` for conversion to Arrow arrays,
     and will benefit from zero copy behaviour when possible.
 
+Creating Record Batches
+======================
+
+Most I/O operations in Arrow happen when shipping batches of data
+to their destination.  :class:`pyarrow.RecordBatch` is the way
+Arrow represents batches of data.  A RecordBatch can be seen as a slice
+of a table.
+
+.. testcode::
+
+    import pyarrow as pa
+
+    batch = pa.RecordBatch.from_arrays([
+        pa.array([1, 3, 5, 7, 9]),
+        pa.array([2, 4, 6, 8, 10])
+    ], names=["odd", "even"])
+
+Multiple batches can be combined into a table using 
+:meth:`pyarrow.Table.from_batches`
+
+.. testcode::
+
+    second_batch = pa.RecordBatch.from_arrays([
+        pa.array([11, 13, 15, 17, 19]),
+        pa.array([12, 14, 16, 18, 20])
+    ], names=["odd", "even"])
+
+    table = pa.Table.from_batches([batch, second_batch])
+
+.. testcode::
+
+    print(table)
+
+.. testoutput::
+
+    pyarrow.Table
+    odd: int64
+    even: int64
+
+Equally, :class:`pyarrow.Table` can be converted to a list of 
+:class:`pyarrow.RecordBatch` using the :meth:`pyarrow.Table.to_batches`
+method
+
+.. testcode::
+
+    record_batches = table.to_batches(max_chunksize=5)
+    print(len(record_batches))
+
+.. testoutput::
+
+    2
+
 Store Categorical Data
 ======================