You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2021/08/31 18:27:33 UTC

[arrow-cookbook] branch main updated: Update CSV recipe to use pyarrow.csv instead of pandas (#50)

This is an automated email from the ASF dual-hosted git repository.

westonpace pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new 70f70ce  Update CSV recipe to use pyarrow.csv instead of pandas (#50)
70f70ce is described below

commit 70f70ce72f1a288dcb1ead6ecbb4413266f1c12f
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Tue Aug 31 20:27:28 2021 +0200

    Update CSV recipe to use pyarrow.csv instead of pandas (#50)
    
    * Switch CSV writing to the arrow provided one
    
    * Add incremental recipe
    
    * Update python/source/io.rst
    
    Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
    
    * Update python/source/io.rst
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update python/source/io.rst
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
    Co-authored-by: Weston Pace <we...@gmail.com>
---
 python/source/io.rst | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/python/source/io.rst b/python/source/io.rst
index 717d1db..b5a9c70 100644
--- a/python/source/io.rst
+++ b/python/source/io.rst
@@ -7,15 +7,12 @@ Apache Arrow.
 
 .. contents::
 
-Write a Parquet file
-====================
-
 .. testsetup::
 
-    import numpy as np
     import pyarrow as pa
 
-    arr = pa.array(np.arange(100))
+Write a Parquet file
+====================
 
 Given an array with 100 numbers, from 0 to 99
 
@@ -179,14 +176,37 @@ format can be memory mapped back directly from the disk.
 Writing CSV files
 =================
 
-It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+It is possible to write an Arrow :class:`pyarrow.Table` to
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to write data to a CSV file incrementally
+as you generate or retrieve the data and you don't want to keep
+in memory the whole table to write it at once, it's possible to use
+:class:`pyarrow.csv.CSVWriter` to write data incrementally
+
+.. testcode::
+
+    schema = pa.schema([("col1", pa.int32())])
+    with pa.csv.CSVWriter("table.csv", schema=schema) as writer:
+        for chunk in range(10):
+            datachunk = range(chunk*10, (chunk+1)*10)
+            table = pa.Table.from_arrays([pa.array(datachunk)], schema=schema)
+            writer.write(table)
+
+It's equally possible to write :class:`pyarrow.RecordBatch`
+by passing them as you would for tables.
 
 Reading CSV files
 =================