You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2021/08/31 18:27:33 UTC
[arrow-cookbook] branch main updated: Update CSV recipe to use
pyarrow.csv instead of pandas (#50)
This is an automated email from the ASF dual-hosted git repository.
westonpace pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new 70f70ce Update CSV recipe to use pyarrow.csv instead of pandas (#50)
70f70ce is described below
commit 70f70ce72f1a288dcb1ead6ecbb4413266f1c12f
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Tue Aug 31 20:27:28 2021 +0200
Update CSV recipe to use pyarrow.csv instead of pandas (#50)
* Switch CSV writing to the arrow provided one
* Add incremental recipe
* Update python/source/io.rst
Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
* Update python/source/io.rst
Co-authored-by: Weston Pace <we...@gmail.com>
* Update python/source/io.rst
Co-authored-by: Weston Pace <we...@gmail.com>
Co-authored-by: Joris Van den Bossche <jo...@gmail.com>
Co-authored-by: Weston Pace <we...@gmail.com>
---
python/source/io.rst | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/python/source/io.rst b/python/source/io.rst
index 717d1db..b5a9c70 100644
--- a/python/source/io.rst
+++ b/python/source/io.rst
@@ -7,15 +7,12 @@ Apache Arrow.
.. contents::
-Write a Parquet file
-====================
-
.. testsetup::
- import numpy as np
import pyarrow as pa
- arr = pa.array(np.arange(100))
+Write a Parquet file
+====================
Given an array with 100 numbers, from 0 to 99
@@ -179,14 +176,37 @@ format can be memory mapped back directly from the disk.
Writing CSV files
=================
-It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+It is possible to write an Arrow :class:`pyarrow.Table` to
+a CSV file using the :func:`pyarrow.csv.write_csv` function
.. testcode::
+ arr = pa.array(range(100))
table = pa.Table.from_arrays([arr], names=["col1"])
- table.to_pandas().to_csv("table.csv", index=False)
+
+ import pyarrow.csv
+ pa.csv.write_csv(table, "table.csv",
+ write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to write data to a CSV file incrementally
+as you generate or retrieve the data and you don't want to keep
+in memory the whole table to write it at once, it's possible to use
+:class:`pyarrow.csv.CSVWriter` to write data incrementally
+
+.. testcode::
+
+ schema = pa.schema([("col1", pa.int32())])
+ with pa.csv.CSVWriter("table.csv", schema=schema) as writer:
+ for chunk in range(10):
+ datachunk = range(chunk*10, (chunk+1)*10)
+ table = pa.Table.from_arrays([pa.array(datachunk)], schema=schema)
+ writer.write(table)
+
+It's equally possible to write :class:`pyarrow.RecordBatch`
+by passing them as you would for tables.
Reading CSV files
=================