You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by uw...@apache.org on 2018/04/03 09:19:05 UTC
[arrow] branch master updated: ARROW-2014: [Python] Document
read_pandas method in pyarrow.parquet
This is an automated email from the ASF dual-hosted git repository.
uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 65493a6 ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet
65493a6 is described below
commit 65493a68f56af4b4e0839fcd590ed21412e8e062
Author: Alex <al...@unexpectedeof.net>
AuthorDate: Tue Apr 3 11:18:57 2018 +0200
ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet
Added read_pandas to the parquet documentation. Added a custom index to show that it is maintained when providing a subset of columns to read_pandas.
Author: Alex <al...@unexpectedeof.net>
Closes #1820 from AlexHagerman/arrow-2014 and squashes the following commits:
44f4412 <Alex> Added to parquet documentation per ticket 2014
---
python/doc/source/parquet.rst | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/python/doc/source/parquet.rst b/python/doc/source/parquet.rst
index 3d01e1d..b68d4d8 100644
--- a/python/doc/source/parquet.rst
+++ b/python/doc/source/parquet.rst
@@ -68,7 +68,8 @@ Let's look at a simple table:
df = pd.DataFrame({'one': [-1, np.nan, 2.5],
'two': ['foo', 'bar', 'baz'],
- 'three': [True, False, True]})
+ 'three': [True, False, True]},
+ index=list('abc'))
table = pa.Table.from_pandas(df)
We write this to Parquet format with ``write_table``:
@@ -94,6 +95,13 @@ the whole file (due to the columnar layout):
pq.read_table('example.parquet', columns=['one', 'three'])
+When reading a subset of columns from a file that used a Pandas dataframe as the
+source, we use ``read_pandas`` to maintain any additional index column data:
+
+.. ipython:: python
+
+ pq.read_pandas('example.parquet', columns=['two']).to_pandas()
+
We need not use a string to specify the origin of the file. It can be any of:
* A file path as a string
--
To stop receiving notification emails like this one, please contact
uwe@apache.org.