You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@arrow.apache.org by uw...@apache.org on 2018/04/03 09:19:05 UTC

[arrow] branch master updated: ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet

This is an automated email from the ASF dual-hosted git repository.

uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 65493a6  ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet
65493a6 is described below

commit 65493a68f56af4b4e0839fcd590ed21412e8e062
Author: Alex <al...@unexpectedeof.net>
AuthorDate: Tue Apr 3 11:18:57 2018 +0200

    ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet
    
    Added read_pandas to the parquet documentation. Added a custom index to show that it is maintained when providing a subset of columns to read_pandas.
    
    Author: Alex <al...@unexpectedeof.net>
    
    Closes #1820 from AlexHagerman/arrow-2014 and squashes the following commits:
    
    44f4412 <Alex> Added  to parquet documentation per ticket 2014
---
 python/doc/source/parquet.rst | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/python/doc/source/parquet.rst b/python/doc/source/parquet.rst
index 3d01e1d..b68d4d8 100644
--- a/python/doc/source/parquet.rst
+++ b/python/doc/source/parquet.rst
@@ -68,7 +68,8 @@ Let's look at a simple table:
 
    df = pd.DataFrame({'one': [-1, np.nan, 2.5],
                       'two': ['foo', 'bar', 'baz'],
-                      'three': [True, False, True]})
+                      'three': [True, False, True]},
+                      index=list('abc'))
    table = pa.Table.from_pandas(df)
 
 We write this to Parquet format with ``write_table``:
@@ -94,6 +95,13 @@ the whole file (due to the columnar layout):
 
    pq.read_table('example.parquet', columns=['one', 'three'])
 
+When reading a subset of columns from a file that used a Pandas dataframe as the
+source, we use ``read_pandas`` to maintain any additional index column data:
+
+.. ipython:: python
+
+   pq.read_pandas('example.parquet', columns=['two']).to_pandas()
+
 We need not use a string to specify the origin of the file. It can be any of:
 
 * A file path as a string

-- 
To stop receiving notification emails like this one, please contact
uwe@apache.org.