You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2022/12/11 02:55:26 UTC
[orc] branch branch-1.8 updated: ORC-1331: Improve `PyArrow` page

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-1.8
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-1.8 by this push:
     new cb18c0534 ORC-1331: Improve `PyArrow` page
cb18c0534 is described below

commit cb18c0534fc5da82b7fc693fe6ca49e81bc54d67
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Sat Dec 10 18:55:07 2022 -0800

    ORC-1331: Improve `PyArrow` page
    
    ### What changes were proposed in this pull request?
    
    This PR aims to improve [PyArrow](https://orc.apache.org/docs/pyarrow.html) page by adding `compression` codec option and a direct link to [Apache Arrow ORC](https://arrow.apache.org/docs/python/orc.html).
    
    ### Why are the changes needed?
    
    To improve Python ORC user experience.
    
    ### How was this patch tested?
    
    Manual review.
    
    Closes #1339 from dongjoon-hyun/ORC-1331.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 1c1af025daa6fd2653ed3d87981f80b728067f7c)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 site/_docs/pyarrow.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/_docs/pyarrow.md b/site/_docs/pyarrow.md
index c159246cf..d26b1d2b6 100644
--- a/site/_docs/pyarrow.md
+++ b/site/_docs/pyarrow.md
@@ -9,25 +9,25 @@ permalink: /docs/pyarrow.html
 [Apache Arrow](https://arrow.apache.org) project's [PyArrow](https://pypi.org/project/pyarrow/) is the recommended package.
 
 ```
-pip3 install pyarrow==7.0.0
+pip3 install pyarrow==10.0.1
 pip3 install pandas
 ```
 
 ## How to write and read an ORC file
 
 ```
-In [1]: import pandas as pd
+In [1]: import pyarrow as pa
 
-In [2]: import pyarrow as pa
+In [2]: from pyarrow import orc
 
-In [3]: from pyarrow import orc
+In [3]: orc.write_table(pa.table({"col1": [1, 2, 3]}), "test.orc", compression="zstd")
 
-In [4]: orc.write_table(pa.table({"col1": [1, 2, 3]}), "test.orc")
-
-In [5]: orc.read_table("test.orc").to_pandas()
-Out[5]:
+In [4]: orc.read_table("test.orc").to_pandas()
+Out[4]:
    col1
 0     1
 1     2
 2     3
 ```
+
+[Apache Arrow ORC](https://arrow.apache.org/docs/python/orc.html) page provides more information.