You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/05/27 15:54:00 UTC

[jira] [Resolved] (ARROW-9299) [Python] Expose ORC metadata() in Python ORCFile

     [ https://issues.apache.org/jira/browse/ARROW-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou resolved ARROW-9299.
-----------------------------------
    Fix Version/s: 5.0.0
       Resolution: Fixed

Issue resolved by pull request 10157
[https://github.com/apache/arrow/pull/10157]

> [Python] Expose ORC metadata() in Python ORCFile
> ------------------------------------------------
>
>                 Key: ARROW-9299
>                 URL: https://issues.apache.org/jira/browse/ARROW-9299
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.17.1
>            Reporter: Jeremy Dyer
>            Assignee: Ying Zhou
>            Priority: Major
>              Labels: orc, pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is currently no way for a user to directly access the underlying ORC metadata of a given file. It seems the C++ functions and objects already existing and rather the plumbing is just missing the the cython/python and potentially a few c++ shims. Giving users the ability to retrieve the metadata without first reading the entire file could help numerous applications to increase their query performance by allowing them to intelligently determine which ORC stripes should be read.  
> This would allow for something like 
> {code:java}
> import pyarrow as pa 
> orc_metadata = pa.orc.ORCFile(filename).metadata()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)