You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ying Zhou (Jira)" <ji...@apache.org> on 2021/04/20 11:12:00 UTC

[jira] [Commented] (ARROW-9299) [Python] Expose ORC metadata() in Python ORCFile

    [ https://issues.apache.org/jira/browse/ARROW-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325719#comment-17325719 ] 

Ying Zhou commented on ARROW-9299:
----------------------------------

I will try to do it by Oct.

> [Python] Expose ORC metadata() in Python ORCFile
> ------------------------------------------------
>
>                 Key: ARROW-9299
>                 URL: https://issues.apache.org/jira/browse/ARROW-9299
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.17.1
>            Reporter: Jeremy Dyer
>            Assignee: Ying Zhou
>            Priority: Major
>              Labels: orc
>
> There is currently no way for a user to directly access the underlying ORC metadata of a given file. It seems the C++ functions and objects already existing and rather the plumbing is just missing the the cython/python and potentially a few c++ shims. Giving users the ability to retrieve the metadata without first reading the entire file could help numerous applications to increase their query performance by allowing them to intelligently determine which ORC stripes should be read.  
> This would allow for something like 
> {code:java}
> import pyarrow as pa 
> orc_metadata = pa.orc.ORCFile(filename).metadata()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)