You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/18 14:29:00 UTC

[jira] [Updated] (ARROW-9299) [Python] Expose ORC metadata() in Python ORCFile

     [ https://issues.apache.org/jira/browse/ARROW-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-9299:
-----------------------------------------
    Labels: orc  (was: )

> [Python] Expose ORC metadata() in Python ORCFile
> ------------------------------------------------
>
>                 Key: ARROW-9299
>                 URL: https://issues.apache.org/jira/browse/ARROW-9299
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.17.1
>            Reporter: Jeremy Dyer
>            Priority: Major
>              Labels: orc
>
> There is currently no way for a user to directly access the underlying ORC metadata of a given file. It seems the C++ functions and objects already existing and rather the plumbing is just missing the the cython/python and potentially a few c++ shims. Giving users the ability to retrieve the metadata without first reading the entire file could help numerous applications to increase their query performance by allowing them to intelligently determine which ORC stripes should be read.  
> This would allow for something like 
> {code:java}
> import pyarrow as pa 
> orc_metadata = pa.orc.ORCFile(filename).metadata()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)