You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/11 06:50:00 UTC

[jira] [Resolved] (ARROW-8201) [Python][Dataset] Improve ergonomics of FileFragment

     [ https://issues.apache.org/jira/browse/ARROW-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche resolved ARROW-8201.
------------------------------------------
    Fix Version/s: 10.0.0
       Resolution: Fixed

Issue resolved by pull request 14301
[https://github.com/apache/arrow/pull/14301]

> [Python][Dataset] Improve ergonomics of FileFragment
> ----------------------------------------------------
>
>                 Key: ARROW-8201
>                 URL: https://issues.apache.org/jira/browse/ARROW-8201
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Assignee: Miles Granger
>            Priority: Major
>              Labels: dataset, pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> FileFragment can be made more directly useful by adding convenience methods.
> For example, a FileFragment could allow underlying file/buffer to be opened directly:
> {code}
>     def open(self):
>         """
>         Open a NativeFile of the buffer or file viewed by this fragment.
>         """
>         cdef:
>             CFileSystem* c_filesystem
>             shared_ptr[CRandomAccessFile] opened
>             NativeFile out = NativeFile()
>         buf = self.buffer
>         if buf is not None:
>             return pa.io.BufferReader(buf)
>         with nogil:
>             c_filesystem = self.file_fragment.source().filesystem()
>             opened = GetResultValue(c_filesystem.OpenInputFile(
>                 self.file_fragment.source().path()))
>         out.set_random_access_file(opened)
>         out.is_readable = True
>         return out
> {code}
> Additionally, a ParquetFileFragment's metadata could be introspectable:
> {code}
>     @property
>     def metadata(self):
>         from pyarrow._parquet import ParquetReader
>         reader = ParquetReader()
>         reader.open(self.open())
>         return reader.metadata
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)