You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/09/02 16:42:00 UTC

[jira] [Commented] (ARROW-8201) [Python][Dataset] Improve ergonomics of FileFragment

    [ https://issues.apache.org/jira/browse/ARROW-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599610#comment-17599610 ] 

Antoine Pitrou commented on ARROW-8201:
---------------------------------------

[~milesgranger]  Perhaps you would be interested whether this issue still applies, and if so, to come up with a PR?

> [Python][Dataset] Improve ergonomics of FileFragment
> ----------------------------------------------------
>
>                 Key: ARROW-8201
>                 URL: https://issues.apache.org/jira/browse/ARROW-8201
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Priority: Major
>              Labels: dataset
>
> FileFragment can be made more directly useful by adding convenience methods.
> For example, a FileFragment could allow underlying file/buffer to be opened directly:
> {code}
>     def open(self):
>         """
>         Open a NativeFile of the buffer or file viewed by this fragment.
>         """
>         cdef:
>             CFileSystem* c_filesystem
>             shared_ptr[CRandomAccessFile] opened
>             NativeFile out = NativeFile()
>         buf = self.buffer
>         if buf is not None:
>             return pa.io.BufferReader(buf)
>         with nogil:
>             c_filesystem = self.file_fragment.source().filesystem()
>             opened = GetResultValue(c_filesystem.OpenInputFile(
>                 self.file_fragment.source().path()))
>         out.set_random_access_file(opened)
>         out.is_readable = True
>         return out
> {code}
> Additionally, a ParquetFileFragment's metadata could be introspectable:
> {code}
>     @property
>     def metadata(self):
>         from pyarrow._parquet import ParquetReader
>         reader = ParquetReader()
>         reader.open(self.open())
>         return reader.metadata
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)