You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/09/02 16:42:00 UTC
[jira] [Comment Edited] (ARROW-8201) [Python][Dataset] Improve ergonomics of FileFragment
[ https://issues.apache.org/jira/browse/ARROW-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599610#comment-17599610 ]
Antoine Pitrou edited comment on ARROW-8201 at 9/2/22 4:41 PM:
---------------------------------------------------------------
[~milesgranger] Perhaps you would be interested in finding out whether this issue still applies, and if so, to come up with a PR?
was (Author: pitrou):
[~milesgranger] Perhaps you would be interested whether this issue still applies, and if so, to come up with a PR?
> [Python][Dataset] Improve ergonomics of FileFragment
> ----------------------------------------------------
>
> Key: ARROW-8201
> URL: https://issues.apache.org/jira/browse/ARROW-8201
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Affects Versions: 0.16.0
> Reporter: Ben Kietzman
> Priority: Major
> Labels: dataset
>
> FileFragment can be made more directly useful by adding convenience methods.
> For example, a FileFragment could allow underlying file/buffer to be opened directly:
> {code}
> def open(self):
> """
> Open a NativeFile of the buffer or file viewed by this fragment.
> """
> cdef:
> CFileSystem* c_filesystem
> shared_ptr[CRandomAccessFile] opened
> NativeFile out = NativeFile()
> buf = self.buffer
> if buf is not None:
> return pa.io.BufferReader(buf)
> with nogil:
> c_filesystem = self.file_fragment.source().filesystem()
> opened = GetResultValue(c_filesystem.OpenInputFile(
> self.file_fragment.source().path()))
> out.set_random_access_file(opened)
> out.is_readable = True
> return out
> {code}
> Additionally, a ParquetFileFragment's metadata could be introspectable:
> {code}
> @property
> def metadata(self):
> from pyarrow._parquet import ParquetReader
> reader = ParquetReader()
> reader.open(self.open())
> return reader.metadata
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)