You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2020/03/24 20:49:00 UTC
[jira] [Created] (ARROW-8201) [Python][Dataset] Improve ergonomics
of FileFragment
Ben Kietzman created ARROW-8201:
-----------------------------------
Summary: [Python][Dataset] Improve ergonomics of FileFragment
Key: ARROW-8201
URL: https://issues.apache.org/jira/browse/ARROW-8201
Project: Apache Arrow
Issue Type: Improvement
Components: C++ - Dataset, Python
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Fix For: 1.0.0
FileFragment can be made more directly useful by adding convenience methods.
For example, a FileFragment could allow underlying file/buffer to be opened directly:
{code}
def open(self):
"""
Open a NativeFile of the buffer or file viewed by this fragment.
"""
cdef:
CFileSystem* c_filesystem
shared_ptr[CRandomAccessFile] opened
NativeFile out = NativeFile()
buf = self.buffer
if buf is not None:
return pa.io.BufferReader(buf)
with nogil:
c_filesystem = self.file_fragment.source().filesystem()
opened = GetResultValue(c_filesystem.OpenInputFile(
self.file_fragment.source().path()))
out.set_random_access_file(opened)
out.is_readable = True
return out
{code}
Additionally, a ParquetFileFragment's metadata could be introspectable:
{code}
@property
def metadata(self):
from pyarrow._parquet import ParquetReader
reader = ParquetReader()
reader.open(self.open())
return reader.metadata
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)