You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2020/03/24 20:49:00 UTC

[jira] [Created] (ARROW-8201) [Python][Dataset] Improve ergonomics of FileFragment

Ben Kietzman created ARROW-8201:
-----------------------------------

             Summary: [Python][Dataset] Improve ergonomics of FileFragment
                 Key: ARROW-8201
                 URL: https://issues.apache.org/jira/browse/ARROW-8201
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++ - Dataset, Python
    Affects Versions: 0.16.0
            Reporter: Ben Kietzman
             Fix For: 1.0.0


FileFragment can be made more directly useful by adding convenience methods.

For example, a FileFragment could allow underlying file/buffer to be opened directly:
{code}
    def open(self):
        """
        Open a NativeFile of the buffer or file viewed by this fragment.
        """
        cdef:
            CFileSystem* c_filesystem
            shared_ptr[CRandomAccessFile] opened
            NativeFile out = NativeFile()

        buf = self.buffer
        if buf is not None:
            return pa.io.BufferReader(buf)

        with nogil:
            c_filesystem = self.file_fragment.source().filesystem()
            opened = GetResultValue(c_filesystem.OpenInputFile(
                self.file_fragment.source().path()))

        out.set_random_access_file(opened)
        out.is_readable = True
        return out
{code}

Additionally, a ParquetFileFragment's metadata could be introspectable:
{code}
    @property
    def metadata(self):
        from pyarrow._parquet import ParquetReader
        reader = ParquetReader()
        reader.open(self.open())
        return reader.metadata
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)