You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Felix Schmalzel (Jira)" <ji...@apache.org> on 2021/02/16 15:21:00 UTC

[jira] [Created] (PARQUET-1983) Pool SeekableInputStreams in ParquetFileReader

Felix Schmalzel created PARQUET-1983:
----------------------------------------

             Summary: Pool SeekableInputStreams in ParquetFileReader
                 Key: PARQUET-1983
                 URL: https://issues.apache.org/jira/browse/PARQUET-1983
             Project: Parquet
          Issue Type: New Feature
          Components: parquet-mr
            Reporter: Felix Schmalzel


 

If https://issues.apache.org/jira/browse/PARQUET-1982 goes through, then we could allow parallel reading of row groups with a pool of SeekableInputStreams. This would significantly boost performance for applications that read data at random positions from a large file.

I've already developed a patch that would enable this functionality. I will link the merge request in the next few days.

Is there a related ticket that i have overlooked?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)