You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Felix Schmalzel (Jira)" <ji...@apache.org> on 2021/02/16 15:21:00 UTC
[jira] [Created] (PARQUET-1983) Pool SeekableInputStreams in
ParquetFileReader
Felix Schmalzel created PARQUET-1983:
----------------------------------------
Summary: Pool SeekableInputStreams in ParquetFileReader
Key: PARQUET-1983
URL: https://issues.apache.org/jira/browse/PARQUET-1983
Project: Parquet
Issue Type: New Feature
Components: parquet-mr
Reporter: Felix Schmalzel
If https://issues.apache.org/jira/browse/PARQUET-1982 goes through, then we could allow parallel reading of row groups with a pool of SeekableInputStreams. This would significantly boost performance for applications that read data at random positions from a large file.
I've already developed a patch that would enable this functionality. I will link the merge request in the next few days.
Is there a related ticket that i have overlooked?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)