You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/06/06 17:25:00 UTC

[jira] [Created] (PARQUET-2151) parquet-hadoop to drop Hadoop 1 input stream support

Steve Loughran created PARQUET-2151:
---------------------------------------

             Summary: parquet-hadoop to drop Hadoop 1 input stream support
                 Key: PARQUET-2151
                 URL: https://issues.apache.org/jira/browse/PARQUET-2151
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.13.0
            Reporter: Steve Loughran


Parquet uses reflection to load a hadoop2 input stream, falling back to a hadoop-1 compatible client if not found.

All hadoop 2.0.2+ releases work with H2SeekableInputStream, so H1SeekableInputStream can be cut and the binding to H2SeekableInputStream reworked to avoid needing reflection. This would make it a lot easier to probe for/use the bytebuffer input, and line the code up for more recent hadoop releases.

One thing H1SeekableInputStream does do is read into a temp array if the FSDataInputStream doesn't support , that is, doesn't implement ByteBufferReadable.
but FSDataInputStream simply forwards that to the inner stream, if it too implements ByteBufferReadable. Filesystems which don't (the cloud stores) can't be read through H2SeekableInputStream.read(ByteBufferReadable). If this desired, H2SeekableInputStream will need to dynamically downgrade to DelegatingSeekableInputStream's base methods if a call to FSDataInputStream.read(ByteBuffer) fails.





--
This message was sent by Atlassian Jira
(v8.20.7#820007)