You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "M. Justin (Jira)" <ji...@apache.org> on 2020/09/16 15:52:00 UTC

[jira] [Updated] (PARQUET-1912) ParquetReader.read(InputFile) always causes exception on build

     [ https://issues.apache.org/jira/browse/PARQUET-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

M. Justin updated PARQUET-1912:
-------------------------------
    Summary: ParquetReader.read(InputFile) always causes exception on build  (was: ParquetReader.read always causes exception on build)

> ParquetReader.read(InputFile) always causes exception on build
> --------------------------------------------------------------
>
>                 Key: PARQUET-1912
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1912
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.11.1
>            Reporter: M. Justin
>            Priority: Major
>
> The [{{ParquetReader.read(InputFile file)}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetReader.html#read-org.apache.parquet.io.InputFile-] static factory method in {{parquet-hadoop}} creates a builder from an {{InputFile}}.  This method always throws an {{IllegalArgumentException}} when {{.build()}} is subsequently called.
> {code:java}
>             java.nio.Path parquetFile = getParquetFile();
>             ParquetReader.read(HadoopInputFile.fromPath(new org.apache.hadoop.fs.Path(parquetFile.toUri()), new Configuration()))
>                     .build();
> {code}
> {noformat}
> java.lang.IllegalArgumentException: [BUG] Classes that extend Builder should override getReadSupport()
> 	at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
> 	at org.apache.parquet.hadoop.ParquetReader$Builder.getReadSupport(ParquetReader.java:310)
> 	at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337)
> {noformat}
> The issue appears to be that the [{{build()}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetReader.Builder.html#build--] method enforces that a [{{ReadSupport}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/api/ReadSupport.html] value was set on the builder, but {{ParquetReader.read(InputFile file)}} doesn't take accept a {{ReadSupport}}, nor is there a way to set it after the builder has been created.
> For context, my use case is reading Parquet files directly from Java.
> h2. Expected behavior
> I wouldn't expect a method to exist that always results in an exception being thrown. I would expect the {{ParquetReader.read(InputFile file)}} to be fixed, replaced, or removed.
> h2. Workaround
> I am able to achieve my goal by using [{{ParquetFileReader.open(InputFile)}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetFileReader.html#open-org.apache.parquet.io.InputFile-] instead of {{ParquetReader.read(InputFile)}}.
> {code:java}
>             java.nio.Path parquetFile = getParquetFile();
>             ParquetFileReader reader = ParquetFileReader.open(
>                     HadoopInputFile.fromPath(new org.apache.hadoop.fs.Path(parquetFile.toUri()), new Configuration()));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)