You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/03/05 00:02:00 UTC

[jira] [Commented] (BEAM-11908) Deprecate .withProjection from ParquetIO

    [ https://issues.apache.org/jira/browse/BEAM-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295651#comment-17295651 ] 

Brian Hulette commented on BEAM-11908:
--------------------------------------

CC: [~heejong]

> Deprecate .withProjection from ParquetIO
> ----------------------------------------
>
>                 Key: BEAM-11908
>                 URL: https://issues.apache.org/jira/browse/BEAM-11908
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-parquet
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: P3
>
> There are multiple issues wrong with the API of withProjection:
> 1. The current API requires an extra encoderSchema that is not needed when projecting data in Parquet. The simplest way to get this with the Parquet API is by passing the projectionSchema like this:
> {quote}{color:#000000}AvroReadSupport{color}.setAvroReadSchema({color:#871094}conf{color}, {color:#871094}projectionSchema{color});
> {color:#000000}AvroReadSupport{color}.setRequestedProjection({color:#871094}conf{color}, {color:#871094}projectionSchema{color});
> {quote}
> We can offer an alternative method `withProjection(Configuration conf, List<String> fields)` so users don't have to build their own projection Schema, but historically we have let users to rely on the upstream connector API. If we follow this we can better document in ParquetIO how to project fields by relying in the Parquet APIs and avoid maintaining this extra code in the Beam side.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)