You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2021/03/02 10:28:00 UTC

[jira] [Created] (BEAM-11908) Deprecate .withProjection from ParquetIO

Ismaël Mejía created BEAM-11908:
-----------------------------------

             Summary: Deprecate .withProjection from ParquetIO
                 Key: BEAM-11908
                 URL: https://issues.apache.org/jira/browse/BEAM-11908
             Project: Beam
          Issue Type: Improvement
          Components: io-java-parquet
            Reporter: Ismaël Mejía
            Assignee: Ismaël Mejía


There are multiple issues wrong with the API of withProjection:

1. The current API requires an extra encoderSchema that is not needed when projecting data in Parquet. The simplest way to get this with the Parquet API is by passing the projectionSchema like this:
{quote}{color:#000000}AvroReadSupport{color}.setAvroReadSchema({color:#871094}conf{color}, {color:#871094}projectionSchema{color});
{color:#000000}AvroReadSupport{color}.setRequestedProjection({color:#871094}conf{color}, {color:#871094}projectionSchema{color});
{quote}
We can offer an alternative method `withProjection(Configuration conf, List<String> fields)` so users don't have to build their own projection Schema, but historically we have let users to rely on the upstream connector API. If we follow this we can better document in ParquetIO how to project fields by relying in the Parquet APIs and avoid maintaining this extra code in the Beam side.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)