You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Anton Kedin (JIRA)" <ji...@apache.org> on 2017/12/05 20:09:00 UTC

[jira] [Created] (BEAM-3292) Remove BeamRecordSqlType

Anton Kedin created BEAM-3292:
---------------------------------

             Summary: Remove BeamRecordSqlType
                 Key: BEAM-3292
                 URL: https://issues.apache.org/jira/browse/BEAM-3292
             Project: Beam
          Issue Type: Bug
          Components: dsl-sql
            Reporter: Anton Kedin
            Assignee: Anton Kedin


[BeamRecordType|https://github.com/apache/beam/blob/39e66e953b0f8e16435acb038cad364acf2b3a57/sdks/java/core/src/main/java/org/apache/beam/sdk/values/BeamRecordType.java] is implemented as 2 lists: the list of field names, and the list of the coders for those fields. Both lists are ordered.

[BeamRecordSqlType|https://github.com/apache/beam/blob/2eb7de0fe6e96da9805fc827294da1e1329ff716/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamRecordSqlType.java] additionally has a list of [java.sql.Types|https://docs.oracle.com/javase/7/docs/api/java/sql/Types.html] ints to define types of those fields. It is used to map between Java types, Calcite types, and Beam Coders.

This information is not used for anything except for that mapping, which in turn is only used to create records and map back to Calcite types.

But because of this indirect mapping we cannot rely on core BeamRecordType and are forced to have BeamRecordSqlType. This introduces additional complexity, when, for example, generating record types based on pojo classes.

If we could find another mechanism to map Calcite types and java classes to Beam Coders bypassing java.sql.Types then we can just use the core BeamRecordType and remove the BeamRecordSqlType functionality.

One approach is to have a predefined set of coders which are then used like types, e.g.:
{code:java}

public static class SqlCoders {
   public Coder INTEGER = VarIntCoder.of();
   public Coder VARCHAR = StringUtf8COder.of();
   public Coder TIMESTAMP = DateCoder.of();
}
{code}

Problem with that approach is establishing the coders identity. That is, when a coder is serialized and then deserialized, it becomes a different instance, so we need a mechanism to know the identity or maybe just equality of the coders. If this is solved then replacing java.sql.Types with predefined SQL coders like above becomes trivial.

Few links on this:
 - https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java#L56

- https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslator.java#L34

 - https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L391




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)