You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Irwin Alejandro Rodirguez Ramirez (Jira)" <ji...@apache.org> on 2021/10/13 17:51:00 UTC

[jira] [Assigned] (BEAM-10277) beam:coder:row:v1 implementations should respect encoding_position

     [ https://issues.apache.org/jira/browse/BEAM-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Irwin Alejandro Rodirguez Ramirez reassigned BEAM-10277:
--------------------------------------------------------

    Assignee: Irwin Alejandro Rodirguez Ramirez

> beam:coder:row:v1 implementations should respect encoding_position
> ------------------------------------------------------------------
>
>                 Key: BEAM-10277
>                 URL: https://issues.apache.org/jira/browse/BEAM-10277
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core, sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Irwin Alejandro Rodirguez Ramirez
>            Priority: P3
>              Labels: Clarified
>          Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> h3. Problem/Status
> The schema proto has an [encoding_position field|https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/model/pipeline/src/main/proto/schema.proto#L55] that is currently unused in every row coder implementation. The intention of this field is that it indicates an alternative order for the fields to be encoded in by [beam:coder:row:v1 implementations|https://github.com/apache/beam/blob/1e60f383fb39b9ff8d44edcbe5357da4c1e52378/model/pipeline/src/main/proto/beam_runner_api.proto#L937-L990]. Currently all the implementations ignore this field, and always encode the fields in the order that they appear in the schema.
> h3. Motivation
> The idea with the encoding position is that it will give runners a way to enforce schema compatibility (BEAM-9502), by re-ordering the way fields are encoded when the schema changes between two job submissions. Schema changes could be due to fields re-ordering, or field additions/deletions.
> h3. Code pointers
> The Python beam:coder:row:v1 implementation lives in [row_coder.py|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py]
> The Java implementation is a little more complicated, distributed between [SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java], [RowCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java], and [RowCoderGenerator|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java]. RowCoderGenerator contains the code relevant to this jira - it uses bytebuddy to generate Java code for the coder. We need it to generate code that puts fields in the order specified by encoding_position.
> h3. Testing
> Python and Java implementations should be tested with unit tests ([RowCoderTest|https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java], [row_coder_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder_test.py]). We should also test them for compatibility by adding test cases that exercise the encoding_position in [standard_coders.yaml|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml]. These tests will be executed by [CommonCoderTest|https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java] and [standard_coders_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/standard_coders_test.py]. There's some example code for generating a new test case [here|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L387-L400].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)