You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "luke de feo (Jira)" <ji...@apache.org> on 2021/03/01 11:07:00 UTC

[jira] [Commented] (BEAM-9502) SchemaCoder is not update compatible

    [ https://issues.apache.org/jira/browse/BEAM-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292822#comment-17292822 ] 

luke de feo commented on BEAM-9502:
-----------------------------------

Hello is there any update on this?

> SchemaCoder is not update compatible
> ------------------------------------
>
>                 Key: BEAM-9502
>                 URL: https://issues.apache.org/jira/browse/BEAM-9502
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-java-core
>            Reporter: Yaron Neuman
>            Priority: P3
>              Labels: Clarified, stale-assigned
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> See relevant [dev@ discussion|https://lists.apache.org/thread.html/r720682624251222c2f3140c785463c0ddfdd782fc88d5d2a48a464c7%40%3Cdev.beam.apache.org%3E]. Runners should consider schemas compatible if they have the same fields in the same order. They should also get the ability to re-order fields in equivalent schemas (same fields, possibly out of order) using the encoding_position field (BEAM-10277).
> Original Description:
> h2. SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail
> After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster comparison.
>  After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
> thus, when trying to update (--update) a Dataflow job with row schemas in user-code, the compatibility check will fail because SchemaCoder produce another random UUID.
>  
> The user can set the UUID after creating the Schema, but not with Schema.Builder
>  and I'm afraid most users, that are not aware to the internal implementation, won't do that.
>  
> In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
> But I think a better solution will be to calculate the UUID based on the schema itself.
> any thoughts?
> [~reuvenlax]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)