You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Daniel Oliveira (Jira)" <ji...@apache.org> on 2022/03/24 23:35:00 UTC

[jira] [Commented] (BEAM-14153) Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & PyPortable

    [ https://issues.apache.org/jira/browse/BEAM-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512127#comment-17512127 ] 

Daniel Oliveira commented on BEAM-14153:
----------------------------------------

I'm looking at release-blocking bugs and trying to see if any can be safely removed from the list of release blockers.

What's the status on this? It does sound like a regression which indicates that it's a release blocker, but how soon is a fix incoming? If a fix isn't coming soon, is it a major regression or something that can easily be worked around? It sounds pretty specific to trigger this since it needs to be a reshuffled PCollection, maybe we can just provide workaround instructions along with marking this as a known issue?

> Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & PyPortable
> ---------------------------------------------------------------------------------------
>
>                 Key: BEAM-14153
>                 URL: https://issues.apache.org/jira/browse/BEAM-14153
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>    Affects Versions: 2.37.0
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P2
>             Fix For: 2.38.0
>
>
> Since First class Iterable side inputs were implemented, passing a reshuffled PCollection directly to a Side Input will cause a coder mismatch between encoding the reshuffle and decoding it on Dataflow and on Python Portable. In particular, the Row values will be encoded without a Length Prefix, but then be requested to decode them with a length prefix, which wasn't included.
> This is similar to the issue in BEAM-12438 which has been hacked around. 
> In this instance it's likely more resilient to always length prefix Row encoded types, and make it explicit in the pipeline proto. This should avoid issues with runners having odd behaviors WRT row coders at this time, while not preventing them from introspecting row encoded values should they chose. This may also allow us to avoid the hack for BEAM-12438, though that is something to be verified independently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)