You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/11/04 20:57:00 UTC

[jira] [Commented] (BEAM-13081) Portable representation of "packed bitset indicating null fields" in beam Row format is not compatible with jvm representations

    [ https://issues.apache.org/jira/browse/BEAM-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438933#comment-17438933 ] 

Brian Hulette commented on BEAM-13081:
--------------------------------------

pr/15829 makes the Python RowCoder tolerant of elided trailing 0s when decoding. To close out this jira I think we need to do a few more things:
- Do the same thing to the Go RowCoder
- Make Python and Go RowCoder elide trailing 0s when *encoding*
- Add a test to [standard_coders.yaml|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml] where trailing 0s in null map are elided

When that is done we won't need the test added in pr/15829, so it could be removed.

> Portable representation of "packed bitset indicating null fields" in beam Row format is not compatible with jvm representations
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-13081
>                 URL: https://issues.apache.org/jira/browse/BEAM-13081
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language
>            Reporter: Steve Niemitz
>            Assignee: Steve Niemitz
>            Priority: P2
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The JVM RowCoder strips trailing 0s from the null-value bitmap, while both python and go expect all bits to be present in the encoded bitmap.  This causes index-out-of-range errors when trying to decode a row encoded on the JVM in other languages in some circumstances.
> For example, given a Row with 10 nullable fields, if the first 8 are null and the last two are set, the row will fail to decode in python, because the nullable bitmap will only have 1 byte, but the python coder expects 2.
> As discussed in the thread, the best solution here is probably to change the python (and go) coders to accept truncated nullable bitmaps.
>  
> More discussion here:
> [https://lists.apache.org/thread.html/r2f148e29902bda8bb0ff7106fffb8a5494295450827ad7fd17289383%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)