You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "feelbergood (via GitHub)" <gi...@apache.org> on 2023/05/26 22:01:46 UTC

[GitHub] [beam] feelbergood opened a new issue, #26918: [Bug]: StrUtf8Coder in Python can't deserialize what's encoded with Java StringUtf8Coder

feelbergood opened a new issue, #26918:
URL: https://github.com/apache/beam/issues/26918

   ### What happened?
   
   I tried to use Python StrUtf8Coder to deserialize an string encoded with Java StringUtf8Coder but couldn't work. This is because beam is using different ways of implementation in Java and Python.
   
   In Java, bytes length is encoded into the output bytes: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/StringUtf8Coder.java#L50 while in Python encoding is simply calling value.encode('utf-8'): https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/coders.py#L426-L439
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26918: [Bug]: StrUtf8Coder in Python can't deserialize what's encoded with Java StringUtf8Coder

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26918:
URL: https://github.com/apache/beam/issues/26918#issuecomment-1577728814

   Thanks for reporting. I assume this has happened in an xlang pipeline?
   
   cc: @chamikaramj  do you have  advice for best practices here? 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26918: [Bug]: StrUtf8Coder in Python can't deserialize what's encoded with Java StringUtf8Coder

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26918:
URL: https://github.com/apache/beam/issues/26918#issuecomment-1641132886

   Hi, thanks for reporting this.
   
   According to reference encodings length should not be included for UTF-8 coders: https://github.com/apache/beam/blob/6152a70d64b08809940ec07c8df2d4f0168d49ec/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L85
   
   Also, looking at the Java implementation, length should only be included if the coder is operating in a nested context: https://github.com/apache/beam/blob/6152a70d64b08809940ec07c8df2d4f0168d49ec/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/StringUtf8Coder.java#L83C3-L83C3
   
   Do you have a test where you ran into an issues due to this ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org