You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Heejong Lee (JIRA)" <ji...@apache.org> on 2019/04/04 19:05:00 UTC
[jira] [Created] (BEAM-7008) standardize UTF-8 string coder
encodings
Heejong Lee created BEAM-7008:
---------------------------------
Summary: standardize UTF-8 string coder encodings
Key: BEAM-7008
URL: https://issues.apache.org/jira/browse/BEAM-7008
Project: Beam
Issue Type: Bug
Components: sdk-java-core, sdk-py-core
Reporter: Heejong Lee
Assignee: Heejong Lee
It looks like UTF-8 String Coder in Java and Python SDKs uses different encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the input string before actual data bytes however StrUtf8Coder in Python SDK directly encodes the input string to bytes value. We should unify the encoding schemes of UTF8 strings across the different SDKs and make it a standard coder.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)