You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Heejong Lee (JIRA)" <ji...@apache.org> on 2019/04/04 19:05:00 UTC

[jira] [Created] (BEAM-7008) standardize UTF-8 string coder encodings

Heejong Lee created BEAM-7008:
---------------------------------

             Summary: standardize UTF-8 string coder encodings
                 Key: BEAM-7008
                 URL: https://issues.apache.org/jira/browse/BEAM-7008
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core, sdk-py-core
            Reporter: Heejong Lee
            Assignee: Heejong Lee


It looks like UTF-8 String Coder in Java and Python SDKs uses different encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the input string before actual data bytes however StrUtf8Coder in Python SDK directly encodes the input string to bytes value. We should unify the encoding schemes of UTF8 strings across the different SDKs and make it a standard coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)