You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Yueyang Qiu (Jira)" <ji...@apache.org> on 2020/01/25 00:33:00 UTC

[jira] [Assigned] (BEAM-9180) [ZetaSQL] Support 4-byte unicode in literal string unparsing

     [ https://issues.apache.org/jira/browse/BEAM-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yueyang Qiu reassigned BEAM-9180:
---------------------------------

    Assignee: Yueyang Qiu

> [ZetaSQL] Support 4-byte unicode in literal string unparsing
> ------------------------------------------------------------
>
>                 Key: BEAM-9180
>                 URL: https://issues.apache.org/jira/browse/BEAM-9180
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql-zetasql
>            Reporter: Kirill Kozlov
>            Assignee: Yueyang Qiu
>            Priority: Major
>
> When unprasing literal strings we need to escape special symbols (ex: `\n`, `\r`, `\u0012`).
> ZetaSQL supports for some 4-byte (or 8 hex digit) unicode via `\Uhhhhhhhh`.
> As of [now|[https://github.com/apache/beam/blob/8a35f408f640d04c38ad6e2a497d30410b3bff32/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L59]] only 2-byte (or 4 hex digit) unicode is supported by escaping it via `\u`.
>  
> More about escape sequences here (need to scroll down a little): 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)