You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2019/09/10 08:24:00 UTC
[jira] [Created] (SPARK-29034) String Constants with C-style Escapes

Yuming Wang created SPARK-29034:
-----------------------------------

             Summary: String Constants with C-style Escapes
                 Key: SPARK-29034
                 URL: https://issues.apache.org/jira/browse/SPARK-29034
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Yuming Wang


PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter {{E}} (upper or lower case) just before the opening single quote, e.g., {{E'foo'}}. (When continuing an escape string constant across lines, write {{E}} only before the first opening quote.) Within an escape string, a backslash character ({{\}}) begins a C-like _backslash escape_ sequence, in which the combination of backslash and following character(s) represent a special byte value, as shown in [Table 4-1|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE].

*Table 4-1. Backslash Escape Sequences*
||Backslash Escape Sequence||Interpretation||
|{{\b}}|backspace|
|{{\f}}|form feed|
|{{\n}}|newline|
|{{\r}}|carriage return|
|{{\t}}|tab|
|{{\}}{{o}}, {{\}}{{oo}}, {{\}}{{ooo}} ({{o}} = 0 - 7)|octal byte value|
|{{\x}}{{h}}, {{\x}}{{hh}} ({{h}} = 0 - 9, A - F)|hexadecimal byte value|
|{{\u}}{{xxxx}}, {{\U}}{{xxxxxxxx}} ({{x}} = 0 - 9, A - F)|16 or 32-bit hexadecimal Unicode character value|

Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes ({{\\}}). Also, a single quote can be included in an escape string by writing {{\'}}, in addition to the normal way of {{''}}.

It is your responsibility that the byte sequences you create, especially when using the octal or hexadecimal escapes, compose valid characters in the server character set encoding. When the server encoding is UTF-8, then the Unicode escapes or the alternative Unicode escape syntax, explained in [Section 4.1.2.3|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE], should be used instead. (The alternative would be doing the UTF-8 encoding by hand and writing out the bytes, which would be very cumbersome.)

The Unicode escape syntax works fully only when the server encoding is {{UTF8}}. When other server encodings are used, only code points in the ASCII range (up to {{\u007F}}) can be specified. Both the 4-digit and the 8-digit form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 8-digit form technically makes this unnecessary. (When surrogate pairs are used when the server encoding is {{UTF8}}, they are first combined into a single code point that is then encoded in UTF-8.)
 
 
[https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE]
 
Example:
{code:sql}
postgres=# SET bytea_output TO escape;
SET
postgres=# SELECT E'Th\\000omas'::bytea;
   bytea
------------
 Th\000omas
(1 row)

postgres=# SELECT 'Th\\000omas'::bytea;
    bytea
-------------
 Th\\000omas
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org