You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Nikhil Gupta (Jira)" <ji...@apache.org> on 2021/01/12 05:34:00 UTC

[jira] [Created] (HIVE-24621) TEXT and varchar datatype does not support unicode encoding in MSSQL

Nikhil Gupta created HIVE-24621:
-----------------------------------

             Summary: TEXT and varchar datatype does not support unicode encoding in MSSQL
                 Key: HIVE-24621
                 URL: https://issues.apache.org/jira/browse/HIVE-24621
             Project: Hive
          Issue Type: Bug
          Components: Standalone Metastore
    Affects Versions: 4.0.0
            Reporter: Nikhil Gupta
            Assignee: Nikhil Gupta


Why Unicode is required?
In following example the Chinese character cannot be properly interpreted. 
{noformat}
CREATE VIEW `test_view` AS select `test_tbl_char`.`col1` from `test_db5`.`test_tbl_char` where `test_tbl_char`.`col1`='你好'; 

show create table test_view;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE VIEW `test_view` AS select `test_tbl_char`.`col1` from `test_db5`.`test_tbl_char` where `test_tbl_char`.`col1`='??' |
+----------------------------------------------------+ {noformat}
 
This issue comes because TBLS is defined as follows:
 
CREATE TABLE TBLS
(
 TBL_ID bigint NOT NULL,
 CREATE_TIME int NOT NULL,
 DB_ID bigint NULL,
 LAST_ACCESS_TIME int NOT NULL,
 OWNER nvarchar(767) NULL,
 OWNER_TYPE nvarchar(10) NULL,
 RETENTION int NOT NULL,
 SD_ID bigint NULL,
 TBL_NAME nvarchar(256) NULL,
 TBL_TYPE nvarchar(128) NULL,
 VIEW_EXPANDED_TEXT text NULL,
 VIEW_ORIGINAL_TEXT text NULL,
 IS_REWRITE_ENABLED bit NOT NULL DEFAULT 0,
 WRITE_ID bigint NOT NULL DEFAULT 0
);

Text data type does not support unicode encoding irrespective of collation
varchar data type does not support unicode encoding prior to SQL Server 2019. Also UTF8 enabled Collation needs to be defined for use of unicode characters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)