You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/04/02 13:14:00 UTC

[jira] [Commented] (HIVE-18083) Support UTF8 in MySQL Metastore Backend

    [ https://issues.apache.org/jira/browse/HIVE-18083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073713#comment-17073713 ] 

David Mollitor commented on HIVE-18083:
---------------------------------------

If Hive is going to be a drop-in replacement for MySQL/MariaDB (or something approaching that) then it needs to support UTF-8:

{quote}
Certain objects within MySQL, including database, table, index, column, alias, view, stored procedure, partition, tablespace, resource group and other object names are known as identifiers.

...

# Permitted characters in quoted identifiers include the full Unicode Basic Multilingual Plane (BMP), except U+0000
{quote}

* https://dev.mysql.com/doc/refman/8.0/en/identifiers.html

> Support UTF8 in MySQL Metastore Backend
> ---------------------------------------
>
>                 Key: HIVE-18083
>                 URL: https://issues.apache.org/jira/browse/HIVE-18083
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore, Standalone Metastore
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: David Mollitor
>            Priority: Major
>
> {code:sql|title=hive-schema-2.2.0.mysql.sql}
> CREATE TABLE IF NOT EXISTS `COLUMNS_V2` (
>   `CD_ID` bigint(20) NOT NULL,
>   `COMMENT` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
>   `COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>   `TYPE_NAME` varchar(4000) DEFAULT NULL,
>   `INTEGER_IDX` int(11) NOT NULL,
>   PRIMARY KEY (`CD_ID`,`COLUMN_NAME`),
>   KEY `COLUMNS_V2_N49` (`CD_ID`),
>   CONSTRAINT `COLUMNS_V2_FK1` FOREIGN KEY (`CD_ID`) REFERENCES `CDS` (`CD_ID`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> {code}
> Hive explicitly defines a {{CHARACTER SET latin1 COLLATE latin1_bin}} in the schema design.  This explicit definition should either be removed, so that it can fallback onto the database administrator's defaults, or changed to {{CHARACTER SET utf8 COLLATE utf8_bin}} to change the explicit definition to utf8.
> This will allow Hive to support UTF8 characters in MySQL backend databases for our international friends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)