You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/04/02 13:14:00 UTC
[jira] [Commented] (HIVE-18083) Support UTF8 in MySQL Metastore
Backend
[ https://issues.apache.org/jira/browse/HIVE-18083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073713#comment-17073713 ]
David Mollitor commented on HIVE-18083:
---------------------------------------
If Hive is going to be a drop-in replacement for MySQL/MariaDB (or something approaching that) then it needs to support UTF-8:
{quote}
Certain objects within MySQL, including database, table, index, column, alias, view, stored procedure, partition, tablespace, resource group and other object names are known as identifiers.
...
# Permitted characters in quoted identifiers include the full Unicode Basic Multilingual Plane (BMP), except U+0000
{quote}
* https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
> Support UTF8 in MySQL Metastore Backend
> ---------------------------------------
>
> Key: HIVE-18083
> URL: https://issues.apache.org/jira/browse/HIVE-18083
> Project: Hive
> Issue Type: Improvement
> Components: Metastore, Standalone Metastore
> Affects Versions: 3.0.0, 2.4.0
> Reporter: David Mollitor
> Priority: Major
>
> {code:sql|title=hive-schema-2.2.0.mysql.sql}
> CREATE TABLE IF NOT EXISTS `COLUMNS_V2` (
> `CD_ID` bigint(20) NOT NULL,
> `COMMENT` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
> `COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
> `TYPE_NAME` varchar(4000) DEFAULT NULL,
> `INTEGER_IDX` int(11) NOT NULL,
> PRIMARY KEY (`CD_ID`,`COLUMN_NAME`),
> KEY `COLUMNS_V2_N49` (`CD_ID`),
> CONSTRAINT `COLUMNS_V2_FK1` FOREIGN KEY (`CD_ID`) REFERENCES `CDS` (`CD_ID`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> {code}
> Hive explicitly defines a {{CHARACTER SET latin1 COLLATE latin1_bin}} in the schema design. This explicit definition should either be removed, so that it can fallback onto the database administrator's defaults, or changed to {{CHARACTER SET utf8 COLLATE utf8_bin}} to change the explicit definition to utf8.
> This will allow Hive to support UTF8 characters in MySQL backend databases for our international friends.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)