You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2017/11/16 15:49:00 UTC

[jira] [Commented] (HIVE-18083) Support UTF8 in MySQL Metastore Backend

    [ https://issues.apache.org/jira/browse/HIVE-18083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255504#comment-16255504 ] 

Alan Gates commented on HIVE-18083:
-----------------------------------

Conceptually I like this, but I'm worried in terms of compatibility and testing.  In particular, what do we do for the many users who already have a metastore with tables set to Latin1?  We need to make sure as they create new tables (e.g. the workload manager tables in Hive 3) that they match their existing tables.  I also suspect this will create a whole different set of bugs in terms of table names, column names, etc.  We will need a plan to test it very thoroughly.

> Support UTF8 in MySQL Metastore Backend
> ---------------------------------------
>
>                 Key: HIVE-18083
>                 URL: https://issues.apache.org/jira/browse/HIVE-18083
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore, Standalone Metastore
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: BELUGA BEHR
>
> {code:sql|title=hive-schema-2.2.0.mysql.sql}
> CREATE TABLE IF NOT EXISTS `COLUMNS_V2` (
>   `CD_ID` bigint(20) NOT NULL,
>   `COMMENT` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
>   `COLUMN_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
>   `TYPE_NAME` varchar(4000) DEFAULT NULL,
>   `INTEGER_IDX` int(11) NOT NULL,
>   PRIMARY KEY (`CD_ID`,`COLUMN_NAME`),
>   KEY `COLUMNS_V2_N49` (`CD_ID`),
>   CONSTRAINT `COLUMNS_V2_FK1` FOREIGN KEY (`CD_ID`) REFERENCES `CDS` (`CD_ID`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> {code}
> Hive explicitly defines a {{CHARACTER SET latin1 COLLATE latin1_bin}} in the schema design.  This explicit definition should either be removed, so that it can fallback onto the database administrator's defaults, or changed to {{CHARACTER SET utf8 COLLATE utf8_bin}} to change the explicit definition to utf8.
> This will allow Hive to support UTF8 characters in MySQL backend databases for our international friends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)