You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Eric Huang (JIRA)" <ji...@apache.org> on 2014/11/07 08:26:33 UTC

[jira] [Created] (SQOOP-1692) Confusion code occurred while importing data from MySQL into HBase

Eric Huang created SQOOP-1692:
---------------------------------

             Summary: Confusion code occurred while importing data from MySQL into HBase
                 Key: SQOOP-1692
                 URL: https://issues.apache.org/jira/browse/SQOOP-1692
             Project: Sqoop
          Issue Type: Bug
          Components: hbase-integration
    Affects Versions: 1.4.4
            Reporter: Eric Huang
             Fix For: 1.4.4


If the charset of MySQL is latin1(default) and tables contain Chinese characters, Importing data from MySQL to HBase will cause confusion code. Some guys said it's because charset "latin1"(similar with cp1252) of MySQL is not standard latin1(ISO-8859-1). ISO-8859-1 latin1 treats the code points between 0x80 and 0x9f as “undefined”. 

For details:
latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,”  whereas cp1252, and therefore MySQL's latin1, assign characters for those positions.  For example, 0x80 is the Euro sign. For the “undefined” entries in cp1252,  MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.	




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)