You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/10/08 10:45:26 UTC

[jira] [Commented] (SQOOP-2607) Direct import from Netezza and encoding

    [ https://issues.apache.org/jira/browse/SQOOP-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948296#comment-14948296 ] 

ASF GitHub Bot commented on SQOOP-2607:
---------------------------------------

GitHub user bonnetb opened a pull request:

    https://github.com/apache/sqoop/pull/9

    [SQOOP-2607] Add a table encoding parameter for Netezza direct import

    Direct import makes an external Netezza table using 'internal' encoding
    for text colums. Then it integrates the external table into HDFS reading
    it as a UTF-8 encoded stream.
    But if the table contains VARCHAR that are not UTF-8 (e.g. ISO-8859-x),
    the external table will share the same encoding as the source table, and
    reading it as a UTF-8 encoded stream will corrupt non ASCII characters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bonnetb/sqoop trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/sqoop/pull/9.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9
    
----
commit 8b8dc61106eaebcc5c0abb1e0ae9394069f77562
Author: Benjamin BONNET <be...@m4x.org>
Date:   2015-10-05T20:50:58Z

    Add a table encoding parameter for Netezza direct import
    
    Direct import makes an external Netezza table using 'internal' encoding
    for text colums. Then it integrates the external table into HDFS reading
    it as a UTF-8 encoded stream.
    But if the table contains VARCHAR that are not UTF-8 (e.g. ISO-8859-x),
    the external table will share the same encoding as the source table, and
    reading it as a UTF-8 encoded stream will corrupt non ASCII characters.

----


> Direct import from Netezza and encoding
> ---------------------------------------
>
>                 Key: SQOOP-2607
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2607
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>    Affects Versions: 1.4.6
>            Reporter: Benjamin BONNET
>
> Hi,
> I encountered an encoding issue while importing a Netezza table containing ISO-8859-15 encoded VARCHAR. Using direct mode, non ASCII chars are corrupted. That does not occur using non-direct mode.
> Actually, direct mode uses a Netezza "external table", i.e. it flushes the table into a stream using "internal" encoding (in my case, it is ISO-8859-15).
> But Sqoop import mapper reads this stream as an UTF-8 one.
> That problem does not occur using non direct mode since it uses Netezza JDBC driver to map fields directly to Java types (no stream encoding involved).
> To have that issue fixed in my environment, I modified sqood netezza connector and added a parameter to specify netezza varchar encoding. Default value will be UTF-8 of course. I will make a pull request on github to propose that enhancement.
> Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)