You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Tom Harrison (JIRA)" <ji...@apache.org> on 2017/08/11 20:47:00 UTC

[jira] [Commented] (SQOOP-2056) Support for Mysql Sqoop Metastore

    [ https://issues.apache.org/jira/browse/SQOOP-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124015#comment-16124015 ] 

Tom Harrison commented on SQOOP-2056:
-------------------------------------

Just as a note for others tracking down this issue...

As of 2017 Cloudera is still recommending Sqoop 1

and, there are multiple easily found documents on using other external DBs than HSQL all of which miss the several key observations made by the OP which I'll expand upon (mostly just for others who may be tracking down a problem)

Sqoop 1 will fail in cases where there is a concurrency issue during updates to the metastore.  It will correctly return a non-zero return code and log the error.  However a failure case during an incremental import *can lead to Hadoop data corruption* -- if the "next value" data is not updated, the _subsequent run_ of sqoop will re-import the records from the database, leading to a duplication of data.

Our case was with 1.4.6 on EMR Hadoop, with an append, having Sqoop metastore on PostgreSQL (same issue as reported here with MySQL).  Concurrent updates from sqoop jobs running in parallel sporadically resulted in IOException from metastore DB.

We'll try to do a small patch if we get a chance

> Support for Mysql Sqoop Metastore
> ---------------------------------
>
>                 Key: SQOOP-2056
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2056
>             Project: Sqoop
>          Issue Type: New Feature
>    Affects Versions: 1.4.5
>            Reporter: Karthic Hariharan
>         Attachments: sqoop-patch.txt
>
>
> We would love to see sqoop metastore supported for Mysql.
> At the moment sqoop metastore can be set up only with HSQLdb. Even though you can fake a mysql database to look like a HSQLdb (refer http://bit.ly/1tz2J5u), it does not translate to compatibility to all of sqoop's features. 
> Some of the incompatibilities are:
> * Metastore client assumes all connections to the metastore is in serializable transaction isolation so when sqoop job is executed it never really finishes because it's trying to run a transaction within a transaction.
> * Incremental loads using last modified timestamp doesnt work because the sqoop job tries to get the current time on the database which is a different sql command for Hsqldb and mysql.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)