You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Juan Carlos Araya (Jira)" <ji...@apache.org> on 2020/05/22 13:45:00 UTC
[jira] [Commented] (SQOOP-3471) While doing sqoop-export mapper
progress goes back causing duplicated data
[ https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114049#comment-17114049 ]
Juan Carlos Araya commented on SQOOP-3471:
------------------------------------------
I am seeing the same issue, if a mapper fails, it start from 0% again, this causing duplicates. In this case I tried twice, I was expecting 271Million records the first time, and endup with 384M records, second time again I was expecting 271M records and is going on 284M records..
> While doing sqoop-export mapper progress goes back causing duplicated data
> --------------------------------------------------------------------------
>
> Key: SQOOP-3471
> URL: https://issues.apache.org/jira/browse/SQOOP-3471
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.6
> Reporter: Ruben Agudo
> Priority: Major
> Attachments: image-2020-04-21-10-36-15-108.png
>
>
> We are running the sqoop-export tool in Qubole, to export some data from S3 back to an SQL Server Database.
> Our issue is that sometimes, one of the mappers of the mapping part seem that fail/restart or something. basically we see the progress going back like in the following image:
> !image-2020-04-21-10-36-15-108.png!
> This is causing duplicates in our destination table. I'm a bit lost because in the documentation it says that *"If an export map task fails due to these or other reasons, it will cause the export job to fail."* and this is not the behaviour we are seeing.
> Unfortunately we can't duplicate it in a consistent manner.
> The command that we are running is:
> sqoop export
> -Dsqoop.export.records.per.statement=50000
> -Dsqoop.export.statements.per.transaction=100
> -Dsqoop.throwOnError=1
> --connection-manager org.apache.sqoop.manager.SQLServerManager
> --driver com.microsoft.sqlserver.jdbc.SQLServerDriver
> --connect connectionString
> --table config.table
> --export-dir config.source
> --input-fields-terminated-by ,
> --num-mappers 8
> --columns theColumnsToCopy
> --batch
> --schema theSchema
> I removed the things that I can't add for privacy reasons.
> And the table we want to export contains 237,371,726 records.
> What could be the cause of the mapper going back in progress? And, if that happens, is it possible to make the sqoop export fail?
> Also, if this isn't the correct channel for this, please let me know.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)