You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Ruben Agudo (Jira)" <ji...@apache.org> on 2020/04/21 09:25:00 UTC

[jira] [Updated] (SQOOP-3471) While doing sqoop-export mapper progress goes back causing duplicated data

     [ https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruben Agudo updated SQOOP-3471:
-------------------------------
    Description: 
We are running the sqoop-export tool in Qubole, to export some data from S3 back to an SQL Server Database.

Our issue is that sometimes, one of the mappers of the mapping part seem that fail/restart or something. basically we see the progress going back like in the following image:

!image-2020-04-21-10-36-15-108.png!

This is causing duplicates in our destination table. I'm a bit lost because in the documentation it says that *"If an export map task fails due to these or other reasons, it will cause the export job to fail."* and this is not the behaviour we are seeing.

Unfortunately we can't duplicate it in a consistent manner.

The command that we are running is:

sqoop export 
 -Dsqoop.export.records.per.statement=50000 
 -Dsqoop.export.statements.per.transaction=100 
 -Dsqoop.throwOnError=1 
 --connection-manager org.apache.sqoop.manager.SQLServerManager 
 --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
 --connect connectionString 
 --table config.table 
 --export-dir config.source
 --input-fields-terminated-by ,
 --num-mappers 8
 --columns theColumnsToCopy
 --batch
 --schema theSchema

I removed the things that I can't add for privacy reasons.

What could be the cause of the mapper going back in progress? And, if that happens, is it possible to make the sqoop export fail?

Also, if this isn't the correct channel for this, please let me know.

Thanks!

  was:
We are running the sqoop-export tool in Qubole, to export some data from S3 back to an SQL Server Database.

Our issue is that sometimes, one of the mappers of the mapping part seem that fail/restart or something. basically we see the progress going back like in the following image:

!image-2020-04-21-10-36-15-108!

This is causing duplicates in our destination table. I'm a bit lost because in the documentation it says that *"If an export map task fails due to these or other reasons, it will cause the export job to fail."* and this is not the behaviour we are seeing.

Unfortunately we can't duplicate it in a consistent manner.

The command that we are running is:

sqoop export 
 -Dsqoop.export.records.per.statement=50000 
 -Dsqoop.export.statements.per.transaction=100 
 -Dsqoop.throwOnError=1 
 --connection-manager org.apache.sqoop.manager.SQLServerManager 
 --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
 --connect connectionString 
 --table config.table 
 --export-dir config.source
 --input-fields-terminated-by ,
 --num-mappers 8
 --columns theColumnsToCopy
 --batch
 --schema theSchema

I removed the things that I can't add for privacy reasons.

What could be the cause of the mapper going back in progress? And, if that happens, is it possible to make the sqoop export fail?

Also, if this isn't the correct channel for this, please let me know.

Thanks!


> While doing sqoop-export mapper progress goes back causing duplicated data
> --------------------------------------------------------------------------
>
>                 Key: SQOOP-3471
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3471
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Ruben Agudo
>            Priority: Major
>         Attachments: image-2020-04-21-10-36-15-108.png
>
>
> We are running the sqoop-export tool in Qubole, to export some data from S3 back to an SQL Server Database.
> Our issue is that sometimes, one of the mappers of the mapping part seem that fail/restart or something. basically we see the progress going back like in the following image:
> !image-2020-04-21-10-36-15-108.png!
> This is causing duplicates in our destination table. I'm a bit lost because in the documentation it says that *"If an export map task fails due to these or other reasons, it will cause the export job to fail."* and this is not the behaviour we are seeing.
> Unfortunately we can't duplicate it in a consistent manner.
> The command that we are running is:
> sqoop export 
>  -Dsqoop.export.records.per.statement=50000 
>  -Dsqoop.export.statements.per.transaction=100 
>  -Dsqoop.throwOnError=1 
>  --connection-manager org.apache.sqoop.manager.SQLServerManager 
>  --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
>  --connect connectionString 
>  --table config.table 
>  --export-dir config.source
>  --input-fields-terminated-by ,
>  --num-mappers 8
>  --columns theColumnsToCopy
>  --batch
>  --schema theSchema
> I removed the things that I can't add for privacy reasons.
> What could be the cause of the mapper going back in progress? And, if that happens, is it possible to make the sqoop export fail?
> Also, if this isn't the correct channel for this, please let me know.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)