You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by "huangli1@changan.com.cn" <hu...@changan.com.cn> on 2020/06/03 02:16:12 UTC
[enhancment]Sqoop component optimization
Hi all,
ds-dev add the sqoop component and the sqoop component need to enhancment.
some optimization point:
Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters
– MR task name
– MR map and reduce memory and quantity, etc.
• Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop
It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore,
split-by needs support
• Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed
ideas:
• The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page
• Add Hadoop custom parameter input box for setting MR parameter memory, etc.
• Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations
• Add option button to choose, custom script or use template script, refer to the design of DataX node
If the idea is feasible, I will implement this.
Best
Eights-Li 黄立
huangli1@changan.com.cn