You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by "huangli1@changan.com.cn" <hu...@changan.com.cn> on 2020/06/03 02:16:12 UTC

[enhancment]Sqoop component optimization

Hi all, 
    ds-dev add the sqoop component and the sqoop component need to enhancment.
    some optimization point:
    Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters
        – MR task name
        – MR map and reduce memory and quantity, etc.
    • Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop
        It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore,
        split-by needs support
    • Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed

    ideas:
    • The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page
    • Add Hadoop custom parameter input box for setting MR parameter memory, etc.
    • Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations
    • Add option button to choose, custom script or use template script, refer to the design of DataX node
    
    If the idea is feasible, I will implement this.


Best 


Eights-Li  黄立
huangli1@changan.com.cn