You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by liu hu <Na...@hotmail.com> on 2021/12/28 07:40:59 UTC

[Proposal] Big data component configuration and environment isolation

hi guys,
     I suggest isolating the configuration and environment of hive, Kerberos, HDFS / S3, spark and Flink to improve the user experience and scalability of scheduling tasks. Better support task on yarn / k8s environment., At present, dolphin scheduler has strong coupling with hive and Hadoop, such as data source hive, Kerberos authentication, storage HDFS / S3, big data task spark, Flink, etc.Their configurations need to be written in the configuration file in advance, and the dependent environment also needs to be loaded in advance to prevent dependency conflicts.
    I plan to externalize the configuration of hive and Kerberos and isolate the environment in the first step. See the issues / PR for details
    https://github.com/apache/dolphinscheduler/issues/7623
    https://github.com/apache/dolphinscheduler/pull/7624
     Then, the storage part of HDFS / S3 is transformed, and finally the most important k8s cluster configuration, Flink and spark configuration.

Narcasserun

Re: [Proposal] Big data component configuration and environment isolation

Posted by Lidong Dai <li...@apache.org>.
great job, great idea

I think we can discussion this topic in the mail,  I totally agree with
what you said.

if anybody who have more suggestions, please leave a word


Best Regards



---------------
Apache DolphinScheduler PMC Chair
LidongDai
lidongdai@apache.org
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Tue, Dec 28, 2021 at 3:41 PM liu hu <Na...@hotmail.com> wrote:

> hi guys,
>      I suggest isolating the configuration and environment of hive,
> Kerberos, HDFS / S3, spark and Flink to improve the user experience and
> scalability of scheduling tasks. Better support task on yarn / k8s
> environment., At present, dolphin scheduler has strong coupling with hive
> and Hadoop, such as data source hive, Kerberos authentication, storage HDFS
> / S3, big data task spark, Flink, etc.Their configurations need to be
> written in the configuration file in advance, and the dependent environment
> also needs to be loaded in advance to prevent dependency conflicts.
>     I plan to externalize the configuration of hive and Kerberos and
> isolate the environment in the first step. See the issues / PR for details
>     https://github.com/apache/dolphinscheduler/issues/7623
>     https://github.com/apache/dolphinscheduler/pull/7624
>      Then, the storage part of HDFS / S3 is transformed, and finally the
> most important k8s cluster configuration, Flink and spark configuration.
>
> Narcasserun
>