You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Lidong Dai <li...@apache.org> on 2022/01/05 15:37:00 UTC

Re: [Proposal] Big data component configuration and environment isolation

great job, great idea

I think we can discussion this topic in the mail,  I totally agree with
what you said.

if anybody who have more suggestions, please leave a word


Best Regards



---------------
Apache DolphinScheduler PMC Chair
LidongDai
lidongdai@apache.org
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Tue, Dec 28, 2021 at 3:41 PM liu hu <Na...@hotmail.com> wrote:

> hi guys,
>      I suggest isolating the configuration and environment of hive,
> Kerberos, HDFS / S3, spark and Flink to improve the user experience and
> scalability of scheduling tasks. Better support task on yarn / k8s
> environment., At present, dolphin scheduler has strong coupling with hive
> and Hadoop, such as data source hive, Kerberos authentication, storage HDFS
> / S3, big data task spark, Flink, etc.Their configurations need to be
> written in the configuration file in advance, and the dependent environment
> also needs to be loaded in advance to prevent dependency conflicts.
>     I plan to externalize the configuration of hive and Kerberos and
> isolate the environment in the first step. See the issues / PR for details
>     https://github.com/apache/dolphinscheduler/issues/7623
>     https://github.com/apache/dolphinscheduler/pull/7624
>      Then, the storage part of HDFS / S3 is transformed, and finally the
> most important k8s cluster configuration, Flink and spark configuration.
>
> Narcasserun
>