You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Lidong Dai <li...@apache.org> on 2022/01/05 15:37:00 UTC
Re: [Proposal] Big data component configuration and environment isolation
great job, great idea
I think we can discussion this topic in the mail, I totally agree with
what you said.
if anybody who have more suggestions, please leave a word
Best Regards
---------------
Apache DolphinScheduler PMC Chair
LidongDai
lidongdai@apache.org
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------
On Tue, Dec 28, 2021 at 3:41 PM liu hu <Na...@hotmail.com> wrote:
> hi guys,
> I suggest isolating the configuration and environment of hive,
> Kerberos, HDFS / S3, spark and Flink to improve the user experience and
> scalability of scheduling tasks. Better support task on yarn / k8s
> environment., At present, dolphin scheduler has strong coupling with hive
> and Hadoop, such as data source hive, Kerberos authentication, storage HDFS
> / S3, big data task spark, Flink, etc.Their configurations need to be
> written in the configuration file in advance, and the dependent environment
> also needs to be loaded in advance to prevent dependency conflicts.
> I plan to externalize the configuration of hive and Kerberos and
> isolate the environment in the first step. See the issues / PR for details
> https://github.com/apache/dolphinscheduler/issues/7623
> https://github.com/apache/dolphinscheduler/pull/7624
> Then, the storage part of HDFS / S3 is transformed, and finally the
> most important k8s cluster configuration, Flink and spark configuration.
>
> Narcasserun
>