You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by eric xiao <xi...@gmail.com> on 2023/04/21 19:20:12 UTC

[DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

Hi there devs,

I would like to start a discussion thread for FLINK-31873[1].

We are in the processing of enabling Flink reactive mode as the default
scheduling mode. While reading configuration docs [2] (I believe it was
also mentioned during one of the training sessions during Flink Forward
2022), one can/should replace all setParallelism calls with
setMaxParallelism when migrating to reactive mode.

This currently isn't possible on a sink in a Flink pipeline as we do not
expose a setMaxParallelism on the DataStreamSink class [3]. The underlying
Transformation class does have both a setMaxParallelism and setParallelism
function defined [4], but only setParallelism is offered in the
DataStreamSink class.

I believe adding setMaxParallelism would be beneficial for not just flink
reactive mode, both modes of running of a flink pipeline (non reactive
mode, flink auto scaling).

Best,

Eric Xiao

[1] https://issues.apache.org/jira/browse/FLINK-31873
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
[3]
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
[4]
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285

Re: [DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

Posted by eric xiao <xi...@gmail.com>.
Wow, thank you all for the responses! I have opened a PR [1] to address
this ticket, would appreciate any feedback - still relatively new to the
codebase, so please let me know if I have overlooked anything obvious 😃.

[1] https://github.com/apache/flink/pull/22438

On Tue, Apr 25, 2023 at 1:34 PM Maximilian Michels <mx...@apache.org> wrote:

> +1
>
> On Tue, Apr 25, 2023 at 5:24 PM David Morávek <dm...@apache.org> wrote:
> >
> > Hi Eric,
> >
> > this sounds reasonable, there are definitely cases where you need to
> limit
> > sink parallelism for example not to overload the storage or limit the
> > number of output files
> >
> > +1
> >
> > Best,
> > D.
> >
> > On Sun, Apr 23, 2023 at 1:09 PM Weihua Hu <hu...@gmail.com>
> wrote:
> >
> > > Hi, Eric
> > >
> > > Thanks for bringing this discussion.
> > > I think it's reasonable to add ''setMaxParallelism" for DataStreamSink.
> > >
> > > +1
> > >
> > > Best,
> > > Weihua
> > >
> > >
> > > On Sat, Apr 22, 2023 at 3:20 AM eric xiao <xi...@gmail.com>
> wrote:
> > >
> > > > Hi there devs,
> > > >
> > > > I would like to start a discussion thread for FLINK-31873[1].
> > > >
> > > > We are in the processing of enabling Flink reactive mode as the
> default
> > > > scheduling mode. While reading configuration docs [2] (I believe it
> was
> > > > also mentioned during one of the training sessions during Flink
> Forward
> > > > 2022), one can/should replace all setParallelism calls with
> > > > setMaxParallelism when migrating to reactive mode.
> > > >
> > > > This currently isn't possible on a sink in a Flink pipeline as we do
> not
> > > > expose a setMaxParallelism on the DataStreamSink class [3]. The
> > > underlying
> > > > Transformation class does have both a setMaxParallelism and
> > > setParallelism
> > > > function defined [4], but only setParallelism is offered in the
> > > > DataStreamSink class.
> > > >
> > > > I believe adding setMaxParallelism would be beneficial for not just
> flink
> > > > reactive mode, both modes of running of a flink pipeline (non
> reactive
> > > > mode, flink auto scaling).
> > > >
> > > > Best,
> > > >
> > > > Eric Xiao
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-31873
> > > > [2]
> > > >
> > > >
> > >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
> > > > [3]
> > > >
> > > >
> > >
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
> > > > [4]
> > > >
> > > >
> > >
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285
> > > >
> > >
>

Re: [DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

Posted by Maximilian Michels <mx...@apache.org>.
+1

On Tue, Apr 25, 2023 at 5:24 PM David Morávek <dm...@apache.org> wrote:
>
> Hi Eric,
>
> this sounds reasonable, there are definitely cases where you need to limit
> sink parallelism for example not to overload the storage or limit the
> number of output files
>
> +1
>
> Best,
> D.
>
> On Sun, Apr 23, 2023 at 1:09 PM Weihua Hu <hu...@gmail.com> wrote:
>
> > Hi, Eric
> >
> > Thanks for bringing this discussion.
> > I think it's reasonable to add ''setMaxParallelism" for DataStreamSink.
> >
> > +1
> >
> > Best,
> > Weihua
> >
> >
> > On Sat, Apr 22, 2023 at 3:20 AM eric xiao <xi...@gmail.com> wrote:
> >
> > > Hi there devs,
> > >
> > > I would like to start a discussion thread for FLINK-31873[1].
> > >
> > > We are in the processing of enabling Flink reactive mode as the default
> > > scheduling mode. While reading configuration docs [2] (I believe it was
> > > also mentioned during one of the training sessions during Flink Forward
> > > 2022), one can/should replace all setParallelism calls with
> > > setMaxParallelism when migrating to reactive mode.
> > >
> > > This currently isn't possible on a sink in a Flink pipeline as we do not
> > > expose a setMaxParallelism on the DataStreamSink class [3]. The
> > underlying
> > > Transformation class does have both a setMaxParallelism and
> > setParallelism
> > > function defined [4], but only setParallelism is offered in the
> > > DataStreamSink class.
> > >
> > > I believe adding setMaxParallelism would be beneficial for not just flink
> > > reactive mode, both modes of running of a flink pipeline (non reactive
> > > mode, flink auto scaling).
> > >
> > > Best,
> > >
> > > Eric Xiao
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-31873
> > > [2]
> > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
> > > [3]
> > >
> > >
> > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
> > > [4]
> > >
> > >
> > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285
> > >
> >

Re: [DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

Posted by David Morávek <dm...@apache.org>.
Hi Eric,

this sounds reasonable, there are definitely cases where you need to limit
sink parallelism for example not to overload the storage or limit the
number of output files

+1

Best,
D.

On Sun, Apr 23, 2023 at 1:09 PM Weihua Hu <hu...@gmail.com> wrote:

> Hi, Eric
>
> Thanks for bringing this discussion.
> I think it's reasonable to add ''setMaxParallelism" for DataStreamSink.
>
> +1
>
> Best,
> Weihua
>
>
> On Sat, Apr 22, 2023 at 3:20 AM eric xiao <xi...@gmail.com> wrote:
>
> > Hi there devs,
> >
> > I would like to start a discussion thread for FLINK-31873[1].
> >
> > We are in the processing of enabling Flink reactive mode as the default
> > scheduling mode. While reading configuration docs [2] (I believe it was
> > also mentioned during one of the training sessions during Flink Forward
> > 2022), one can/should replace all setParallelism calls with
> > setMaxParallelism when migrating to reactive mode.
> >
> > This currently isn't possible on a sink in a Flink pipeline as we do not
> > expose a setMaxParallelism on the DataStreamSink class [3]. The
> underlying
> > Transformation class does have both a setMaxParallelism and
> setParallelism
> > function defined [4], but only setParallelism is offered in the
> > DataStreamSink class.
> >
> > I believe adding setMaxParallelism would be beneficial for not just flink
> > reactive mode, both modes of running of a flink pipeline (non reactive
> > mode, flink auto scaling).
> >
> > Best,
> >
> > Eric Xiao
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-31873
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
> > [3]
> >
> >
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
> > [4]
> >
> >
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285
> >
>

Re: [DISCUSS] FLINK-31873: Add setMaxParallelism to the DataStreamSink Class

Posted by Weihua Hu <hu...@gmail.com>.
Hi, Eric

Thanks for bringing this discussion.
I think it's reasonable to add ''setMaxParallelism" for DataStreamSink.

+1

Best,
Weihua


On Sat, Apr 22, 2023 at 3:20 AM eric xiao <xi...@gmail.com> wrote:

> Hi there devs,
>
> I would like to start a discussion thread for FLINK-31873[1].
>
> We are in the processing of enabling Flink reactive mode as the default
> scheduling mode. While reading configuration docs [2] (I believe it was
> also mentioned during one of the training sessions during Flink Forward
> 2022), one can/should replace all setParallelism calls with
> setMaxParallelism when migrating to reactive mode.
>
> This currently isn't possible on a sink in a Flink pipeline as we do not
> expose a setMaxParallelism on the DataStreamSink class [3]. The underlying
> Transformation class does have both a setMaxParallelism and setParallelism
> function defined [4], but only setParallelism is offered in the
> DataStreamSink class.
>
> I believe adding setMaxParallelism would be beneficial for not just flink
> reactive mode, both modes of running of a flink pipeline (non reactive
> mode, flink auto scaling).
>
> Best,
>
> Eric Xiao
>
> [1] https://issues.apache.org/jira/browse/FLINK-31873
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/elastic_scaling/#configuration
> [3]
>
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSink.java
> [4]
>
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/dag/Transformation.java#L248-L285
>