You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by admin <17...@163.com> on 2020/09/09 17:18:56 UTC

[DISCUSS] Support source/sink parallelism config in Flink sql

Hi devs:
Currently,Flink sql does not support source/sink parallelism config.So,it will result in wasting or lacking resources in some cases.
I think it is necessary to introduce configuration of source/sink parallelism in sql.
From my side,i have the solution for this feature.Add parallelism config in ‘with’ properties of DDL.

Before 1.11,we can get parallelism and then set it to StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
After 1.11,we can get parallelism from catalogTable and then set it to transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.

What do you think?

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Posted by Jingsong Li <ji...@gmail.com>.

Hi ,

I have started a discussion about improving the new TableSource and
TableSink:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-146-Improve-new-TableSource-and-TableSink-interfaces-td45161.html
It includes parallelism setting, welcome to join the discussion and look
forward to your comments.

Best,
Jingsong

On Mon, Sep 21, 2020 at 11:03 AM Jark Wu <im...@gmail.com> wrote:

> Since FLIP-95, the parallelism is decoupled from the runtime class
> (DataStream/SourceFunction),
> so we need to have an API to tell the planner what the parallelism of the
> source/sink is.
>
> This is indeed the purpose of a previous discussion: [DISCUSS] Introduce
> SupportsParallelismReport and SupportsStatisticsReport
> We can continue the discussion there.
>
> Best,
> Jark
>
> [1]:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html
>
> On Sun, 20 Sep 2020 at 23:14, 刘大龙 <ld...@zju.edu.cn> wrote:
>
> >
> > +1
> >
> > > -----原始邮件-----
> > > 发件人: "Benchao Li" <li...@apache.org>
> > > 发送时间: 2020-09-20 16:28:20 (星期日)
> > > 收件人: dev <de...@flink.apache.org>
> > > 抄送:
> > > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> > >
> > > Hi admin,
> > >
> > > Thanks for bringing up this discussion.
> > > IMHO, it's a valuable feature. We also added this feature for our
> > internal
> > > SQL engine.
> > > And our way is very similar to your proposal.
> > >
> > > Regarding the implementation, there is one shorthand that we should
> > modify
> > > each connector
> > > to support this property.
> > > We can wait for others' opinion whether this is a valid proposal. If
> yes,
> > > then we can discuss
> > > the implementation detailedly.
> > >
> > > admin <17...@163.com> 于2020年9月10日周四 上午1:19写道：
> > >
> > > > Hi devs:
> > > > Currently,Flink sql does not support source/sink parallelism
> > config.So,it
> > > > will result in wasting or lacking resources in some cases.
> > > > I think it is necessary to introduce configuration of source/sink
> > > > parallelism in sql.
> > > > From my side,i have the solution for this feature.Add parallelism
> > config
> > > > in ‘with’ properties of DDL.
> > > >
> > > > Before 1.11,we can get parallelism and then set it to
> > > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > > > After 1.11,we can get parallelism from catalogTable and then set it
> to
> > > > transformation in CommonPhysicalTableSourceScan or
> CommonPhysicalSink.
> > > >
> > > > What do you think?
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> >
>


-- 
Best, Jingsong Lee

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Posted by Jark Wu <im...@gmail.com>.

Since FLIP-95, the parallelism is decoupled from the runtime class
(DataStream/SourceFunction),
so we need to have an API to tell the planner what the parallelism of the
source/sink is.

This is indeed the purpose of a previous discussion: [DISCUSS] Introduce
SupportsParallelismReport and SupportsStatisticsReport
We can continue the discussion there.

Best,
Jark

[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html

On Sun, 20 Sep 2020 at 23:14, 刘大龙 <ld...@zju.edu.cn> wrote:

>
> +1
>
> > -----原始邮件-----
> > 发件人: "Benchao Li" <li...@apache.org>
> > 发送时间: 2020-09-20 16:28:20 (星期日)
> > 收件人: dev <de...@flink.apache.org>
> > 抄送:
> > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> >
> > Hi admin,
> >
> > Thanks for bringing up this discussion.
> > IMHO, it's a valuable feature. We also added this feature for our
> internal
> > SQL engine.
> > And our way is very similar to your proposal.
> >
> > Regarding the implementation, there is one shorthand that we should
> modify
> > each connector
> > to support this property.
> > We can wait for others' opinion whether this is a valid proposal. If yes,
> > then we can discuss
> > the implementation detailedly.
> >
> > admin <17...@163.com> 于2020年9月10日周四 上午1:19写道：
> >
> > > Hi devs:
> > > Currently,Flink sql does not support source/sink parallelism
> config.So,it
> > > will result in wasting or lacking resources in some cases.
> > > I think it is necessary to introduce configuration of source/sink
> > > parallelism in sql.
> > > From my side,i have the solution for this feature.Add parallelism
> config
> > > in ‘with’ properties of DDL.
> > >
> > > Before 1.11,we can get parallelism and then set it to
> > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > > After 1.11,we can get parallelism from catalogTable and then set it to
> > > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
> > >
> > > What do you think?
> > >
> > >
> > >
> > >
> > >
> >
> > --
> >
> > Best,
> > Benchao Li
>

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Posted by 刘大龙 <ld...@zju.edu.cn>.

+1

> -----原始邮件-----
> 发件人: "Benchao Li" <li...@apache.org>
> 发送时间: 2020-09-20 16:28:20 (星期日)
> 收件人: dev <de...@flink.apache.org>
> 抄送: 
> 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> 
> Hi admin,
> 
> Thanks for bringing up this discussion.
> IMHO, it's a valuable feature. We also added this feature for our internal
> SQL engine.
> And our way is very similar to your proposal.
> 
> Regarding the implementation, there is one shorthand that we should modify
> each connector
> to support this property.
> We can wait for others' opinion whether this is a valid proposal. If yes,
> then we can discuss
> the implementation detailedly.
> 
> admin <17...@163.com> 于2020年9月10日周四 上午1:19写道：
> 
> > Hi devs:
> > Currently,Flink sql does not support source/sink parallelism config.So,it
> > will result in wasting or lacking resources in some cases.
> > I think it is necessary to introduce configuration of source/sink
> > parallelism in sql.
> > From my side,i have the solution for this feature.Add parallelism config
> > in ‘with’ properties of DDL.
> >
> > Before 1.11,we can get parallelism and then set it to
> > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > After 1.11,we can get parallelism from catalogTable and then set it to
> > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
> >
> > What do you think?
> >
> >
> >
> >
> >
> 
> -- 
> 
> Best,
> Benchao Li

Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Posted by Benchao Li <li...@apache.org>.

Hi admin,

Thanks for bringing up this discussion.
IMHO, it's a valuable feature. We also added this feature for our internal
SQL engine.
And our way is very similar to your proposal.

Regarding the implementation, there is one shorthand that we should modify
each connector
to support this property.
We can wait for others' opinion whether this is a valid proposal. If yes,
then we can discuss
the implementation detailedly.

admin <17...@163.com> 于2020年9月10日周四 上午1:19写道：

> Hi devs:
> Currently,Flink sql does not support source/sink parallelism config.So,it
> will result in wasting or lacking resources in some cases.
> I think it is necessary to introduce configuration of source/sink
> parallelism in sql.
> From my side,i have the solution for this feature.Add parallelism config
> in ‘with’ properties of DDL.
>
> Before 1.11,we can get parallelism and then set it to
> StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> After 1.11,we can get parallelism from catalogTable and then set it to
> transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
>
> What do you think?
>
>
>
>
>

-- 

Best,
Benchao Li