You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Martijn Visser <ma...@apache.org> on 2022/08/01 13:42:48 UTC

Re: Re: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27

Hi,

There is currently already a PR submitted to port the JDBC interface to the
new interfaces. Can we make sure that this FLIP is being finalized, so that
you and other maintainers can work on getting the PRs correct and
eventually merged in?

Best regards,

Martijn

Op ma 4 jul. 2022 om 16:38 schreef Martijn Visser <martijnvisser@apache.org
>:

> Hi Roc,
>
> Thanks for the FLIP and opening the discussion. I have a couple of initial
> questions/remarks:
>
> * The FLIP contains information for both Source and Sink, but nothing
> explicitly on the Lookup functionality. I'm assuming we also want to have
> that implementation covered while porting this to the new interfaces.
> * The FLIP mentions porting to both the new Source and the new Sink API,
> but the FLIP only contains detailed information on the Source. Are you
> planning to add that to the FLIP before casting a vote? Because the
> discussion should definitely be resolved for both the Source and the Sink.
>
> Best regards,
>
> Martijn
>
> Op za 2 jul. 2022 om 06:35 schreef Roc Marshal <fl...@126.com>:
>
>> Hi, Weike.
>>
>> Thank you for your reply
>> As you said, too many splits stored in SourceEnumerator will increase the
>> load of JM.
>> What do you think if we introduce a capacity of splits in
>> SourceEnumerator to limit the total number, and introduce a reject or
>> callback mechanism with too many splits in the timely generation strategy
>> to solve this problem?
>> Looking forward to a better solution .
>>
>> Best regards,
>> Roc Marshal
>>
>> On 2022/07/01 07:58:22 Dong Weike wrote:
>> > Hi,
>> >
>> > Thank you for bringing this up, and I am +1 for this feature.
>> >
>> > IMO, one important thing that I would like to mention is that an
>> improperly-designed FLIP-27 connector could impose very severe memory
>> pressure on the JobManager, especially when there are enormous number of
>> splits for the source tables, e.g. there are billions of records to read.
>> Frankly speaking, we have been haunted by this problem for a long time when
>> using the Flink CDC Connectors to read large tables.
>> >
>> > Therefore, in order to prevent JobManager from experiencing frequent
>> OOM faults, JdbcSourceEnumerator should avoid saving too many
>> JdbcSourceSplits in the unassigned list. And it would be better if all the
>> splits would be computed on the fly.
>> >
>> > Best,
>> > Weike
>> >
>> > -----邮件原件-----
>> > 发件人: Lijie Wang <wa...@gmail.com>
>> > 发送时间: 2022年7月1日 上午 10:25
>> > 收件人: dev@flink.apache.org
>> > 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27
>> >
>> > Hi Roc,
>> >
>> > Thanks for driving the discussion.
>> >
>> > Could you describe in detail what the JdbcSourceSplit represents? It
>> looks like something wrong with the comments of JdbcSourceSplit in FLIP(it
>> describe as "A {@link SourceSplit} that represents a file, or a region of a
>> file....").
>> >
>> > Best,
>> > Lijie
>> >
>> >
>> > Roc Marshal <fl...@126.com> 于2022年6月30日周四 21:41写道:
>> >
>> > > Hi, Boto.
>> > >     Thanks for your reply.
>> > >
>> > >    +1 to me on watermark strategy definition in ‘streaming’ & table
>> > > source. I'm not sure if FLIP-202[1]  is suitable for a separate
>> > > discussion, but I think your proposal is very helpful to the new
>> > > source. It would be great if the new source could be compatible with
>> this abstraction.
>> > >
>> > >    In addition, whether we need to support such a special bounded
>> > > scenario abstraction?
>> > >    The number of JdbcSourceSplit is certain, but the time to generate
>> > > all JdbcSourceSplit completely is not certain in the user defined
>> > > implementation. When the condition that the JdbcSourceSplit
>> > > generate-process end is met, the JdbcSourceSplit will not be
>> generated.
>> > > After all JdbcSourceSplit processing is completed, the reader will be
>> > > notified that there are no more JdbcSourceSplit from
>> > > JdbcSourceSplitEnumerator.
>> > >
>> > > - [1]
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc
>> > > e+ClickHouse+Connector
>> > >
>> > > Best regards,
>> > > Roc Marshal
>> > >
>> > > On 2022/06/30 09:02:23 João Boto wrote:
>> > > > Hi,
>> > > >
>> > > > On source we could improve the JdbcParameterValuesProvider.. to be
>> > > defined as a query(s) or something more dynamic.
>> > > > The most time if your job is dynamic or have some condition to be
>> > > > met
>> > > (based on data on table) you have to create a connection an get that
>> > > info from database
>> > > >
>> > > > If we are going to create/allow a "streaming" jdbc source, we
>> should
>> > > > be
>> > > able to define watermark and get new data from table using that
>> watermark..
>> > > >
>> > > >
>> > > > For the sink (but it could apply on source) will be great to be
>> able
>> > > > to
>> > > set your implementation of the connection type.. For example if you
>> > > are connecting to clickhouse, be able to set a implementation based
>> on
>> > > "BalancedClickhouseDataSource" for example (in this[1] implementation
>> > > we have a example) or set a extension version of a implementation for
>> > > debug purpose
>> > > >
>> > > > Regards
>> > > >
>> > > >
>> > > > [1]
>> > >
>> https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14
>> > > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d
>> > > >
>> > > > On 2022/06/27 13:09:51 Roc Marshal wrote:
>> > > > > Hi, all,
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > I would like to open a discussion on porting JDBC Source to new
>> > > > > Source
>> > > API (FLIP-27[1]).
>> > > > >
>> > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on the
>> > > > > JIRA
>> > > FLINK-25420[2] and planed to start the discussion about the source
>> > > part first.
>> > > > >
>> > > > >
>> > > > >
>> > > > > Please let me know:
>> > > > >
>> > > > > - The issues about old Jdbc source you encountered;
>> > > > > - The new feature or design you want;
>> > > > > - More suggestions from other dimensions...
>> > > > >
>> > > > >
>> > > > >
>> > > > > You could find more details in FLIP-239[3].
>> > > > >
>> > > > > Looking forward to your feedback.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > [1]
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+
>> > > Source+Interface
>> > > > >
>> > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420
>> > > > >
>> > > > > [3]
>> > >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738
>> > > 6271
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Roc Marshal
>> > > >
>> >
>>
>