You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by mohan radhakrishnan <ra...@gmail.com> on 2022/02/04 12:55:36 UTC
CDC using Query
Hi,
When I was looking for CDC I realized Flink uses Kafka Connector to
stream to Flink. The idea is to send it forward to Kafka and consume it
using Kafka Streams.
Are there source DLQs or additional mechanisms to detect failures to read
from the DB ?
We don't want to use Debezium and our CDC is based on queries.
What mechanisms does Flink have that a Kafka Connect worker does not ?
Kafka Connect workers can go down and source data can be lost.
Does the idea to send it forward to Kafka and consume it using Kafka
Streams make sense ? The checkpointing feature of Flink can help ? I plan
to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
Could you point out relevant material to read ?
Thanks,
Mohan
Re: CDC using Query
Posted by Martijn Visser <ma...@ververica.com>.
Hi Mohan,
I don't know the specifics about the single Kafka Connect worker.
The Flink CDC connector is NOT a Kafka Connector. As explained before,
there is no Kafka involved when using this connector. As also is mentioned
in the same readme, it indeed provides exactly once processing.
Best regards,
Martijn
Op vr 11 feb. 2022 om 13:05 schreef mohan radhakrishnan <
radhakrishnan.mohan@gmail.com>
> Hello,
> Ok. I may not have understood the answer to my previous
> question.
> When I listen to https://www.youtube.com/watch?v=IOZ2Um6e430 at 20:14 he
> starts to talk about this.
> Is he talking about a single Kafka Connect worker or a cluster ? He
> mentions that it is 'atleast-once'.
> So Flink's version is an improvement ? So Flink's Kafka Connector in a
> Connect cluster guarantees 'Exactly-once' ?
> Please bear with me.
>
> This will have other consequences too as our MQ may need a MQ connector.(
> Probably from Flink or Confluent )
> Different connectors may have different guarantees.
>
> Thanks.
>
>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>
>>
>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>> once implementation.
>>
>
>
>
>
>
>
> On Fri, Feb 11, 2022 at 1:57 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi,
>>
>> The readme on the Flink CDC connectors [1] say that Oracle Databases
>> version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://github.com/ververica/flink-cdc-connectors/blob/master/README.md
>>
>> On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
>> radhakrishnan.mohan@gmail.com> wrote:
>>
>>> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
>>> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
>>> Connector for Oracle ?
>>>
>>> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>>>
>>>> Hello, mohan
>>>>
>>>> 1. Does flink have any support to track any missed source Jdbc CDC
>>>> records ?
>>>>
>>>>
>>>> Flink CDC Connector provides Exactly once semantics which means they
>>>> won’t miss records. Tips: The Flink JDBC Connector only
>>>> Scan the database once which can not continuously read CDC stream.
>>>>
>>>> 2. What is the equivalent of Kafka consumer groups ?
>>>>
>>>>
>>>> Different database has different CDC mechanism, it’s serverId which
>>>> used to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>>>
>>>>
>>>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>>>
>>>>
>>>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>>>> once implementation.
>>>>
>>>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>>>> help you.
>>>>
>>>> Best,
>>>> Leonard
>>>>
>>>> [1]
>>>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> On Friday, February 4, 2022, mohan radhakrishnan <
>>>> radhakrishnan.mohan@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>> So the jdbc source connector is kafka and
>>>>> transformation is done by flink (flink sql) ? But that connector can miss
>>>>> records. I thought. Started looking at flink for this and other use cases.
>>>>> Can I see the alternative to spring cloudstreams( kafka streams )?
>>>>> Since I am learning flink, kafka streams' changelog topics and exactly-once
>>>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>>>
>>>>> We also needed a elastic sink.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Mohan,
>>>>>>
>>>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>>>> features and differences to Flink. Flink on its own does not have a
>>>>>> capability to read a CDC stream directly from a DB. However there is the
>>>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>>>> engine inside of Flink's source and can process DB changelog with all
>>>>>> processing guarantees that Flink provides.
>>>>>>
>>>>>> As for the idea of processing further with Kafka Streams. Why not
>>>>>> process data with Flink? What do you miss in Flink?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>>>
>>>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> When I was looking for CDC I realized Flink uses Kafka
>>>>>>> Connector to stream to Flink. The idea is to send it forward to Kafka and
>>>>>>> consume it using Kafka Streams.
>>>>>>>
>>>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>>>> read from the DB ?
>>>>>>>
>>>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>>>
>>>>>>> What mechanisms does Flink have that a Kafka Connect worker does not
>>>>>>> ? Kafka Connect workers can go down and source data can be lost.
>>>>>>>
>>>>>>> Does the idea to send it forward to Kafka and consume it using
>>>>>>> Kafka Streams make sense ? The checkpointing feature of Flink can help ? I
>>>>>>> plan to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>>>
>>>>>>> Could you point out relevant material to read ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mohan
>>>>>>>
>>>>>>
>>>> --
Martijn Visser | Product Manager
martijn@ververica.com
<https://www.ververica.com/>
Follow us @VervericaData
--
Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference
Stream Processing | Event Driven | Real Time
Re: CDC using Query
Posted by mohan radhakrishnan <ra...@gmail.com>.
Hello,
Ok. I may not have understood the answer to my previous
question.
When I listen to https://www.youtube.com/watch?v=IOZ2Um6e430 at 20:14 he
starts to talk about this.
Is he talking about a single Kafka Connect worker or a cluster ? He
mentions that it is 'atleast-once'.
So Flink's version is an improvement ? So Flink's Kafka Connector in a
Connect cluster guarantees 'Exactly-once' ?
Please bear with me.
This will have other consequences too as our MQ may need a MQ connector.(
Probably from Flink or Confluent )
Different connectors may have different guarantees.
Thanks.
> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>
>
> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
> once implementation.
>
On Fri, Feb 11, 2022 at 1:57 PM Martijn Visser <ma...@ververica.com>
wrote:
> Hi,
>
> The readme on the Flink CDC connectors [1] say that Oracle Databases
> version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.
>
> Best regards,
>
> Martijn
>
> [1]
> https://github.com/ververica/flink-cdc-connectors/blob/master/README.md
>
> On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
>> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
>> Connector for Oracle ?
>>
>> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>>
>>> Hello, mohan
>>>
>>> 1. Does flink have any support to track any missed source Jdbc CDC
>>> records ?
>>>
>>>
>>> Flink CDC Connector provides Exactly once semantics which means they
>>> won’t miss records. Tips: The Flink JDBC Connector only
>>> Scan the database once which can not continuously read CDC stream.
>>>
>>> 2. What is the equivalent of Kafka consumer groups ?
>>>
>>>
>>> Different database has different CDC mechanism, it’s serverId which used
>>> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>>
>>>
>>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>>
>>>
>>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>>> once implementation.
>>>
>>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>>> help you.
>>>
>>> Best,
>>> Leonard
>>>
>>> [1]
>>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>>
>>>
>>>
>>> Thanks
>>>
>>> On Friday, February 4, 2022, mohan radhakrishnan <
>>> radhakrishnan.mohan@gmail.com> wrote:
>>>
>>>> Hello,
>>>> So the jdbc source connector is kafka and
>>>> transformation is done by flink (flink sql) ? But that connector can miss
>>>> records. I thought. Started looking at flink for this and other use cases.
>>>> Can I see the alternative to spring cloudstreams( kafka streams )?
>>>> Since I am learning flink, kafka streams' changelog topics and exactly-once
>>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>>
>>>> We also needed a elastic sink.
>>>>
>>>> Thanks
>>>>
>>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Mohan,
>>>>>
>>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>>> features and differences to Flink. Flink on its own does not have a
>>>>> capability to read a CDC stream directly from a DB. However there is the
>>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>>> engine inside of Flink's source and can process DB changelog with all
>>>>> processing guarantees that Flink provides.
>>>>>
>>>>> As for the idea of processing further with Kafka Streams. Why not
>>>>> process data with Flink? What do you miss in Flink?
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>>
>>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>>
>>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>>
>>>>>> Hi,
>>>>>> When I was looking for CDC I realized Flink uses Kafka Connector
>>>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>>>> using Kafka Streams.
>>>>>>
>>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>>> read from the DB ?
>>>>>>
>>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>>
>>>>>> What mechanisms does Flink have that a Kafka Connect worker does not
>>>>>> ? Kafka Connect workers can go down and source data can be lost.
>>>>>>
>>>>>> Does the idea to send it forward to Kafka and consume it using Kafka
>>>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>>
>>>>>> Could you point out relevant material to read ?
>>>>>>
>>>>>> Thanks,
>>>>>> Mohan
>>>>>>
>>>>>
>>>
Re: CDC using Query
Posted by Martijn Visser <ma...@ververica.com>.
Hi,
The readme on the Flink CDC connectors [1] say that Oracle Databases
version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.
Best regards,
Martijn
[1] https://github.com/ververica/flink-cdc-connectors/blob/master/README.md
On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
> Connector for Oracle ?
>
> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>
>> Hello, mohan
>>
>> 1. Does flink have any support to track any missed source Jdbc CDC
>> records ?
>>
>>
>> Flink CDC Connector provides Exactly once semantics which means they
>> won’t miss records. Tips: The Flink JDBC Connector only
>> Scan the database once which can not continuously read CDC stream.
>>
>> 2. What is the equivalent of Kafka consumer groups ?
>>
>>
>> Different database has different CDC mechanism, it’s serverId which used
>> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>
>>
>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>
>>
>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>> once implementation.
>>
>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>> help you.
>>
>> Best,
>> Leonard
>>
>> [1]
>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>
>>
>>
>> Thanks
>>
>> On Friday, February 4, 2022, mohan radhakrishnan <
>> radhakrishnan.mohan@gmail.com> wrote:
>>
>>> Hello,
>>> So the jdbc source connector is kafka and transformation
>>> is done by flink (flink sql) ? But that connector can miss records. I
>>> thought. Started looking at flink for this and other use cases.
>>> Can I see the alternative to spring cloudstreams( kafka streams )? Since
>>> I am learning flink, kafka streams' changelog topics and exactly-once
>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>
>>> We also needed a elastic sink.
>>>
>>> Thanks
>>>
>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>> wrote:
>>>
>>>> Hi Mohan,
>>>>
>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>> features and differences to Flink. Flink on its own does not have a
>>>> capability to read a CDC stream directly from a DB. However there is the
>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>> engine inside of Flink's source and can process DB changelog with all
>>>> processing guarantees that Flink provides.
>>>>
>>>> As for the idea of processing further with Kafka Streams. Why not
>>>> process data with Flink? What do you miss in Flink?
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>
>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>
>>>>> Hi,
>>>>> When I was looking for CDC I realized Flink uses Kafka Connector
>>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>>> using Kafka Streams.
>>>>>
>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>> read from the DB ?
>>>>>
>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>
>>>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>>>> Kafka Connect workers can go down and source data can be lost.
>>>>>
>>>>> Does the idea to send it forward to Kafka and consume it using Kafka
>>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>
>>>>> Could you point out relevant material to read ?
>>>>>
>>>>> Thanks,
>>>>> Mohan
>>>>>
>>>>
>>
Re: CDC using Query
Posted by mohan radhakrishnan <ra...@gmail.com>.
Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
Connector for Oracle ?
On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
> Hello, mohan
>
> 1. Does flink have any support to track any missed source Jdbc CDC records
> ?
>
>
> Flink CDC Connector provides Exactly once semantics which means they won’t
> miss records. Tips: The Flink JDBC Connector only
> Scan the database once which can not continuously read CDC stream.
>
> 2. What is the equivalent of Kafka consumer groups ?
>
>
> Different database has different CDC mechanism, it’s serverId which used
> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>
>
> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>
>
> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
> once implementation.
>
> BTW, if your destination is Elasticsearch, the quick start demo[1] may
> help you.
>
> Best,
> Leonard
>
> [1]
> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>
>
>
> Thanks
>
> On Friday, February 4, 2022, mohan radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hello,
>> So the jdbc source connector is kafka and transformation
>> is done by flink (flink sql) ? But that connector can miss records. I
>> thought. Started looking at flink for this and other use cases.
>> Can I see the alternative to spring cloudstreams( kafka streams )? Since
>> I am learning flink, kafka streams' changelog topics and exactly-once
>> delivery and dlqs seemed good for our cŕitical push notifications.
>>
>> We also needed a elastic sink.
>>
>> Thanks
>>
>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hi Mohan,
>>>
>>> I don't know much about Kafka Connect, so I will not talk about its
>>> features and differences to Flink. Flink on its own does not have a
>>> capability to read a CDC stream directly from a DB. However there is the
>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>> engine inside of Flink's source and can process DB changelog with all
>>> processing guarantees that Flink provides.
>>>
>>> As for the idea of processing further with Kafka Streams. Why not
>>> process data with Flink? What do you miss in Flink?
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>
>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>
>>>> Hi,
>>>> When I was looking for CDC I realized Flink uses Kafka Connector
>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>> using Kafka Streams.
>>>>
>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>> read from the DB ?
>>>>
>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>
>>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>>> Kafka Connect workers can go down and source data can be lost.
>>>>
>>>> Does the idea to send it forward to Kafka and consume it using Kafka
>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>
>>>> Could you point out relevant material to read ?
>>>>
>>>> Thanks,
>>>> Mohan
>>>>
>>>
>
Re: CDC using Query
Posted by Leonard Xu <xb...@gmail.com>.
Hello, mohan
> 1. Does flink have any support to track any missed source Jdbc CDC records ?
Flink CDC Connector provides Exactly once semantics which means they won’t miss records. Tips: The Flink JDBC Connector only
Scan the database once which can not continuously read CDC stream.
> 2. What is the equivalent of Kafka consumer groups ?
Different database has different CDC mechanism, it’s serverId which used to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
> 3. Delivering to kafka from flink is not exactly once. Is that right ?
No, both Flink CDC Connector and Flink Kafka Connector provide exactly once implementation.
BTW, if your destination is Elasticsearch, the quick start demo[1] may help you.
Best,
Leonard
[1] https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>
> Thanks
>
> On Friday, February 4, 2022, mohan radhakrishnan <radhakrishnan.mohan@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> So the jdbc source connector is kafka and transformation is done by flink (flink sql) ? But that connector can miss records. I thought. Started looking at flink for this and other use cases.
> Can I see the alternative to spring cloudstreams( kafka streams )? Since I am learning flink, kafka streams' changelog topics and exactly-once delivery and dlqs seemed good for our cŕitical push notifications.
>
> We also needed a elastic sink.
>
> Thanks
>
> On Friday, February 4, 2022, Dawid Wysakowicz <dwysakowicz@apache.org <ma...@apache.org>> wrote:
> Hi Mohan,
>
> I don't know much about Kafka Connect, so I will not talk about its features and differences to Flink. Flink on its own does not have a capability to read a CDC stream directly from a DB. However there is the flink-cdc-connectors[1] projects which embeds the standalone Debezium engine inside of Flink's source and can process DB changelog with all processing guarantees that Flink provides.
>
> As for the idea of processing further with Kafka Streams. Why not process data with Flink? What do you miss in Flink?
>
> Best,
>
> Dawid
>
> [1] https://github.com/ververica/flink-cdc-connectors <https://github.com/ververica/flink-cdc-connectors>
>
> On 04/02/2022 13:55, mohan radhakrishnan wrote:
> Hi,
> When I was looking for CDC I realized Flink uses Kafka Connector to stream to Flink. The idea is to send it forward to Kafka and consume it using Kafka Streams.
>
> Are there source DLQs or additional mechanisms to detect failures to read from the DB ?
>
> We don't want to use Debezium and our CDC is based on queries.
>
> What mechanisms does Flink have that a Kafka Connect worker does not ? Kafka Connect workers can go down and source data can be lost.
>
> Does the idea to send it forward to Kafka and consume it using Kafka Streams make sense ? The checkpointing feature of Flink can help ? I plan to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>
> Could you point out relevant material to read ?
>
> Thanks,
> Mohan
Re: CDC using Query
Posted by mohan radhakrishnan <ra...@gmail.com>.
Hello,
I have some specific questions. Appreciate some pointers
1. Does flink have any support to track any missed source Jdbc CDC records
?
2. What is the equivalent of Kafka consumer groups ?
3. Delivering to kafka from flink is not exactly once. Is that right ?
Thanks
On Friday, February 4, 2022, mohan radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:
> Hello,
> So the jdbc source connector is kafka and transformation
> is done by flink (flink sql) ? But that connector can miss records. I
> thought. Started looking at flink for this and other use cases.
> Can I see the alternative to spring cloudstreams( kafka streams )? Since I
> am learning flink, kafka streams' changelog topics and exactly-once
> delivery and dlqs seemed good for our cŕitical push notifications.
>
> We also needed a elastic sink.
>
> Thanks
>
> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hi Mohan,
>>
>> I don't know much about Kafka Connect, so I will not talk about its
>> features and differences to Flink. Flink on its own does not have a
>> capability to read a CDC stream directly from a DB. However there is the
>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>> engine inside of Flink's source and can process DB changelog with all
>> processing guarantees that Flink provides.
>>
>> As for the idea of processing further with Kafka Streams. Why not process
>> data with Flink? What do you miss in Flink?
>>
>> Best,
>>
>> Dawid
>>
>> [1] https://github.com/ververica/flink-cdc-connectors
>>
>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>
>>> Hi,
>>> When I was looking for CDC I realized Flink uses Kafka Connector to
>>> stream to Flink. The idea is to send it forward to Kafka and consume it
>>> using Kafka Streams.
>>>
>>> Are there source DLQs or additional mechanisms to detect failures to
>>> read from the DB ?
>>>
>>> We don't want to use Debezium and our CDC is based on queries.
>>>
>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>> Kafka Connect workers can go down and source data can be lost.
>>>
>>> Does the idea to send it forward to Kafka and consume it using Kafka
>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>
>>> Could you point out relevant material to read ?
>>>
>>> Thanks,
>>> Mohan
>>>
>>
Re: CDC using Query
Posted by mohan radhakrishnan <ra...@gmail.com>.
Hello,
So the jdbc source connector is kafka and transformation is
done by flink (flink sql) ? But that connector can miss records. I thought.
Started looking at flink for this and other use cases.
Can I see the alternative to spring cloudstreams( kafka streams )? Since I
am learning flink, kafka streams' changelog topics and exactly-once
delivery and dlqs seemed good for our cŕitical push notifications.
We also needed a elastic sink.
Thanks
On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
wrote:
> Hi Mohan,
>
> I don't know much about Kafka Connect, so I will not talk about its
> features and differences to Flink. Flink on its own does not have a
> capability to read a CDC stream directly from a DB. However there is the
> flink-cdc-connectors[1] projects which embeds the standalone Debezium
> engine inside of Flink's source and can process DB changelog with all
> processing guarantees that Flink provides.
>
> As for the idea of processing further with Kafka Streams. Why not process
> data with Flink? What do you miss in Flink?
>
> Best,
>
> Dawid
>
> [1] https://github.com/ververica/flink-cdc-connectors
>
> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>
>> Hi,
>> When I was looking for CDC I realized Flink uses Kafka Connector to
>> stream to Flink. The idea is to send it forward to Kafka and consume it
>> using Kafka Streams.
>>
>> Are there source DLQs or additional mechanisms to detect failures to read
>> from the DB ?
>>
>> We don't want to use Debezium and our CDC is based on queries.
>>
>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>> Kafka Connect workers can go down and source data can be lost.
>>
>> Does the idea to send it forward to Kafka and consume it using Kafka
>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>
>> Could you point out relevant material to read ?
>>
>> Thanks,
>> Mohan
>>
>
Re: CDC using Query
Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Mohan,
I don't know much about Kafka Connect, so I will not talk about its
features and differences to Flink. Flink on its own does not have a
capability to read a CDC stream directly from a DB. However there is the
flink-cdc-connectors[1] projects which embeds the standalone Debezium
engine inside of Flink's source and can process DB changelog with all
processing guarantees that Flink provides.
As for the idea of processing further with Kafka Streams. Why not
process data with Flink? What do you miss in Flink?
Best,
Dawid
[1] https://github.com/ververica/flink-cdc-connectors
On 04/02/2022 13:55, mohan radhakrishnan wrote:
> Hi,
> When I was looking for CDC I realized Flink uses Kafka Connector
> to stream to Flink. The idea is to send it forward to Kafka and
> consume it using Kafka Streams.
>
> Are there source DLQs or additional mechanisms to detect failures to
> read from the DB ?
>
> We don't want to use Debezium and our CDC is based on queries.
>
> What mechanisms does Flink have that a Kafka Connect worker does not ?
> Kafka Connect workers can go down and source data can be lost.
>
> Does the idea to send it forward to Kafka and consume it using Kafka
> Streams make sense ? The checkpointing feature of Flink can help ? I
> plan to use Kafka Streams for 'Exactly-once Delivery' and changelog
> topics.
>
> Could you point out relevant material to read ?
>
> Thanks,
> Mohan