You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by mohan radhakrishnan <ra...@gmail.com> on 2022/02/04 12:55:36 UTC

CDC using Query

Hi,
     When I was looking for CDC I realized Flink uses Kafka Connector to
stream to Flink. The idea is to send it forward to Kafka and consume it
using Kafka Streams.

Are there source DLQs or additional mechanisms to detect failures to read
from the DB ?

We don't want to use Debezium and our CDC is based on queries.

What mechanisms does Flink have that a Kafka Connect worker does not ?
Kafka Connect workers can go down and source data can be lost.

Does the idea  to send it forward to Kafka and consume it using Kafka
Streams make sense ? The checkpointing feature of Flink can help ? I plan
to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.

Could you point out relevant material to read ?

Thanks,
Mohan

Re: CDC using Query

Posted by Martijn Visser <ma...@ververica.com>.

Hi Mohan,

I don't know the specifics about the single Kafka Connect worker.

The Flink CDC connector is NOT a Kafka Connector. As explained before,
there is no Kafka involved when using this connector. As also is mentioned
in the same readme, it indeed provides exactly once processing.

Best regards,

Martijn

Op vr 11 feb. 2022 om 13:05 schreef mohan radhakrishnan <
radhakrishnan.mohan@gmail.com>

> Hello,
>               Ok. I may not have understood the answer to my previous
> question.
> When I listen to https://www.youtube.com/watch?v=IOZ2Um6e430 at 20:14 he
> starts to talk about this.
> Is he talking about a single Kafka Connect worker or a cluster ? He
> mentions that it is 'atleast-once'.
> So Flink's version is an improvement ? So Flink's Kafka Connector in a
> Connect cluster guarantees 'Exactly-once' ?
> Please bear with me.
>
> This will have other consequences too as our MQ may need a MQ connector.(
> Probably from Flink or Confluent  )
> Different connectors may have different guarantees.
>
> Thanks.
>
>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>
>>
>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>> once implementation.
>>
>
>
>
>
>
>
> On Fri, Feb 11, 2022 at 1:57 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi,
>>
>> The readme on the Flink CDC connectors [1] say that Oracle Databases
>> version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://github.com/ververica/flink-cdc-connectors/blob/master/README.md
>>
>> On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
>> radhakrishnan.mohan@gmail.com> wrote:
>>
>>> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
>>> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
>>> Connector for Oracle ?
>>>
>>> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>>>
>>>> Hello, mohan
>>>>
>>>> 1. Does flink have any support to track any missed source Jdbc CDC
>>>> records ?
>>>>
>>>>
>>>> Flink CDC Connector provides Exactly once semantics which means they
>>>> won’t miss records. Tips: The Flink JDBC Connector only
>>>> Scan the database once which can not continuously read CDC stream.
>>>>
>>>> 2. What is the equivalent of Kafka consumer groups ?
>>>>
>>>>
>>>> Different database has different CDC mechanism, it’s serverId which
>>>> used to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>>>
>>>>
>>>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>>>
>>>>
>>>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>>>> once implementation.
>>>>
>>>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>>>> help you.
>>>>
>>>> Best,
>>>> Leonard
>>>>
>>>> [1]
>>>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> On Friday, February 4, 2022, mohan radhakrishnan <
>>>> radhakrishnan.mohan@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>                So the jdbc source connector is  kafka and
>>>>> transformation is done by flink (flink sql) ? But that connector can miss
>>>>> records. I thought. Started looking at flink for this and other use cases.
>>>>> Can I see the alternative to spring cloudstreams( kafka streams )?
>>>>> Since I am learning flink, kafka streams' changelog topics and exactly-once
>>>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>>>
>>>>> We also needed a  elastic  sink.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Mohan,
>>>>>>
>>>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>>>> features and differences to Flink. Flink on its own does not have a
>>>>>> capability to read a CDC stream directly from a DB. However there is the
>>>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>>>> engine inside of Flink's source and can process DB changelog with all
>>>>>> processing guarantees that Flink provides.
>>>>>>
>>>>>> As for the idea of processing further with Kafka Streams. Why not
>>>>>> process data with Flink? What do you miss in Flink?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>>>
>>>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>      When I was looking for CDC I realized Flink uses Kafka
>>>>>>> Connector to stream to Flink. The idea is to send it forward to Kafka and
>>>>>>> consume it using Kafka Streams.
>>>>>>>
>>>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>>>> read from the DB ?
>>>>>>>
>>>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>>>
>>>>>>> What mechanisms does Flink have that a Kafka Connect worker does not
>>>>>>> ? Kafka Connect workers can go down and source data can be lost.
>>>>>>>
>>>>>>> Does the idea  to send it forward to Kafka and consume it using
>>>>>>> Kafka Streams make sense ? The checkpointing feature of Flink can help ? I
>>>>>>> plan to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>>>
>>>>>>> Could you point out relevant material to read ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mohan
>>>>>>>
>>>>>>
>>>> --

Martijn Visser | Product Manager

martijn@ververica.com

<https://www.ververica.com/>


Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

Re: CDC using Query

Posted by mohan radhakrishnan <ra...@gmail.com>.

Hello,
              Ok. I may not have understood the answer to my previous
question.
When I listen to https://www.youtube.com/watch?v=IOZ2Um6e430 at 20:14 he
starts to talk about this.
Is he talking about a single Kafka Connect worker or a cluster ? He
mentions that it is 'atleast-once'.
So Flink's version is an improvement ? So Flink's Kafka Connector in a
Connect cluster guarantees 'Exactly-once' ?
Please bear with me.

This will have other consequences too as our MQ may need a MQ connector.(
Probably from Flink or Confluent  )
Different connectors may have different guarantees.

Thanks.

> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>
>
> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
> once implementation.
>






On Fri, Feb 11, 2022 at 1:57 PM Martijn Visser <ma...@ververica.com>
wrote:

> Hi,
>
> The readme on the Flink CDC connectors [1] say that Oracle Databases
> version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.
>
> Best regards,
>
> Martijn
>
> [1]
> https://github.com/ververica/flink-cdc-connectors/blob/master/README.md
>
> On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
>> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
>> Connector for Oracle ?
>>
>> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>>
>>> Hello, mohan
>>>
>>> 1. Does flink have any support to track any missed source Jdbc CDC
>>> records ?
>>>
>>>
>>> Flink CDC Connector provides Exactly once semantics which means they
>>> won’t miss records. Tips: The Flink JDBC Connector only
>>> Scan the database once which can not continuously read CDC stream.
>>>
>>> 2. What is the equivalent of Kafka consumer groups ?
>>>
>>>
>>> Different database has different CDC mechanism, it’s serverId which used
>>> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>>
>>>
>>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>>
>>>
>>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>>> once implementation.
>>>
>>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>>> help you.
>>>
>>> Best,
>>> Leonard
>>>
>>> [1]
>>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>>
>>>
>>>
>>> Thanks
>>>
>>> On Friday, February 4, 2022, mohan radhakrishnan <
>>> radhakrishnan.mohan@gmail.com> wrote:
>>>
>>>> Hello,
>>>>                So the jdbc source connector is  kafka and
>>>> transformation is done by flink (flink sql) ? But that connector can miss
>>>> records. I thought. Started looking at flink for this and other use cases.
>>>> Can I see the alternative to spring cloudstreams( kafka streams )?
>>>> Since I am learning flink, kafka streams' changelog topics and exactly-once
>>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>>
>>>> We also needed a  elastic  sink.
>>>>
>>>> Thanks
>>>>
>>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Mohan,
>>>>>
>>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>>> features and differences to Flink. Flink on its own does not have a
>>>>> capability to read a CDC stream directly from a DB. However there is the
>>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>>> engine inside of Flink's source and can process DB changelog with all
>>>>> processing guarantees that Flink provides.
>>>>>
>>>>> As for the idea of processing further with Kafka Streams. Why not
>>>>> process data with Flink? What do you miss in Flink?
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>>
>>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>>
>>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>>
>>>>>> Hi,
>>>>>>      When I was looking for CDC I realized Flink uses Kafka Connector
>>>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>>>> using Kafka Streams.
>>>>>>
>>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>>> read from the DB ?
>>>>>>
>>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>>
>>>>>> What mechanisms does Flink have that a Kafka Connect worker does not
>>>>>> ? Kafka Connect workers can go down and source data can be lost.
>>>>>>
>>>>>> Does the idea  to send it forward to Kafka and consume it using Kafka
>>>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>>
>>>>>> Could you point out relevant material to read ?
>>>>>>
>>>>>> Thanks,
>>>>>> Mohan
>>>>>>
>>>>>
>>>

Re: CDC using Query

Posted by Martijn Visser <ma...@ververica.com>.

Hi,

The readme on the Flink CDC connectors [1] say that Oracle Databases
version 11, 12, 19 are supported with Oracle Driver 19.3.0.0.

Best regards,

Martijn

[1] https://github.com/ververica/flink-cdc-connectors/blob/master/README.md

On Fri, 11 Feb 2022 at 08:37, mohan radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:

> Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
> Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
> Connector for Oracle ?
>
> On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:
>
>> Hello, mohan
>>
>> 1. Does flink have any support to track any missed source Jdbc CDC
>> records ?
>>
>>
>> Flink CDC Connector provides Exactly once semantics which means they
>> won’t miss records. Tips: The Flink JDBC Connector only
>> Scan the database once which can not continuously read CDC stream.
>>
>> 2. What is the equivalent of Kafka consumer groups ?
>>
>>
>> Different database has different CDC mechanism, it’s serverId which used
>> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>>
>>
>> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>>
>>
>> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
>> once implementation.
>>
>> BTW, if your destination is Elasticsearch, the quick start demo[1] may
>> help you.
>>
>> Best,
>> Leonard
>>
>> [1]
>> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>>
>>
>>
>> Thanks
>>
>> On Friday, February 4, 2022, mohan radhakrishnan <
>> radhakrishnan.mohan@gmail.com> wrote:
>>
>>> Hello,
>>>                So the jdbc source connector is  kafka and transformation
>>> is done by flink (flink sql) ? But that connector can miss records. I
>>> thought. Started looking at flink for this and other use cases.
>>> Can I see the alternative to spring cloudstreams( kafka streams )? Since
>>> I am learning flink, kafka streams' changelog topics and exactly-once
>>> delivery and dlqs seemed good for our cŕitical push notifications.
>>>
>>> We also needed a  elastic  sink.
>>>
>>> Thanks
>>>
>>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>>> wrote:
>>>
>>>> Hi Mohan,
>>>>
>>>> I don't know much about Kafka Connect, so I will not talk about its
>>>> features and differences to Flink. Flink on its own does not have a
>>>> capability to read a CDC stream directly from a DB. However there is the
>>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>>> engine inside of Flink's source and can process DB changelog with all
>>>> processing guarantees that Flink provides.
>>>>
>>>> As for the idea of processing further with Kafka Streams. Why not
>>>> process data with Flink? What do you miss in Flink?
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>>
>>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>>
>>>>> Hi,
>>>>>      When I was looking for CDC I realized Flink uses Kafka Connector
>>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>>> using Kafka Streams.
>>>>>
>>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>>> read from the DB ?
>>>>>
>>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>>
>>>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>>>> Kafka Connect workers can go down and source data can be lost.
>>>>>
>>>>> Does the idea  to send it forward to Kafka and consume it using Kafka
>>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>>
>>>>> Could you point out relevant material to read ?
>>>>>
>>>>> Thanks,
>>>>> Mohan
>>>>>
>>>>
>>

Re: CDC using Query

Posted by mohan radhakrishnan <ra...@gmail.com>.

Thanks. I looked at it. Our primary DB is Oracle and MySql. Flink CDC
Connector uses Debezium. I think. So ververica doesn't have a Flink CDC
Connector for Oracle ?

On Mon, Feb 7, 2022 at 3:03 PM Leonard Xu <xb...@gmail.com> wrote:

> Hello, mohan
>
> 1. Does flink have any support to track any missed source Jdbc CDC records
> ?
>
>
> Flink CDC Connector provides Exactly once semantics which means they won’t
> miss records. Tips: The Flink JDBC Connector only
> Scan the database once which can not continuously read CDC stream.
>
> 2. What is the equivalent of Kafka consumer groups ?
>
>
> Different database has different CDC mechanism, it’s serverId which used
> to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL.
>
>
> 3. Delivering to kafka from flink is not exactly once. Is that right ?
>
>
> No, both Flink CDC Connector and Flink Kafka Connector provide exactly
> once implementation.
>
> BTW, if your destination is Elasticsearch, the quick start demo[1] may
> help you.
>
> Best,
> Leonard
>
> [1]
> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
>
>
>
> Thanks
>
> On Friday, February 4, 2022, mohan radhakrishnan <
> radhakrishnan.mohan@gmail.com> wrote:
>
>> Hello,
>>                So the jdbc source connector is  kafka and transformation
>> is done by flink (flink sql) ? But that connector can miss records. I
>> thought. Started looking at flink for this and other use cases.
>> Can I see the alternative to spring cloudstreams( kafka streams )? Since
>> I am learning flink, kafka streams' changelog topics and exactly-once
>> delivery and dlqs seemed good for our cŕitical push notifications.
>>
>> We also needed a  elastic  sink.
>>
>> Thanks
>>
>> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hi Mohan,
>>>
>>> I don't know much about Kafka Connect, so I will not talk about its
>>> features and differences to Flink. Flink on its own does not have a
>>> capability to read a CDC stream directly from a DB. However there is the
>>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>>> engine inside of Flink's source and can process DB changelog with all
>>> processing guarantees that Flink provides.
>>>
>>> As for the idea of processing further with Kafka Streams. Why not
>>> process data with Flink? What do you miss in Flink?
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> [1] https://github.com/ververica/flink-cdc-connectors
>>>
>>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>>
>>>> Hi,
>>>>      When I was looking for CDC I realized Flink uses Kafka Connector
>>>> to stream to Flink. The idea is to send it forward to Kafka and consume it
>>>> using Kafka Streams.
>>>>
>>>> Are there source DLQs or additional mechanisms to detect failures to
>>>> read from the DB ?
>>>>
>>>> We don't want to use Debezium and our CDC is based on queries.
>>>>
>>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>>> Kafka Connect workers can go down and source data can be lost.
>>>>
>>>> Does the idea  to send it forward to Kafka and consume it using Kafka
>>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>>
>>>> Could you point out relevant material to read ?
>>>>
>>>> Thanks,
>>>> Mohan
>>>>
>>>
>

Re: CDC using Query

Posted by Leonard Xu <xb...@gmail.com>.

Hello, mohan

> 1. Does flink have any support to track any missed source Jdbc CDC records ? 

Flink CDC Connector provides Exactly once semantics which means they won’t miss records. Tips: The Flink JDBC Connector only 
Scan the database once which can not continuously read CDC stream.

> 2. What is the equivalent of Kafka consumer groups ?

Different database has different CDC mechanism, it’s serverId which used to mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL. 


> 3. Delivering to kafka from flink is not exactly once. Is that right ?

No, both Flink CDC Connector and Flink Kafka Connector provide exactly once implementation.

BTW, if your destination is Elasticsearch, the quick start demo[1] may help you.

Best,
Leonard

[1] https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html


> 
> Thanks
> 
> On Friday, February 4, 2022, mohan radhakrishnan <radhakrishnan.mohan@gmail.com <ma...@gmail.com>> wrote:
> Hello,
>                So the jdbc source connector is  kafka and transformation is done by flink (flink sql) ? But that connector can miss records. I thought. Started looking at flink for this and other use cases.
> Can I see the alternative to spring cloudstreams( kafka streams )? Since I am learning flink, kafka streams' changelog topics and exactly-once delivery and dlqs seemed good for our cŕitical push notifications.
> 
> We also needed a  elastic  sink.
> 
> Thanks
> 
> On Friday, February 4, 2022, Dawid Wysakowicz <dwysakowicz@apache.org <ma...@apache.org>> wrote:
> Hi Mohan,
> 
> I don't know much about Kafka Connect, so I will not talk about its features and differences to Flink. Flink on its own does not have a capability to read a CDC stream directly from a DB. However there is the flink-cdc-connectors[1] projects which embeds the standalone Debezium engine inside of Flink's source and can process DB changelog with all processing guarantees that Flink provides.
> 
> As for the idea of processing further with Kafka Streams. Why not process data with Flink? What do you miss in Flink?
> 
> Best,
> 
> Dawid
> 
> [1] https://github.com/ververica/flink-cdc-connectors <https://github.com/ververica/flink-cdc-connectors>
> 
> On 04/02/2022 13:55, mohan radhakrishnan wrote:
> Hi,
>      When I was looking for CDC I realized Flink uses Kafka Connector to stream to Flink. The idea is to send it forward to Kafka and consume it using Kafka Streams.
> 
> Are there source DLQs or additional mechanisms to detect failures to read from the DB ?
> 
> We don't want to use Debezium and our CDC is based on queries.
> 
> What mechanisms does Flink have that a Kafka Connect worker does not ? Kafka Connect workers can go down and source data can be lost.
> 
> Does the idea  to send it forward to Kafka and consume it using Kafka Streams make sense ? The checkpointing feature of Flink can help ? I plan to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
> 
> Could you point out relevant material to read ?
> 
> Thanks,
> Mohan

Re: CDC using Query

Posted by mohan radhakrishnan <ra...@gmail.com>.

Hello,
             I have some specific questions. Appreciate some pointers
1. Does flink have any support to track any missed source Jdbc CDC records
?
2. What is the equivalent of Kafka consumer groups ?
3. Delivering to kafka from flink is not exactly once. Is that right ?

Thanks

On Friday, February 4, 2022, mohan radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:

> Hello,
>                So the jdbc source connector is  kafka and transformation
> is done by flink (flink sql) ? But that connector can miss records. I
> thought. Started looking at flink for this and other use cases.
> Can I see the alternative to spring cloudstreams( kafka streams )? Since I
> am learning flink, kafka streams' changelog topics and exactly-once
> delivery and dlqs seemed good for our cŕitical push notifications.
>
> We also needed a  elastic  sink.
>
> Thanks
>
> On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hi Mohan,
>>
>> I don't know much about Kafka Connect, so I will not talk about its
>> features and differences to Flink. Flink on its own does not have a
>> capability to read a CDC stream directly from a DB. However there is the
>> flink-cdc-connectors[1] projects which embeds the standalone Debezium
>> engine inside of Flink's source and can process DB changelog with all
>> processing guarantees that Flink provides.
>>
>> As for the idea of processing further with Kafka Streams. Why not process
>> data with Flink? What do you miss in Flink?
>>
>> Best,
>>
>> Dawid
>>
>> [1] https://github.com/ververica/flink-cdc-connectors
>>
>> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>>
>>> Hi,
>>>      When I was looking for CDC I realized Flink uses Kafka Connector to
>>> stream to Flink. The idea is to send it forward to Kafka and consume it
>>> using Kafka Streams.
>>>
>>> Are there source DLQs or additional mechanisms to detect failures to
>>> read from the DB ?
>>>
>>> We don't want to use Debezium and our CDC is based on queries.
>>>
>>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>>> Kafka Connect workers can go down and source data can be lost.
>>>
>>> Does the idea  to send it forward to Kafka and consume it using Kafka
>>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>>
>>> Could you point out relevant material to read ?
>>>
>>> Thanks,
>>> Mohan
>>>
>>

Re: CDC using Query

Posted by mohan radhakrishnan <ra...@gmail.com>.

Hello,
               So the jdbc source connector is  kafka and transformation is
done by flink (flink sql) ? But that connector can miss records. I thought.
Started looking at flink for this and other use cases.
Can I see the alternative to spring cloudstreams( kafka streams )? Since I
am learning flink, kafka streams' changelog topics and exactly-once
delivery and dlqs seemed good for our cŕitical push notifications.

We also needed a  elastic  sink.

Thanks

On Friday, February 4, 2022, Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi Mohan,
>
> I don't know much about Kafka Connect, so I will not talk about its
> features and differences to Flink. Flink on its own does not have a
> capability to read a CDC stream directly from a DB. However there is the
> flink-cdc-connectors[1] projects which embeds the standalone Debezium
> engine inside of Flink's source and can process DB changelog with all
> processing guarantees that Flink provides.
>
> As for the idea of processing further with Kafka Streams. Why not process
> data with Flink? What do you miss in Flink?
>
> Best,
>
> Dawid
>
> [1] https://github.com/ververica/flink-cdc-connectors
>
> On 04/02/2022 13:55, mohan radhakrishnan wrote:
>
>> Hi,
>>      When I was looking for CDC I realized Flink uses Kafka Connector to
>> stream to Flink. The idea is to send it forward to Kafka and consume it
>> using Kafka Streams.
>>
>> Are there source DLQs or additional mechanisms to detect failures to read
>> from the DB ?
>>
>> We don't want to use Debezium and our CDC is based on queries.
>>
>> What mechanisms does Flink have that a Kafka Connect worker does not ?
>> Kafka Connect workers can go down and source data can be lost.
>>
>> Does the idea  to send it forward to Kafka and consume it using Kafka
>> Streams make sense ? The checkpointing feature of Flink can help ? I plan
>> to use Kafka Streams for 'Exactly-once Delivery' and changelog topics.
>>
>> Could you point out relevant material to read ?
>>
>> Thanks,
>> Mohan
>>
>

Re: CDC using Query

Posted by Dawid Wysakowicz <dw...@apache.org>.

Hi Mohan,

I don't know much about Kafka Connect, so I will not talk about its 
features and differences to Flink. Flink on its own does not have a 
capability to read a CDC stream directly from a DB. However there is the 
flink-cdc-connectors[1] projects which embeds the standalone Debezium 
engine inside of Flink's source and can process DB changelog with all 
processing guarantees that Flink provides.

As for the idea of processing further with Kafka Streams. Why not 
process data with Flink? What do you miss in Flink?

Best,

Dawid

[1] https://github.com/ververica/flink-cdc-connectors

On 04/02/2022 13:55, mohan radhakrishnan wrote:
> Hi,
>      When I was looking for CDC I realized Flink uses Kafka Connector 
> to stream to Flink. The idea is to send it forward to Kafka and 
> consume it using Kafka Streams.
>
> Are there source DLQs or additional mechanisms to detect failures to 
> read from the DB ?
>
> We don't want to use Debezium and our CDC is based on queries.
>
> What mechanisms does Flink have that a Kafka Connect worker does not ? 
> Kafka Connect workers can go down and source data can be lost.
>
> Does the idea  to send it forward to Kafka and consume it using Kafka 
> Streams make sense ? The checkpointing feature of Flink can help ? I 
> plan to use Kafka Streams for 'Exactly-once Delivery' and changelog 
> topics.
>
> Could you point out relevant material to read ?
>
> Thanks,
> Mohan