You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gabriele Di Bernardo <ga...@me.com> on 2017/07/18 19:14:29 UTC

Kafka control source in addition to Kafka data source

Hello everyone,

I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. 

I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.

Thank you so much. 

Best,


Gabriele


Re: Kafka control source in addition to Kafka data source

Posted by Gabriele Di Bernardo <ga...@me.com>.
Hi Konstantin, 

Thank you so much for your answer. Yes, I think this is exactly what I need.

Thank you.

Best,


Gabriele 

> On 18 Jul 2017, at 21:27, Konstantin Knauf <ko...@tngtech.com> wrote:
> 
> Hi Gabriele,
> 
> I think this is actually a quite common pattern. Generally, you can
> `join` the two streams and then use a `CoFlatMapFunction`. A
> `CoFlatMapFunction` allows you to keep shared (checkpointed) state
> between two streams. It has two callbacks `flatMap1` and `flatMap2`
> which are called whenever a record from the respective stream arrives.
> You can update a data structure (state), which holds the information on
> which information to apply to which kind of element, on each element of
> the control stream. On each element of the data stream you apply to
> correct transformation based on the current state of the operator.
> 
> Does this makes sense to you? If you share a little bit about the use
> case. In particular, it would be relevant if both streams share a common
> key, on which they can be partitioned.
> 
> Cheers,
> 
> Konstantin
> 
> On 18.07.2017 21:14, Gabriele Di Bernardo wrote:
>> Hello everyone,
>> 
>> I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. 
>> 
>> I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.
>> 
>> Thank you so much. 
>> 
>> Best,
>> 
>> 
>> Gabriele
>> 
> 
> -- 
> Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082


Re: Kafka control source in addition to Kafka data source

Posted by Konstantin Knauf <ko...@tngtech.com>.
Hi Gabriele,

I think this is actually a quite common pattern. Generally, you can
`join` the two streams and then use a `CoFlatMapFunction`. A
`CoFlatMapFunction` allows you to keep shared (checkpointed) state
between two streams. It has two callbacks `flatMap1` and `flatMap2`
which are called whenever a record from the respective stream arrives.
You can update a data structure (state), which holds the information on
which information to apply to which kind of element, on each element of
the control stream. On each element of the data stream you apply to
correct transformation based on the current state of the operator.

Does this makes sense to you? If you share a little bit about the use
case. In particular, it would be relevant if both streams share a common
key, on which they can be partitioned.

Cheers,

Konstantin

On 18.07.2017 21:14, Gabriele Di Bernardo wrote:
> Hello everyone,
> 
> I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. 
> 
> I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.
> 
> Thank you so much. 
> 
> Best,
> 
> 
> Gabriele
> 

-- 
Konstantin Knauf * konstantin.knauf@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082