You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jessy Ping <te...@gmail.com> on 2021/03/16 11:37:21 UTC

Editing job graph at runtime

Hi Team,
Is it possible to edit the job graph at runtime ? . Suppose, I want to add
a new sink to the flink application at runtime that depends upon the
specific parameters in the incoming events.Can i edit the jobgraph of a
running flink application ?

Thanks
Jessy

Re: Editing job graph at runtime

Posted by Arvid Heise <ar...@apache.org>.
Hi,

Option 2 is to implement your own Source/Sink. Currently, we have the old,
discouraged interfaces along with the new interfaces.

For source, you want to read [1]. There is a KafkaSource already in 1.12
that we consider Beta, you can replace it with the 1.13 after the 1.13
release (should be compatible). If that is too new for you or you are using
1.11-, then you may also just implement your own SourceFunction [2], quite
possibly by just copying the existing KafkaSourceFunction and adjusting it.
Inside your source function, you would read from your main=control topic
and read the additional topics as you go. Your emitted values would then
probably also contain the source topic along with your payload. How you
model your payload data types depends on your use case and in general
dynamic types are harder to use as you will experience lots of runtime
errors (misspelled field names, mismatched field types etc.). I can outline
the options once you provide more details.

With sink, it's similar, although we miss a nice documentation for the new
interface. I'd recommend having a look at the FLIP [3], which is quite
non-technical. Again, if it doesn't work for you, you can also go with old
SinkFunctions. The implementation idea is similar, you start with an
existing sink (by copying or delegating) and write to additional topics as
you go. Again, the dynamic types (different types for output topic1 and
topic2?) will cause you quite a bit of runtime bugs.

If all types are the same, there are better options than going with custom
sources and sinks. Implementing them is an advanced topic, so I usually
discourage them unless you have lots of experience already. I can help in
any case. But in that case, I also need to understand your use case better.
(In general, if you ask a technical question, you get a technical answer.
If you ask it on a conceptual level, we may also offer architectural
advice).

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/sources.html
[2]
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

On Tue, Mar 23, 2021 at 8:10 AM Jessy Ping <te...@gmail.com>
wrote:

> Hi Arvid,
>
> Thanks for the reply.
>
> I am currently exploring the flink features and we have certain use cases
> where new producers will be added the system dynamically and we don't want
> to restart the application frequently.
>
> It will be helpful if you explain the option 2 in detail ?
>
> Thanks & Regards
> Jessy
>
> On Mon, 22 Mar 2021 at 19:27, Arvid Heise <ar...@apache.org> wrote:
>
>> Hi Jessy,
>>
>> Can I add a new sink into the execution graph at runtime, for example : a
>>> new Kafka producer , without restarting the current application  or using
>>> option1 ?
>>>
>>
>> No, there is no way to add a sink without restart currently. Could you
>> elaborate why a restart is not an option for you?
>>
>> You can use Option 2, which means that you implement 1 source and 1 sink
>> which will dynamically read from or write to different topics possibly by
>> wrapping the existing source and sink. This is a rather complex task that I
>> would not recommend to a new Flink user.
>>
>> If you have a known set of possible sink topics, another option would be
>> to add all sinks from the go and only route messages dynamically with
>> side-outputs. However, I'm not aware that such a pattern exists for
>> sources. Although with the new source interface, it should be possible to
>> do that.
>>
>> On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <te...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> Can you provide your thoughts on this, it will be helpful ..
>>>
>>> Thanks
>>> Jessy
>>>
>>> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com>
>>> wrote:
>>>
>>>> Hi Timo/Team,
>>>> Thanks for the reply.
>>>>
>>>> Just take the example from the following pseduo code,
>>>> Suppose , this is the current application logic.
>>>>
>>>> firstInputStream = addSource(...)* //Kafka consumer C1*
>>>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>>>
>>>> outputStream = firstInputStream,keyBy(a -> a.key)
>>>> .connect(secondInputStream.keyBy(b->b.key))
>>>> .coProcessFunction(....)
>>>> * // logic determines : whether a new sink should be added to the
>>>> application or not ?. If not: then the event will be produced to the
>>>> existing sink(s). If a new sink is required: produce the events to the
>>>> existing sinks + the new one*
>>>> sink1 = addSink(outPutStream). //Kafka producer P1
>>>> .
>>>> .
>>>> .
>>>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>>>
>>>> *Questions*
>>>> --> Can I add a new sink into the execution graph at runtime, for
>>>> example : a new Kafka producer , without restarting the current
>>>> application  or using option1 ?
>>>>
>>>> -->  (Option 2 )What do you mean by adding a custom sink at
>>>> coProcessFunction , how will it change the execution graph ?
>>>>
>>>> Thanks
>>>> Jessy
>>>>
>>>>
>>>>
>>>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>>>>
>>>>> Hi Jessy,
>>>>>
>>>>> to be precise, the JobGraph is not used at runtime. It is translated
>>>>> into an ExecutionGraph.
>>>>>
>>>>> But nevertheless such patterns are possible but require a bit of
>>>>> manual
>>>>> implementation.
>>>>>
>>>>> Option 1) You stop the job with a savepoint and restart the
>>>>> application
>>>>> with slightly different parameters. If the pipeline has not changed
>>>>> much, the old state can be remapped to the slightly modified job
>>>>> graph.
>>>>> This is the easiest solution but with the downside of maybe a couple
>>>>> of
>>>>> seconds downtime.
>>>>>
>>>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>>>> connect() DataStream API [1]). Either you implement a custom sink in
>>>>> the
>>>>> main stream of the CoProcessFunction. Or you enrich every record in
>>>>> the
>>>>> main stream with sink parameters that are read by you custom sink
>>>>> implementation.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>>>
>>>>> On 16.03.21 12:37, Jessy Ping wrote:
>>>>> > Hi Team,
>>>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want
>>>>> to
>>>>> > add a new sink to the flink application at runtime that depends upon
>>>>> > the  specific parameters in the incoming events.Can i edit the
>>>>> jobgraph
>>>>> > of a running flink application ?
>>>>> >
>>>>> > Thanks
>>>>> > Jessy
>>>>>
>>>>>

Re: Editing job graph at runtime

Posted by Jessy Ping <te...@gmail.com>.
Hi Arvid,

Thanks for the reply.

I am currently exploring the flink features and we have certain use cases
where new producers will be added the system dynamically and we don't want
to restart the application frequently.

It will be helpful if you explain the option 2 in detail ?

Thanks & Regards
Jessy

On Mon, 22 Mar 2021 at 19:27, Arvid Heise <ar...@apache.org> wrote:

> Hi Jessy,
>
> Can I add a new sink into the execution graph at runtime, for example : a
>> new Kafka producer , without restarting the current application  or using
>> option1 ?
>>
>
> No, there is no way to add a sink without restart currently. Could you
> elaborate why a restart is not an option for you?
>
> You can use Option 2, which means that you implement 1 source and 1 sink
> which will dynamically read from or write to different topics possibly by
> wrapping the existing source and sink. This is a rather complex task that I
> would not recommend to a new Flink user.
>
> If you have a known set of possible sink topics, another option would be
> to add all sinks from the go and only route messages dynamically with
> side-outputs. However, I'm not aware that such a pattern exists for
> sources. Although with the new source interface, it should be possible to
> do that.
>
> On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <te...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> Can you provide your thoughts on this, it will be helpful ..
>>
>> Thanks
>> Jessy
>>
>> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com>
>> wrote:
>>
>>> Hi Timo/Team,
>>> Thanks for the reply.
>>>
>>> Just take the example from the following pseduo code,
>>> Suppose , this is the current application logic.
>>>
>>> firstInputStream = addSource(...)* //Kafka consumer C1*
>>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>>
>>> outputStream = firstInputStream,keyBy(a -> a.key)
>>> .connect(secondInputStream.keyBy(b->b.key))
>>> .coProcessFunction(....)
>>> * // logic determines : whether a new sink should be added to the
>>> application or not ?. If not: then the event will be produced to the
>>> existing sink(s). If a new sink is required: produce the events to the
>>> existing sinks + the new one*
>>> sink1 = addSink(outPutStream). //Kafka producer P1
>>> .
>>> .
>>> .
>>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>>
>>> *Questions*
>>> --> Can I add a new sink into the execution graph at runtime, for
>>> example : a new Kafka producer , without restarting the current
>>> application  or using option1 ?
>>>
>>> -->  (Option 2 )What do you mean by adding a custom sink at
>>> coProcessFunction , how will it change the execution graph ?
>>>
>>> Thanks
>>> Jessy
>>>
>>>
>>>
>>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jessy,
>>>>
>>>> to be precise, the JobGraph is not used at runtime. It is translated
>>>> into an ExecutionGraph.
>>>>
>>>> But nevertheless such patterns are possible but require a bit of manual
>>>> implementation.
>>>>
>>>> Option 1) You stop the job with a savepoint and restart the application
>>>> with slightly different parameters. If the pipeline has not changed
>>>> much, the old state can be remapped to the slightly modified job graph.
>>>> This is the easiest solution but with the downside of maybe a couple of
>>>> seconds downtime.
>>>>
>>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>>> connect() DataStream API [1]). Either you implement a custom sink in
>>>> the
>>>> main stream of the CoProcessFunction. Or you enrich every record in the
>>>> main stream with sink parameters that are read by you custom sink
>>>> implementation.
>>>>
>>>> I hope this helps.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> [1]
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>>
>>>> On 16.03.21 12:37, Jessy Ping wrote:
>>>> > Hi Team,
>>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want
>>>> to
>>>> > add a new sink to the flink application at runtime that depends upon
>>>> > the  specific parameters in the incoming events.Can i edit the
>>>> jobgraph
>>>> > of a running flink application ?
>>>> >
>>>> > Thanks
>>>> > Jessy
>>>>
>>>>

Re: Editing job graph at runtime

Posted by Ejaskhan S <ia...@gmail.com>.
Thanks Arvid for the reply.

Can you please elaborate a little bit on option 2 , if possible ?

We are looking for a similar option. Currently we are proceeding with
option 1.

Thanks Jessy for the question

On Mon, Mar 22, 2021, 7:27 PM Arvid Heise <ar...@apache.org> wrote:

> Hi Jessy,
>
> Can I add a new sink into the execution graph at runtime, for example : a
>> new Kafka producer , without restarting the current application  or using
>> option1 ?
>>
>
> No, there is no way to add a sink without restart currently. Could you
> elaborate why a restart is not an option for you?
>
> You can use Option 2, which means that you implement 1 source and 1 sink
> which will dynamically read from or write to different topics possibly by
> wrapping the existing source and sink. This is a rather complex task that I
> would not recommend to a new Flink user.
>
> If you have a known set of possible sink topics, another option would be
> to add all sinks from the go and only route messages dynamically with
> side-outputs. However, I'm not aware that such a pattern exists for
> sources. Although with the new source interface, it should be possible to
> do that.
>
> On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <te...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> Can you provide your thoughts on this, it will be helpful ..
>>
>> Thanks
>> Jessy
>>
>> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com>
>> wrote:
>>
>>> Hi Timo/Team,
>>> Thanks for the reply.
>>>
>>> Just take the example from the following pseduo code,
>>> Suppose , this is the current application logic.
>>>
>>> firstInputStream = addSource(...)* //Kafka consumer C1*
>>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>>
>>> outputStream = firstInputStream,keyBy(a -> a.key)
>>> .connect(secondInputStream.keyBy(b->b.key))
>>> .coProcessFunction(....)
>>> * // logic determines : whether a new sink should be added to the
>>> application or not ?. If not: then the event will be produced to the
>>> existing sink(s). If a new sink is required: produce the events to the
>>> existing sinks + the new one*
>>> sink1 = addSink(outPutStream). //Kafka producer P1
>>> .
>>> .
>>> .
>>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>>
>>> *Questions*
>>> --> Can I add a new sink into the execution graph at runtime, for
>>> example : a new Kafka producer , without restarting the current
>>> application  or using option1 ?
>>>
>>> -->  (Option 2 )What do you mean by adding a custom sink at
>>> coProcessFunction , how will it change the execution graph ?
>>>
>>> Thanks
>>> Jessy
>>>
>>>
>>>
>>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jessy,
>>>>
>>>> to be precise, the JobGraph is not used at runtime. It is translated
>>>> into an ExecutionGraph.
>>>>
>>>> But nevertheless such patterns are possible but require a bit of manual
>>>> implementation.
>>>>
>>>> Option 1) You stop the job with a savepoint and restart the application
>>>> with slightly different parameters. If the pipeline has not changed
>>>> much, the old state can be remapped to the slightly modified job graph.
>>>> This is the easiest solution but with the downside of maybe a couple of
>>>> seconds downtime.
>>>>
>>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>>> connect() DataStream API [1]). Either you implement a custom sink in
>>>> the
>>>> main stream of the CoProcessFunction. Or you enrich every record in the
>>>> main stream with sink parameters that are read by you custom sink
>>>> implementation.
>>>>
>>>> I hope this helps.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> [1]
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>>
>>>> On 16.03.21 12:37, Jessy Ping wrote:
>>>> > Hi Team,
>>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want
>>>> to
>>>> > add a new sink to the flink application at runtime that depends upon
>>>> > the  specific parameters in the incoming events.Can i edit the
>>>> jobgraph
>>>> > of a running flink application ?
>>>> >
>>>> > Thanks
>>>> > Jessy
>>>>
>>>>

Re: Editing job graph at runtime

Posted by Ejaskhan S <ia...@gmail.com>.
Thanks Arvid for the reply.

Can you please elaborate a little bit on option 2 , if possible ?

Thanks
Jessy

On Mon, Mar 22, 2021, 7:27 PM Arvid Heise <ar...@apache.org> wrote:

> Hi Jessy,
>
> Can I add a new sink into the execution graph at runtime, for example : a
>> new Kafka producer , without restarting the current application  or using
>> option1 ?
>>
>
> No, there is no way to add a sink without restart currently. Could you
> elaborate why a restart is not an option for you?
>
> You can use Option 2, which means that you implement 1 source and 1 sink
> which will dynamically read from or write to different topics possibly by
> wrapping the existing source and sink. This is a rather complex task that I
> would not recommend to a new Flink user.
>
> If you have a known set of possible sink topics, another option would be
> to add all sinks from the go and only route messages dynamically with
> side-outputs. However, I'm not aware that such a pattern exists for
> sources. Although with the new source interface, it should be possible to
> do that.
>
> On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <te...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> Can you provide your thoughts on this, it will be helpful ..
>>
>> Thanks
>> Jessy
>>
>> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com>
>> wrote:
>>
>>> Hi Timo/Team,
>>> Thanks for the reply.
>>>
>>> Just take the example from the following pseduo code,
>>> Suppose , this is the current application logic.
>>>
>>> firstInputStream = addSource(...)* //Kafka consumer C1*
>>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>>
>>> outputStream = firstInputStream,keyBy(a -> a.key)
>>> .connect(secondInputStream.keyBy(b->b.key))
>>> .coProcessFunction(....)
>>> * // logic determines : whether a new sink should be added to the
>>> application or not ?. If not: then the event will be produced to the
>>> existing sink(s). If a new sink is required: produce the events to the
>>> existing sinks + the new one*
>>> sink1 = addSink(outPutStream). //Kafka producer P1
>>> .
>>> .
>>> .
>>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>>
>>> *Questions*
>>> --> Can I add a new sink into the execution graph at runtime, for
>>> example : a new Kafka producer , without restarting the current
>>> application  or using option1 ?
>>>
>>> -->  (Option 2 )What do you mean by adding a custom sink at
>>> coProcessFunction , how will it change the execution graph ?
>>>
>>> Thanks
>>> Jessy
>>>
>>>
>>>
>>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jessy,
>>>>
>>>> to be precise, the JobGraph is not used at runtime. It is translated
>>>> into an ExecutionGraph.
>>>>
>>>> But nevertheless such patterns are possible but require a bit of manual
>>>> implementation.
>>>>
>>>> Option 1) You stop the job with a savepoint and restart the application
>>>> with slightly different parameters. If the pipeline has not changed
>>>> much, the old state can be remapped to the slightly modified job graph.
>>>> This is the easiest solution but with the downside of maybe a couple of
>>>> seconds downtime.
>>>>
>>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>>> connect() DataStream API [1]). Either you implement a custom sink in
>>>> the
>>>> main stream of the CoProcessFunction. Or you enrich every record in the
>>>> main stream with sink parameters that are read by you custom sink
>>>> implementation.
>>>>
>>>> I hope this helps.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> [1]
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>>
>>>> On 16.03.21 12:37, Jessy Ping wrote:
>>>> > Hi Team,
>>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want
>>>> to
>>>> > add a new sink to the flink application at runtime that depends upon
>>>> > the  specific parameters in the incoming events.Can i edit the
>>>> jobgraph
>>>> > of a running flink application ?
>>>> >
>>>> > Thanks
>>>> > Jessy
>>>>
>>>>

Re: Editing job graph at runtime

Posted by Arvid Heise <ar...@apache.org>.
Hi Jessy,

Can I add a new sink into the execution graph at runtime, for example : a
> new Kafka producer , without restarting the current application  or using
> option1 ?
>

No, there is no way to add a sink without restart currently. Could you
elaborate why a restart is not an option for you?

You can use Option 2, which means that you implement 1 source and 1 sink
which will dynamically read from or write to different topics possibly by
wrapping the existing source and sink. This is a rather complex task that I
would not recommend to a new Flink user.

If you have a known set of possible sink topics, another option would be to
add all sinks from the go and only route messages dynamically with
side-outputs. However, I'm not aware that such a pattern exists for
sources. Although with the new source interface, it should be possible to
do that.

On Wed, Mar 17, 2021 at 7:12 AM Jessy Ping <te...@gmail.com>
wrote:

> Hi Team,
>
> Can you provide your thoughts on this, it will be helpful ..
>
> Thanks
> Jessy
>
> On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com>
> wrote:
>
>> Hi Timo/Team,
>> Thanks for the reply.
>>
>> Just take the example from the following pseduo code,
>> Suppose , this is the current application logic.
>>
>> firstInputStream = addSource(...)* //Kafka consumer C1*
>> secondInputStream =  addSource(...) *//Kafka consumer C2*
>>
>> outputStream = firstInputStream,keyBy(a -> a.key)
>> .connect(secondInputStream.keyBy(b->b.key))
>> .coProcessFunction(....)
>> * // logic determines : whether a new sink should be added to the
>> application or not ?. If not: then the event will be produced to the
>> existing sink(s). If a new sink is required: produce the events to the
>> existing sinks + the new one*
>> sink1 = addSink(outPutStream). //Kafka producer P1
>> .
>> .
>> .
>> sinkN =  addSink(outPutStream). //Kafka producer PN
>>
>> *Questions*
>> --> Can I add a new sink into the execution graph at runtime, for example
>> : a new Kafka producer , without restarting the current application  or
>> using option1 ?
>>
>> -->  (Option 2 )What do you mean by adding a custom sink at
>> coProcessFunction , how will it change the execution graph ?
>>
>> Thanks
>> Jessy
>>
>>
>>
>> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>>
>>> Hi Jessy,
>>>
>>> to be precise, the JobGraph is not used at runtime. It is translated
>>> into an ExecutionGraph.
>>>
>>> But nevertheless such patterns are possible but require a bit of manual
>>> implementation.
>>>
>>> Option 1) You stop the job with a savepoint and restart the application
>>> with slightly different parameters. If the pipeline has not changed
>>> much, the old state can be remapped to the slightly modified job graph.
>>> This is the easiest solution but with the downside of maybe a couple of
>>> seconds downtime.
>>>
>>> Option 2) You introduce a dedicated control stream (i.e. by using the
>>> connect() DataStream API [1]). Either you implement a custom sink in the
>>> main stream of the CoProcessFunction. Or you enrich every record in the
>>> main stream with sink parameters that are read by you custom sink
>>> implementation.
>>>
>>> I hope this helps.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> [1]
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>>
>>> On 16.03.21 12:37, Jessy Ping wrote:
>>> > Hi Team,
>>> > Is it possible to edit the job graph at runtime ? . Suppose, I want to
>>> > add a new sink to the flink application at runtime that depends upon
>>> > the  specific parameters in the incoming events.Can i edit the
>>> jobgraph
>>> > of a running flink application ?
>>> >
>>> > Thanks
>>> > Jessy
>>>
>>>

Re: Editing job graph at runtime

Posted by Jessy Ping <te...@gmail.com>.
Hi Team,

Can you provide your thoughts on this, it will be helpful ..

Thanks
Jessy

On Tue, 16 Mar 2021 at 21:29, Jessy Ping <te...@gmail.com> wrote:

> Hi Timo/Team,
> Thanks for the reply.
>
> Just take the example from the following pseduo code,
> Suppose , this is the current application logic.
>
> firstInputStream = addSource(...)* //Kafka consumer C1*
> secondInputStream =  addSource(...) *//Kafka consumer C2*
>
> outputStream = firstInputStream,keyBy(a -> a.key)
> .connect(secondInputStream.keyBy(b->b.key))
> .coProcessFunction(....)
> * // logic determines : whether a new sink should be added to the
> application or not ?. If not: then the event will be produced to the
> existing sink(s). If a new sink is required: produce the events to the
> existing sinks + the new one*
> sink1 = addSink(outPutStream). //Kafka producer P1
> .
> .
> .
> sinkN =  addSink(outPutStream). //Kafka producer PN
>
> *Questions*
> --> Can I add a new sink into the execution graph at runtime, for example
> : a new Kafka producer , without restarting the current application  or
> using option1 ?
>
> -->  (Option 2 )What do you mean by adding a custom sink at
> coProcessFunction , how will it change the execution graph ?
>
> Thanks
> Jessy
>
>
>
> On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:
>
>> Hi Jessy,
>>
>> to be precise, the JobGraph is not used at runtime. It is translated
>> into an ExecutionGraph.
>>
>> But nevertheless such patterns are possible but require a bit of manual
>> implementation.
>>
>> Option 1) You stop the job with a savepoint and restart the application
>> with slightly different parameters. If the pipeline has not changed
>> much, the old state can be remapped to the slightly modified job graph.
>> This is the easiest solution but with the downside of maybe a couple of
>> seconds downtime.
>>
>> Option 2) You introduce a dedicated control stream (i.e. by using the
>> connect() DataStream API [1]). Either you implement a custom sink in the
>> main stream of the CoProcessFunction. Or you enrich every record in the
>> main stream with sink parameters that are read by you custom sink
>> implementation.
>>
>> I hope this helps.
>>
>> Regards,
>> Timo
>>
>>
>> [1]
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>>
>> On 16.03.21 12:37, Jessy Ping wrote:
>> > Hi Team,
>> > Is it possible to edit the job graph at runtime ? . Suppose, I want to
>> > add a new sink to the flink application at runtime that depends upon
>> > the  specific parameters in the incoming events.Can i edit the jobgraph
>> > of a running flink application ?
>> >
>> > Thanks
>> > Jessy
>>
>>

Re: Editing job graph at runtime

Posted by Jessy Ping <te...@gmail.com>.
Hi Timo/Team,
Thanks for the reply.

Just take the example from the following pseduo code,
Suppose , this is the current application logic.

firstInputStream = addSource(...)* //Kafka consumer C1*
secondInputStream =  addSource(...) *//Kafka consumer C2*

outputStream = firstInputStream,keyBy(a -> a.key)
.connect(secondInputStream.keyBy(b->b.key))
.coProcessFunction(....)
* // logic determines : whether a new sink should be added to the
application or not ?. If not: then the event will be produced to the
existing sink(s). If a new sink is required: produce the events to the
existing sinks + the new one*
sink1 = addSink(outPutStream). //Kafka producer P1
.
.
.
sinkN =  addSink(outPutStream). //Kafka producer PN

*Questions*
--> Can I add a new sink into the execution graph at runtime, for example :
a new Kafka producer , without restarting the current application  or using
option1 ?

-->  (Option 2 )What do you mean by adding a custom sink at
coProcessFunction , how will it change the execution graph ?

Thanks
Jessy



On Tue, 16 Mar 2021 at 17:45, Timo Walther <tw...@apache.org> wrote:

> Hi Jessy,
>
> to be precise, the JobGraph is not used at runtime. It is translated
> into an ExecutionGraph.
>
> But nevertheless such patterns are possible but require a bit of manual
> implementation.
>
> Option 1) You stop the job with a savepoint and restart the application
> with slightly different parameters. If the pipeline has not changed
> much, the old state can be remapped to the slightly modified job graph.
> This is the easiest solution but with the downside of maybe a couple of
> seconds downtime.
>
> Option 2) You introduce a dedicated control stream (i.e. by using the
> connect() DataStream API [1]). Either you implement a custom sink in the
> main stream of the CoProcessFunction. Or you enrich every record in the
> main stream with sink parameters that are read by you custom sink
> implementation.
>
> I hope this helps.
>
> Regards,
> Timo
>
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect
>
> On 16.03.21 12:37, Jessy Ping wrote:
> > Hi Team,
> > Is it possible to edit the job graph at runtime ? . Suppose, I want to
> > add a new sink to the flink application at runtime that depends upon
> > the  specific parameters in the incoming events.Can i edit the jobgraph
> > of a running flink application ?
> >
> > Thanks
> > Jessy
>
>

Re: Editing job graph at runtime

Posted by Timo Walther <tw...@apache.org>.
Hi Jessy,

to be precise, the JobGraph is not used at runtime. It is translated 
into an ExecutionGraph.

But nevertheless such patterns are possible but require a bit of manual 
implementation.

Option 1) You stop the job with a savepoint and restart the application 
with slightly different parameters. If the pipeline has not changed 
much, the old state can be remapped to the slightly modified job graph. 
This is the easiest solution but with the downside of maybe a couple of 
seconds downtime.

Option 2) You introduce a dedicated control stream (i.e. by using the 
connect() DataStream API [1]). Either you implement a custom sink in the 
main stream of the CoProcessFunction. Or you enrich every record in the 
main stream with sink parameters that are read by you custom sink 
implementation.

I hope this helps.

Regards,
Timo


[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/overview/#connect

On 16.03.21 12:37, Jessy Ping wrote:
> Hi Team,
> Is it possible to edit the job graph at runtime ? . Suppose, I want to 
> add a new sink to the flink application at runtime that depends upon 
> theĀ  specific parameters in the incoming events.Can i edit the jobgraph 
> of a running flink application ?
> 
> Thanks
> Jessy