You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Bariša Obradović <bb...@gmail.com> on 2022/06/01 14:38:39 UTC

Is there an HA solution to run flink job with multiple source

Hi,
we are running a flink job with multiple kafka sources connected to
different kafka servers.

The problem we are facing is when one of the kafka's is down, the flink job
starts restarting.
Is there anyway for flink to pause processing of the kafka which is down,
and yet continue processing from other sources?

Cheers,
Barisa

Re: Is there an HA solution to run flink job with multiple source

Posted by Bariša Obradović <bb...@gmail.com>.

Hi,
our use is that the data sources are independent, we are using flink to
ingest data from kafka sources, do a bit of filtering and then write it to
S3.
Since we ingest from multiple kafka sources, and they are independent, we
consider them all optional. Even if 1 just kafka is up and running, we
would like to process it's data.

We use a single flink job, since we find it easier to manage less flink
jobs, and that way we also use less resources

So far, the idea from Xuyang seems doable to me, I'll explore the idea of
subclassing existing Kafka source and making sure that kafka source can
function even if kafka is down.
In the essence, we would like to treat situation of kafka being down, being
the same as if kafka is up, but has no data.
The caveat I can think of, is to add metrics and logs when kafka is down,
so we can still be aware of it, if we need to.

Cheers,
Barisa

On Wed, 1 Jun 2022 at 23:23, Alexander Fedulov <al...@ververica.com>
wrote:

> Hi Bariša,
>
> The way I see it is you either
> - need data from all sources because you are doing some
> conjoint processing. In that case stopping the pipeline is usually the
> right thing to do.
> - the streams consumed from multiple servers are not combined and hence
> could be processed in independent Flink jobs.
> Maybe you could explain where specifically your situation does not fit in
> one of those two scenarios?
>
> Best,
> Alexander Fedulov
>
>
> On Wed, Jun 1, 2022 at 10:57 PM Jing Ge <ji...@ververica.com> wrote:
>
>> Hi Bariša,
>>
>> Could you share the reason why your data processing pipeline should keep
>> running when one kafka source is down?
>> It seems like any one among the multiple kafka sources is optional for
>> the data processing logic, because any kafka source could be the one that
>> is down.
>>
>> Best regards,
>> Jing
>>
>> On Wed, Jun 1, 2022 at 5:59 PM Xuyang <xy...@163.com> wrote:
>>
>>> I think you can try to use a custom source to do that although the one
>>> of the kafka sources is down the operator is also running(just do nothing).
>>> The only trouble is that you need to manage the checkpoint and something
>>> else yourself. But the good news is that you can copy the implementation of
>>> existing kafka source and change a little code conveniently.
>>>
>>> At 2022-06-01 22:38:39, "Bariša Obradović" <bb...@gmail.com> wrote:
>>>
>>> Hi,
>>> we are running a flink job with multiple kafka sources connected to
>>> different kafka servers.
>>>
>>> The problem we are facing is when one of the kafka's is down, the flink
>>> job starts restarting.
>>> Is there anyway for flink to pause processing of the kafka which is
>>> down, and yet continue processing from other sources?
>>>
>>> Cheers,
>>> Barisa
>>>
>>>

Re: Is there an HA solution to run flink job with multiple source

Posted by Alexander Fedulov <al...@ververica.com>.

Hi Bariša,

The way I see it is you either
- need data from all sources because you are doing some
conjoint processing. In that case stopping the pipeline is usually the
right thing to do.
- the streams consumed from multiple servers are not combined and hence
could be processed in independent Flink jobs.
Maybe you could explain where specifically your situation does not fit in
one of those two scenarios?

Best,
Alexander Fedulov


On Wed, Jun 1, 2022 at 10:57 PM Jing Ge <ji...@ververica.com> wrote:

> Hi Bariša,
>
> Could you share the reason why your data processing pipeline should keep
> running when one kafka source is down?
> It seems like any one among the multiple kafka sources is optional for the
> data processing logic, because any kafka source could be the one that is
> down.
>
> Best regards,
> Jing
>
> On Wed, Jun 1, 2022 at 5:59 PM Xuyang <xy...@163.com> wrote:
>
>> I think you can try to use a custom source to do that although the one of
>> the kafka sources is down the operator is also running(just do nothing).
>> The only trouble is that you need to manage the checkpoint and something
>> else yourself. But the good news is that you can copy the implementation of
>> existing kafka source and change a little code conveniently.
>>
>> At 2022-06-01 22:38:39, "Bariša Obradović" <bb...@gmail.com> wrote:
>>
>> Hi,
>> we are running a flink job with multiple kafka sources connected to
>> different kafka servers.
>>
>> The problem we are facing is when one of the kafka's is down, the flink
>> job starts restarting.
>> Is there anyway for flink to pause processing of the kafka which is down,
>> and yet continue processing from other sources?
>>
>> Cheers,
>> Barisa
>>
>>

Re: Is there an HA solution to run flink job with multiple source

Posted by Jing Ge <ji...@ververica.com>.

Hi Bariša,

Could you share the reason why your data processing pipeline should keep
running when one kafka source is down?
It seems like any one among the multiple kafka sources is optional for the
data processing logic, because any kafka source could be the one that is
down.

Best regards,
Jing

On Wed, Jun 1, 2022 at 5:59 PM Xuyang <xy...@163.com> wrote:

> I think you can try to use a custom source to do that although the one of
> the kafka sources is down the operator is also running(just do nothing).
> The only trouble is that you need to manage the checkpoint and something
> else yourself. But the good news is that you can copy the implementation of
> existing kafka source and change a little code conveniently.
>
> At 2022-06-01 22:38:39, "Bariša Obradović" <bb...@gmail.com> wrote:
>
> Hi,
> we are running a flink job with multiple kafka sources connected to
> different kafka servers.
>
> The problem we are facing is when one of the kafka's is down, the flink
> job starts restarting.
> Is there anyway for flink to pause processing of the kafka which is down,
> and yet continue processing from other sources?
>
> Cheers,
> Barisa
>
>

Re:Is there an HA solution to run flink job with multiple source

Posted by Xuyang <xy...@163.com>.

I think you can try to use a custom source to do that although the one of the kafka sources is down the operator is also running(just do nothing). The only trouble is that you need to manage the checkpoint and something else yourself. But the good news is that you can copy the implementation of existing kafka source and change a little code conveniently.



At 2022-06-01 22:38:39, "Bariša Obradović" <bb...@gmail.com> wrote:

Hi,
we are running a flink job with multiple kafka sources connected to different kafka servers.


The problem we are facing is when one of the kafka's is down, the flink job starts restarting.
Is there anyway for flink to pause processing of the kafka which is down, and yet continue processing from other sources?


Cheers,
Barisa