You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gunnar Morling <gu...@googlemail.com.INVALID> on 2021/11/26 08:08:37 UTC

Re: Handling retriable exceptions during Connect source task start

Hi all,

We encountered a similar situation in Debezium again, where an exception
during Task::start() would be desirable to be retried.

Would anything speak against implementing retriable support for
Task::start() in Kafka Connect? Would it require a KIP?

Thanks,

--Gunnar


Am Mo., 9. Aug. 2021 um 10:47 Uhr schrieb Gunnar Morling <
gunnar.morling@googlemail.com>:

> Hi,
>
> To ask slightly differently: would there be interest in a pull request for
> implementing retries, in case RetriableException is thrown from the
> Task::start() method?
>
> Thanks,
>
> --Gunnar
>
>
> Am Do., 5. Aug. 2021 um 22:27 Uhr schrieb Sergei Morozov <mo...@tut.by>:
>
>> Hi,
>>
>> I'm trying to address an issue in Debezium (DBZ-3823
>> <https://issues.redhat.com/browse/DBZ-3823>) where a source connector
>> task
>> cannot recover from a retriable exception.
>>
>> The root cause is that the task interacts with the source database during
>> SourceTask#start but Kafka Connect doesn't handle retriable exceptions
>> thrown at this stage as retriable. KIP-298
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect
>> >
>> that
>> originally introduced handling of retriable exception doesn't describe
>> handling task start exceptions, so it's unclear to me whether those aren't
>> allowed by design or it was just out of the scope of the KIP.
>>
>> My current working solution
>> <https://github.com/debezium/debezium/pull/2572> relies
>> on the internal Debezium implementation of the task restart which
>> introduces certain risks (the details are in the PR description).
>>
>> The question is: are retriable exceptions during start disallowed by
>> design, and the task must not throw retriable exceptions during start, or
>> it's just currently not supported by the Connect framework and I just need
>> to implement proper error handling in the connector?
>>
>> Thanks!
>>
>> --
>> Sergei Morozov
>>
>

Re: Handling retriable exceptions during Connect source task start

Posted by Chris Egerton <ch...@confluent.io.INVALID>.
Hi Gunnar,

I think there's some risk of introducing this retry behavior if we end up
invoking Connector::start or Task::start on the same object multiple times.
Unexpected behavior may result, such as double-allocation of resources that
are initialized in the start method and which are meant to be released in
the stop method. An alternative could be to invoke stop on the object to
allow it to perform best-effort cleanup, then initialize an entirely new
Connector or Task instance, and invoke its start method.

As far as a KIP goes, I think one might be a good idea just to ensure that
the desired behavior is agreed upon and then protected as part of the
contract for the Connector/Task API. Otherwise, if this is implemented
without a KIP, we might just as easily roll things back or change the scope
of what constitutes a retriable exception without a KIP, which might be
frustrating for connector developers.

As a final note, if the approach proposed above (invoking stop on the
failed object, then reallocating a new one and invoking start on it) seems
reasonable, we might also consider using that kind of technique for a
general "automatic restart" feature that catches anything that causes a
connector or task to fail at the moment and tries to bring it back to life.

Cheers,

Chris

On Sun, Nov 28, 2021 at 10:26 PM Luke Chen <sh...@gmail.com> wrote:

> Hi Gunnar and Sergei,
> I think it's good to have a retriable exception handling during task#start.
>
> > are retriable exceptions during start disallowed by
> design, and the task must not throw retriable exceptions during start, or
> it's just currently not supported by the Connect framework and I just need
> to implement proper error handling in the connector?
>
> > Would it require a KIP?
>
> Sorry, I'm not sure if it's by design or not supported and needed to be
> implemented.
> But I guess if you want to implement the error handling in connector, you
> might leverage existing retry configuration, or you'll create a new one.
> Either way, I think it needs a small KIP as you mentioned, task#start is
> not covered in KIP-298. On the other hand, I think having a KIP first is
> good, to make sure you're on the right track, before you get your hand
> dirty. Besides, KIP discussion would have more attention, I think. :)
>
> Thank you.
> Luke
>
> On Fri, Nov 26, 2021 at 4:09 PM Gunnar Morling
> <gu...@googlemail.com.invalid> wrote:
>
> > Hi all,
> >
> > We encountered a similar situation in Debezium again, where an exception
> > during Task::start() would be desirable to be retried.
> >
> > Would anything speak against implementing retriable support for
> > Task::start() in Kafka Connect? Would it require a KIP?
> >
> > Thanks,
> >
> > --Gunnar
> >
> >
> > Am Mo., 9. Aug. 2021 um 10:47 Uhr schrieb Gunnar Morling <
> > gunnar.morling@googlemail.com>:
> >
> > > Hi,
> > >
> > > To ask slightly differently: would there be interest in a pull request
> > for
> > > implementing retries, in case RetriableException is thrown from the
> > > Task::start() method?
> > >
> > > Thanks,
> > >
> > > --Gunnar
> > >
> > >
> > > Am Do., 5. Aug. 2021 um 22:27 Uhr schrieb Sergei Morozov <
> morozov@tut.by
> > >:
> > >
> > >> Hi,
> > >>
> > >> I'm trying to address an issue in Debezium (DBZ-3823
> > >> <https://issues.redhat.com/browse/DBZ-3823>) where a source connector
> > >> task
> > >> cannot recover from a retriable exception.
> > >>
> > >> The root cause is that the task interacts with the source database
> > during
> > >> SourceTask#start but Kafka Connect doesn't handle retriable exceptions
> > >> thrown at this stage as retriable. KIP-298
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect
> > >> >
> > >> that
> > >> originally introduced handling of retriable exception doesn't describe
> > >> handling task start exceptions, so it's unclear to me whether those
> > aren't
> > >> allowed by design or it was just out of the scope of the KIP.
> > >>
> > >> My current working solution
> > >> <https://github.com/debezium/debezium/pull/2572> relies
> > >> on the internal Debezium implementation of the task restart which
> > >> introduces certain risks (the details are in the PR description).
> > >>
> > >> The question is: are retriable exceptions during start disallowed by
> > >> design, and the task must not throw retriable exceptions during start,
> > or
> > >> it's just currently not supported by the Connect framework and I just
> > need
> > >> to implement proper error handling in the connector?
> > >>
> > >> Thanks!
> > >>
> > >> --
> > >> Sergei Morozov
> > >>
> > >
> >
>

Re: Handling retriable exceptions during Connect source task start

Posted by Luke Chen <sh...@gmail.com>.
Hi Gunnar and Sergei,
I think it's good to have a retriable exception handling during task#start.

> are retriable exceptions during start disallowed by
design, and the task must not throw retriable exceptions during start, or
it's just currently not supported by the Connect framework and I just need
to implement proper error handling in the connector?

> Would it require a KIP?

Sorry, I'm not sure if it's by design or not supported and needed to be
implemented.
But I guess if you want to implement the error handling in connector, you
might leverage existing retry configuration, or you'll create a new one.
Either way, I think it needs a small KIP as you mentioned, task#start is
not covered in KIP-298. On the other hand, I think having a KIP first is
good, to make sure you're on the right track, before you get your hand
dirty. Besides, KIP discussion would have more attention, I think. :)

Thank you.
Luke

On Fri, Nov 26, 2021 at 4:09 PM Gunnar Morling
<gu...@googlemail.com.invalid> wrote:

> Hi all,
>
> We encountered a similar situation in Debezium again, where an exception
> during Task::start() would be desirable to be retried.
>
> Would anything speak against implementing retriable support for
> Task::start() in Kafka Connect? Would it require a KIP?
>
> Thanks,
>
> --Gunnar
>
>
> Am Mo., 9. Aug. 2021 um 10:47 Uhr schrieb Gunnar Morling <
> gunnar.morling@googlemail.com>:
>
> > Hi,
> >
> > To ask slightly differently: would there be interest in a pull request
> for
> > implementing retries, in case RetriableException is thrown from the
> > Task::start() method?
> >
> > Thanks,
> >
> > --Gunnar
> >
> >
> > Am Do., 5. Aug. 2021 um 22:27 Uhr schrieb Sergei Morozov <morozov@tut.by
> >:
> >
> >> Hi,
> >>
> >> I'm trying to address an issue in Debezium (DBZ-3823
> >> <https://issues.redhat.com/browse/DBZ-3823>) where a source connector
> >> task
> >> cannot recover from a retriable exception.
> >>
> >> The root cause is that the task interacts with the source database
> during
> >> SourceTask#start but Kafka Connect doesn't handle retriable exceptions
> >> thrown at this stage as retriable. KIP-298
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect
> >> >
> >> that
> >> originally introduced handling of retriable exception doesn't describe
> >> handling task start exceptions, so it's unclear to me whether those
> aren't
> >> allowed by design or it was just out of the scope of the KIP.
> >>
> >> My current working solution
> >> <https://github.com/debezium/debezium/pull/2572> relies
> >> on the internal Debezium implementation of the task restart which
> >> introduces certain risks (the details are in the PR description).
> >>
> >> The question is: are retriable exceptions during start disallowed by
> >> design, and the task must not throw retriable exceptions during start,
> or
> >> it's just currently not supported by the Connect framework and I just
> need
> >> to implement proper error handling in the connector?
> >>
> >> Thanks!
> >>
> >> --
> >> Sergei Morozov
> >>
> >
>