You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/12/09 17:02:42 UTC

[GitHub] [kafka] blcksrx opened a new pull request #11588: KAFKA-13485: Restart connectors after RetriableException raised from Task::start()

blcksrx opened a new pull request #11588:
URL: https://github.com/apache/kafka/pull/11588


   If a `RetriableException` is raised from `Task::start()`, this doesn't trigger an attempt to start that connector again. I.e. the restart functionality currently is only implemented for exceptions raised from `poll()/put()`. Triggering restarts also upon failures during `start()` would be desirable, so to circumvent temporary failure conditions like a network hickup which currrently require a manual restart of the affected tasks, if a connector for instance establishes a database connection during `start()`.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] C0urante commented on pull request #11588: KAFKA-13485: Restart connectors after RetriableException raised from Task::start()

Posted by GitBox <gi...@apache.org>.
C0urante commented on pull request #11588:
URL: https://github.com/apache/kafka/pull/11588#issuecomment-997964564


   @blcksrx did you catch the [mailing list thread](https://www.mail-archive.com/dev@kafka.apache.org/msg120391.html) about this? I left this comment on there:
   
   > I think there's some risk of introducing this retry behavior if we end up invoking Connector::start or Task::start on the same object multiple times. Unexpected behavior may result, such as double-allocation of resources that are initialized in the start method and which are meant to be released in the stop method. An alternative could be to invoke stop on the object to allow it to perform best-effort cleanup, then initialize an entirely new Connector or Task instance, and invoke its start method.
   
   It's worth keeping in mind that some connectors may throw `RetriableException`s from `start` right now but not handle this case properly. If we add this behavior now and someone upgrades their worker to a version with this change, that kind of connector being restarted in a loop may end up crippling their worker.
   
   
   
   On a separate note, with these changes, what would happen if a task were stuck in a retry loop, but then scheduled for shutdown (because of rebalance, deletion of the connector, reconfiguration, etc.)? If the answer is "the task will keep retrying until `start` either fails with a non-retriable error or succeeds" then we may want to refine the logic a little bit in order to avoid accruing zombie tasks in that situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] blcksrx commented on pull request #11588: KAFKA-13485: Restart connectors after RetriableException raised from Task::start()

Posted by GitBox <gi...@apache.org>.
blcksrx commented on pull request #11588:
URL: https://github.com/apache/kafka/pull/11588#issuecomment-998130538


   Absolutely you are right, maybe its the best to consider retryMaxTimeout beside of that. Also it makes sense to invoke stop in that case but re-initialisation im not sure about it.
   In addition, In case of re-balancing the it tries again and from this point you are right


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org