You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/07 00:45:00 UTC

[jira] [Commented] (KAFKA-9481) Improve TaskMigratedException handling on Stream thread

    [ https://issues.apache.org/jira/browse/KAFKA-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032042#comment-17032042 ] 

ASF GitHub Bot commented on KAFKA-9481:
---------------------------------------

guozhangwang commented on pull request #8058: KAFKA-9481: Graceful handling TaskMigrated and TaskCorrupted
URL: https://github.com/apache/kafka/pull/8058
 
 
   1. Removed task field from TaskMigrated; the only caller that encodes a task id from StreamTask actually do not throw so we only log it. To handle it on StreamThread we just always enforce rebalance (and we would call onPartitionsLost to remove all tasks as dirty).
   2. Added TaskCorruptedException with a set of task-ids. The first scenario of this is the restoreConsumer.poll which throws InvalidOffset indicating that the logs are truncated / compacted. To handle it on StreamThread we first close the corresponding tasks as dirty (if EOS is enabled we would also wipe out the state stores), and then revive them into the CREATED state.
   3. Re-enabled the unit test.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Improve TaskMigratedException handling on Stream thread
> -------------------------------------------------------
>
>                 Key: KAFKA-9481
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9481
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>
> Today we handle TaskMigratedException as one-task at a time, when 1) producer got fenced, 2) consumer got fenced, 3) adding records to closed tasks.
> When 1) and 2) happens, all tasks hosted by that thread should have migrated; and for 3) it only happens when we are closing a task but not clearing its corresponding record buffer.
> So a better exception handling is first better fixing 3) to also clear the record buffer when closing a task (clean or dirty), and then for 1/2) we can always treat it as all-tasks-are-migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)