You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/03/09 17:09:26 UTC

[GitHub] [druid] jerchung opened a new issue #9486: Constant pauses and resumes for Kafka Indexing Service Tasks on empty topics when intermediateHandoffPeriod is configured

jerchung opened a new issue #9486: Constant pauses and resumes for Kafka Indexing Service Tasks on empty topics when intermediateHandoffPeriod is configured
URL: https://github.com/apache/druid/issues/9486
 
 
   For Kafka Supervisor Specs that are configured with an `intermediateHandoffPeriod`, there is the possibility of the tasks constantly getting paused and resumed in the event that no events are received within the assigned partitions of the task within the `intermediateHandoffPeriod`
   
   ### Affected Version
   
   0.17.0
   
   ### Description
   The run loop of the `SeekableStreamIndexTaskRunner` has a check to publish a checkpoint for the assigned metadata at an interval of `nextCheckpointTime`. 
   
   https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L766
   
   This `nextCheckpointTime` is set by the [`resetNextCheckpointTime` method](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1759) at the [initialization of the of the task](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L272), and [every time that the task is resumed](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L1707).
   
   However, in the event that the latest offsets in the assigned partitions match the start offsets of the task (i.e. when the assigned partitions do not receive any events), the task is resumed but `resetNextCheckpointTime` method is never called. This means that the `nextCheckpointTime` stays as the time called at initialization, and the [checkpoint interval check](https://github.com/apache/druid/blob/a6776648112917b72c077ba3ac0cb7f61993a2d0/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java#L765) will constantly pass, causing the task to pause and resume itself for checkpointing over and over again.
   
   I would imagine a naive fix would be to ensure that even in the event that the endOffsets are the same as the starting offsets, that the checkpoint time is still moved forward, but I'm not familiar enough with the code to understand the ramifications of such a change.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org