You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/02/15 21:40:52 UTC

[GitHub] jihoonson commented on issue #6001: Segment publishing order should be preserved in kafka indexing service

jihoonson commented on issue #6001: Segment publishing order should be preserved in kafka indexing service
URL: https://github.com/apache/incubator-druid/issues/6001#issuecomment-464211404
 
 
   @gianm that sounds a possible option, but, I don't want to block the stream ingestion if possible. I think it's possible to not block if we can add a new signal which the supervisor can send to tasks to start publishing segments. The changed algorithm I'm thinking is like
   
   1. Checkpointing is initialized by the task or the supervisor
   2. The supervisor sends `pause` request to all tasks in the same taskGroup.
   3. Tasks send their current offsets to the supervisor.
   4. The supervisor finds the max and sends it to all tasks in the taskGroup.
   5. Tasks resume and read up to the updated endOffsets. They generate and push segments, but _**wouldn't publish them immediately**_. Instead, they add the set of segments to a queue and send a publish request to the supervisor. 
   6. The supervisor adds the publish request to a queue. This publish request contains a taskGroup ID.
   7. If it's the first request of the previous publish has finished, the supervisor pops a request from the queue and sends a signal to tasks to start publishing.
   8. Tasks pops a set of segments from its queue, publishes them, and tells the supervisor that publish is finished.
   
   So, 1-4 steps are same with the current, but 5-8 is to guarantee the publish order across taskGroups. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org