You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/08 00:12:46 UTC

[GitHub] jihoonson opened a new issue #6124: KafkaIndexTask can delete published segments on restart

jihoonson opened a new issue #6124: KafkaIndexTask can delete published segments on restart
URL: https://github.com/apache/incubator-druid/issues/6124
 
 
   This can happen in the following scenario.
   
   1. A kafka index task starts publishing segments.
   2. The task succeeds to publish segments and is stopped immediately (by restarting the machine).
   3. When the task is restored, it restores all sequences it kept in memory before restarting.
   4. After reading some more events from Kafka, the task tries to publish segments. These segments include the ones which were published before restarting because the restored sequences contain them.
   5. Since the segments which are published twice are already stored in metastore, the publish fails.
   6. The set of published segments in metastore is different from the set of segments the task is trying because the task read more data.
   7. The task thinks that the publish actually failed and removes the published segments from deep storage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org