You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/31 08:29:24 UTC

[GitHub] dyanarose opened a new issue #6968: Kafka Indexing task pausing forever if no data received in intermediateHandoffPeriod

dyanarose opened a new issue #6968: Kafka Indexing task pausing forever if no data received in intermediateHandoffPeriod
URL: https://github.com/apache/incubator-druid/issues/6968
 
 
   druid: 0.13.0
   Kafka: 1.1.1
   
   I've been running a number of tests locally with Kafka indexing and I believe something similar to what this was meant to fix https://github.com/apache/incubator-druid/commit/638f50cb52c248f4408975d5fc7762cc9ce82d8e is still occurring.
   
   I've set the intermediateHandoffPeriod to a low value PT5M while testing to see what handoffs and shards will look like. 
   
   If I send data in every < 5 minutes then the task continues indexing. However if there is a gap where no data lands during an intermediate hand off period I see this logged out in the task:
   
   ```
   2019-01-30T14:18:16,501 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kafka_testTopic_881471c82f88076_caeelnij]: CheckPointDataSourceMetadataAction{supervisorId='testTopic', baseSequenceName='index_kafka_testTopic_881471c82f88076', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='testTopic', partitionOffsetMap={0=3313}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='testTopic', partitionOffsetMap={0=3313}}}}
   2019-01-30T14:18:16,501 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_testTopic_881471c82f88076_caeelnij] to overlord: [CheckPointDataSourceMetadataAction{supervisorId='testTopic', baseSequenceName='index_kafka_testTopic_881471c82f88076', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='testTopic', partitionOffsetMap={0=3313}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='testTopic', partitionOffsetMap={0=3313}}}}].
   2019-01-30T14:18:16,507 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed
   2019-01-30T14:18:16,512 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed
   2019-01-30T14:18:16,520 WARN [qtp873134840-79] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Ignoring duplicate request, end offsets already set for sequences [[SequenceMetadata{sequenceName='index_kafka_testTopic_881471c82f88076_2', sequenceId=2, startOffsets={0=3313}, endOffsets={0=9223372036854775807}, assignments=[0], sentinel=false, checkpointed=false}]]
   ```
   and any further data sent to the stream is not indexed. If the supervisor task is suspended and resumed, then the task starts reading and indexing the kafka stream from the last checkpoint, and no data loss is seen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org