You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/24 05:16:01 UTC

[GitHub] [druid] maytasm opened a new issue #9763: Kinesis supervisor is showing unhealthy and task are not running when one or more partition is empty

maytasm opened a new issue #9763:
URL: https://github.com/apache/druid/issues/9763


   
   ### Affected Version
   0.18.0
   
   ### Description
   
   There is no problem when index task is running and polling from kafka/kinesis stream with one or more empty shards (as tested in KafkaIndexTaskTest.java and KinesisIndexTaskTest.java). The problem for Kinesis described in the tittle is when we try to get the sequence number in SeekableStreamSupervisor#getOffsetFromStorageForPartition and Kinesis has one or more empty shard (as tested in KinesisRecordSupplierTest.java and SeekableStreamSupervisorStateTest.java). More specifically, this happens for the following conditions:
   
   - we don't have a startingOffset (first run or we had some previous failures and reset the sequences) and don't have offset in metadata store so we retrieve the latest or earliest Kinesis sequence
   
   - we don't have a startingOffset (first run or we had some previous failures and reset the sequences) and we have offset in metadata store but skipSequenceNumberAvailabilityCheck=False
   
   Currently, in SeekableStreamSupervisor#getOffsetFromStorageForPartition, after we use a ShardIterator to get some records, you get back a new iterator to continue reading where you left off. The thing is, it doesn't matter whether or not you've already reached the end of the stream, you'll still get back a valid ShardIterator. As long as the shard is open, any call to GetRecords with a valid (unexpired) ShardIterator will provide a valid non-null NextShardIterator. Hence, we keep getting new ShardIterator until we timeout and then throw an ISE exception which then resulted in Kinesis supervisor showing unhealthy. 
   
   We should determine when Kinesis shard is empty and not rely on timeout. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org