You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/11/01 21:22:57 UTC

[GitHub] [incubator-druid] SEKIRO-J commented on issue #8748: Use earliest offset on kafka newly discovered partitions

SEKIRO-J commented on issue #8748: Use earliest offset on kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#issuecomment-548955564
 
 
   > Does this really solve the original bug? It adds the partition to partitionIds, so it's detected, which is an improvement; the fact that this isn't happening now seems to be a regression. But this patch does not seem to do anything to force the system to read from the earliest offset for this newly-discovered partition. I skimmed through the tests and didn't see a test for this case either.
   
   
   The root cause for the original bug was:
   newly added partition in kafka never recognized as "newly added", so it won't be using `useEarliestOffset==true`, the reason why this is happening is:
   https://github.com/apache/incubator-druid/blob/53094e159db4b04f0b5b356e064b011d66bad21b/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L448 never gets updated, so https://github.com/apache/incubator-druid/blob/53094e159db4b04f0b5b356e064b011d66bad21b/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L1884 `initialPartitionDiscovery ` is always true and `subsequentlyDiscoveredPartitions` never updates.
   So all the "newly discovered partitions" are regarded as "partitions when the supervisor first run", which reads `earliestOffset` from config. 
   
   The code for using `useEarliestOffset==true` for newly discovered partitions already exists:
   https://github.com/apache/incubator-druid/blob/53094e159db4b04f0b5b356e064b011d66bad21b/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L2860,
   So the solution to this bug just needs to update `partitionIds`, so that `subsequentlyDiscoveredPartitions ` can be updated.
   
   
   I have a test case to test this behavior, but due to some reasons, like "metadata.max.age.ms" is set to 10s and immutable, and other system environment things. The regression test could be flaky,  so I just wrote a unit test to test the function I added. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org