You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/29 22:55:43 UTC

[GitHub] jihoonson opened a new issue #6952: ServerSelector should always return the best segment

jihoonson opened a new issue #6952: ServerSelector should always return the best segment 
URL: https://github.com/apache/incubator-druid/issues/6952
 
 
   In Druid, `DataSegment` is a container of segment metadata. The `DataSegment` class is immutable, but segment metadata is a sort of mutable. Some metadata (like size or loadSpec) is filled later than others. Once they are set, they are immutable and never changed.
   
   `ServerSelector` is to pick servers for processing a segment. Since either realtime tasks or historicals can serve the same segment, `DataSegment` announced by them can have different amount of information. Usually, the segment announced by realtime tasks don't have `size` and `loadSpec`. So, we can say that the segment announced by historicals is _better_ than that announced by realtime tasks since it has more information.
   
   To handle this different amount of information, `ServerSelector` is currently updating its `segment` reference whenever a new server announces a segment no matter what its type is. This would work because it will end up keeping only the reference for the segment announced by historicals when all realtime tasks are finished, but it would be better to make things happen earlier. 
   
   To do this, `ServerSelector` should always return the best segment: it should return the segment from historicals if any historical is serving the segment. Otherwise, it should return the segment from realtime tasks. This means, it should consider the case when all historicals disappear.
   
   I also found that it looks strange that `ServerSelector` always returns a non-null `DataSegment` even when all servers disappear. Probably it should return null in that case.
   
   Finally, I'm not sure how important this is. I see some people are trying to solve this problem (https://github.com/apache/incubator-druid/pull/6901#issuecomment-456919401, the commit `970b9cfcaadaa309a9dc5890ed1361ce0a9b3650`). However, `segment` in `ServerSelector` is being used in only 1) `CachingClusteredClient` to get the segment ID or shardSpec which are always immutable, and 2) some APIs (`ClientInfoResource` and `BrokerQueryResource`).
   
   @drcrallen, would you tell me more details about how this can be used elsewhere?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org