You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/30 21:05:09 UTC

[GitHub] [druid] a2l007 commented on issue #9791: Introduce extensibility support for Datasources

a2l007 commented on issue #9791:
URL: https://github.com/apache/druid/issues/9791#issuecomment-622111329


   @gianm  Thanks for taking a look. 
   
   Both the Broker and Historical would need to be aware of `MultiDataSource`, as Brokers would need this info to choose the custom segment selection strategy and Historicals would need to identify interval-specific timeline entries from multiple timelines. I’m not sure if I can use the `SegmentWrangler` in its current form since `MultiDataSource` is directed towards table datasources and table datasources would have to use `CachingClusteredClient`. 
   
   A `SegmentWrangler` implementation could be created with a custom `QuerySegmentWalker` but the walker might end up being much similar to `CachingClusteredClient`, especially the `getQueryRunnerForIntervals` method. 
   
   > Are there use cases you have in mind other than redesigning UnionDataSource?
   
   Apart from the UnionDataSource redesign, one other useful MultiDataSource implementation would be something like a Priority list datasource that accepts an ordered list of tables. 
   For example if the tables are `table1` and `table2`, the priority list datasource would attempt to serve every query out of `table1` and would use `table2` only for intervals with missing segments on `table1`.
   One of our usecases internally for priority list datasource is a validation mechanism for data correction on production tables. It isn’t ideal to test out the data fixes directly by reindexing the prod table,  so it's better to have an experimental table where the validations could be performed before applying it to the prod table. 
   A Priority list based datasource implementation could be used for this usecase with the table list having the experimental table first followed by the prod table.
   Another potential usecase for priority list datasource could be used to represent updates to existing data when combined with realtime ingestion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org