You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/20 21:00:31 UTC

[GitHub] [incubator-druid] leventov opened a new pull request #7306: Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegments

leventov opened a new pull request #7306: Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegments
URL: https://github.com/apache/incubator-druid/pull/7306
 
 
   ### Description
   There are several related sets of changes:
   #### Reconciles terminology and method naming to 'used/unused segments'
   This naming stems from the `used=0/1` column in the database table. Other names that used to be used in the codebase are
    - "database segments"
    - "database available segments"
    - "available segments"
   
   You can see how the third one, "available segments", has arose, but the result is totally misleading and confusing with segments which are available aka served by some data nodes.
   
   #### Don't use terms 'enable/disable data source'
   "Enable data source" actually means "try to mark as 'used' all segments in the database that belong to a data source". "Disable data source" actually means "try to mark as unused all segments in the database that belong to a data source". Enable/disable terminology might make users think that Druid maintains a separate toggle with the notion of individual segment's `used` flag, which is false. Disabling a data source and then enabling it again loses information about subset of segments that may had been unused in a data source.
   
   Similarly, uses terms 'Used/unused data source' instead of 'enabled/disabled data source' to differentiate between data sources to which at least one used segment belongs and to which none used segments belong.
   
   **REST API change:** GET `/druid/coordinator/v1/metadata/datasources`'s flag `includeDisabled` is renamed to `includeUnused`. `includeDisabled` is still accepted, but a warning is emitted in a log. See changes in `DataSourcesResource` class.
   
   FYI added a method `JettyUtils.getQueryParam()` to facilitate such parameter renames.
   
   #### Rename `MetadataSegmentManager` to `MetadataSegments`
   
   To make less aliasing and confusion with `SegmentsManager`, a class used on Historical nodes. Also renamed residual occurrences in variable names of "DatabaseSegmentManager", which is the former name of `MetadataSegmentManager`.
   
   Also, renamed almost all methods in this class for clarity and communicating their action and cost:
    - `enableDataSource` -> `tryMarkAsUsedAllSegmentsInDataSource`
    - `enableSegment` -> `tryMarkSegmentAsUsed`
    - `removeDataSource` -> `tryMarkAsUnusedAllSegmentsInDataSource`
    - `removeSegment` -> `tryMarkSegmentAsUnused`
    - `getDataSource` -> `prepareImmutableDataSourceWithUsedSegments`
    - `getDataSources` -> `prepareImmutableDataSourcesWithAllUsedSegments`
    - `iterateAllSegments` -> `iterateAllUsedSegments`
    - `getAllDataSourceNames` -> `retrieveAllDataSourceNames`
   
   Renaming methods which names start with "get" to names with "prepare" and "retrieve" immediately allowed to identify several places in code where those expensive operations used to be called needlessly and optimize them (e. g, in `DataSourcesResource`).
   
   #### Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error
   
   POST `/druid/coordinator/v1/datasources/{dataSourceName}`
   POST `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
   DELETE `/druid/coordinator/v1/datasources/{dataSourceName}`
   DELETE `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
   
   Now return a server error (500 response code) instead of an empty response if the attempt to mark (all) the corresponding segments in the database as used or unused has failed. It also appears that Druid's consoles (both old and new) actually expected error responses. See changes in `DataSourcesResource.java`, `use-data-source-0.0.1.js` (formerly called `enable-0.0.1.js`), and `datasource-view.tsx`.
   
   See https://github.com/apache/incubator-druid/pull/452#discussion_r266044930.
   
   Because of this change in behaviour of REST API, the PR is labelled `Design Review`.
   
   Also fixed the documentation of
   POST `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
   
   That used to say "Enables (read: marks as 'used') all segments of datasource which are not overshadowed by others." However, actually no action is taken to make that "not overshadowed by others" part to happen.
   
   #### Misc
   
    - Optimized how `DruidCoordinatorLogger` emits segment metrics, where it used to needlessly regroup all segments in streams, see https://github.com/apache/incubator-druid/pull/4320#discussion_r266606780
    - Optimized `PartitionHolder` and probably fixed a bug in `VersionedIntervalTimeline.remove()`
    - Fixed needlessly expensive `ImmutableDruidDataSource.hashCode()`, see https://github.com/apache/incubator-druid/pull/6901#discussion_r265741002.
   
   ### Future work
   
    - Rename `MetadataRuleManager` to `MetadataRules` to align with `MetadataSegments`.
    - Actually make POST `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}` to mark as used only non-overshadowed segments?
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org