You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/01 01:34:45 UTC

[GitHub] [incubator-druid] jon-wei commented on a change in pull request #7595: Optimize overshadowed segments computation

jon-wei commented on a change in pull request #7595: Optimize overshadowed segments computation
URL: https://github.com/apache/incubator-druid/pull/7595#discussion_r289587119
 
 

 ##########
 File path: server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
 ##########
 @@ -502,18 +507,18 @@ public boolean removeSegment(SegmentId segmentId)
   {
     try {
       final boolean removed = removeSegmentFromTable(segmentId.toString());
-      Optional.ofNullable(dataSources).ifPresent(
-          m ->
-              m.computeIfPresent(
-                  segmentId.getDataSource(),
-                  (dsName, dataSource) -> {
-                    dataSource.removeSegment(segmentId);
-                    // Returning null from the lambda here makes the ConcurrentHashMap to remove the current entry.
-                    //noinspection ReturnOfNull
-                    return dataSource.isEmpty() ? null : dataSource;
-                  }
-              )
-      );
+      if (dataSourcesSnapshot != null) {
+        final Map<String, ImmutableDruidDataSource> dataSourcesMap = dataSourcesSnapshot.getDataSourcesMap();
 
 Review comment:
   The addition of overshadowed computation makes the snapshot invalidation/update an expensive operation (suppose a user issues a lot of single segment remove calls in rapid succession), I'm thinking the following:
   - For 0.15.0 release, remove the behavior where the snapshot is being updated outside of poll(): these updates were primarily for user experience (so if someone removes a segment, they would immediately see that reflected in MetadataResource API calls), but the coordinator loop was operating on datasources/available segments snapshots that are not updated by removeDatasource/removeSegment, so removing that update behavior would not break the coordinator loop.
   - Adding the complexity of a system for reducing the performance impact of repeated invalidating operations is too risky this close to 0.15.0 release IMO, and I think the potential performance degradation outweighs the user experience benefit of updating the snapshot outside of poll()
   - After 0.15.0, if we want the snapshot to more rapidly reflect changes caused by operations outside of the scheduled poll, I think it makes sense to look into that after the on-demand polling changes from https://github.com/apache/incubator-druid/pull/7653 are merged.
   
   Does that sound reasonable? @surekhasaharan @leventov @gianm 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org