You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/11 03:50:08 UTC

[GitHub] [druid] imply-cheddar commented on a diff in pull request #12392: Make tombstones ingestible by having them return an empty result set.

imply-cheddar commented on code in PR #12392:
URL: https://github.com/apache/druid/pull/12392#discussion_r846921111


##########
core/src/main/java/org/apache/druid/data/input/InputEntity.java:
##########
@@ -136,4 +136,13 @@ public void close()
   {
     return Predicates.alwaysFalse();
   }
+
+  /**
+   * This is required so that an empty iterator is created for the reader corresponding to a tombstone
+   * @return false if the entity does not come from a tombstone segment, true otherwise
+   */
+  default boolean isFromTombstone()

Review Comment:
   Please do not add this to the interface.  `DruidSegmentReader` and `DruidSegmentInputFormat` pretty much only know how to deal with DruidInputEntity objects anyway (it's gonna have a really hard time reading an ORC based one, for example).  As such, the users can just do an `instanceof` check and then call the concrete method directly.



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -550,16 +552,18 @@ private String createIndexTaskSpecId(int i)
         segmentProvider,
         lockGranularityInUse
     );
+
     final Map<DataSegment, File> segmentFileMap = pair.lhs;
     final List<TimelineObjectHolder<String, DataSegment>> timelineSegments = pair.rhs;
 
     if (timelineSegments.size() == 0) {
       return Collections.emptyList();
     }
 
-    // find metadata for interval
+    // find metadata for intervals with real data segments
     // queryableIndexAndSegments is sorted by the interval of the dataSegment
-    final List<NonnullPair<QueryableIndex, DataSegment>> queryableIndexAndSegments = loadSegments(
+    // Note that this list will contain null QueriableIndex values for tombstones
+    final List<Pair<QueryableIndex, DataSegment>> queryableIndexAndSegments = loadSegments(

Review Comment:
   I haven't checked all of the call sites, so just asking, but given that from looking at this diff, it appears as those you adjusted all of the places that know about this list to start filtering out non-null values, what's the difference between just filtering them out here?
   
   It would've been a lot nicer if there were just noop versions of these classes that could be used so that this logic doesn't have to change at all, but without doing that, filtering seems to be necessary.  But, why filter locally in all of the places instead of just once and reuse?



##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java:
##########
@@ -105,6 +105,29 @@
   @Override
   protected CloseableIterator<Map<String, Object>> intermediateRowIterator() throws IOException
   {
+    if (source.isFromTombstone()) {

Review Comment:
   Maybe don't do this here?  `DruidSegmentInputFormat` can check if the `InputEntity` is a tombstone and return an `DruidTombstoneSegmentReader` which only knows how to return empty stuff



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java:
##########
@@ -426,6 +426,17 @@ private CoordinatorStats doRun(
         if (config.getIoConfig() != null) {
           dropExisting = config.getIoConfig().isDropExisting();
         }
+        // force dropExisting to true if dropExisting is not set,
+        // dropExistingis false,
+        // or all segments to compact are tombstones.
+        // Otherwise, auto-compaction will cycle indefinitely since segments won't be compacted and their
+        // compaction state won't be set

Review Comment:
   This comment is basically documentation but I fear that it is missing enough words to fully explain why we are forcing this setting even if the cluster wasn't configured to do this.  To someone without context, I fear they will get lost in the "if drop Existing is not set, dropExisting is false, all segments are tombstones" wondering what that means.   Maybe re-write as prose explaining the situation that this is attempting to resolve.  Something like
   
   > if drop Existing is not set, dropExisting is false, all segments are tombstones, then it is possible for auto-compaction to cycle indefinitely because XYZ.  Thus, we force dropExisting to be true to work around this.
   
   Then also throw in an extra note that this is working around a limitation in compaction and that we should look at how to make compaction be able to make progress even in cases like the one described here and when/if that is done, this workaround could be revisited.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org