You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/11 01:13:48 UTC

[GitHub] [druid] jihoonson commented on a change in pull request #10012: Set the core partition set size properly for batch ingestion with dynamic partitioning

jihoonson commented on a change in pull request #10012:
URL: https://github.com/apache/druid/pull/10012#discussion_r438489638



##########
File path: core/src/main/java/org/apache/druid/timeline/partition/BuildingNumberedShardSpec.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline.partition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.data.input.InputRow;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * This is a special shardSpec which is temporarily used during batch ingestion. In Druid, there is a concept
+ * of core partition set which is a set of segments atomically becoming queryable together in Brokers. The core
+ * partition set is represented as a range of partitionIds. For {@link NumberedShardSpec}, the core partition set
+ * is [0, {@link NumberedShardSpec#partitions}).
+ *
+ * The NumberedShardSpec is used for dynamic partitioning which is based on the number of rows in each segment.
+ * In streaming ingestion, the core partition set size cannot be determined since it's impossible to know how many
+ * segments will be created per time chunk. However, in batch ingestion with time chunk locking, the core partition
+ * set is the set of segments created by an initial task or an overwriting task. Since the core partition set is
+ * determined when the task publishes segments at the end, the task postpones creating proper NumberedShardSpec
+ * until the end.
+ *
+ * This shardSpec is used for such use case. A non-appending batch task can use this shardSpec until it publishes
+ * segments at last. When it publishes segments, it should convert the shardSpec of those segments to NumberedShardSpec.
+ * See {@code SegmentPublisherHelper#annotateShardSpec} for converting to NumberedShardSpec. Note that, when
+ * the segment lock is used, the Overlord coordinates the segment allocation and this class is never used. See
+ * {@link PartialShardSpec} for that case.

Review comment:
       In segment locking, any implementation of `OverwriteShardSpec` can be used when you overwrite segments. Rephrased the last sentence. 

##########
File path: core/src/main/java/org/apache/druid/timeline/partition/PartialShardSpec.java
##########
@@ -29,7 +29,10 @@
 
 /**
  * Class to contain all information of a {@link ShardSpec} except for the partition ID.
- * This class is mainly used by the indexing tasks to allocate new segments using the Overlord.
+ * This class is used when the segment allocation is coordinated by the Overlord; when appending segments to an
+ * existing datasource (either streaming ingestion or batch append) or when using segment locking.
+ * The ingestion tasks send all information required for allocating a new segment using this class and the Overlord
+ * determins the partition ID to create a new segment.

Review comment:
       Fixed, thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org