You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/22 02:02:18 UTC
[GitHub] [druid] jihoonson commented on a change in pull request #10288: Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

jihoonson commented on a change in pull request #10288:
URL: https://github.com/apache/druid/pull/10288#discussion_r492438732



##########
File path: core/src/main/java/org/apache/druid/timeline/partition/HashPartitioner.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline.partition;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.data.input.Rows;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+/**
+ * This class is used for hash partitioning during ingestion. The {@link ShardSpecLookup} returned from
+ * {@link #createHashLookup} is used to determine what hash bucket the given input row will belong to.
+ *
+ * Note: this class must be used only for ingestion. For segment pruning at query time,
+ * {@link HashBasedNumberedShardSpec#partitionFunction} should be used instead.

Review comment:
       Hmm, the reason I added this note was that the partition function could be null before and `HashPartitioner` uses a default partition function when it's null unlike in query. So, if it was used in a query, then segment pruning could be done based on a wrong default. However, now, the partition function can never be null in ingestion tasks, `HashPartitioner` doesn't have to be used only in ingestion even though it is for now. I just removed this note.

##########
File path: docs/querying/query-context.md
##########
@@ -58,6 +58,7 @@ These parameters apply to all query types.
 |parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
+|segmentPruning|`true`|Enable segment pruning on the Broker. Segment pruning can be applied to only the segments partitioned by hash or range.|

Review comment:
       Good point. I renamed it as `secondaryPartitionPruning` since we might want to add tertiary partitioning in the future. Also updated the description.

##########
File path: core/src/main/java/org/apache/druid/timeline/partition/HashPartitionFunction.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline.partition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import com.google.common.hash.Hashing;
+import org.apache.druid.java.util.common.StringUtils;
+
+/**
+ * An enum of supported hash partition functions. This enum should be updated when we want to use a new function
+ * for hash partitioning. This function is a part of {@link HashBasedNumberedShardSpec} which is stored
+ * in the metadata store.

Review comment:
       Sounds good. Added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org