You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/04 16:45:40 UTC

[GitHub] [incubator-pinot] sajjad-moradi commented on a change in pull request #6869: Add segment size rule to Recommendation Engine

sajjad-moradi commented on a change in pull request #6869:
URL: https://github.com/apache/incubator-pinot/pull/6869#discussion_r625940064



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/io/InputManager.java
##########
@@ -169,6 +172,33 @@ private void regulateCardinalityForAll() {
     });
   }
 

Review comment:
       Done.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/RulesToExecute.java
##########
@@ -45,6 +46,8 @@
   public static class RuleFactory {
     public static AbstractRule getRule(Rule rule, InputManager inputManager, ConfigManager outputManager) {
       switch (rule) {
+        case SegmentSizeRule:

Review comment:
       Not in the switch statement. The order of rules is defined in `Rule` enum. In `RecommenderDriver.run()`, it iterates over `Rule.values()` which returns the enum values in the defined order.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/RulesToExecute.java
##########
@@ -160,9 +169,14 @@ public boolean isRecommendRealtimeProvisioning() {
     return _recommendRealtimeProvisioning;
   }
 
+  public boolean isRecommendSegmentSize() {
+    return _recommendSegmentSize;
+  }
+
   // Be careful with the sequence, each rule can execute individually
   // but a rule may depend on its previous rule when they both fired
   public enum Rule {
+    SegmentSizeRule, // This rule must be the first rule. It provides segment count, segment size, numRows in segments which are used in other rules

Review comment:
       Sure.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/io/params/SegmentSizeRuleParams.java
##########
@@ -0,0 +1,83 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.io.params;
+
+import com.fasterxml.jackson.annotation.JsonSetter;
+import com.fasterxml.jackson.annotation.Nulls;
+
+import static org.apache.pinot.controller.recommender.rules.io.params.RecommenderConstants.SegmentSizeRule.*;
+
+
+/**
+ * Parameters used in SegmentSizeRule
+ */
+public class SegmentSizeRuleParams {
+
+  // Desired segment size in MB
+  private int desiredSegmentSizeMb = DEFAULT_DESIRED_SEGMENT_SIZE_MB;

Review comment:
       Good suggestion. Sure, it's deleted now.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/impl/SegmentSizeRule.java
##########
@@ -0,0 +1,142 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.impl;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.controller.recommender.exceptions.InvalidInputException;
+import org.apache.pinot.controller.recommender.io.ConfigManager;
+import org.apache.pinot.controller.recommender.io.InputManager;
+import org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator;
+import org.apache.pinot.controller.recommender.rules.AbstractRule;
+import org.apache.pinot.controller.recommender.rules.io.configs.SegmentSizeRecommendations;
+import org.apache.pinot.controller.recommender.rules.io.params.SegmentSizeRuleParams;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+
+import static org.apache.pinot.controller.recommender.rules.io.params.RecommenderConstants.SegmentSizeRule.NOT_PROVIDED;
+
+
+/**
+ * This rule generates a segment based on the provided schema characteristics and then recommends the followings
+ * using the size and number of records in the generated segments:
+ *   - number of segments
+ *   - number of records in each segment
+ *   - size of each segment
+ */
+public class SegmentSizeRule extends AbstractRule {
+
+  static final int MEGA_BYTE = 1024 * 1024;
+
+  public SegmentSizeRule(InputManager input, ConfigManager output) {
+    super(input, output);
+  }
+
+  @Override
+  public void run()
+      throws InvalidInputException {
+
+    if (_input.getTableType().equalsIgnoreCase("REALTIME")) {
+      // no need to estimate segment size & optimal number of segments for realtime only tables;
+      // RT Provisioning Rule will have a comprehensive analysis on that
+      return;
+    }
+
+    long segmentSize;
+    int numRows;
+    SegmentSizeRuleParams segmentSizeRuleParams = _input.getSegmentSizeRuleParams();
+    if (segmentSizeRuleParams.getActualSegmentSizeMB() == NOT_PROVIDED
+        && segmentSizeRuleParams.getNumRowsInActualSegment() == NOT_PROVIDED) {
+
+      // generate a segment
+      TableConfig tableConfig = createTableConfig(_input.getSchema());
+      int numRowsInGeneratedSegment = segmentSizeRuleParams.getNumRowsInGeneratedSegment();
+      File generatedSegmentDir =
+          new MemoryEstimator.SegmentGenerator(_input._schemaWithMetaData, _input._schema, tableConfig,
+              numRowsInGeneratedSegment, true).generate();
+      segmentSize = FileUtils.sizeOfDirectory(generatedSegmentDir);
+      numRows = numRowsInGeneratedSegment;
+
+      // cleanup
+      try {
+        FileUtils.deleteDirectory(generatedSegmentDir);
+      } catch (IOException e) {
+        throw new RuntimeException("Cannot delete the generated segment directory", e);
+      }
+    } else {
+      segmentSize = segmentSizeRuleParams.getActualSegmentSizeMB() * MEGA_BYTE;
+      numRows = segmentSizeRuleParams.getNumRowsInActualSegment();
+    }
+
+    // estimate optimal segment count & size parameters
+    SegmentSizeRecommendations params =
+        estimate(segmentSize, segmentSizeRuleParams.getDesiredSegmentSizeMb() * MEGA_BYTE, numRows,
+            _input.getNumRecordsPerPush());
+
+    // wire the recommendations
+    _output.setSegmentSizeRecommendations(params);
+    _input.capCardinalities((int) params.getNumRows());

Review comment:
       Done.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/impl/SegmentSizeRule.java
##########
@@ -0,0 +1,142 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.impl;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.controller.recommender.exceptions.InvalidInputException;
+import org.apache.pinot.controller.recommender.io.ConfigManager;
+import org.apache.pinot.controller.recommender.io.InputManager;
+import org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator;
+import org.apache.pinot.controller.recommender.rules.AbstractRule;
+import org.apache.pinot.controller.recommender.rules.io.configs.SegmentSizeRecommendations;
+import org.apache.pinot.controller.recommender.rules.io.params.SegmentSizeRuleParams;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+
+import static org.apache.pinot.controller.recommender.rules.io.params.RecommenderConstants.SegmentSizeRule.NOT_PROVIDED;
+
+
+/**
+ * This rule generates a segment based on the provided schema characteristics and then recommends the followings
+ * using the size and number of records in the generated segments:
+ *   - number of segments
+ *   - number of records in each segment
+ *   - size of each segment

Review comment:
       I just added that. Let me know if it's missing something.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/io/configs/SegmentSizeRecommendations.java
##########
@@ -0,0 +1,64 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.io.configs;
+
+
+/**
+ * The recommendations proposed by SegmentSizeRule
+ */
+public class SegmentSizeRecommendations {
+
+  private long numRows;
+  private long numSegments;
+  private long segmentSize;

Review comment:
       The size is in bytes and it's being used in PartitionRule. If I change that, then I have to divide by MEGA_BYTE factor when it's being estimated and then multiply by MEGA_BYTE for partition rule. This round trip rounding reduces its accuracy. So if you don't hate the name, let me keep it as is.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/io/configs/SegmentSizeRecommendations.java
##########
@@ -0,0 +1,64 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.io.configs;
+
+
+/**
+ * The recommendations proposed by SegmentSizeRule
+ */
+public class SegmentSizeRecommendations {
+
+  private long numRows;

Review comment:
       Done.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/io/params/RecommenderConstants.java
##########
@@ -89,6 +90,13 @@
     public static final int[] DEFAULT_NUM_HOSTS = {3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
   }
 
+  public static class SegmentSizeRule {
+    public static final int DEFAULT_NUM_SEGMENTS = 1;
+    public static final int DEFAULT_DESIRED_SEGMENT_SIZE_MB = 500;
+    public static final int DEFAULT_NUM_ROWS_IN_GENERATED_SEGMENT = 50_000;
+    public static final int NOT_PROVIDED = -1;

Review comment:
       Please refer the next comment.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/io/params/SegmentSizeRuleParams.java
##########
@@ -0,0 +1,83 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.controller.recommender.rules.io.params;
+
+import com.fasterxml.jackson.annotation.JsonSetter;
+import com.fasterxml.jackson.annotation.Nulls;
+
+import static org.apache.pinot.controller.recommender.rules.io.params.RecommenderConstants.SegmentSizeRule.*;
+
+
+/**
+ * Parameters used in SegmentSizeRule
+ */
+public class SegmentSizeRuleParams {
+
+  // Desired segment size in MB
+  private int desiredSegmentSizeMb = DEFAULT_DESIRED_SEGMENT_SIZE_MB;
+
+  // Number for rows in the generated segment
+  private int numRowsInGeneratedSegment = DEFAULT_NUM_ROWS_IN_GENERATED_SEGMENT;
+
+  // Actual segment size in MB
+  private int actualSegmentSizeMB = NOT_PROVIDED;
+
+  // Number of rows in the actual segment
+  private int numRowsInActualSegment = NOT_PROVIDED;
+

Review comment:
       Doesn't it read better: `numRowsInActualSegment NOT PROVIDED`, instead of `numRowsInActualSegment = 0`?

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/impl/SegmentSizeRule.java
##########
@@ -85,7 +85,12 @@ public void run()
   }
 
   /**
-   * Estimate optimal segment size parameters
+   * Estimate segment size parameters by extrapolation based on the number of records and size of the generated segment.
+   * The linear extrapolation used here is not optimal because of columnar way of storing data and usage of different
+   * indices. Another way would be to iteratively generate new segments with expected number of rows until the ideal
+   * segment is found, but that's costly because of the time it takes to generate segments. Although the extrapolation
+   * approach seems to be less accurate, it is chosen due to its performance.

Review comment:
       I actually used JMH to measure the time and find a good candidate that doesn't take much time. It depends on the cardinalities and number of columns, but in average for numRows = 50K  and a 14-15 columns it takes one second. Generating an actual size segment is definitely taking more time and I don't think it's going to be a good user experience.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org