You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/30 22:28:06 UTC

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Jackie-Jiang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604464226



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       (nit)
   ```suggestion
       THREAD_CPU_TIME_NS("threadCpuTimeNs");
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       I still suggest associating an id with each key instead of using ordinal of the enum. The convention here should be always increasing the id when adding new keys.
   Id is more flexible than ordinal for the following reasons:
   - Ordinal works as always putting the index key as the id. If by any chance people accidentally change the order of the keys, it will break
   - With id, we can remove keys in a backward-compatible way in two releases if necessary. With ordinal, we have to keep a place holder so that the ordinal for other keys don't change
   
   @mqliang @siddharthteotia @mcvsubbu Thoughts?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       You don't need `Optional` here, but either:
   - Throw exception for invalid id (suggest this way)
   - Return `null` for invalid id

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       ```suggestion
     enum MetadataKey {
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));

Review comment:
       `getOrDefault(name, null)` is the same as `get(name)`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  private static int _version = VERSION_3;

Review comment:
       This should not be hardcoded but from the config

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {

Review comment:
       Same here

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       Rename to `BaseDataTable`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       Suggest making 2 builders, one for v2 and one for v3. You can extract the common logic into a base class, or just duplicate code because we will deprecate v2 in the next release once v3 is well tested

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -96,6 +99,17 @@ public DataTableBuilder(DataSchema dataSchema) {
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);

Review comment:
       This won't be correct because we want to fix the float value size (should be 4 but use 8 bytes in v2)

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       Put this block following the map definition for better readability




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org