You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/25 20:03:43 UTC

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r601741649



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";

Review comment:
       Why do we need this? 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -209,24 +190,22 @@ public DataTableImplV2(ByteBuffer byteBuffer)
     }
   }
 
-  private static String decodeString(DataInputStream dataInputStream)
-      throws IOException {
-    int length = dataInputStream.readInt();
-    if (length == 0) {
-      return StringUtils.EMPTY;
-    } else {
-      byte[] buffer = new byte[length];
-      int numBytesRead = dataInputStream.read(buffer);
-      assert numBytesRead == length;
-      return StringUtil.decodeUtf8(buffer);
-    }
-  }
-
   @Override
   public void addException(ProcessingException processingException) {
     _metadata.put(EXCEPTION_METADATA_KEY + processingException.getErrorCode(), processingException.getMessage());
   }
 
+  @Override
+  public Map<Integer, String> getExceptions() {
+    Map<Integer, String> exceptions = new HashMap<>();
+    for (String key : _metadata.keySet()) {
+      if (key.startsWith(DataTable.EXCEPTION_METADATA_KEY)) {
+        exceptions.put(Integer.parseInt(key.substring(9)), _metadata.get(key));

Review comment:
       what is `9`? Can we have a `static final int` and an example here?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest

Review comment:
       Add an UNKNOWN as the first one (=0). You will then be able to code better around things easier when exceptions are thrown in valueOf() methods.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -263,6 +263,14 @@ public void finishRow()
   }
 
   public DataTable build() {
+    return new DataTableImplV3(_numRows, _dataSchema, _reverseDictionaryMap,
+        _fixedSizeDataByteArrayOutputStream.toByteArray(), _variableSizeDataByteArrayOutputStream.toByteArray());
+  }
+
+  // buildV2() is only used in V2V3Compatibility test

Review comment:
       It may be better to make it configurable on the server to generate either version? (default the config to V3).
   
   Alternative is that we insist that the brokers be upgraded before picking up this feature. That may be ok, but may cause other constraints (e.g. if someone needs a fix on the server that is urgently needed in production, forcing them to pull a newer version of the server).
   

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(
+      TrailerKeys.TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+      TrailerKeys.EXCEPTION_METADATA_KEY,
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.NUM_GROUPS_LIMIT_REACHED_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.TRACE_INFO_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // TrailerKeyToMetadataKeyMap is used to convert enum key to metadata key(string).
+  Map<TrailerKeys, String> TrailerKeyToMetadataKeyMap = ImmutableMap.<TrailerKeys, String>builder()

Review comment:
       Will BiMap work better?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(

Review comment:
       Instead of duplicating keys, can we just union the set?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org