You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/29 05:53:05 UTC

[GitHub] [incubator-pinot] mqliang opened a new pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

mqliang opened a new pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   ## Description
   
   This PR:
   * Add a positional data section to the tail of data table, bump up data table version to V3
   * Data in the positional data section is supposed to be key/value pairs, and data are supposed to be positional(value of a given key is locatable even after serialization), so use `String[]` to store keys and use `enum` to store keys.
   * Currently we only have one KV pair (response_serialization_cost) in positional data section. But if we add more KV pairs, we can add some utility function such as `getOffsetForValueOfGivenKey()` to locate the value of given key.
   * measure data table serialization cost on server and put the cost in the positional data section.
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release.
   
   If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text
   
   ## Documentation
   If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531296



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603636428



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),

Review comment:
       Since the server will always send the aggregated (execution + data table serialization + whatever we add and instrument in the future), the name of key should be changed. Right now it is related to execution part. I suggest changing it to simply **`threadCpuTimeNs`** to indicate that this reflects the entire cpu time measured in nanoseconds on the server. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809727115

@siddharthteotia @mcvsubbu Forced updated to delete all stale changes. PR is ready for review.

> We can emit both costs separately. So have 2 server gauges. Similarly, log them in the QueryScheduler separately. But the serialized cost in the DataTable should be a single value (sum total of both exec cpu time cost and serialization cpu time cost)

This implementation is cumbersome -- either need serialize data table twice, or need hack the bytes to do some content replacement: We first add `executionThreadCpuTimeNs` into metadata when execute query, then add `serializationCpuTimeNs` into metadata when serialize data table, log and emit emit both costs separately. Then sum them together, remove them from metadata and add a `totalThreadCpuTimeNs` into metadata, then we need serialize datatable again to send back to broker.

My implementation log and emit a sum metric only, which is much simpler and easy to read: in `datatable.toBytes()`, before serialize metadata, update the value of "executionThreadCpuTimeNs" to account data table serialization time.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603459656



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();
+  }
+
+  /**
+   * Serialize trailer section to bytes.
+   * Format of the bytes looks:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pairs:
+   * - if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
+   * - if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
+   */
+  private byte[] serializeTrailer()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    int offset = 0;
+    dataOutputStream.writeInt(_trailer.size());
+    offset += Integer.BYTES;
+    for (Map.Entry<TrailerKeys, String> entry : _trailer.entrySet()) {
+      TrailerKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      offset += Integer.BYTES;
+      if (key == TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY) {
+        _responseSerializationCpuTimeNsValueOffset += offset;
+      }
+      if (IntValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Ints.toByteArray(Integer.parseInt(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else if (LongValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Longs.toByteArray(Long.parseLong(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+        offset += Integer.BYTES + valueBytes.length;
+      }
+    }
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<TrailerKeys, String> deserializeTrailer(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numEntries = dataInputStream.readInt();
+      Map<TrailerKeys, String> trailer = new TreeMap<>();
+      for (int i = 0; i < numEntries; i++) {
+        int ordinal = dataInputStream.readInt();
+        TrailerKeys key = TrailerKeys.values()[ordinal];

Review comment:
       We should NEVER throw exception on unknown keys. Please change it so that we always ignore unknown keys. (This has nothing to do with key removal, and let us not bring up key removal again. Key removal is not allowed. Period).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

codecov-io edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804528996


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=h1) Report
   > Merging [#6710](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=desc) (cfa16d8) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.39%`.
   > The diff coverage is `62.64%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6710/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6710      +/-   ##
   ==========================================
   - Coverage   66.44%   66.05%   -0.40%     
   ==========================================
     Files        1075     1398     +323     
     Lines       54773    68193   +13420     
     Branches     8168     9857    +1689     
   ==========================================
   + Hits        36396    45043    +8647     
   - Misses      15700    19947    +4247     
   - Partials     2677     3203     +526     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `66.05% <62.64%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1288 more](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=footer). Last update [27b61fe...cfa16d8](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806028103


   @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   * We are able to add new key into the enum, without bumping up version
   * We are able to not include a key into trailer, without bumping up version
   * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   
   Namely, say we now have three keys:
   ```
   // old version:
   enum {
       key1,
       key2,
       key3,
   }
   ```
   Now if we remove key2 from the enum since it's no longer been used.
   ```
   // new version
   enum {
       key1,
       key3,
   }
   ```
   Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   
   So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   
   However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearranging sections in datatable.
   
   Let's take a step back to what we wanner solve:
   * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization. 
   * we wanner keep back-comp
   
   To add serialization_cost to datatable after serialization, basically we have two options:
   * append it to the end of bytes. 
   * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   
   So, here is another approach:
   * don't add a trailer section
   * put serialization_cost into metedata
   * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`. 
   * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost. 
   
   ```
   
   bytes[] bytes;
   int serialization_cost_value_start_offset;
   
   offset = 0;
   for (String key: metadata.keySet()) {
         keybytes[] = to-utf8(key);
         bytes.append(keybytes.length())
         bytes.append(keybytes)
   
         offset += 4;
         offset += keybytes.length
   
   
         valuebytes[]
         if (key.equals("erialization_cost")) {
               serialization_cost_value_start_offset = offset;
               valuebytes = toBigEndian(value);
         } else {
               valuebytes = to-utf8(value);
         }
   
         bytes.append(valuebytes.length())
         bytes.append(valuebytes)
   
         offset += 4;
         offset += keybytes.length
   }
   
   ```
   
   So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   ```
   offset = metadataStartOffset+serialization_cost_value_start_offset
   bytes[offset:offset+8] = toBigEndian(actualValue)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604527967



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604486636



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       We plan to remove all V2 logic after the next release. So, we can keep re-factors and beautifications to a minimum. Please do only what is necessary because all of V2 logic will disappear and someone looking at the code will wonder why we have a base class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531810



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       The code was put here by Intellj reformatting. I'd suggest keep it here, since assume some change this file, and run IntellJ reformat before commit, it will be moved to here anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603696893



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {

Review comment:
       Got it. Thanks for clarifying. We can continue to use _name as defined in the above implementation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809727115

@siddharthteotia @mcvsubbu Forced updated to delete all stale changes. PR is ready for review.

This implementation is cumbersome. We first add `executionThreadCpuTimeNs` into metadata when execute query, then add `serializationCpuTimeNs` into metadata when serialize data table, log and emit emit both costs separately. Then sum them together, remove them from metadata and add a `totalThreadCpuTimeNs` into metadata, then we need serialize datatable again to send back to broker.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604379681



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       I think all these places are constructing empty data table on the server right?
   I think we should replace these with DataTableUtils.buildEmptyDataTable() to properly build an empty data table. Secondly, since DataTableUtils internally uses DataTableBuilder which is aware of the version so it will build an empty table based on V2 or V3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603634110



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN(0, "unknown"),
+    TABLE_KEY(1, "table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY(2, "Exception"),
+    NUM_DOCS_SCANNED_METADATA_KEY(3, "numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY(4, "numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY(5, "numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED(6, "numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED(7, "numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED(8, "numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED(9, "numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS(10, "minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS_METADATA_KEY(11, "totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED_KEY(12, "numGroupsLimitReached"),
+    TIME_USED_MS_METADATA_KEY(13, "timeUsedMs"),
+    TRACE_INFO_METADATA_KEY(14, "traceInfo"),
+    REQUEST_ID_METADATA_KEY(15, "requestId"),
+    NUM_RESIZES_METADATA_KEY(16, "numResizes"),
+    RESIZE_TIME_MS_METADATA_KEY(17, "resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY(18, "executionThreadCpuTimeNs"),
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY(19, "responseSerializationCpuTimeNs"),
+    ;
+
+    private static final Map<Integer, MetadataKeys> _idToEnumKeyMap = new HashMap<>();
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueTrailerKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet.of(
+        MetadataKeys.NUM_SEGMENTS_QUERIED,
+        MetadataKeys.NUM_SEGMENTS_PROCESSED,
+        MetadataKeys.NUM_SEGMENTS_MATCHED,
+        MetadataKeys.NUM_RESIZES_METADATA_KEY,
+        MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+        MetadataKeys.NUM_RESIZES_METADATA_KEY
+    );
+    // _longValueTrailerKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueTrailerKeys = ImmutableSet.of(

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604360620



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";
+    public static final int DEFAULT_CURRENT_DATA_TABLE_VERSION = 3;

Review comment:
       Got it. Thanks for confirming




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605222747



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       How about let's use enum at this moment. We can discuss more, if we decide to associate an id with each key later on, as long as we associate the first key with 0, second with 1, third key with 3...The bytes send on wire will not change. We can address it in a separate PR, it's just some code level change, will not change any payloads.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806232651


   @mcvsubbu Ready for another round of review. 
   
   commit of "implement datatable V3":
   
   * Add DataTableImplV3, compared with V2:
       * V3 has a trailer section, at the end of datatable
       * V3 don't have metadata sections, all KV pairs are put into trailer section
       * V3 has an exceptions section in the middle of datatable. V2 use meta data to store exceptions (use 
         `"Exception"+errCode` as key). In V3, all key are enum value, which must be defined statically, we can not use  
         `"Exception"+errCode` to create new keys, so use a dedicate section to store exceptions
    
   * Although metadata section has been removed in V3, there are many existing code use `dataTable.getMetadata().get("key")/dataTable.getMetadata().set("key", "value")` to set/get metadata KV pairs, to provide the same interface with V2, V3 also implement the `getMetadata()` method. When serialize, move all metadata into trailer section; when deserialize, move all metadata KV pair trailer section to matedata map.
   * When serialize the trailer section, for each KV pairs:
      *  if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
      *  if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
     
   To make review easier, will @you at where V3 is different with V2.
   
   commit of "add responseSerializationCpuTimeNs measurement":
      * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603688817



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;

Review comment:
       So this is needed to comply with existing interface **`Map<String, String> getMetadata();`** right and keep its callers happy for now ?
   
   I don't think there is any way of resolving this TODO in a clean manner. All the callers of getMetadata API will have to be changed and the code will become conditional/ugly since we will have to support both V2 and V3 and the structure returned by API will be different.
   
   So, I suggest to not worry about it at all and for internal in-memory processing of metadata get/put always use `Map<String, String> _metadata`
   
   Here is what we can do
   
   - Remove `private final Map<MetadataKeys, String> _metadata;`
   - Replace `private final Map<String, String> _metadataV2` with `private final Map<String, String> _metadata`;
   - getMetadata() will continue to return _metadata
   - Before serialization you are anyway converting String key to MetadataKeys enum by copying all KV pairs. You don't do any copy. Just convert from String to MetadataKeys enum in the `serializeMetadata` function itself before you generate the wire payload. We won't need to copy pairs from one map to another.
   - During deserialization on the broker, you deserialize MetadataKeys based format and convert it into String, String for processing. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804596719


   With this PR, we can resolve a couple of TODOs introduced in PR https://github.com/apache/incubator-pinot/pull/6680/
   
   - Expose the serialization time through an API at the DataTable level and log it in [QueryScheduler](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR222). You need to serialize before the logging line. Currently it is after.
   - Revisit [this](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR255). The execution cpu time is not yet serialized as part of metadata. May be we can just remove line 258. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

codecov-io edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804528996


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=h1) Report
   > Merging [#6710](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=desc) (636ec0e) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/8dbb70ba08daf90f5e9067fcec545203ffefe215?el=desc) (8dbb70b) will **decrease** coverage by `8.00%`.
   > The diff coverage is `83.86%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6710/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6710      +/-   ##
   ==========================================
   - Coverage   73.83%   65.82%   -8.01%     
   ==========================================
     Files        1396     1405       +9     
     Lines       67765    68161     +396     
     Branches     9807     9853      +46     
   ==========================================
   - Hits        50035    44870    -5165     
   - Misses      14485    20100    +5615     
   + Partials     3245     3191      -54     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.82% <83.86%> (-0.18%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...org/apache/pinot/common/utils/CommonConstants.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvQ29tbW9uQ29uc3RhbnRzLmphdmE=) | `21.15% <ø> (-13.47%)` | :arrow_down: |
   | [...e/pinot/core/common/datatable/DataTableImplV2.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMi5qYXZh) | `0.00% <0.00%> (-89.46%)` | :arrow_down: |
   | [...core/operator/blocks/IntermediateResultsBlock.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9ibG9ja3MvSW50ZXJtZWRpYXRlUmVzdWx0c0Jsb2NrLmphdmE=) | `76.21% <0.00%> (-5.41%)` | :arrow_down: |
   | [...core/query/executor/ServerQueryExecutorV1Impl.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9leGVjdXRvci9TZXJ2ZXJRdWVyeUV4ZWN1dG9yVjFJbXBsLmphdmE=) | `46.19% <0.00%> (-33.70%)` | :arrow_down: |
   | [...e/pinot/core/transport/InstanceRequestHandler.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvSW5zdGFuY2VSZXF1ZXN0SGFuZGxlci5qYXZh) | `55.88% <0.00%> (-22.06%)` | :arrow_down: |
   | [...pinot/server/starter/helix/HelixServerStarter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvc3RhcnRlci9oZWxpeC9IZWxpeFNlcnZlclN0YXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-51.99%)` | :arrow_down: |
   | [...e/pinot/core/query/reduce/BrokerReduceService.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZWR1Y2UvQnJva2VyUmVkdWNlU2VydmljZS5qYXZh) | `68.54% <33.33%> (-25.81%)` | :arrow_down: |
   | [...che/pinot/core/query/scheduler/QueryScheduler.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9zY2hlZHVsZXIvUXVlcnlTY2hlZHVsZXIuamF2YQ==) | `68.96% <66.66%> (-13.09%)` | :arrow_down: |
   | [...pinot/core/common/datatable/DataTableImplBase.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxCYXNlLmphdmE=) | `76.53% <76.53%> (ø)` | |
   | [.../pinot/core/common/datatable/DataTableBuilder.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUJ1aWxkZXIuamF2YQ==) | `86.72% <85.71%> (-0.32%)` | :arrow_down: |
   | ... and [366 more](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=footer). Last update [8dbb70b...636ec0e](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604447733



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION_3);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "threadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // TODO: currently log/emit a total thread cpu time for query execution time and data table serialization time.
+    //  Figure out a way to log/emit separately. Probably via providing an API on the DataTable to get/set query
+    //  context, which is supposed to be used at server side only.
+    long threadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(THREAD_CPU_TIME_NS.getName(), String.valueOf(threadCpuTimeNs));
+
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]

Review comment:
       Oh, actually the length of metadata section is written outside of this function, it's write by the caller. So the description of `[numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]` here is correct. Has add comments at caller to highlight the length writing logic.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,322 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+
+public abstract class DataTableImplBase implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public DataTableImplBase(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,11 +94,16 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    _version = VERSION_3;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603639040



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {

Review comment:
       name is not needed. Java enum already has a name (CAMEL case of the constant)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603634009



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableFactory.java
##########
@@ -32,7 +34,9 @@ public static DataTable getDataTable(ByteBuffer byteBuffer)
     int version = byteBuffer.getInt();
     switch (version) {
       case 2:
-        return new DataTableImplV2(byteBuffer);
+        return convertDataTableImplV2ToV3(new DataTableImplV2(byteBuffer));
+      case 3:

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604385506



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()

Review comment:
       let's also add test cases for
   
   - v3 data table sent by server is empty
   - v3 data table sent by server has metadata length as 0 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r600923547



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)

Review comment:
       @mcvsubbu V3 has a dedicate exceptions section to store exceptions. The reason is in V3, all key are enum value, which must be defined statically, we can not use "Exception"+errCode to create new keys




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603655876



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       V2 construct key for exception using "Exception"+errCode, e.g. Exception404, Exception500, then put into metadata. In V2 it's OK since V2 metadata key is String,  but in V3 it's impossible -- all key in V3 must be defined in enum statically. We can not construct a key dynamically 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604375714



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,322 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+
+public abstract class DataTableImplBase implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public DataTableImplBase(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,

Review comment:
       Please add javadoc

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,11 +94,16 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    _version = VERSION_3;

Review comment:
       This is not really needed since you already define it at line# 82




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604475304



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  private static int _version = VERSION_3;

Review comment:
       We have a `setCurrentDataTableVersion` static function to set versions, which is called in `HelixServerStarter`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603647761



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,9 +107,17 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    CURRENT_VERSION = VERSION_3;

Review comment:
       This doesn't look clean. CURRENT_VERSION should not be assigned in the constructor. It is a static final set to the current version # (3 for this PR).
   
   Have a variable **`private static int _version = VERSION_3`** and assign that in the function  **`setCurrentDataTableVersion(int version)`** so that it can be overridden with the configured value.
   
   Now in the builder, you check if _version is VERSION_2, call DataTableImplV2 else if _version is VERSION_3, call DataTableImplV3
   
   This way you also remove CURRENT_VERSION and just keep VERSION_2, VERSION_3 and _version
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809612014


   > With this PR, we should resolve a couple of TODOs introduced in PR #6680
   > 
   > * Expose the serialization time through an API at the DataTable level and log it in [QueryScheduler](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR222). You need to serialize before the logging line. Currently it is after.
   > * Revisit [this](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR255). The execution cpu time is not yet serialized as part of metadata. May be we can just remove line 258.
   
   We can emit both costs separately. So have 2 server gauges. Similarly, log them in the QueryScheduler separately. But the serialized cost in the DataTable should be a single value (sum total of both exec cpu time cost and serialization cpu time cost)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806259380


   > > @siddharthteotia , @mqliang and I met, and agreed on the following (I have added some extras, so take a look)
   > > 
   > > * We will move the metadata to the trailer, retain the other elements in the same order.
   > > * We will encode the trailer as
   > > * = (int, int, blob)+
   > > * The first int is the enum ordinal, second int is the length of the blob, the third part is utf8 encoding of a string, or int/long as dictated by the enum. If int/long, then we will encode in network byte order (big-endian). Alternative is to convert it to a string.
   > 
   > Not sure which option @siddharthteotia agrees with, but the alternatives are something like:
   > `7, 8, "12609856"` (8 byte string for a number)
   > vs
   > `7, 4, 12609856` (4-byte integer for a number)
   > 
   > Maybe we can decide based on what looks easier in code.
   
   @mcvsubbu I agree with the big endian approach in case when the value/blob part itself is fixed with int or long


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603647963



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -263,7 +271,9 @@ public void finishRow()
   }
 
   public DataTable build() {
-    return new DataTableImplV2(_numRows, _dataSchema, _reverseDictionaryMap,
-        _fixedSizeDataByteArrayOutputStream.toByteArray(), _variableSizeDataByteArrayOutputStream.toByteArray());
+    return CURRENT_VERSION == VERSION_2 ? new DataTableImplV2(_numRows, _dataSchema, _reverseDictionaryMap,

Review comment:
       See the previous comment - Use _version




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603667157



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {

Review comment:
       The build-in `enumKey.name()` will return  a string exactly the same as the constant: 
   For example: 
   ```
   System.out.println(MetadataKeys.TOTAL_DOCS.name());
   ```
   will output:
   ```
   TOTAL_DOCS
   ```
   
   However, the corresponding string should be "totalDocs". If we wanner use the build-in `enumKey.name()`, we should define the metadata key constant as `totalDocs`. I am fine with that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599202507



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -61,12 +65,15 @@
   private final byte[] _variableSizeDataBytes;
   private final ByteBuffer _variableSizeData;
   private final Map<String, String> _metadata;
+  // Only V3 has _positionalData
+  private final String[] _positionalData;

Review comment:
       Also update the javadocs in class DataTableBuilder because that's where the structure of the file is listed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606007420



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;

Review comment:
       (Code style) Avoid using static import. Same for other non-test files

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;

Review comment:
       (nit)
   ```suggestion
         _name = name;
         _valueType = valueType;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  protected byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(dictionaryMap.size());
+    for (Map.Entry<String, Map<Integer, String>> dictionaryMapEntry : dictionaryMap.entrySet()) {
+      String columnName = dictionaryMapEntry.getKey();
+      Map<Integer, String> dictionary = dictionaryMapEntry.getValue();
+      byte[] bytes = StringUtil.encodeUtf8(columnName);
+      dataOutputStream.writeInt(bytes.length);
+      dataOutputStream.write(bytes);
+      dataOutputStream.writeInt(dictionary.size());
+
+      for (Map.Entry<Integer, String> dictionaryEntry : dictionary.entrySet()) {
+        dataOutputStream.writeInt(dictionaryEntry.getKey());
+        byte[] valueBytes = StringUtil.encodeUtf8(dictionaryEntry.getValue());
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Helper method to deserialize dictionary map.
+   */
+  protected Map<String, Map<Integer, String>> deserializeDictionaryMap(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numDictionaries = dataInputStream.readInt();
+      Map<String, Map<Integer, String>> dictionaryMap = new HashMap<>(numDictionaries);
+
+      for (int i = 0; i < numDictionaries; i++) {
+        String column = decodeString(dataInputStream);
+        int dictionarySize = dataInputStream.readInt();
+        Map<Integer, String> dictionary = new HashMap<>(dictionarySize);
+        for (int j = 0; j < dictionarySize; j++) {
+          int key = dataInputStream.readInt();
+          String value = decodeString(dataInputStream);
+          dictionary.put(key, value);
+        }
+        dictionaryMap.put(column, dictionary);
+      }
+
+      return dictionaryMap;
+    }
+  }
+
+  public Map<String, String> getMetadata() {

Review comment:
       Put override annotation over these classes that implements the interface

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal or null if the key does not exist.
+    public static MetadataKey getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKey.values().length) {
+        return null;
+      }
+      return MetadataKey.values()[ordinal];
+    }
+
+    // getByName returns an enum key for a given name or null if the key does not exist.
+    public static MetadataKey getByName(String name) {
+      return _nameToEnumKeyMap.getOrDefault(name, null);
+    }

Review comment:
       ```suggestion
       @Nullable
       public static MetadataKey getByName(String name) {
         return _nameToEnumKeyMap.get(name);
       }
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal or null if the key does not exist.
+    public static MetadataKey getByOrdinal(int ordinal) {

Review comment:
       ```suggestion
       @Nullable
       public static MetadataKey getByOrdinal(int ordinal) {
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  protected byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)

Review comment:
       No need to have the argument. It always serializes the `_dictionaryMap`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603648893



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableFactory.java
##########
@@ -31,8 +35,10 @@ public static DataTable getDataTable(ByteBuffer byteBuffer)
       throws IOException {
     int version = byteBuffer.getInt();
     switch (version) {
-      case 2:
-        return new DataTableImplV2(byteBuffer);
+      case VERSION_2:
+        return convertDataTableImplV2ToV3(new DataTableImplV2(byteBuffer));

Review comment:
       Since the implementation between V2 and V3 is decoupled with common parts abstracted into DataTableUtils, why do we need a converter ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-810385539


   I have labelled it as backward-incompat and release-notes. Please add appropriate checkin comments mentioning that this change will be backward incompat if servers are upgraded first, so brokers must be upgraded before servers.
   Also mention that the compatibility of the protocols will not be retained beyond 0.8.0 (or the next version that is released), create an issue so that we remove all V2 protocol code after 0.8.0 is released.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809610152


   > @mcvsubbu @siddharthteotia and me meet offline, we wanner keep this PR focus on bumping up to v3 and move metadata to the end of data table, also use enmu ordinal as key when serialize. And make it configurable to send V2/V3 data at server side (instance config).
   > 
   > @Jackie-Jiang I terms of addressing the TODO in DataTableBuilder(fix float data length, one String->Int map for the whole table instead of for each column), we will address it separately(bumping up to V4).
   
   Yes, the enum based original approach is much simpler and the enum name() gives the string key and the enum ordinal gives the id. The approach of using explicit id and name is not needed since we want the enum structure to always grow - users should not be allowed to remove enums or change name of enums. Should only be adding new enums to the end. 
   
   For the existing TODOs, the main reason for not addressing them in this PR is to keep this as simple as possible. It is easier  to review and once this change goes in production and if there are any issues, we don't have to debug multiple independent changes. We can always address those TODOs in a follow-up and if they demand version change, we can bump the version again.
   
   Regarding introducing a instance config for version, we agreed that CURRENT_VERSION will be 3 and server will always send the data table with this version. @mcvsubbu  suggested that it can be helpful to force server to send the old version v2 in case there are issues with v3 forcing a rollback but we want to use the latest code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605324977



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),

Review comment:
       Is UNKNOWN really needed? Can we get rid of it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605993528



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs");
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKey contains all metadata keys which has value of int type.
+    private static final Set<MetadataKey> _intValueMetadataKey = ImmutableSet
+        .of(MetadataKey.NUM_SEGMENTS_QUERIED, MetadataKey.NUM_SEGMENTS_PROCESSED, MetadataKey.NUM_SEGMENTS_MATCHED,
+            MetadataKey.NUM_RESIZES, MetadataKey.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKey.NUM_RESIZES);
+    // _longValueMetadataKey contains all metadata keys which has value of long type.

Review comment:
       Looks good, but wondering if we can use ColumnDataType (which is widely used already) instead of defining a new enum which more or less means the same thing? I think the ordinal position of values in ColumnDataType is already fixed (from serialization, deserialization point of view), but for safety we can add a comment their saying not to change the ordinal position.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603637907



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN(0, "unknown"),
+    TABLE_KEY(1, "table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY(2, "Exception"),
+    NUM_DOCS_SCANNED_METADATA_KEY(3, "numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY(4, "numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY(5, "numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED(6, "numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED(7, "numSegmentsProcessed"),

Review comment:
       I believe it's left over from history. In V2, some metadata key string literal has the suffix and some not, namely the following 4 does not have the suffix: 
   ```
     String NUM_SEGMENTS_QUERIED = "numSegmentsQueried";
     String NUM_SEGMENTS_PROCESSED = "numSegmentsProcessed";
     String NUM_SEGMENTS_MATCHED = "numSegmentsMatched";
     String NUM_CONSUMING_SEGMENTS_PROCESSED = "numConsumingSegmentsProcessed";
     String MIN_CONSUMING_FRESHNESS_TIME_MS = "minConsumingFreshnessTimeMs";
   ``` 
   So I copied them to keep consistence. Now, I have remove the suffix for all the keys in V3. In V3, all the keys are in a 
   MetadataKeys enum), so user will know they are metadata keys even if they do not has a suffix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603762884



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -27,19 +27,18 @@
 import java.util.HashMap;
 import java.util.Map;
 import java.util.Map.Entry;
-import org.apache.commons.lang3.StringUtils;
 import org.apache.pinot.common.response.ProcessingException;
 import org.apache.pinot.common.utils.DataSchema;
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
-import org.apache.pinot.core.common.ObjectSerDeUtils;
-import org.apache.pinot.spi.utils.ByteArray;
-import org.apache.pinot.spi.utils.BytesUtils;
 
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_2;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.deserializeDictionaryMap;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.serializeDictionaryMap;
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
 
+public class DataTableImplV2 extends DataTableImplBase implements DataTable {

Review comment:
       I think this works but doesn't look correct from OOP point of view. Ideally the abstract base class should implement the interface to make a proper logical hierarchy
   DataTable -> DataTableImpleBase (or AbstractDataTableImpl) -> DataTableImplV2, DataTableImplV3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603638827



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),

Review comment:
       Why are we using the constructor based pattern ? The getName function of enum will automatically return the constant name as string in CAMEL case which is exactly what you are passing to the constructor. So we don't need constructor based creation of enum constants. You can simply define the constant and Java enum implementation will take care of the name




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806320191


   test failed due to flaky networking:
   ```
   Failed to execute goal on project pinot-orc: Could not resolve dependencies for project org.apache.pinot:pinot-orc:jar:0.7.0-SNAPSHOT: Could not transfer artifact org.apache.hive:hive-storage-api:jar:2.7.1 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/org/apache/hive/hive-storage-api/2.7.1/hive-storage-api-2.7.1.jar: Connection reset -> [Help 1]
   ```
   close and re-open to trigger a re-run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603689214



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);

Review comment:
       See my comment above - Lines 222 to 225 can be moved into deserializeMetadata and you don't need to do a copy. Just maintain one map




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603678900



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pair:
+   * - if the value type is String, encode it as: [keyID, valueLength, Utf8EncodedValue].
+   * - if the value type is int, encode it as: [keyID, bigEndianRepresentationOfIntValue]
+   * - if the value type is long, encode it as: [keyID, bigEndianRepresentationOfLongValue]
+   */
+  private byte[] serializeMetadata()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(_metadata.size());
+
+    for (Map.Entry<MetadataKeys, String> entry : _metadata.entrySet()) {
+      MetadataKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      if (MetadataKeys.isIntValueMetadataKey(key)) {
+        dataOutputStream.write(Ints.toByteArray(Integer.parseInt(value)));
+      } else if (MetadataKeys.isLongValueMetadataKey(key)) {
+        dataOutputStream.write(Longs.toByteArray(Long.parseLong(value)));
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<MetadataKeys, String> deserializeMetadata(byte[] bytes)

Review comment:
       (nit) Add a comment here highlighting something like  ---
   
   Even though the wire format uses UTF-8 for string/bytes and big-endian for numeric values, the in-memory representation is STRING based for processing the metadata before serialization (by the server as it adds the statistics in metadata) and after deserialization (by the broker as it receives DataTable from each server and aggregates the values). This goes back to the point of how the V3 implementation has kept the consumers of `Map<String, String> getMetadata()` API in the code happy by internally converting to it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r601741649



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";

Review comment:
       Why do we need this? 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -209,24 +190,22 @@ public DataTableImplV2(ByteBuffer byteBuffer)
     }
   }
 
-  private static String decodeString(DataInputStream dataInputStream)
-      throws IOException {
-    int length = dataInputStream.readInt();
-    if (length == 0) {
-      return StringUtils.EMPTY;
-    } else {
-      byte[] buffer = new byte[length];
-      int numBytesRead = dataInputStream.read(buffer);
-      assert numBytesRead == length;
-      return StringUtil.decodeUtf8(buffer);
-    }
-  }
-
   @Override
   public void addException(ProcessingException processingException) {
     _metadata.put(EXCEPTION_METADATA_KEY + processingException.getErrorCode(), processingException.getMessage());
   }
 
+  @Override
+  public Map<Integer, String> getExceptions() {
+    Map<Integer, String> exceptions = new HashMap<>();
+    for (String key : _metadata.keySet()) {
+      if (key.startsWith(DataTable.EXCEPTION_METADATA_KEY)) {
+        exceptions.put(Integer.parseInt(key.substring(9)), _metadata.get(key));

Review comment:
       what is `9`? Can we have a `static final int` and an example here?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest

Review comment:
       Add an UNKNOWN as the first one (=0). You will then be able to code better around things easier when exceptions are thrown in valueOf() methods.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -263,6 +263,14 @@ public void finishRow()
   }
 
   public DataTable build() {
+    return new DataTableImplV3(_numRows, _dataSchema, _reverseDictionaryMap,
+        _fixedSizeDataByteArrayOutputStream.toByteArray(), _variableSizeDataByteArrayOutputStream.toByteArray());
+  }
+
+  // buildV2() is only used in V2V3Compatibility test

Review comment:
       It may be better to make it configurable on the server to generate either version? (default the config to V3).
   
   Alternative is that we insist that the brokers be upgraded before picking up this feature. That may be ok, but may cause other constraints (e.g. if someone needs a fix on the server that is urgently needed in production, forcing them to pull a newer version of the server).
   

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(
+      TrailerKeys.TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+      TrailerKeys.EXCEPTION_METADATA_KEY,
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.NUM_GROUPS_LIMIT_REACHED_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.TRACE_INFO_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // TrailerKeyToMetadataKeyMap is used to convert enum key to metadata key(string).
+  Map<TrailerKeys, String> TrailerKeyToMetadataKeyMap = ImmutableMap.<TrailerKeys, String>builder()

Review comment:
       Will BiMap work better?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(

Review comment:
       Instead of duplicating keys, can we just union the set?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

codecov-io edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804528996


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=h1) Report
   > Merging [#6710](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=desc) (6eee51b) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.38%`.
   > The diff coverage is `62.64%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6710/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6710      +/-   ##
   ==========================================
   - Coverage   66.44%   66.06%   -0.39%     
   ==========================================
     Files        1075     1398     +323     
     Lines       54773    68158   +13385     
     Branches     8168     9852    +1684     
   ==========================================
   + Hits        36396    45029    +8633     
   - Misses      15700    19932    +4232     
   - Partials     2677     3197     +520     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `66.06% <62.64%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1287 more](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=footer). Last update [27b61fe...6eee51b](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804596719


   With this PR, we should resolve a couple of TODOs introduced in PR https://github.com/apache/incubator-pinot/pull/6680/
   
   - Expose the serialization time through an API at the DataTable level and log it in [QueryScheduler](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR222). You need to serialize before the logging line. Currently it is after.
   - Revisit [this](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR255). The execution cpu time is not yet serialized as part of metadata. May be we can just remove line 258. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603649214



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableFactory.java
##########
@@ -31,8 +35,10 @@ public static DataTable getDataTable(ByteBuffer byteBuffer)
       throws IOException {
     int version = byteBuffer.getInt();
     switch (version) {
-      case 2:
-        return new DataTableImplV2(byteBuffer);
+      case VERSION_2:
+        return convertDataTableImplV2ToV3(new DataTableImplV2(byteBuffer));

Review comment:
       We should simply call `new DataTableImplV2(byteBuffer)` or `new DataTableImplV3(byteBuffer)` depending on the version




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604397992



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       Let's discuss this to see what we need to do here. Might want to cleanup the existing code first to always build empty data table in the same manner




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603692307



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableUtils.java
##########
@@ -233,4 +244,170 @@ private static DataTable buildEmptyDataTableForDistinctQuery(QueryContext queryC
     dataTableBuilder.finishRow();
     return dataTableBuilder.build();
   }
+
+  /**
+   * Helper method to decode string.
+   */
+  public static String decodeString(DataInputStream dataInputStream)
+      throws IOException {
+    int length = dataInputStream.readInt();
+    if (length == 0) {
+      return StringUtils.EMPTY;
+    } else {
+      byte[] buffer = new byte[length];
+      int numBytesRead = dataInputStream.read(buffer);
+      assert numBytesRead == length;
+      return StringUtil.decodeUtf8(buffer);
+    }
+  }
+
+  /**
+   * Helper method to decode int.
+   */
+  public static int decodeInt(DataInputStream dataInputStream)
+      throws IOException {
+    int length = Integer.BYTES;
+    byte[] buffer = new byte[length];
+    int numBytesRead = dataInputStream.read(buffer);
+    assert numBytesRead == length;
+    return Ints.fromByteArray(buffer);
+  }
+
+  /**
+   * Helper method to decode long.
+   */
+  public static long decodeLong(DataInputStream dataInputStream)
+      throws IOException {
+    int length = Long.BYTES;
+    byte[] buffer = new byte[length];
+    int numBytesRead = dataInputStream.read(buffer);
+    assert numBytesRead == length;
+    return Longs.fromByteArray(buffer);
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  public static byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(dictionaryMap.size());
+    for (Map.Entry<String, Map<Integer, String>> dictionaryMapEntry : dictionaryMap.entrySet()) {
+      String columnName = dictionaryMapEntry.getKey();
+      Map<Integer, String> dictionary = dictionaryMapEntry.getValue();
+      byte[] bytes = StringUtil.encodeUtf8(columnName);
+      dataOutputStream.writeInt(bytes.length);
+      dataOutputStream.write(bytes);
+      dataOutputStream.writeInt(dictionary.size());
+
+      for (Map.Entry<Integer, String> dictionaryEntry : dictionary.entrySet()) {
+        dataOutputStream.writeInt(dictionaryEntry.getKey());
+        byte[] valueBytes = StringUtil.encodeUtf8(dictionaryEntry.getValue());
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Helper method to deserialize dictionary map.
+   */
+  public static Map<String, Map<Integer, String>> deserializeDictionaryMap(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numDictionaries = dataInputStream.readInt();
+      Map<String, Map<Integer, String>> dictionaryMap = new HashMap<>(numDictionaries);
+
+      for (int i = 0; i < numDictionaries; i++) {
+        String column = decodeString(dataInputStream);
+        int dictionarySize = dataInputStream.readInt();
+        Map<Integer, String> dictionary = new HashMap<>(dictionarySize);
+        for (int j = 0; j < dictionarySize; j++) {
+          int key = dataInputStream.readInt();
+          String value = decodeString(dataInputStream);
+          dictionary.put(key, value);
+        }
+        dictionaryMap.put(column, dictionary);
+      }
+
+      return dictionaryMap;
+    }
+  }
+
+  /**

Review comment:
       We don't need to do this conversion. The factory class should read the version as the first 4 bytes from the incoming ByteBuffer and accordingly create DataTableImplV2 or DataTableImplV3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606092021



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;

Review comment:
       Will do it in a follow-up PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809610152


   > @mcvsubbu @siddharthteotia and me meet offline, we wanner keep this PR focus on bumping up to v3 and move metadata to the end of data table, also use enmu ordinal as key when serialize. And make it configurable to send V2/V3 data at server side (instance config).
   > 
   > @Jackie-Jiang I terms of addressing the TODO in DataTableBuilder(fix float data length, one String->Int map for the whole table instead of for each column), we will address it separately(bumping up to V4).
   
   Yes, the enum based original approach is much simpler and the enum getName() gives the camel case string key and the enum ordinal gives the id. The approach of using explicit id and name is not needed since we want the enum structure to always grow - users should not be allowed to remove enums or change name of enums. Should only be adding new enums to the end. 
   
   For the existing TODOs, the main reason for not addressing them in this PR is to keep this as simple as possible. It is easier  to review and once this change goes in production and if there are any issues, we don't have to debug multiple independent changes. We can always address those TODOs in a follow-up and if they demand version change, we can bump the version again.
   
   Regarding introducing a instance config for version, we agreed that CURRENT_VERSION will be 3 and server will always send the data table with this version. @mcvsubbu  suggested that it can be helpful to force server to send the old version v2 in case there are issues with v3 forcing a rollback but we want to use the latest code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599240463



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()

Review comment:
       This is actually not doing the serialization to the main output stream opened by the caller toByte().
   This function like the other serialization functions first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream. I think the reason for doing that is upfront we don't know the length of byte[] array to allocate.
   
   However, for this footer we can probably do different and it might be faster
   
   - Write a loop to go over each entry and keep a running sum of size
   - At the end of loop, allocate byte array of that size
   - Start another loop and go over each entry again and fill out the pre-allocated byte array.
   - Return the filled byte array
   
   This will allow the unnecessary creation of streams at lined 400,401 and then writing to them followed by converting to byte array. We can directly write to byte array. I think this can be faster. 
   For the other Serialization functions which follow this approach, we can fix them later outside this PR if need be




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603658312



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),

Review comment:
       This is a good suggestion, but we should be careful about the strings that:
   - Broker uses in returning metadata
   - Broker/Server use in logs.
   It is better we don't change those strings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603488335



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,9 +107,17 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    CURRENT_VERSION = VERSION_3;

Review comment:
       Yes, it's configurable now. There is another constructor `public DataTableBuilder(DataSchema dataSchema, int version) {}`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599240463



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()

Review comment:
       This is actually not doing the serialization to the main output stream opened by the caller toByte().
   This function like the other serialization functions first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream. I think the reason for doing that is upfront we don't know the length of byte[] array to allocate.
   
   However, for this footer we can probably do different and it might be faster
   
   - Write a loop to go over each entry and keep a running sum of size
   - At the end of loop, allocate byte array of that size
   - Start another loop and go over each entry again and fill out the pre-allocated byte array.
   - Return the filled byte array
   
   This will prevent the unnecessary creation of streams at lined 400,401 and then writing to them followed by converting to byte array. We can directly write to byte array. I think this can be faster. 
   For the other Serialization functions which follow this approach, we can fix them later outside this PR if need be




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599200668



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -33,12 +33,15 @@
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
 import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
+public class DataTableImplV2V3 implements DataTable {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  public static final int DEFAULT_VERSION = VERSION_3;

Review comment:
       Change this to CURRENT_VERSION ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604529060



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       +1 for keeping the current logic. Another drawback of having two builder is: all caller need to decide call v2 builder or v3 builder based on instance config, which is ugly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604417165



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -161,13 +163,15 @@ public void stop() {
           queryRequest.getBrokerId(), e);
       // For not handled exceptions
       serverMetrics.addMeteredGlobalValue(ServerMeter.UNCAUGHT_EXCEPTIONS, 1);
-      dataTable = new DataTableImplV2();
+      dataTable = new DataTableImplV3();

Review comment:
       Please address this as per approach discussed in https://github.com/apache/incubator-pinot/pull/6710/#discussion_r604379681




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809089192


   test failed due to flaky issue, close and re-open to trigger  a re-run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603655493



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;

Review comment:
       Remove this. Use VERSION_3 constant already defined in DataTableBuilder




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531148



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       +1 for keeping current logic. Another drawback of have two builder is: all caller need to decide whether to call V2 or V3 based on instance config, which is ugly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605862022



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs");
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKey contains all metadata keys which has value of int type.
+    private static final Set<MetadataKey> _intValueMetadataKey = ImmutableSet
+        .of(MetadataKey.NUM_SEGMENTS_QUERIED, MetadataKey.NUM_SEGMENTS_PROCESSED, MetadataKey.NUM_SEGMENTS_MATCHED,
+            MetadataKey.NUM_RESIZES, MetadataKey.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKey.NUM_RESIZES);
+    // _longValueMetadataKey contains all metadata keys which has value of long type.

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    super();
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -50,51 +46,19 @@
   // VARIABLE_SIZE_DATA (START|SIZE)
   private static final int HEADER_SIZE = Integer.BYTES * 13;
 
-  private final int _numRows;
-  private final int _numColumns;
-  private final DataSchema _dataSchema;
-  private final int[] _columnOffsets;
-  private final int _rowSizeInBytes;
-  private final Map<String, Map<Integer, String>> _dictionaryMap;
-  private final byte[] _fixedSizeDataBytes;
-  private final ByteBuffer _fixedSizeData;
-  private final byte[] _variableSizeDataBytes;
-  private final ByteBuffer _variableSizeData;
-  private final Map<String, String> _metadata;
-
   /**
    * Construct data table with results. (Server side)
    */
   public DataTableImplV2(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
       byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
-    _numRows = numRows;
-    _numColumns = dataSchema.size();
-    _dataSchema = dataSchema;
-    _columnOffsets = new int[_numColumns];
-    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
-    _dictionaryMap = dictionaryMap;
-    _fixedSizeDataBytes = fixedSizeDataBytes;
-    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
-    _variableSizeDataBytes = variableSizeDataBytes;
-    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
-    _metadata = new HashMap<>();
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
   }
 
   /**
    * Construct empty data table. (Server side)
    */
   public DataTableImplV2() {
-    _numRows = 0;
-    _numColumns = 0;
-    _dataSchema = null;
-    _columnOffsets = null;
-    _rowSizeInBytes = 0;
-    _dictionaryMap = null;
-    _fixedSizeDataBytes = null;
-    _fixedSizeData = null;
-    _variableSizeDataBytes = null;
-    _variableSizeData = null;
-    _metadata = new HashMap<>();
+    super();

Review comment:
       Done. `super()` is redundant, but default constructor is needed here since we have a `DataTableImplV2(ByteBuffer byteBuffer)` no-default constructor.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605260402



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       Yes, we can argue both ways here but my preference would be enum with implicit ordinal as opposed to id based. I agree the latter gives more flexibility to the user but I don't think we need it. So a simple enum with ordinal as id along with clear javadoc highlighting the rules for updating the enum is preferable imo. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809929578


   integration failed due to flaky issue:
   ```
   Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project pinot-integration-tests: There was a timeout in the fork -> [Help 1]
   ```
   
   close and re-open to trigger a re-run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603744212



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;

Review comment:
       Discussed this offline. The approach I suggested above is the one we agreed to




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603499342



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableFactory.java
##########
@@ -32,7 +34,9 @@ public static DataTable getDataTable(ByteBuffer byteBuffer)
     int version = byteBuffer.getInt();
     switch (version) {
       case 2:
-        return new DataTableImplV2(byteBuffer);
+        return convertDataTableImplV2ToV3(new DataTableImplV2(byteBuffer));
+      case 3:

Review comment:
       can we use the VERSION_ constants defined




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603708603



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -161,13 +163,15 @@ public void stop() {
           queryRequest.getBrokerId(), e);
       // For not handled exceptions
       serverMetrics.addMeteredGlobalValue(ServerMeter.UNCAUGHT_EXCEPTIONS, 1);
-      dataTable = new DataTableImplV2();
+      dataTable = new DataTableImplV3();
       dataTable.addException(QueryException.getException(QueryException.INTERNAL_ERROR, e));
     }
     long requestId = queryRequest.getRequestId();
     Map<String, String> dataTableMetadata = dataTable.getMetadata();
     dataTableMetadata.put(DataTable.REQUEST_ID_METADATA_KEY, Long.toString(requestId));
 
+    byte[] responseBytes = serializeDataTable(queryRequest, dataTable);
+

Review comment:
       I think it is useful for debugging and understanding how time is being spent by emitting and logging each cpu time cost metric separately. We should send an aggregated value to the broker though. 
   
   I understand this gets convoluted, but may be this is what we can do
   
   Option 1 (simpler)
   
   - Log, emit (serve table gauge) and send to broker separately. This simplifies everything. Broker (the follow up PR) will anyway sum up the costs from each server. It might as well do a per server summation first before aggregating across servers
   
   Option 2 (somewhat hacky)
   
   If we want to log and emit (Server table gauge) separately but want to send a single value to broker
   
   - Provide an API on the DataTable to expose the data table serialization cost. 
   - The API will be implemented by DataTableImplV3 to return the actual measured cost and DataTableImplV2 can simply return -1
   - Call that API here after line 173 to get the serialization cpu time cost. Use that to emit the metric to server table gauge and then log it at line 225. 
   - Subtract the previous value from line 193 and then emit the latter and log it
   
   I prefer option 1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531148



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       +1 for keeping current logic. Another drawback of have two builder is: all caller need to decide whether to call V2/V3 based on instance config, which is ugly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603677439



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pair:
+   * - if the value type is String, encode it as: [keyID, valueLength, Utf8EncodedValue].
+   * - if the value type is int, encode it as: [keyID, bigEndianRepresentationOfIntValue]
+   * - if the value type is long, encode it as: [keyID, bigEndianRepresentationOfLongValue]

Review comment:
       (nit) Add a comment here highlighting that unlike V2, numeric metadata values in V3 are not encoded in UTF-8 in the wire format. Instead big endian representation is used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r601806385



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -209,24 +190,22 @@ public DataTableImplV2(ByteBuffer byteBuffer)
     }
   }
 
-  private static String decodeString(DataInputStream dataInputStream)
-      throws IOException {
-    int length = dataInputStream.readInt();
-    if (length == 0) {
-      return StringUtils.EMPTY;
-    } else {
-      byte[] buffer = new byte[length];
-      int numBytesRead = dataInputStream.read(buffer);
-      assert numBytesRead == length;
-      return StringUtil.decodeUtf8(buffer);
-    }
-  }
-
   @Override
   public void addException(ProcessingException processingException) {
     _metadata.put(EXCEPTION_METADATA_KEY + processingException.getErrorCode(), processingException.getMessage());
   }
 
+  @Override
+  public Map<Integer, String> getExceptions() {
+    Map<Integer, String> exceptions = new HashMap<>();
+    for (String key : _metadata.keySet()) {
+      if (key.startsWith(DataTable.EXCEPTION_METADATA_KEY)) {
+        exceptions.put(Integer.parseInt(key.substring(9)), _metadata.get(key));

Review comment:
       In V2, all exception was add into metadata, using key `"Exception"+errCode`, `"Exception".length() = 8`,`Integer.parseInt(key.substring(9))` can extract the error code from the key. See code here: https://github.com/apache/incubator-pinot/blob/98d569db6deaab6ece09c08ece5e29a33eceab0f/pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java#L225-L228




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603763658



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableUtils.java
##########
@@ -233,4 +243,98 @@ private static DataTable buildEmptyDataTableForDistinctQuery(QueryContext queryC
     dataTableBuilder.finishRow();
     return dataTableBuilder.build();
   }
+
+  /**
+   * Helper method to decode string.
+   */
+  public static String decodeString(DataInputStream dataInputStream)
+      throws IOException {
+    int length = dataInputStream.readInt();
+    if (length == 0) {
+      return StringUtils.EMPTY;
+    } else {
+      byte[] buffer = new byte[length];
+      int numBytesRead = dataInputStream.read(buffer);
+      assert numBytesRead == length;
+      return StringUtil.decodeUtf8(buffer);
+    }
+  }
+
+  /**
+   * Helper method to decode int.
+   */
+  public static int decodeInt(DataInputStream dataInputStream)
+      throws IOException {
+    int length = Integer.BYTES;
+    byte[] buffer = new byte[length];
+    int numBytesRead = dataInputStream.read(buffer);
+    assert numBytesRead == length;
+    return Ints.fromByteArray(buffer);

Review comment:
       Does this convert from big endian to little endian?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806261871


   > With the addition of new data structure in this PR, there are essentially two places in DataTable where the key-value / name-value style structure is located.
   > 
   > * First is the existing DataTable metadata which is also a series of key-value pairs where key is string and value is some statistic/metric. This is towards the beginning of the byte stream
   > * Second is the structure introduced in this PR which is written as a footer.
   > 
   > Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs) + new positional stuff can be a file footer.
   
   
   
   > With this PR, we should resolve a couple of TODOs introduced in PR #6680
   > 
   > * Expose the serialization time through an API at the DataTable level and log it in [QueryScheduler](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR222). You need to serialize before the logging line. Currently it is after.
   > * Revisit [this](https://github.com/apache/incubator-pinot/pull/6710/files#diff-2bff83abd3f6e831acfe4b6d31a022f228710def4eea47db3929c6d90b3147ecR255). The execution cpu time is not yet serialized as part of metadata. May be we can just remove line 258.
   
   @mqliang , please make sure to address these TODOs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605325603



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,399 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKey.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeInt;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeLong;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 integers of header:                        |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends BaseDataTable {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _errCodeToExceptionMap stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _errCodeToExceptionMap;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _errCodeToExceptionMap = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();

Review comment:
       call to super() is redundant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603806901



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {

Review comment:
       not needed since build-in `enumKey.name()` can not satisfied our requirement 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableUtils.java
##########
@@ -233,4 +243,98 @@ private static DataTable buildEmptyDataTableForDistinctQuery(QueryContext queryC
     dataTableBuilder.finishRow();
     return dataTableBuilder.build();
   }
+
+  /**
+   * Helper method to decode string.
+   */
+  public static String decodeString(DataInputStream dataInputStream)
+      throws IOException {
+    int length = dataInputStream.readInt();
+    if (length == 0) {
+      return StringUtils.EMPTY;
+    } else {
+      byte[] buffer = new byte[length];
+      int numBytesRead = dataInputStream.read(buffer);
+      assert numBytesRead == length;
+      return StringUtil.decodeUtf8(buffer);
+    }
+  }
+
+  /**
+   * Helper method to decode int.
+   */
+  public static int decodeInt(DataInputStream dataInputStream)
+      throws IOException {
+    int length = Integer.BYTES;
+    byte[] buffer = new byte[length];
+    int numBytesRead = dataInputStream.read(buffer);
+    assert numBytesRead == length;
+    return Ints.fromByteArray(buffer);

Review comment:
       Copy from the comments of `Ints.toByteArray()`:
   ```
   Returns a big-endian representation of value in a 4-element byte array; equivalent to ByteBuffer.allocate(4).putInt(value).array()
   ```
   And comments from `Ints.fromByteArray()`:
   ```
   Returns the int value whose big-endian representation is stored in the first 4 bytes of bytes; equivalent to ByteBuffer.wrap(bytes).getInt(
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -27,19 +27,18 @@
 import java.util.HashMap;
 import java.util.Map;
 import java.util.Map.Entry;
-import org.apache.commons.lang3.StringUtils;
 import org.apache.pinot.common.response.ProcessingException;
 import org.apache.pinot.common.utils.DataSchema;
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
-import org.apache.pinot.core.common.ObjectSerDeUtils;
-import org.apache.pinot.spi.utils.ByteArray;
-import org.apache.pinot.spi.utils.BytesUtils;
 
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_2;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.deserializeDictionaryMap;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.serializeDictionaryMap;
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
 
+public class DataTableImplV2 extends DataTableImplBase implements DataTable {

Review comment:
       Done. Has declare DataTableImpleBase as abstract now and make DataTableImpleBase implement the DataTable interface.

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";
+    public static final int DEFAULT_CURRENT_DATA_TABLE_VERSION = 3;

Review comment:
       This will cause a cynical cyclical dependency issue. So hardcode it as 3 at this moment 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableUtils.java
##########
@@ -233,4 +243,98 @@ private static DataTable buildEmptyDataTableForDistinctQuery(QueryContext queryC
     dataTableBuilder.finishRow();
     return dataTableBuilder.build();
   }
+
+  /**

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pair:
+   * - if the value type is String, encode it as: [keyID, valueLength, Utf8EncodedValue].
+   * - if the value type is int, encode it as: [keyID, bigEndianRepresentationOfIntValue]
+   * - if the value type is long, encode it as: [keyID, bigEndianRepresentationOfLongValue]

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599244164



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -167,6 +178,18 @@ public DataTableImplV2(ByteBuffer byteBuffer)
       _variableSizeDataBytes = null;
       _variableSizeData = null;
     }
+
+    // Read positional data.
+    String[] positionalData = null;
+    if (version == VERSION_3 && byteBuffer.hasRemaining()) {
+      int positionalDataStart = variableSizeDataStart + variableSizeDataLength;
+      int positionalDataLength = byteBuffer.remaining();
+      byteBuffer.position(positionalDataStart);

Review comment:
       Since we are using `byteBuffer.remaining() `to compute the length of positional data, it implies we are treating it as a **footer** of specific format (name-value pairs as defined in the enum) even though we are not calling it. So, technically no other structure can come after this as we will fail to distinguish between the length of positional data + whatever comes after it. I don't think we should limit that flexibility. Even if we call this footer, let's please write the length of footer as well before line 348




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807233774


   > @mqliang @mcvsubbu I'm suggesting putting integer ids so that we can deprecate keys if needed by skipping the id, similar to the `thrift` convention. Using ordinal to index across enum is not as flexible. We can also put the name as another field of the enum.
   
   I think we are saying the same thing.  Have set of integers in no specific order that we can skip if not recognized. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603654939



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       This is the only difference between V2 and V3 protocol right?
   
    ```
   // VERSION
     // NUM_ROWS
     // NUM_COLUMNS
     // DICTIONARY_MAP (START|SIZE)
     // METADATA (START|SIZE) -> removed in V3 and moves to trailer/footer/end
     // DATA_SCHEMA (START|SIZE)
     // FIXED_SIZE_DATA (START|SIZE)
     // VARIABLE_SIZE_DATA (START|SIZE)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603032926



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -167,6 +178,18 @@ public DataTableImplV2(ByteBuffer byteBuffer)
       _variableSizeDataBytes = null;
       _variableSizeData = null;
     }
+
+    // Read positional data.
+    String[] positionalData = null;
+    if (version == VERSION_3 && byteBuffer.hasRemaining()) {
+      int positionalDataStart = variableSizeDataStart + variableSizeDataLength;
+      int positionalDataLength = byteBuffer.remaining();
+      byteBuffer.position(positionalDataStart);

Review comment:
       done. Length of footer are written into header now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606091778



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  protected byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(dictionaryMap.size());
+    for (Map.Entry<String, Map<Integer, String>> dictionaryMapEntry : dictionaryMap.entrySet()) {
+      String columnName = dictionaryMapEntry.getKey();
+      Map<Integer, String> dictionary = dictionaryMapEntry.getValue();
+      byte[] bytes = StringUtil.encodeUtf8(columnName);
+      dataOutputStream.writeInt(bytes.length);
+      dataOutputStream.write(bytes);
+      dataOutputStream.writeInt(dictionary.size());
+
+      for (Map.Entry<Integer, String> dictionaryEntry : dictionary.entrySet()) {
+        dataOutputStream.writeInt(dictionaryEntry.getKey());
+        byte[] valueBytes = StringUtil.encodeUtf8(dictionaryEntry.getValue());
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Helper method to deserialize dictionary map.
+   */
+  protected Map<String, Map<Integer, String>> deserializeDictionaryMap(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numDictionaries = dataInputStream.readInt();
+      Map<String, Map<Integer, String>> dictionaryMap = new HashMap<>(numDictionaries);
+
+      for (int i = 0; i < numDictionaries; i++) {
+        String column = decodeString(dataInputStream);
+        int dictionarySize = dataInputStream.readInt();
+        Map<Integer, String> dictionary = new HashMap<>(dictionarySize);
+        for (int j = 0; j < dictionarySize; j++) {
+          int key = dataInputStream.readInt();
+          String value = decodeString(dataInputStream);
+          dictionary.put(key, value);
+        }
+        dictionaryMap.put(column, dictionary);
+      }
+
+      return dictionaryMap;
+    }
+  }
+
+  public Map<String, String> getMetadata() {

Review comment:
       @mqliang , can you please address this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605219147



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()
+      throws IOException {
+    DataSchema.ColumnDataType[] columnDataTypes = DataSchema.ColumnDataType.values();
+    int numColumns = columnDataTypes.length;
+    String[] columnNames = new String[numColumns];
+    for (int i = 0; i < numColumns; i++) {
+      columnNames[i] = columnDataTypes[i].name();
+    }
 
     int[] ints = new int[NUM_ROWS];
     long[] longs = new long[NUM_ROWS];
     float[] floats = new float[NUM_ROWS];
     double[] doubles = new double[NUM_ROWS];
     String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
     Object[] objects = new Object[NUM_ROWS];
     int[][] intArrays = new int[NUM_ROWS][];
     long[][] longArrays = new long[NUM_ROWS][];
     float[][] floatArrays = new float[NUM_ROWS][];
     double[][] doubleArrays = new double[NUM_ROWS][];
     String[][] stringArrays = new String[NUM_ROWS][];
 
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    DataTableBuilder.setCurrentDataTableVersion(DataTableBuilder.VERSION_2);
+    DataTableBuilder dataTableBuilderV2 = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV2, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    // Verify V3 broker can deserialize data table send by V2 server
+    DataTable dataTableV2 = dataTableBuilderV2.build(); // create a V2 data table
+    // Deserialize data table bytes as V3
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);

Review comment:
       Done. I have updated the comments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603693228



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;

Review comment:
       All of the above can be moved to a base class as protected?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807918035


   > Everything except for the actual result data can be called metadata IMO. I don't like the term "trailer" because it is not a common term in data world, which can cause confusion. Also, we are not really putting it at the end of the data table, it is in front of the actual result data.
   
   @Jackie-Jiang No. Now we put it at the end of the data table. In V3,  layout of data table looks like:
   
   ```
   version number
   num_rows
   num_columns
   exception_start_offset, exception_length
   dictionary_map_start_offset, dictionary_map_length
   data_schema_start_offset, data_scheme_length
   fixed_size_data_start_offset, fixed_size_length
   variable_size_data_start_offset, variable_size_length
   trailer_start_offset, trailer_length
   exception_bytes
   dictionary_map_bytes
   data_schema_bytes
   fixed_size_data_bytes
   variable_size_data_bytes
   trailer_bytes
   ```
   But I agree to name it as metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599306832



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -33,12 +33,15 @@
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
 import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
+public class DataTableImplV2V3 implements DataTable {

Review comment:
       I name it as DataTableImplV2V3 since V2 and V3 share a lot of common logic. If V2 and V3 has major changes, as you suggest:
   > Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs) + new positional stuff can be a file footer.
   
   If we do that, I vote for put V2 logic into DataTableImplV2 and V3 logic into DataTableImplV3, and extract common logic (e.g. serialize/de-serialize metada/dictionaryMap into DataTableUtils.java)
   
   > move the existing metadata of key-value pairs to the end of file 
   
   Actually I considered that. I also considered to make metadata as a `String[]` instead of `Map<String, String>` and make all meta data keys as enum value. Also make "serialization_cpu_times_ns" as part of metadata. In other words, "serialization_cpu_times_ns" is part of mate data and footer section only contains meta data. In this way:
   * all meta data is positional, we can replace values in metadata even after data table is serialized. (`Map<String, String>` is not positional because when loop over a hashmap, the order of items is not deterministic,  but loop over of an array, the order is deterministic)
   * meta data previously is `Map<String, String>`, where we need to write keys(type string) to byte buffer. When replaces as `String[]`, we don't write the enum constant itself. Just the value (length+bytes) corresponding to the ordinal/position of the constant. So less data is transfered between server/broker.
   
   But if we change in this way, as I previously stated, I vote to keep the current DataTableImplV2.java as it is, and create a DataTableImplV3.java to put all V3 logic (with extracting common  into DataTableUtils.java ). Otherwise, puting all V2/V3 logic in same file will make the code hard to read. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603474243



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.

Review comment:
       Also add - `"Don't remove key"`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804592233


   With the addition of new data structure in this PR, there are essentially two places in DataTable where the key-value / name-value style structure is located.
   
   - First is the existing DataTable metadata which is also a series of key-value pairs where key is string and value is some statistic/metric
   - Second is the structure introduced in this PR which is written as a file footer.
   
   Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs or positional stuff) can be a file footer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809076119


   @Jackie-Jiang @siddharthteotia @mcvsubbu Comment addressed and are ready for review now. I split the change into 5 commits:
   * 1st commit:
      * Rename TrailerKeys as MetadataKeys
      * Associate ID/Name with MetadataKeys
      * V2->V3 convert instead of construct V3 from V2 bytes
      * ASCII layout of V3 datatable
      * Address a TODO in DataTableBuilder: store bytes data into variable size data section, instead of String
     
   * 2nd commit: Address a TODO in DataTableBuilder: fix float size issue at DataTableBuilder
   * 3rd commit:  Address a TODO in DataTableBuilder: use one Map to map a String to Integer for all columns in V3.
   * 4th commit:  fix a bug at BrokerReduceService, which bring down integration test.
   * 5th commit:  Log `responseSerializationCpuTimeNs` at QueryScheduler and emit a broker gauge; put "executionThreadCpuTimeNs" and "responseSerializationCpuTimeNs" into metadata so that they can be sent to broker
   
   
   There is still one more TODO in DataTableBuilder: Given a data schema, write all values one by one instead of using rowId and colId to position (save time). It will not impact the serialized bytes layout of data table, it's just some implementation optimization. Which means it does not require a version bumping up, so can be done in a separate PR, I create a issue: https://github.com/apache/incubator-pinot/issues/6720 to track this. And a preliminary benchmark shows that the optimization is quite speculative -- there is no improvement by writing all values one by one without using rowId and colId to position, for more details, see the benchmark result at: https://github.com/apache/incubator-pinot/issues/6720
   
   There is one more thing need to be done: change the interface of `DataTable.getMetadata()` returns a `Map<MetadataKeys, String>`, instead of `Map<String, String>`. This PR is already quite large,  I wanner address it in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605866370



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,399 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKey.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeInt;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeLong;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 integers of header:                        |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends BaseDataTable {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _errCodeToExceptionMap stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _errCodeToExceptionMap;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _errCodeToExceptionMap = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531255



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605325237



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs");
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKey contains all metadata keys which has value of int type.
+    private static final Set<MetadataKey> _intValueMetadataKey = ImmutableSet
+        .of(MetadataKey.NUM_SEGMENTS_QUERIED, MetadataKey.NUM_SEGMENTS_PROCESSED, MetadataKey.NUM_SEGMENTS_MATCHED,
+            MetadataKey.NUM_RESIZES, MetadataKey.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKey.NUM_RESIZES);
+    // _longValueMetadataKey contains all metadata keys which has value of long type.

Review comment:
       Why do we need _intValueMetadataKey and _longValueMetadataKey? Instead of maintaining two static maps to decide which parameter is long and which is int, can we add a member variable `_type` for each of the enum options? This will also allow for replacing `isIntValueMetadataKey()` and `isLongValueMetadataKey()` functions with `getType()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603021060



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604527883



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603746111



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       Got it. Although it is possible to squeeze in the exceptions in the metadata section by mapping multiple exception codes to same ordinal, looks like this approach of having separate section of exceptions is cleaner. Discussed offline with @mqliang and @mcvsubbu . We all agreed with this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r600925212



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();

Review comment:
       @mcvsubbu Before serialize _trailer, we need copy all KV pairs in metadata in to trailer.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)

Review comment:
       @mcvsubbu V3 has a dedicate exceptions section to store exceptions. The reason is in V3, all key are enum value, which must be defined statically, we can not use "Exception"+errCode to create new keys

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();
+  }
+
+  /**
+   * Serialize trailer section to bytes.
+   * Format of the bytes looks:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pairs:
+   * - if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
+   * - if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
+   */
+  private byte[] serializeTrailer()

Review comment:
       @mcvsubbu This is the code to serialize trailer.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();

Review comment:
       @mcvsubbu  V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode. To interpret V2 bytes as V3 object, extract exceptions from metadata and put them into _exceptions

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;

Review comment:
       All metadata KV pairs are stored in trailer in V3,  however, to provide the same interface with V2, V3 also implement the `Map<String, String> getMedadata()` method. We need to copy KV paird between _metadata and _trailer during serializaion/deserialization.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)

Review comment:
       @mcvsubbu This function is used to deserialize a V2 bytes into V3 datatable object

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();

Review comment:
       @mcvsubbu After de-serialize _trailer, we need copy all metadata KV pairs in _trailer into _metadata.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();
+  }
+
+  /**
+   * Serialize trailer section to bytes.
+   * Format of the bytes looks:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pairs:
+   * - if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
+   * - if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
+   */
+  private byte[] serializeTrailer()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    int offset = 0;
+    dataOutputStream.writeInt(_trailer.size());
+    offset += Integer.BYTES;
+    for (Map.Entry<TrailerKeys, String> entry : _trailer.entrySet()) {
+      TrailerKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      offset += Integer.BYTES;
+      if (key == TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY) {
+        _responseSerializationCpuTimeNsValueOffset += offset;
+      }
+      if (IntValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Ints.toByteArray(Integer.parseInt(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else if (LongValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Longs.toByteArray(Long.parseLong(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+        offset += Integer.BYTES + valueBytes.length;
+      }
+    }
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<TrailerKeys, String> deserializeTrailer(byte[] bytes)

Review comment:
       @mcvsubbu This is the code to de-serialize trailer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

codecov-io edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804528996


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=h1) Report
   > Merging [#6710](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=desc) (0461950) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.50%`.
   > The diff coverage is `62.64%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6710/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6710      +/-   ##
   ==========================================
   - Coverage   66.44%   65.94%   -0.51%     
   ==========================================
     Files        1075     1398     +323     
     Lines       54773    68215   +13442     
     Branches     8168     9852    +1684     
   ==========================================
   + Hits        36396    44985    +8589     
   - Misses      15700    20036    +4336     
   - Partials     2677     3194     +517     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.94% <62.64%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1287 more](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=footer). Last update [27b61fe...0461950](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606091669



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;

Review comment:
       @mqliang , can you please fix static imports? They are in a quite a few places




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805058323


   Any reason we are restricting the trailer (or footer) to have only key-value pairs? We don't need to place that restriction as long as the length is also encoded up front. It can be any serialized object, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807891631


   > I name it as TrailerKeys since trailer may not only contains metadata KV pairs, but also contains some other data in future -- a metadata key must be a trailer key, but the opposite is not necessarily true. I am OK to rename it as MetadataKeys if we can accept it to call all data we put into the section in future as metadata. CC @mcvsubbu for more input.
   
   @mqliang Everything except for the actual result data can be called metadata IMO. I don't like the term "trailer" because it is not a common term in data world, which can cause confusion. Also, we are not really putting it at the end of the data table, it is in front of the actual result data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804592233


   With the addition of new data structure in this PR, there are essentially two places in DataTable where the key-value / name-value style structure is located.
   
   - First is the existing DataTable metadata which is also a series of key-value pairs where key is string and value is some statistic/metric. This is towards the beginning of the byte stream
   - Second is the structure introduced in this PR which is written as a footer.
   
   Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs) + new positional stuff can be a file footer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806028103


   @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   * We are able to add new key into the enum, without bumping up version
   * We are able to not include a key into trailer, without bumping up version
   * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   
   Namely, say we now have three keys:
   ```
   // old version:
   enum {
       key1,
       key2,
       key3,
   }
   ```
   Now if we remove key2 from the enum since it's no longer been used.
   ```
   // new version
   enum {
       key1,
       key3,
   }
   ```
   Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   
   So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   
   However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearranging sections in datatable.
   
   Let's take a step back to what we wanner solve:
   * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization. 
   * we wanner keep back-comp
   
   To add serialization_cost to datatable after serialization, basically we have two options:
   * append it to the end of bytes. 
   * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   
   So, here is another approach:
   * don't add a trailer section
   * put serialization_cost into metedata
   * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`. 
   * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost. 
   ```
   bytes[] bytes;
   int serialization_cost_value_start_offset;
   
   offset = 0;
   for (String key: metadata.keySet()) {
         keybytes[] = to-utf8(key);
         bytes.append(keybytes.length())
         bytes.append(keybytes)
   
         offset += 4;
         offset += keybytes.length
   
         if (key.equals("erialization_cost")) {
               serialization_cost_value_start_offset = offset;
               valuebytes = toBigEndian(value);
               bytes.append(valuebytes)
               offset += 8;
         } else {
               valuebytes = to-utf8(value);
               bytes.append(valuebytes.length())
               bytes.append(valuebytes)
               offset += 4
               offset += keybytes.length
         }
   }
   ```
   So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   ```
   offset = metadataStartOffset+serialization_cost_value_start_offset
   bytes[offset:offset+8] = toBigEndian(actualValue)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807167614


   @mqliang @mcvsubbu I'm suggesting putting integer ids so that we can deprecate keys if needed by skipping the id, similar to the `thrift` convention. Using ordinal to index across enum is not as flexible. We can also put the name as another field of the enum. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805274402


   > @siddharthteotia , @mqliang and I met, and agreed on the following (I have added some extras, so take a look)
   > 
   > * We will move the metadata to the trailer, retain the other elements in the same order.
   > * We will encode the trailer as
   > * = (int, int, blob)+
   > * The first int is the enum ordinal, second int is the length of the blob, the third part is utf8 encoding of a string, or int/long as dictated by the enum. If int/long, then we will encode in network byte order (big-endian). Alternative is to convert it to a string.
   
   I think (int, int, bytes/blob in utf-8) is preferable as opposed to converting to string


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603634457



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603653571



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       Why are we treating EXCEPTIONS differently in V3 v/s V2 ? Nothing changes here right. They continue to be part of the metadata and along with rest of the metadata key-values, it moves to the end of the byte stream




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604449557



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -315,7 +313,7 @@ private boolean forceLog(long schedulerWaitMs, long numDocsScanned) {
    */
   protected ListenableFuture<byte[]> immediateErrorResponse(ServerQueryRequest queryRequest,
       ProcessingException error) {
-    DataTable result = new DataTableImplV2();
+    DataTable result = new DataTableImplV3();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604531810



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       The code was put here by Intellj reformatting. I'd suggest keep it here, since assume some change this file, and run IntellJ reformat before commit, it will be moved to here anyway.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       +1 for keeping current logic. Another drawback of have two builder is: all caller need to decide whether to call V2 or V3 based on instance config, which is ugly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603022112



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603636428



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS("executionThreadCpuTimeNs"),

Review comment:
       Since the server will always send the aggregated (execution + data table serialization + whatever we add and instrument in the future), the name of key should be changed. Right now it is related to execution part. I suggest changing it to simply **`threadCpuTimeNs`** to indicate that this reflects the entire cpu time measured on the server. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804592233


   With the addition of new data structure in this PR, there are essentially two places in DataTable where the key-value / name-value style structure is located.
   
   - First is the existing DataTable metadata which is also a series of key-value pairs where key is string and value is some statistic/metric. This is towards the beginning of the byte stream
   - Second is the structure introduced in this PR which is written as a footer.
   
   Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs or new positional stuff) can be a file footer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604381281



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -161,13 +163,15 @@ public void stop() {
           queryRequest.getBrokerId(), e);
       // For not handled exceptions
       serverMetrics.addMeteredGlobalValue(ServerMeter.UNCAUGHT_EXCEPTIONS, 1);
-      dataTable = new DataTableImplV2();
+      dataTable = new DataTableImplV3();

Review comment:
       I think we should use DataTableUtils.buildEmptyDataTable()




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603653571



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       Why are we treating EXCEPTIONS differently in V3 v/s V2 ? Nothing changes here right. They continue to be part of the metadata and along with rest of the metadata key-values, it moves to the end of the byte stream in V3. We don't have to handle them any differently in V3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805294188


   > > @siddharthteotia , @mqliang and I met, and agreed on the following (I have added some extras, so take a look)
   > > 
   > > * We will move the metadata to the trailer, retain the other elements in the same order.
   > > * We will encode the trailer as
   > > * = (int, int, blob)+
   > > * The first int is the enum ordinal, second int is the length of the blob, the third part is utf8 encoding of a string, or int/long as dictated by the enum. If int/long, then we will encode in network byte order (big-endian). Alternative is to convert it to a string.
   > 
   > I think (int, int, bytes/blob in utf-8) is preferable as opposed to converting to string
   
   Correct, but if the "blob" is an int or long value, then utf8 will mean long ->string->utf8 right? Alternatively, toBigEndian(longValue)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809126203


   update: pushed two more commits:
   * 1st commit: bug fix 
   * 2nd commit: encode as bigEndian representation for all int/long value. Previously only encode as bigEndian for "responseSerializationCpuTimeNs"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604551363



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604446232



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603658485



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";
+    public static final int DEFAULT_CURRENT_DATA_TABLE_VERSION = 3;

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806028103


   @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   * We are able to add new key into the enum, without break back-compt
   * We are able to not include a key into trailer, without break back-compt
   * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   
   Namely, say we now have three keys:
   ```
   // old version:
   enum {
       key1,
       key2,
       key3,
   }
   ```
   Now if we remove key2 from the enum since it's no longer been used.
   ```
   // new version
   enum {
       key1,
       key3,
   }
   ```
   Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   
   So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   
   However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearrange sections in datatable.
   
   Let's take a step back to what we wanner solve:
   * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization. 
   * we wanner keep back-comp
   
   To add serialization_cost to datatable after serialization, basically we have two options:
   * append it to the end of bytes. 
   * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   
   So, here is another approach:
   * don't add a trailer section
   * put serialization_cost into metedata
   * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`. 
   * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost. 
   
   ```
   
   bytes[] bytes;
   int serialization_cost_value_start_offset;
   
   offset = 0;
   for (String key: metadata.keySet()) {
         keybytes[] = to-utf8(key);
         bytes.append(keybytes.length())
         bytes.append(keybytes)
   
         offset += 4;
         offset += keybytes.length
   
   
         valuebytes[]
         if (key.equals("erialization_cost")) {
               serialization_cost_value_start_offset = offset;
               valuebytes = toBigEndian(value);
         } else {
               valuebytes = to-utf8(value);
         }
   
         bytes.append(valuebytes.length())
         bytes.append(valuebytes)
   
         offset += 4;
         offset += keybytes.length
   }
   
   ```
   
   So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   ```
   offset = metadataStartOffset+serialization_cost_value_start_offset
   bytes[offset:offset+8] = toBigEndian(actualValue)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603476549



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN(0, "unknown"),
+    TABLE_KEY(1, "table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY(2, "Exception"),
+    NUM_DOCS_SCANNED_METADATA_KEY(3, "numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY(4, "numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY(5, "numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED(6, "numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED(7, "numSegmentsProcessed"),

Review comment:
       Why does this not have METADATA_KEY in the suffix ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599235536



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(_positionalData.length);
+    for (String entry : _positionalData) {
+      byte[] bytes = StringUtil.encodeUtf8(entry);

Review comment:
       Some comments on the format here would be useful. We don't write the enum constant itself. Just the value (length+bytes) corresponding to the ordinal/position of the constant. Correct ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603479201



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";

Review comment:
       This was technically never used in V2. It was only added in the previous PR but was never sent to the broker as part of DataTable metadata. So, can we remove it from here ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807704383


   @mcvsubbu Here is what Jackie means by assign a ID to enum key:
   ```
   enum TrailerKeys {
       UNKNOWN(0, "unknown"),
       TABLE_KEY(1, "table"), // NOTE: this key is only used in PrioritySchedulerTest
       EXCEPTION_METADATA_KEY(2, "Exception"),
       NUM_DOCS_SCANNED_METADATA_KEY(3, "numDocsScanned"),
       NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY(4, "numEntriesScannedInFilter"),
       NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY(5, "numEntriesScannedPostFilter"),
       NUM_SEGMENTS_QUERIED(6, "numSegmentsQueried"),
       NUM_SEGMENTS_PROCESSED(7, "numSegmentsProcessed"),
       NUM_SEGMENTS_MATCHED(8, "numSegmentsMatched"),
       NUM_CONSUMING_SEGMENTS_PROCESSED(9, "numConsumingSegmentsProcessed"),
       MIN_CONSUMING_FRESHNESS_TIME_MS(10, "minConsumingFreshnessTimeMs"),
       TOTAL_DOCS_METADATA_KEY(11, "totalDocs"),
       NUM_GROUPS_LIMIT_REACHED_KEY(12, "numGroupsLimitReached"),
       TIME_USED_MS_METADATA_KEY(13, "timeUsedMs"),
       TRACE_INFO_METADATA_KEY(14, "traceInfo"),
       REQUEST_ID_METADATA_KEY(15, "requestId"),
       NUM_RESIZES_METADATA_KEY(16, "numResizes"),
       RESIZE_TIME_MS_METADATA_KEY(17, "resizeTimeMs"),
       EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY(18, "executionThreadCpuTimeNs"),
       RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY(19, "responseSerializationCpuTimeNs"),
       ;
   
       private static final Map<Integer, TrailerKeys> _map = new HashMap<>();
       static {
         for (TrailerKeys key : TrailerKeys.values()) {
           if (_map.put(key.getId(), key) != null) {
             throw new IllegalArgumentException("duplicate id: " + key.getId());
           }
         }
       }
   
       private String _name;
       private int _id;
   
       TrailerKeys(int id, String name) {
         this._id = id;
         this._name = name;
       }
   
       public String getName() {
         return _name;
       }
   
       public int getId() {
         return _id;
       }
   
       public static TrailerKeys getById(int id) {
         return _map.get(id);
       }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604477092



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -96,6 +99,17 @@ public DataTableBuilder(DataSchema dataSchema) {
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);

Review comment:
       As discussed offline, we wanner this change focus on metadata change, I will send a separate PR to bump up version to V4, which is dedicated to address all TODOs in DataTableBuilder, including:
   * fix the float value size issue 
   * Store bytes as variable size data instead of String
   * Use one map of "String->Int" for all columns, instead a one map for one column.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-808451682


   > @Jackie-Jiang Here is trailer. A common term used when encoding decoding network packets. Footer is used more in documents, but acceptable.
   > https://en.wikipedia.org/wiki/Trailer_(computing)#:~:text=In%20information%20technology%2C%20a%20trailer,simply%20mark%20the%20block's%20end.
   
   @mcvsubbu I actually read this doc before suggesting renaming it to metadata as per the wiki, trailer is used to store metadata, and it is commonly used in network packets for packet related info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604397992



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       Let's discuss this to see what we need to do here. Might want to cleanup the existing code first to always build empty data table in the same manner. We have 2 options
   
   - Add a static method to DataTableBuilder -- something like DataTableBuilder.getDefaultTable() , this internally has the version so it will either return `new DataTableImplV2()` or `new DataTableImplV3()`
   - Clean the existing code by always using DataTableUtils.buildEmptyDataTable in these situations
   
   For option 2, I am not sure why the existing code (not this PR) is having mixed semantics for constructing empty data table. Several places are directly calling the constructor which sets everything to null whereas in one unique place we are calling `DataTableUtils.buildEmptyDataTable(queryContext)` to return an empty data table with properly initialized schema




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604397222



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()
+      throws IOException {
+    DataSchema.ColumnDataType[] columnDataTypes = DataSchema.ColumnDataType.values();
+    int numColumns = columnDataTypes.length;
+    String[] columnNames = new String[numColumns];
+    for (int i = 0; i < numColumns; i++) {
+      columnNames[i] = columnDataTypes[i].name();
+    }
 
     int[] ints = new int[NUM_ROWS];
     long[] longs = new long[NUM_ROWS];
     float[] floats = new float[NUM_ROWS];
     double[] doubles = new double[NUM_ROWS];
     String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
     Object[] objects = new Object[NUM_ROWS];
     int[][] intArrays = new int[NUM_ROWS][];
     long[][] longArrays = new long[NUM_ROWS][];
     float[][] floatArrays = new float[NUM_ROWS][];
     double[][] doubleArrays = new double[NUM_ROWS][];
     String[][] stringArrays = new String[NUM_ROWS][];
 
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    DataTableBuilder.setCurrentDataTableVersion(DataTableBuilder.VERSION_2);
+    DataTableBuilder dataTableBuilderV2 = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV2, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    // Verify V3 broker can deserialize data table send by V2 server
+    DataTable dataTableV2 = dataTableBuilderV2.build(); // create a V2 data table
+    // Deserialize data table bytes as V3
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);

Review comment:
       Oh, that's from previous implementation, where we have a convert to convert V2 to V3. Will change the comments here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603632394



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,9 +107,17 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    CURRENT_VERSION = VERSION_3;

Review comment:
       Now DataTableBuilder has a static function `setCurrentDataTableVersion()`, server can be configured to send either V2/V3 to datatable to broker.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604415996



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()

Review comment:
       > v3 data table sent by server has metadata length as 0
   
   That's impossible, since the `toBytes()` in V3 will always add a `threadCpuTimeNs` KV pair to metadata, so for V3, metadata at least contains 1 KV pair. We can add a test: empty datatable (numRow = 0); datatable whoes metadata only contains `threadCpuTimeNs` KV; datatable whoes metadata has multiple KV pairs, etc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806292277


   > Since we are adding a new data table version, please use this opportunity to address the TODOs within the `DataTableBuilder`.
   > For the `TrailerKeys` enum, let's put an id for each key instead of using the ordinal of the enum. This way it is much easier to manage as long as we don't reuse the ids. Also suggest renaming it to `MetadataKeys`
   
   By "Id" do you mean strings? Why is that any more advantages than adding an enum? In fact, enum is better since we can declare all enums in one place and add whatever comments there to not remove an enum


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806161505


   > @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   > 
   > * We are able to add new key into the enum, without bumping up version
   > * We are able to not include a key into trailer, without bumping up version
   > * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   > 
   > Namely, say we now have three keys:
   > 
   > ```
   > // old version:
   > enum {
   >     key1,
   >     key2,
   >     key3,
   > }
   > ```
   > 
   > Now if we remove key2 from the enum since it's no longer been used.
   > 
   > ```
   > // new version
   > enum {
   >     key1,
   >     key3,
   > }
   > ```
   > 
   > Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   > 
   > So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   > 
   > However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearranging sections in datatable.
   > 
   > Let's take a step back to what we wanner solve:
   > 
   > * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization.
   > * we wanner keep back-comp
   > 
   > To add serialization_cost to datatable after serialization, basically we have two options:
   > 
   > * append it to the end of bytes.
   > * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   > 
   > So, here is another approach:
   > 
   > * don't add a trailer section
   > * put serialization_cost into metedata
   > * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`.
   > * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost.
   > 
   > ```
   > bytes[] bytes;
   > int serialization_cost_value_start_offset;
   > 
   > offset = 0;
   > for (String key: metadata.keySet()) {
   >       keybytes[] = to-utf8(key);
   >       bytes.append(keybytes.length())
   >       bytes.append(keybytes)
   > 
   >       offset += 4;
   >       offset += keybytes.length
   > 
   >       if (key.equals("erialization_cost")) {
   >             serialization_cost_value_start_offset = offset;
   >             valuebytes = toBigEndian(value);
   >             bytes.append(valuebytes)
   >             offset += 8;
   >       } else {
   >             valuebytes = to-utf8(value);
   >             bytes.append(valuebytes.length())
   >             bytes.append(valuebytes)
   >             offset += 4
   >             offset += keybytes.length
   >       }
   > }
   > ```
   > 
   > So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   > 
   > ```
   > offset = metadataStartOffset+serialization_cost_value_start_offset
   > bytes[offset:offset+8] = toBigEndian(actualValue)
   > ```
   
   - Removing enums will break the protocol and is not allowed. We need to state in the comments clearly.
   - We should use a trailer instead of hacking the length. This will be applicable for streaming use cases as well
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603641283



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";
+    public static final int DEFAULT_CURRENT_DATA_TABLE_VERSION = 3;

Review comment:
       I suggest changing the `private static int CURRENT_VERSION = VERSION_3` defined in DataTableBuilder to public and use that here instead of hardcoding the value 3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603676518



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pair:
+   * - if the value type is String, encode it as: [keyID, valueLength, Utf8EncodedValue].
+   * - if the value type is int, encode it as: [keyID, bigEndianRepresentationOfIntValue]
+   * - if the value type is long, encode it as: [keyID, bigEndianRepresentationOfLongValue]
+   */
+  private byte[] serializeMetadata()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(_metadata.size());
+
+    for (Map.Entry<MetadataKeys, String> entry : _metadata.entrySet()) {
+      MetadataKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      if (MetadataKeys.isIntValueMetadataKey(key)) {
+        dataOutputStream.write(Ints.toByteArray(Integer.parseInt(value)));
+      } else if (MetadataKeys.isLongValueMetadataKey(key)) {
+        dataOutputStream.write(Longs.toByteArray(Long.parseLong(value)));
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<MetadataKeys, String> deserializeMetadata(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numEntries = dataInputStream.readInt();
+      Map<MetadataKeys, String> metadata = new HashMap<>();
+      for (int i = 0; i < numEntries; i++) {
+        int keyId = dataInputStream.readInt();
+        Optional<MetadataKeys> opt = MetadataKeys.getByOrdinal(keyId);
+        // Ignore unknown keys.
+        if (!opt.isPresent()) {
+          continue;
+        }
+        MetadataKeys key = opt.get();
+        if (MetadataKeys.isIntValueMetadataKey(key)) {
+          String value = String.valueOf(decodeInt(dataInputStream));
+          metadata.put(key, value);
+        } else if (MetadataKeys.isLongValueMetadataKey(key)) {
+          String value = String.valueOf(decodeLong(dataInputStream));
+          metadata.put(key, value);
+        } else {
+          String value = String.valueOf(decodeString(dataInputStream));
+          metadata.put(key, value);
+        }
+      }
+      return metadata;
+    }
+  }
+
+  private byte[] serializeExceptions()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(_exceptions.size());
+
+    for (Map.Entry<Integer, String> entry : _exceptions.entrySet()) {
+      int key = entry.getKey();
+      String value = entry.getValue();
+      byte[] valueBytes = StringUtil.encodeUtf8(value);
+      dataOutputStream.writeInt(key);
+      dataOutputStream.writeInt(valueBytes.length);
+      dataOutputStream.write(valueBytes);
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<Integer, String> deserializeExceptions(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numExceptions = dataInputStream.readInt();
+      Map<Integer, String> exceptions = new HashMap<>(numExceptions);
+      for (int i = 0; i < numExceptions; i++) {
+        int errCode = dataInputStream.readInt();
+        String errMessage = decodeString(dataInputStream);
+        exceptions.put(errCode, errMessage);
+      }
+      return exceptions;
+    }
+  }
+
+  @Override
+  public Map<String, String> getMetadata() {
+    return _metadataV2;
+  }
+
+  @Override
+  public DataSchema getDataSchema() {
+    return _dataSchema;
+  }
+
+  @Override

Review comment:
       Can we create a base class and move all of the following APIs to the base class which can be extended by both DataTableImplV2 and DataTableImplV3 ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603021955



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -263,6 +263,14 @@ public void finishRow()
   }
 
   public DataTable build() {
+    return new DataTableImplV3(_numRows, _dataSchema, _reverseDictionaryMap,
+        _fixedSizeDataByteArrayOutputStream.toByteArray(), _variableSizeDataByteArrayOutputStream.toByteArray());
+  }
+
+  // buildV2() is only used in V2V3Compatibility test

Review comment:
       Done. Now DataTableBuilder now accept a "version" parameter, so caller can decide to generate either version(default to V3)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603707132



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;

Review comment:
       > I don't think there is any way of resolving this TODO in a clean manner. All the callers of getMetadata API will have to be changed and the code will become conditional/ugly since we will have to support both V2 and V3 
   
   We. don't need ugly condition code to switch between V2/V3 all over the place. Just make change all the code use `Map<MetadataKey, String>` new API. 
   
   * Old server use old codebase, send V2 bytes on wire
   * New server use new codebase, calling the new `Map<MetadataKey, String>` API to set/get metadata, send V2/V3 bytes on wire based on instance config
   * New Broker use new codebase, calling the new `Map<MetadataKey, String>` API to set/get metadata. When receive a data bytes from wire, depends on the version, deserialize it as V2 or V3.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-807254112


   > I'm suggesting putting integer ids so that we can deprecate keys if needed by skipping the id, similar to the thrift convention.
   
   Getcha. Associate an ID(integer) with each enum ordinal. During serialization, use the ID as key, not ordinal. This way we can remove a key from enum if it's not used anymore.
   
   > Also suggest renaming it to MetadataKeys
   
   I name it as TrailerKeys since trailer may not only contains metadata KV pairs, but also contains some other data in future -- a metadata key must be a trailer key, but the opposite is not necessarily true. I am OK to rename it as MetadataKeys if we can accept it to call all data we put into the section in future as metadata. CC @mcvsubbu for more input. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604447933



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606099782



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  protected byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)

Review comment:
       done

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +131,267 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
+
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
     DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()
+      throws IOException {
+    DataSchema.ColumnDataType[] columnDataTypes = DataSchema.ColumnDataType.values();
+    int numColumns = columnDataTypes.length;
+    String[] columnNames = new String[numColumns];
+    for (int i = 0; i < numColumns; i++) {
+      columnNames[i] = columnDataTypes[i].name();
+    }
+
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+
+    // Verify V3 broker can deserialize data table (has data, but has no metadata) send by V2 server
+    DataTableBuilder.setCurrentDataTableVersion(DataTableBuilder.VERSION_2);
+    DataTableBuilder dataTableBuilderV2WithDataOnly = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV2WithDataOnly, columnDataTypes, numColumns, ints, longs, floats,
+        doubles, strings, bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTableV2 = dataTableBuilderV2WithDataOnly.build(); // create a V2 data table
+    DataTable newDataTable =
+        DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    Assert.assertEquals(newDataTable.getMetadata().size(), 0);
+
+    // Verify V3 broker can deserialize data table (has data and metadata) send by V2 server
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV2.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
+
+    // Verify V3 broker can deserialize data table (only has metadata) send by V2 server
+    DataTableBuilder dataTableBuilderV2WithMetadataDataOnly = new DataTableBuilder(dataSchema);
+    dataTableV2 = dataTableBuilderV2WithMetadataDataOnly.build(); // create a V2 data table
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV2.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), 0, 0);
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
+
+    // Verify V3 broker can deserialize (has data, but has no metadata) send by V3 server.
+    DataTableBuilder.setCurrentDataTableVersion(VERSION_3);
+    DataTableBuilder dataTableBuilderV3WithDataOnly = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV3WithDataOnly, columnDataTypes, numColumns, ints, longs, floats,
+        doubles, strings, bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    DataTable dataTableV3 = dataTableBuilderV3WithDataOnly.build(); // create a V3 data table
+    // Deserialize data table bytes as V3
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    // DataTable V3 serialization logic will add an extra THREAD_CPU_TIME_NS KV pair into metadata
+    Assert.assertEquals(newDataTable.getMetadata().size(), 1);
+    Assert.assertTrue(newDataTable.getMetadata().containsKey(THREAD_CPU_TIME_NS.getName()));
+
+    // Verify V3 broker can deserialize data table (has data and metadata) send by V3 server
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV3.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes()); // Broker deserialize data table bytes as V3
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    newDataTable.getMetadata().remove(THREAD_CPU_TIME_NS.getName());
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
 
+    // Verify V3 broker can deserialize data table (only has metadata) send by V3 server
+    DataTableBuilder dataTableBuilderV3WithMetadataDataOnly = new DataTableBuilder(dataSchema);
+    dataTableV3 = dataTableBuilderV3WithMetadataDataOnly.build(); // create a V2 data table
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV3.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);

Review comment:
       fixed in https://github.com/apache/incubator-pinot/pull/6738

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal or null if the key does not exist.
+    public static MetadataKey getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKey.values().length) {
+        return null;
+      }
+      return MetadataKey.values()[ordinal];
+    }
+
+    // getByName returns an enum key for a given name or null if the key does not exist.
+    public static MetadataKey getByName(String name) {
+      return _nameToEnumKeyMap.getOrDefault(name, null);
+    }

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal or null if the key does not exist.
+    public static MetadataKey getByOrdinal(int ordinal) {

Review comment:
       done

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806028103


   @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   * We are able to add new key into the enum, without bumping up version
   * We are able to not include a key into trailer, without bumping up version
   * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   
   Namely, say we now have three keys:
   ```
   // old version:
   enum {
       key1,
       key2,
       key3,
   }
   ```
   Now if we remove key2 from the enum since it's no longer been used.
   ```
   // new version
   enum {
       key1,
       key3,
   }
   ```
   Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   
   So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   
   However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearrange sections in datatable.
   
   Let's take a step back to what we wanner solve:
   * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization. 
   * we wanner keep back-comp
   
   To add serialization_cost to datatable after serialization, basically we have two options:
   * append it to the end of bytes. 
   * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   
   So, here is another approach:
   * don't add a trailer section
   * put serialization_cost into metedata
   * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`. 
   * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost. 
   
   ```
   
   bytes[] bytes;
   int serialization_cost_value_start_offset;
   
   offset = 0;
   for (String key: metadata.keySet()) {
         keybytes[] = to-utf8(key);
         bytes.append(keybytes.length())
         bytes.append(keybytes)
   
         offset += 4;
         offset += keybytes.length
   
   
         valuebytes[]
         if (key.equals("erialization_cost")) {
               serialization_cost_value_start_offset = offset;
               valuebytes = toBigEndian(value);
         } else {
               valuebytes = to-utf8(value);
         }
   
         bytes.append(valuebytes.length())
         bytes.append(valuebytes)
   
         offset += 4;
         offset += keybytes.length
   }
   
   ```
   
   So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   ```
   offset = metadataStartOffset+serialization_cost_value_start_offset
   bytes[offset:offset+8] = toBigEndian(actualValue)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603765482



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableUtils.java
##########
@@ -233,4 +243,98 @@ private static DataTable buildEmptyDataTableForDistinctQuery(QueryContext queryC
     dataTableBuilder.finishRow();
     return dataTableBuilder.build();
   }
+
+  /**

Review comment:
       These functions should also be moved to base class. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806244858


   Since we are adding a new data table version, please use this opportunity to address the TODOs within the `DataTableBuilder`.
   For the `TrailerKeys` enum, let's put an id for each key instead of using the ordinal of the enum. This way it is much easier to manage as long as we don't reuse the ids. Also suggest renaming it to `MetadataKeys`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606099731



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;

Review comment:
       fixed in https://github.com/apache/incubator-pinot/pull/6738




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606091546



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +82,76 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  enum MetadataValueType {
+    INT, LONG, STRING
+  }
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown", MetadataValueType.STRING),
+    TABLE("table", MetadataValueType.STRING), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter", MetadataValueType.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter", MetadataValueType.LONG),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried", MetadataValueType.INT),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed", MetadataValueType.INT),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched", MetadataValueType.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed", MetadataValueType.INT),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs", MetadataValueType.LONG),
+    TOTAL_DOCS("totalDocs", MetadataValueType.LONG),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached", MetadataValueType.STRING),
+    TIME_USED_MS("timeUsedMs", MetadataValueType.LONG),
+    TRACE_INFO("traceInfo", MetadataValueType.STRING),
+    REQUEST_ID("requestId", MetadataValueType.LONG),
+    NUM_RESIZES("numResizes", MetadataValueType.INT),
+    RESIZE_TIME_MS("resizeTimeMs", MetadataValueType.LONG),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs", MetadataValueType.LONG);
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    private final String _name;
+    private final MetadataValueType _valueType;
+
+    MetadataKey(String name, MetadataValueType valueType) {
+      this._name = name;
+      this._valueType = valueType;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal or null if the key does not exist.
+    public static MetadataKey getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKey.values().length) {
+        return null;
+      }
+      return MetadataKey.values()[ordinal];
+    }
+
+    // getByName returns an enum key for a given name or null if the key does not exist.
+    public static MetadataKey getByName(String name) {
+      return _nameToEnumKeyMap.getOrDefault(name, null);
+    }

Review comment:
       @mqliang , can you please address this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606070048



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs");
+
+    private static final Map<String, MetadataKey> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKey contains all metadata keys which has value of int type.
+    private static final Set<MetadataKey> _intValueMetadataKey = ImmutableSet
+        .of(MetadataKey.NUM_SEGMENTS_QUERIED, MetadataKey.NUM_SEGMENTS_PROCESSED, MetadataKey.NUM_SEGMENTS_MATCHED,
+            MetadataKey.NUM_RESIZES, MetadataKey.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKey.NUM_RESIZES);
+    // _longValueMetadataKey contains all metadata keys which has value of long type.

Review comment:
       That's a good idea, I see Jackie has some relate work to unify the usage of CloummDataType: https://github.com/apache/incubator-pinot/pull/6728, he mentation that we will consider merging DataType and ColumnDataType in the future. So let's address it separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604475304



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  private static int _version = VERSION_3;

Review comment:
       We have a `setCurrentDataTableVersion()` static function to set versions, which is called in `HelixServerStarter`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604418937



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -315,7 +313,7 @@ private boolean forceLog(long schedulerWaitMs, long numDocsScanned) {
    */
   protected ListenableFuture<byte[]> immediateErrorResponse(ServerQueryRequest queryRequest,
       ProcessingException error) {
-    DataTable result = new DataTableImplV2();
+    DataTable result = new DataTableImplV3();

Review comment:
       Fix this as per https://github.com/apache/incubator-pinot/pull/6710/files#r604379681

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -315,7 +313,7 @@ private boolean forceLog(long schedulerWaitMs, long numDocsScanned) {
    */
   protected ListenableFuture<byte[]> immediateErrorResponse(ServerQueryRequest queryRequest,
       ProcessingException error) {
-    DataTable result = new DataTableImplV2();
+    DataTable result = new DataTableImplV3();

Review comment:
       Please fix this as per https://github.com/apache/incubator-pinot/pull/6710/files#r604379681




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599201245



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -61,12 +65,15 @@
   private final byte[] _variableSizeDataBytes;
   private final ByteBuffer _variableSizeData;
   private final Map<String, String> _metadata;
+  // Only V3 has _positionalData
+  private final String[] _positionalData;

Review comment:
       I would suggest calling this footer and add some comments on the structure of footer. Please give some example as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603655876



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       V2 construct key for exception using "Exception"+errCode, e.g. Exception404, Exception500, then put into metadatam in V2 it's OK since V2 metadata key is String,  but in V3 it's impossible -- all key in V3 must be defined in enum statically. We can not construct a key dynamically 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605325369



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    super();
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;

Review comment:
       This block of code including the call to super() is redundant as Java will automatically do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605222747



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       How about let's use enum at this moment. We can discuss more, if we decide to associate an id with each key later on, as long as we associate the first key with 0, second with 1, third key with 3...The bytes send on wire will not change, we can address it in a separate PR, it's just some code level change, will not change any payloads send on wire.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605812512



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,85 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKey is used in V3, where we present metadata as Map<MetadataKey, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKey {
+    UNKNOWN("unknown"),

Review comment:
       I have found it useful to have one enum reserved that is never used. It is never sent by the sender, but the receiver, if needed, can set it to this value if it encounters a value that it does not know about. In that case, the special case handling is restricted to the layer that first scans the enums, and the other layers above don't need to worry about default cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599312636



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -33,12 +33,15 @@
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
 import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
+public class DataTableImplV2V3 implements DataTable {

Review comment:
       Let's discuss the approach again by moving the metadata to the end of the payload. I think we both are inclined towards doing that since all the metadata (existing + new) will be together in the footer. 
   
   Coming to naming, my initial suggestion of not including version was indeed because they share the logic. So tomorrow if we move to v4 and still share a lot of common logic, we can continue to retain the name DataTableImpl and not DataTableImplv2v3v4 as everything will be in the same file as long as it is readable.
   
   I agree that moving the metadata is a change which will make some code unreadable if we try to keep everything in the same file. So yes, if we go down this path, I agree we should create a new class. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806259732


   > @mcvsubbu Just found a defect of using enum value as key and encode trailer as `(int, int, bytes/blob in utf-8) `:
   > 
   > * We are able to add new key into the enum, without bumping up version
   > * We are able to not include a key into trailer, without bumping up version
   > * **However, we are unable to remove a key from the enum (if the key is no long used in a future version)**
   > 
   > Namely, say we now have three keys:
   > 
   > ```
   > // old version:
   > enum {
   >     key1,
   >     key2,
   >     key3,
   > }
   > ```
   > 
   > Now if we remove key2 from the enum since it's no longer been used.
   > 
   > ```
   > // new version
   > enum {
   >     key1,
   >     key3,
   > }
   > ```
   > 
   > Then, when new broker receive bytes from old server, it will interpret value of k2 as value of k3.
   > 
   > So a better solution is using string as key and encode trailer as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Which is exactly how we encode metadata in V2.
   > 
   > However, if we do it in his way, it's equivalent to just moving metadata section to the end of datatable, which does not make too much sense to bump up a version just for rearranging sections in datatable.
   > 
   > Let's take a step back to what we wanner solve:
   > 
   > * we wanner add serialization_cost to datatable, but serialization_cost is not available before serialization.
   > * we wanner keep back-comp
   > 
   > To add serialization_cost to datatable after serialization, basically we have two options:
   > 
   > * append it to the end of bytes.
   > * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   > 
   > So, here is another approach:
   > 
   > * don't add a trailer section
   > * put serialization_cost into metedata
   > * we serialize metedata, in V2 we encode it as `(int of key length, bytes of key in utf-8, int of value length, bytes of value in utf-8)`. Encoding in this way makes value replacement after  serialization impossible, since `String.valueOf("1000").length() != String.valueOf("100000").length()`.
   > * In V3, keep all existing logic. However, if the value is long, we should encode it as `(int of key length, bytes of key in utf-8, toBigEndian(longValue))`. And the the function of `serializaMetadata()`, we can have a variable to record the start offset of serialization_cost.
   > 
   > ```
   > bytes[] bytes;
   > int serialization_cost_value_start_offset;
   > 
   > offset = 0;
   > for (String key: metadata.keySet()) {
   >       keybytes[] = to-utf8(key);
   >       bytes.append(keybytes.length())
   >       bytes.append(keybytes)
   > 
   >       offset += 4;
   >       offset += keybytes.length
   > 
   >       if (key.equals("erialization_cost")) {
   >             serialization_cost_value_start_offset = offset;
   >             valuebytes = toBigEndian(value);
   >             bytes.append(valuebytes)
   >             offset += 8;
   >       } else {
   >             valuebytes = to-utf8(value);
   >             bytes.append(valuebytes.length())
   >             bytes.append(valuebytes)
   >             offset += 4
   >             offset += keybytes.length
   >       }
   > }
   > ```
   > 
   > So after serialization, we are able to replace the value of serialization_cost (`toBigEndian(longValue)` is always 8 bytes, which makes replacement possible):
   > 
   > ```
   > offset = metadataStartOffset+serialization_cost_value_start_offset
   > bytes[offset:offset+8] = toBigEndian(actualValue)
   > ```
   
   @mqliang @mcvsubbu  I don't think we should worry about or even allow removal of enums. It is complicating the design plus it's something that is typically not allowed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805121867


   @mcvsubbu 
   > Any reason we are restricting the trailer (or footer) to have only key-value pairs? We don't need to place that restriction as long as the length is also encoded up front. It can be any serialized object, right?
   
   You are right, it can be any serialized object, but restricting to only contains KV pairs has following benefit:
   
   * Any object can be add as a KV pair, just: (key, serialized_object). So it's easy to add new section to footer in future.
   * For all KV pairs in footer, put their keys in enum, so when serialize footer, the order of KV pairs is deterministic. This make all KV pairs is positional/locatable. So we are able to replace value of a given key in footer even after serialized. 
   * If we want to add a new object into data table. If we are OK to put it as a KV pair into footer, we don't need to bum up version Here is the pseudocode of serialize/de-serialize footer:
   ```
   enum footerkeys {
   	k0,
   	k1,
   	k2,
   }
   
   String footerkeysToStr = new String[]{
   	"k0",
   	"k1",
   	"k2",
   }
   
   function serializeFooter() {
    	byte[] bytes;
    	for (key in footerkeys) {
    	    String data = encode_to_str(value_of_key(key));
    	    bytes = append(bytes, len(data));
    	    bytes = append(bytes, data.toBytes());
    	}
   }
   
   function String[] deSerializeFooter(byte[] bytes) {
   	String[] values = new String[len(footerkeys)];
   	for (int i = 0; i < len(footerkeys); i++) {
   	   int data_len = bytes.nextInt();
   	   values[i] = bytes.nextBytesofLens(data_len);
   	}
   }
   
   // If values_i is a complex object instead of a string, we can deserialize it even further：
       String[] footerKVpairs = deSerializeFooter(bytes);
   	Object_i = deserialize(footerKVpairs[i].toBytes());
   
   ```
   So, if we want to add  new object to footer, add it as KV pair, and as long as we add the key as the last one of the enum, old broker will just ignore the extra one, it's back-compatable).
   
   If we make footer not only contains KV pairs, but also other arbitrary serializable objects:
   ```
   +------------------------------------+
   |     
   |    serializable object 1
   |
   +------------------------------------
   |
   |    serializable object 2
   |
   +------------------------------------
   |
   |    KV pairs
   |
   +------------------------------------
   
   ```
   It's not extensible: If we wanner add a serializable_object_3 in between of serializable_object_2 and KV_pairs, we need to bump up version (If we bump version, we can also add in to the middle of data table, not necessarily in footer). 
   
   That's the reason I prefer footer only contains KV pairs: If we want to add a new simple section into data table, and don't want bump up version, add it as KV pair to footer. If we want add new very complex section or re-arrange current sections, add it into the middle of data table, and bump up version.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806158108


   > @siddharthteotia , @mqliang and I met, and agreed on the following (I have added some extras, so take a look)
   > 
   > * We will move the metadata to the trailer, retain the other elements in the same order.
   > * We will encode the trailer as
   > * = (int, int, blob)+
   > * The first int is the enum ordinal, second int is the length of the blob, the third part is utf8 encoding of a string, or int/long as dictated by the enum. If int/long, then we will encode in network byte order (big-endian). Alternative is to convert it to a string.
   
   Not sure which option @siddharthteotia agrees with, but the alternatives are something like:
   `7, 8, "12609856"` (8 byte string for a number)
   vs
   `7, 4, 12609856`  (4-byte integer for a number)
   
   Maybe we can decide based on what looks easier in code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603479529



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -46,8 +52,120 @@
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
 
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate id/name is not allowed.
+   *  - Don't change id/name of existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN(0, "unknown"),
+    TABLE_KEY(1, "table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY(2, "Exception"),
+    NUM_DOCS_SCANNED_METADATA_KEY(3, "numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY(4, "numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY(5, "numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED(6, "numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED(7, "numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED(8, "numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED(9, "numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS(10, "minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS_METADATA_KEY(11, "totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED_KEY(12, "numGroupsLimitReached"),
+    TIME_USED_MS_METADATA_KEY(13, "timeUsedMs"),
+    TRACE_INFO_METADATA_KEY(14, "traceInfo"),
+    REQUEST_ID_METADATA_KEY(15, "requestId"),
+    NUM_RESIZES_METADATA_KEY(16, "numResizes"),
+    RESIZE_TIME_MS_METADATA_KEY(17, "resizeTimeMs"),
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY(18, "executionThreadCpuTimeNs"),
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY(19, "responseSerializationCpuTimeNs"),
+    ;
+
+    private static final Map<Integer, MetadataKeys> _idToEnumKeyMap = new HashMap<>();
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueTrailerKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet.of(
+        MetadataKeys.NUM_SEGMENTS_QUERIED,
+        MetadataKeys.NUM_SEGMENTS_PROCESSED,
+        MetadataKeys.NUM_SEGMENTS_MATCHED,
+        MetadataKeys.NUM_RESIZES_METADATA_KEY,
+        MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+        MetadataKeys.NUM_RESIZES_METADATA_KEY
+    );
+    // _longValueTrailerKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueTrailerKeys = ImmutableSet.of(

Review comment:
       let us keep trailer/metadata consistent.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,9 +107,17 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    CURRENT_VERSION = VERSION_3;

Review comment:
       I think we should make this configurable for until we migrate to v3 and remove v2 code. Default should be V3 (so that if anyone upgrades broker first. they are fine)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599302999



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(_positionalData.length);
+    for (String entry : _positionalData) {
+      byte[] bytes = StringUtil.encodeUtf8(entry);

Review comment:
       yes, your understanding is correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r600939560



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)

Review comment:
       It might be cleaner if you add a data table v2 to v3 converter instead of constructing v3 directly from v2 buffer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806330983


   > By "Id" do you mean strings? Why is that any more advantages than adding an enum? In fact, enum is better since we can declare all enums in one place and add whatever comments there to not remove an enum
   
   @mcvsubbu @Jackie-Jiang I think Jackie means to associate a string with each enum ordinal. Use the pattern from Effective Java:
   ```
   enum MyEnum {
       ENUM_1("A"),
       ENUM_2("B");
   
       private String name;
   
       private static final Map<String,MyEnum> ENUM_MAP;
   
       MyEnum (String name) {
           this.name = name;
       }
   
       public String getName() {
           return this.name;
       }
   
       // Build an immutable map of String name to enum pairs.
       // Any Map impl can be used.
   
       static {
           Map<String,MyEnum> map = new ConcurrentHashMap<String, MyEnum>();
           for (MyEnum instance : MyEnum.values()) {
               map.put(instance.getName().toLowerCase(),instance);
           }
           ENUM_MAP = Collections.unmodifiableMap(map);
       }
   
       public static MyEnum get (String name) {
           return ENUM_MAP.get(name.toLowerCase());
       }
   }
   
   ```
   
   This way, we are able to convert between  enum ordinal and string. My current implementation use two helper map and two helper function outside of the enum definition (`TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap` and `trailerKeyToMetadataKey()/metaDataKeyToTrailerKey()`) to do the conversion. Use the pattern of Effective Java can put all helper funcitons/data structure inside the enum definition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806145868


   @Jackie-Jiang 
   > High level question: why do we need this new field? We should be able to use the metadata field for this
   
   We wanner measure CPU time to serialize datatable (AKA: `serialization_cost`)on each server, and send it back to broker. Here is the dilemma: we will only know the CPU time after the serialization is completed, however if the serialization is already completed, how can make `serialization_cost` as part of the payload (it's a chicken-and-egg problem)? 
   
   To add `serialization_cost` to serialized bytes of datatable, basically we have two options (we don't want serialize two times):
   * append it to the end of bytes.
   * put a temporary value of serialization_cost when serialization, after serialization is done, replace it as the actual value.
   
   No matter which options we adopt, we need bump up the version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] codecov-io commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

codecov-io commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-804528996


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=h1) Report
   > Merging [#6710](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=desc) (c0b42f3) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.49%`.
   > The diff coverage is `62.64%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6710/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6710      +/-   ##
   ==========================================
   - Coverage   66.44%   65.95%   -0.50%     
   ==========================================
     Files        1075     1391     +316     
     Lines       54773    67555   +12782     
     Branches     8168     9788    +1620     
   ==========================================
   + Hits        36396    44554    +8158     
   - Misses      15700    19829    +4129     
   - Partials     2677     3172     +495     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.95% <62.64%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1281 more](https://codecov.io/gh/apache/incubator-pinot/pull/6710/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=footer). Last update [27b61fe...c0b42f3](https://codecov.io/gh/apache/incubator-pinot/pull/6710?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-808329095


   @Jackie-Jiang Here is trailer. A common term used when encoding decoding network packets. Footer is used more in documents, but acceptable.
   https://en.wikipedia.org/wiki/Trailer_(computing)#:~:text=In%20information%20technology%2C%20a%20trailer,simply%20mark%20the%20block's%20end.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603052774



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();
+  }
+
+  /**
+   * Serialize trailer section to bytes.
+   * Format of the bytes looks:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pairs:
+   * - if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
+   * - if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
+   */
+  private byte[] serializeTrailer()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    int offset = 0;
+    dataOutputStream.writeInt(_trailer.size());
+    offset += Integer.BYTES;
+    for (Map.Entry<TrailerKeys, String> entry : _trailer.entrySet()) {
+      TrailerKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      offset += Integer.BYTES;
+      if (key == TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY) {
+        _responseSerializationCpuTimeNsValueOffset += offset;
+      }
+      if (IntValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Ints.toByteArray(Integer.parseInt(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else if (LongValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Longs.toByteArray(Long.parseLong(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+        offset += Integer.BYTES + valueBytes.length;
+      }
+    }
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<TrailerKeys, String> deserializeTrailer(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numEntries = dataInputStream.readInt();
+      Map<TrailerKeys, String> trailer = new TreeMap<>();
+      for (int i = 0; i < numEntries; i++) {
+        int ordinal = dataInputStream.readInt();
+        TrailerKeys key = TrailerKeys.values()[ordinal];

Review comment:
       Now, will throw exceptions for unknown keys. it's ok to throw exception since we don't allow key removal. We can change it as ignore unknown ones if we allow remove keys.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606092101



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +131,267 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
+
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
     DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()
+      throws IOException {
+    DataSchema.ColumnDataType[] columnDataTypes = DataSchema.ColumnDataType.values();
+    int numColumns = columnDataTypes.length;
+    String[] columnNames = new String[numColumns];
+    for (int i = 0; i < numColumns; i++) {
+      columnNames[i] = columnDataTypes[i].name();
+    }
+
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+
+    // Verify V3 broker can deserialize data table (has data, but has no metadata) send by V2 server
+    DataTableBuilder.setCurrentDataTableVersion(DataTableBuilder.VERSION_2);
+    DataTableBuilder dataTableBuilderV2WithDataOnly = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV2WithDataOnly, columnDataTypes, numColumns, ints, longs, floats,
+        doubles, strings, bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTableV2 = dataTableBuilderV2WithDataOnly.build(); // create a V2 data table
+    DataTable newDataTable =
+        DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    Assert.assertEquals(newDataTable.getMetadata().size(), 0);
+
+    // Verify V3 broker can deserialize data table (has data and metadata) send by V2 server
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV2.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
+
+    // Verify V3 broker can deserialize data table (only has metadata) send by V2 server
+    DataTableBuilder dataTableBuilderV2WithMetadataDataOnly = new DataTableBuilder(dataSchema);
+    dataTableV2 = dataTableBuilderV2WithMetadataDataOnly.build(); // create a V2 data table
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV2.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), 0, 0);
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
+
+    // Verify V3 broker can deserialize (has data, but has no metadata) send by V3 server.
+    DataTableBuilder.setCurrentDataTableVersion(VERSION_3);
+    DataTableBuilder dataTableBuilderV3WithDataOnly = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV3WithDataOnly, columnDataTypes, numColumns, ints, longs, floats,
+        doubles, strings, bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    DataTable dataTableV3 = dataTableBuilderV3WithDataOnly.build(); // create a V3 data table
+    // Deserialize data table bytes as V3
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    // DataTable V3 serialization logic will add an extra THREAD_CPU_TIME_NS KV pair into metadata
+    Assert.assertEquals(newDataTable.getMetadata().size(), 1);
+    Assert.assertTrue(newDataTable.getMetadata().containsKey(THREAD_CPU_TIME_NS.getName()));
+
+    // Verify V3 broker can deserialize data table (has data and metadata) send by V3 server
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV3.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes()); // Broker deserialize data table bytes as V3
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+    newDataTable.getMetadata().remove(THREAD_CPU_TIME_NS.getName());
+    Assert.assertEquals(newDataTable.getMetadata(), EXPECTED_METADATA);
 
+    // Verify V3 broker can deserialize data table (only has metadata) send by V3 server
+    DataTableBuilder dataTableBuilderV3WithMetadataDataOnly = new DataTableBuilder(dataSchema);
+    dataTableV3 = dataTableBuilderV3WithMetadataDataOnly.build(); // create a V2 data table
+    for (String key : EXPECTED_METADATA.keySet()) {
+      dataTableV3.getMetadata().put(key, EXPECTED_METADATA.get(key));
+    }
+    newDataTable = DataTableFactory.getDataTable(dataTableV3.toBytes()); // Broker deserialize data table bytes as V2
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);

Review comment:
       @mqliang , can you please fix the typo? this should be V3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang edited a comment on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang edited a comment on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809076119


   @Jackie-Jiang @siddharthteotia @mcvsubbu Comment addressed and PR is ready for review now. I split the change into 5 commits:
   * 1st commit:
      * Rename TrailerKeys as MetadataKeys
      * Associate ID/Name with MetadataKeys
      * V2->V3 convert instead of construct V3 from V2 bytes
      * ASCII layout of V3 datatable
      * Address a TODO in DataTableBuilder: store bytes data into variable size data section, instead of String
     
   * 2nd commit: Address a TODO in DataTableBuilder: fix float size issue at DataTableBuilder
   * 3rd commit:  Address a TODO in DataTableBuilder: use one Map to map a String to Integer for all columns in V3.
   * 4th commit:  fix a bug at BrokerReduceService, which bring down integration test.
   * 5th commit:  Log `responseSerializationCpuTimeNs` at QueryScheduler and emit a broker gauge; put "executionThreadCpuTimeNs" and "responseSerializationCpuTimeNs" into metadata so that they can be sent to broker
   
   
   There is still one more TODO in DataTableBuilder: Given a data schema, write all values one by one instead of using rowId and colId to position (save time). It will not impact the serialized bytes layout of data table, it's just some implementation optimization. Which means it does not require a version bumping up, so can be done in a separate PR, I create a issue: https://github.com/apache/incubator-pinot/issues/6720 to track this. And a preliminary benchmark shows that the optimization is quite speculative -- there is no improvement by writing all values one by one without using rowId and colId to position, for more details, see the benchmark result at: https://github.com/apache/incubator-pinot/issues/6720
   
   There is one more thing need to be done: change the interface of `DataTable.getMetadata()` returns a `Map<MetadataKeys, String>`, instead of `Map<String, String>`. This PR is already quite large,  I wanner address it in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604388515



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()
+      throws IOException {
+    DataSchema.ColumnDataType[] columnDataTypes = DataSchema.ColumnDataType.values();
+    int numColumns = columnDataTypes.length;
+    String[] columnNames = new String[numColumns];
+    for (int i = 0; i < numColumns; i++) {
+      columnNames[i] = columnDataTypes[i].name();
+    }
 
     int[] ints = new int[NUM_ROWS];
     long[] longs = new long[NUM_ROWS];
     float[] floats = new float[NUM_ROWS];
     double[] doubles = new double[NUM_ROWS];
     String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
     Object[] objects = new Object[NUM_ROWS];
     int[][] intArrays = new int[NUM_ROWS][];
     long[][] longArrays = new long[NUM_ROWS][];
     float[][] floatArrays = new float[NUM_ROWS][];
     double[][] doubleArrays = new double[NUM_ROWS][];
     String[][] stringArrays = new String[NUM_ROWS][];
 
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    DataTableBuilder.setCurrentDataTableVersion(DataTableBuilder.VERSION_2);
+    DataTableBuilder dataTableBuilderV2 = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilderV2, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    // Verify V3 broker can deserialize data table send by V2 server
+    DataTable dataTableV2 = dataTableBuilderV2.build(); // create a V2 data table
+    // Deserialize data table bytes as V3
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTableV2.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);

Review comment:
       Not sure I follow this test
   
   - server is constructing a v2 data table and serializing it
   - broker will use DataTableFactory to get the data table. How can broker get it as v3 when the version # will indicate 2 and DataTableFactory will accordingly create DataTableImplV2 ?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605165539



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java
##########
@@ -96,22 +99,130 @@ public void testAllDataTypes()
     for (int i = 0; i < numColumns; i++) {
       columnNames[i] = columnDataTypes[i].name();
     }
-    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
 
+    int[] ints = new int[NUM_ROWS];
+    long[] longs = new long[NUM_ROWS];
+    float[] floats = new float[NUM_ROWS];
+    double[] doubles = new double[NUM_ROWS];
+    String[] strings = new String[NUM_ROWS];
+    byte[][] bytes = new byte[NUM_ROWS][];
+    Object[] objects = new Object[NUM_ROWS];
+    int[][] intArrays = new int[NUM_ROWS][];
+    long[][] longArrays = new long[NUM_ROWS][];
+    float[][] floatArrays = new float[NUM_ROWS][];
+    double[][] doubleArrays = new double[NUM_ROWS][];
+    String[][] stringArrays = new String[NUM_ROWS][];
+
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
     DataTableBuilder dataTableBuilder = new DataTableBuilder(dataSchema);
+    fillDataTableWithRandomData(dataTableBuilder, columnDataTypes, numColumns, ints, longs, floats, doubles, strings,
+        bytes, objects, intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+
+    DataTable dataTable = dataTableBuilder.build();
+    DataTable newDataTable = DataTableFactory.getDataTable(dataTable.toBytes());
+    Assert.assertEquals(newDataTable.getDataSchema(), dataSchema, ERROR_MESSAGE);
+    Assert.assertEquals(newDataTable.getNumberOfRows(), NUM_ROWS, ERROR_MESSAGE);
+    verifyDataIsSame(newDataTable, columnDataTypes, numColumns, ints, longs, floats, doubles, strings, bytes, objects,
+        intArrays, longArrays, floatArrays, doubleArrays, stringArrays);
+  }
+
+  @Test
+  public void testV2V3Compatibility()

Review comment:
       @mqliang the idea here is to make sure that the receiver handles things as much as possible even if sender does something weird (say, someone introduces a bug, or somehow the next rev of the protocol does something funky). See Robustmness principle: https://en.wikipedia.org/wiki/Robustness_principle




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603710319



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,88 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't add new key which has same id/name with existing keys. Duplicate name is not allowed.
+   *  - Don't change name of existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),

Review comment:
       No longer needed based on the comment https://github.com/apache/incubator-pinot/pull/6710/files#r603639040




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-806330983


   > By "Id" do you mean strings? Why is that any more advantages than adding an enum? In fact, enum is better since we can declare all enums in one place and add whatever comments there to not remove an enum
   
   @mcvsubbu @Jackie-Jiang I think Jackie means to associate a sting with each enum ordinal. Use the pattern from Effective Java:
   ```
   enum MyEnum {
       ENUM_1("A"),
       ENUM_2("B");
   
       private String name;
   
       private static final Map<String,MyEnum> ENUM_MAP;
   
       MyEnum (String name) {
           this.name = name;
       }
   
       public String getName() {
           return this.name;
       }
   
       // Build an immutable map of String name to enum pairs.
       // Any Map impl can be used.
   
       static {
           Map<String,MyEnum> map = new ConcurrentHashMap<String, MyEnum>();
           for (MyEnum instance : MyEnum.values()) {
               map.put(instance.getName().toLowerCase(),instance);
           }
           ENUM_MAP = Collections.unmodifiableMap(map);
       }
   
       public static MyEnum get (String name) {
           return ENUM_MAP.get(name.toLowerCase());
       }
   }
   
   ```
   
   This way, we are able to convert between  enum ordinal and string. My current implementation use two helper map and two helper function outside of the enum definition (`TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap` and `trailerKeyToMetadataKey()/metaDataKeyToTrailerKey()`) to do the conversion. Use the pattern of Effective Java can put all helper funcitons/data structure inside the enum definition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604528494



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       Oh, the code block putting here was conduct by IntellJ reformat. I'd suggest keep as it is, since later someone change this file and run IntellJ reformatting before commit, it will be moved to here anyway.

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       Oh, the code block putting here was conduct by IntellJ reformat. I'd suggest keep as it is, since assume later someone change this file and run IntellJ reformatting before commit, it will be moved to here anyway.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604372863



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";

Review comment:
       Note that by default protocol version is the latest (3). The config will be used to downgrade the protocol to 2 without having to rollback the server deployment if in case there are any issues. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia merged pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia merged pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604551363



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805246552


   @siddharthteotia  , @mqliang and I met, and agreed on the following (I have added some extras, so take a look)
   - We will move the metadata to the trailer, retain the other elements in the same order.
   - We will encode the trailer as <totalTrailerLen><trailerPart>
   - <trailerPart> = (int, int, blob)+
   - The first int is the enum ordinal, second int is the length of the blob, the third part is utf8 encoding of a string, or int/long as dictated by the enum. If int/long, then we will encode in network byte order (big-endian). Alternative is to convert it to a string.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603744773



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
##########
@@ -161,13 +163,15 @@ public void stop() {
           queryRequest.getBrokerId(), e);
       // For not handled exceptions
       serverMetrics.addMeteredGlobalValue(ServerMeter.UNCAUGHT_EXCEPTIONS, 1);
-      dataTable = new DataTableImplV2();
+      dataTable = new DataTableImplV3();
       dataTable.addException(QueryException.getException(QueryException.INTERNAL_ERROR, e));
     }
     long requestId = queryRequest.getRequestId();
     Map<String, String> dataTableMetadata = dataTable.getMetadata();
     dataTableMetadata.put(DataTable.REQUEST_ID_METADATA_KEY, Long.toString(requestId));
 
+    byte[] responseBytes = serializeDataTable(queryRequest, dataTable);
+

Review comment:
       Discussed offline. For now we agreed to emit, log and send a single metric. Will add a TODO to see how the metrics can be separated out on the server for emission and logging. Protocol won't change since we will always send the aggregated metric




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604379681



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       This seems incorrect since if the protocol config is set to V2, we should not be constructing V3 data table. 
   
   I think all these places are constructing empty data table on the server right?
   
   I think we should replace these with DataTableUtils.buildEmptyDataTable() to properly build an empty data table. Secondly, since DataTableUtils internally uses DataTableBuilder which is aware of the version so it will build an empty table based on V2 or V3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603021339



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(

Review comment:
       This is not needed anymore, after renaming Trailer as MetaData

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -45,9 +51,140 @@
   String NUM_RESIZES_METADATA_KEY = "numResizes";
   String RESIZE_TIME_MS_METADATA_KEY = "resizeTimeMs";
   String EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY = "executionThreadCpuTimeNs";
+  String RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY = "responseSerializationCpuTimeNs";
+
+  /* The TrailerKeys is used in V3, where we put all metadata as part of trailer and use enum keys as metadata keys.
+   * Currently all trailer keys are metadata keys, but in future we may add trailer key which is not a metadata key.
+   *
+   * NOTE:
+   * if you add a new key in TrailerKeys enum
+   *  - you need add it's corresponding string to TrailerKeyToMetadataKeyMap/MetadataKeyToTrailerKeyMap also.
+   *  - if it happen to be a metadata key, add it into MetadataKeys also.
+   *  - if it has a long/int type value, add it into LongValueTrailerKeys/LongValueTrailerKeys also.
+   *
+   * ATTENTION:
+   *  - Always add new key to the end of enum.
+   *  - Don't remove existing keys.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum TrailerKeys {
+    TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION_METADATA_KEY,
+    NUM_DOCS_SCANNED_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+    NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+    NUM_SEGMENTS_QUERIED,
+    NUM_SEGMENTS_PROCESSED,
+    NUM_SEGMENTS_MATCHED,
+    NUM_CONSUMING_SEGMENTS_PROCESSED,
+    MIN_CONSUMING_FRESHNESS_TIME_MS,
+    TOTAL_DOCS_METADATA_KEY,
+    NUM_GROUPS_LIMIT_REACHED_KEY,
+    TIME_USED_MS_METADATA_KEY,
+    TRACE_INFO_METADATA_KEY,
+    REQUEST_ID_METADATA_KEY,
+    NUM_RESIZES_METADATA_KEY,
+    RESIZE_TIME_MS_METADATA_KEY,
+    EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+    RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY,
+  }
+
+  // LongValueTrailerKeys contains all trailer keys which has value of long type.
+  Set<TrailerKeys> LongValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // IntValueTrailerKeys contains all trailer keys which has value of int type.
+  Set<TrailerKeys> IntValueTrailerKeys = ImmutableSet.of(
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY
+  );
+
+  // MetadataKeys contains all trailer keys which is also metadata key.
+  Set<TrailerKeys> MetadataKeys = ImmutableSet.of(
+      TrailerKeys.TABLE_KEY, // NOTE: this key is only used in PrioritySchedulerTest
+      TrailerKeys.EXCEPTION_METADATA_KEY,
+      TrailerKeys.NUM_DOCS_SCANNED_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_IN_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_ENTRIES_SCANNED_POST_FILTER_METADATA_KEY,
+      TrailerKeys.NUM_SEGMENTS_QUERIED,
+      TrailerKeys.NUM_SEGMENTS_PROCESSED,
+      TrailerKeys.NUM_SEGMENTS_MATCHED,
+      TrailerKeys.NUM_CONSUMING_SEGMENTS_PROCESSED,
+      TrailerKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+      TrailerKeys.TOTAL_DOCS_METADATA_KEY,
+      TrailerKeys.NUM_GROUPS_LIMIT_REACHED_KEY,
+      TrailerKeys.TIME_USED_MS_METADATA_KEY,
+      TrailerKeys.TRACE_INFO_METADATA_KEY,
+      TrailerKeys.REQUEST_ID_METADATA_KEY,
+      TrailerKeys.NUM_RESIZES_METADATA_KEY,
+      TrailerKeys.RESIZE_TIME_MS_METADATA_KEY,
+      TrailerKeys.EXECUTION_THREAD_CPU_TIME_NS_METADATA_KEY,
+      TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY
+  );
+
+  // TrailerKeyToMetadataKeyMap is used to convert enum key to metadata key(string).
+  Map<TrailerKeys, String> TrailerKeyToMetadataKeyMap = ImmutableMap.<TrailerKeys, String>builder()

Review comment:
       This is not needed anymore, after renaming Trailer as MetaData




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603691048



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  private final Map<MetadataKeys, String> _metadata;
+  // _metadataV2 is just a V2 presentation of _metadata, we copy KV pairs between _metadata and _metadataV2 during
+  // serialization/deserialization. This is because V2 API of getMetadata returns a Map<String, String> and there are
+  // a lot of existing code using string as key to access metadata.
+  // TODO: remove this and change all metadata accessing code use MetadataKeys.
+  private final Map<String, String> _metadataV2;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _metadataV2 = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+
+    _metadataV2 = new HashMap<>();
+    for (MetadataKeys key : _metadata.keySet()) {
+      _metadataV2.put(key.getName(), _metadata.get(key));
+    }
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "executionThreadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    long executionThreadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(EXECUTION_THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(EXECUTION_THREAD_CPU_TIME_NS.getName(), String.valueOf(executionThreadCpuTimeNs));
+    // Copy all KV pair in _metadataV2 into _metadata
+    for (String key : _metadataV2.keySet()) {
+      Optional<MetadataKeys> opt = MetadataKeys.getByName(key);
+      if (!opt.isPresent()) {
+        continue;
+      }
+      _metadata.put(opt.get(), _metadataV2.get(key));
+    }
+    // Write metadata length and bytes.

Review comment:
       See the comment above. Logic of lines 323 to 330 can be moved into serializeMetadata () itself and copy won't be needed 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604464226



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       (nit)
   ```suggestion
       THREAD_CPU_TIME_NS("threadCpuTimeNs");
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       I still suggest associating an id with each key instead of using ordinal of the enum. The convention here should be always increasing the id when adding new keys.
   Id is more flexible than ordinal for the following reasons:
   - Ordinal works as always putting the index key as the id. If by any chance people accidentally change the order of the keys, it will break
   - With id, we can remove keys in a backward-compatible way in two releases if necessary. With ordinal, we have to keep a place holder so that the ordinal for other keys don't change
   
   @mqliang @siddharthteotia @mcvsubbu Thoughts?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       You don't need `Optional` here, but either:
   - Throw exception for invalid id (suggest this way)
   - Return `null` for invalid id

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       ```suggestion
     enum MetadataKey {
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));

Review comment:
       `getOrDefault(name, null)` is the same as `get(name)`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  private static int _version = VERSION_3;

Review comment:
       This should not be hardcoded but from the config

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {

Review comment:
       Same here

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       Rename to `BaseDataTable`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       Suggest making 2 builders, one for v2 and one for v3. You can extract the common logic into a base class, or just duplicate code because we will deprecate v2 in the next release once v3 is well tested

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -96,6 +99,17 @@ public DataTableBuilder(DataSchema dataSchema) {
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);

Review comment:
       This won't be correct because we want to fix the float value size (should be 4 but use 8 bytes in v2)

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       Put this block following the map definition for better readability




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r606099691



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/BaseDataTable.java
##########
@@ -0,0 +1,283 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class BaseDataTable implements DataTable {
+  protected int _numRows;
+  protected int _numColumns;
+  protected DataSchema _dataSchema;
+  protected int[] _columnOffsets;
+  protected int _rowSizeInBytes;
+  protected Map<String, Map<Integer, String>> _dictionaryMap;
+  protected byte[] _fixedSizeDataBytes;
+  protected ByteBuffer _fixedSizeData;
+  protected byte[] _variableSizeDataBytes;
+  protected ByteBuffer _variableSizeData;
+  protected Map<String, String> _metadata;
+
+  public BaseDataTable(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public BaseDataTable() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _metadata = new HashMap<>();
+  }
+
+  /**
+   * Helper method to serialize dictionary map.
+   */
+  protected byte[] serializeDictionaryMap(Map<String, Map<Integer, String>> dictionaryMap)
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+
+    dataOutputStream.writeInt(dictionaryMap.size());
+    for (Map.Entry<String, Map<Integer, String>> dictionaryMapEntry : dictionaryMap.entrySet()) {
+      String columnName = dictionaryMapEntry.getKey();
+      Map<Integer, String> dictionary = dictionaryMapEntry.getValue();
+      byte[] bytes = StringUtil.encodeUtf8(columnName);
+      dataOutputStream.writeInt(bytes.length);
+      dataOutputStream.write(bytes);
+      dataOutputStream.writeInt(dictionary.size());
+
+      for (Map.Entry<Integer, String> dictionaryEntry : dictionary.entrySet()) {
+        dataOutputStream.writeInt(dictionaryEntry.getKey());
+        byte[] valueBytes = StringUtil.encodeUtf8(dictionaryEntry.getValue());
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+      }
+    }
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Helper method to deserialize dictionary map.
+   */
+  protected Map<String, Map<Integer, String>> deserializeDictionaryMap(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numDictionaries = dataInputStream.readInt();
+      Map<String, Map<Integer, String>> dictionaryMap = new HashMap<>(numDictionaries);
+
+      for (int i = 0; i < numDictionaries; i++) {
+        String column = decodeString(dataInputStream);
+        int dictionarySize = dataInputStream.readInt();
+        Map<Integer, String> dictionary = new HashMap<>(dictionarySize);
+        for (int j = 0; j < dictionarySize; j++) {
+          int key = dataInputStream.readInt();
+          String value = decodeString(dataInputStream);
+          dictionary.put(key, value);
+        }
+        dictionaryMap.put(column, dictionary);
+      }
+
+      return dictionaryMap;
+    }
+  }
+
+  public Map<String, String> getMetadata() {

Review comment:
       fixed in https://github.com/apache/incubator-pinot/pull/6738




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603654939



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       This is the only difference between V2 and V3 protocol right?
   
    ```
   // VERSION
     // NUM_ROWS
     // NUM_COLUMNS
     // DICTIONARY_MAP (START|SIZE)
     // METADATA (START|SIZE) -> in V3, this moves to trailer/footer/end
     // DATA_SCHEMA (START|SIZE)
     // FIXED_SIZE_DATA (START|SIZE)
     // VARIABLE_SIZE_DATA (START|SIZE)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599226381



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -61,12 +65,15 @@
   private final byte[] _variableSizeDataBytes;
   private final ByteBuffer _variableSizeData;
   private final Map<String, String> _metadata;
+  // Only V3 has _positionalData
+  private final String[] _positionalData;
 
   /**
    * Construct data table with results. (Server side)
    */
-  public DataTableImplV2(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
-      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+  public DataTableImplV2V3(int version, int numRows, DataSchema dataSchema,

Review comment:
       Are we passing version number to the constructor so that we can do backward compatibility tests between V2 and V3 ? Other than tests, I don't see why server should decide a version. It should always write the data table with CURRENT_VERSION 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605166183



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       @Jackie-Jiang , I prefer enums. We can add a unit test that asserts (A < B< C ...), to catch any re-orders.
   If we have to manually insert a value, then duplicate values are possible (by mistake) and that can also cause problems.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-809594670


   @mcvsubbu @siddharthteotia and me meet offline, we wanner keep this PR focus on bumping up to v3 and move metadata to the end of data table, also use enmu ordinal as key when serialize. And make it configurable to send V2/V3 data at server side (instance config).
   
   @Jackie-Jiang I terms of addressing the TODO in DataTableBuilder(fix float data length, one String->Int map for the whole table instead of for each column), we will address it separately(bumping up to V4). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r601850279



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {

Review comment:
       Can you please add a picture (ascii) of the layout as comments here? It will help clarify a lot of things. Thanks.
   
   You can use http://www.luismg.com/protocol/ or, if that is cumbersome, then just generate one manually but let us identify the items in the order in which they appear

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();

Review comment:
       Instead of starting threadtimer here you can start and end the timer inside the toBytesInternal() method. We don't need to capture the timer for serializing the trailer. Instead, just save the value in a local variable, and add it to the trailer. That way, you can avoid having the variable `_responseSerializationCpuTimeNsValueOffset`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();

Review comment:
       ```suggestion
       _exceptions = extractExceptionsFromV2Metadata();
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+    byte[] bytes = toBytesInternal();
+    _responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // Replace the value of "responseSerializationCpuTimeNs" as actual value
+    System.arraycopy(Longs.toByteArray(_responseSerializationCpuTimeNs), 0, bytes,
+        _responseSerializationCpuTimeNsValueOffset, Long.BYTES);
+    return bytes;
+  }
+
+  private byte[] toBytesInternal()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+      dataOffset += _variableSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write trailer data (START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    // Put all meta data into trailer.
+    _trailer = putAllMetaDataIntoTrailer();
+    _responseSerializationCpuTimeNsValueOffset = dataOffset;
+    byte[] trailerBytes = serializeTrailer();
+    dataOutputStream.writeInt(trailerBytes.length);
+
+    // Write actual data.
+    dataOutputStream.write(exceptionsBytes);
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+    dataOutputStream.write(trailerBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+    int trailerStart = byteBuffer.getInt();
+    int trailerLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read trailer.
+    byte[] trailerBytes = new byte[trailerLength];
+    byteBuffer.position(trailerStart);
+    byteBuffer.get(trailerBytes);
+    _trailer = deserializeTrailer(trailerBytes);
+
+    /**
+     * Extract metadata from trailer.
+     * Metadata is actually a part of _trailer in V3 when serialize DataTable into bytes. When deserialize,
+     * we extract metadata from _trailer into this _metadata map to provide the same interface with V2.
+     * */
+    _metadata = extractMetadataFormTrailer();
+  }
+
+  /**
+   * Construct data table from V2 byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer, boolean isV2)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int metadataStart = byteBuffer.getInt();
+    int metadataLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read metadata.
+    byte[] metadataBytes = new byte[metadataLength];
+    byteBuffer.position(metadataStart);
+    byteBuffer.get(metadataBytes);
+    _metadata = deserializeV2Metadata(metadataBytes);
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    _trailer = null;
+    /**
+     * V2 stores exceptions as a bunch of KV pairs in metadata, all exceptions has key of "Exception"+errCode.
+     * To interpret V2 bytes as V3 object, extract exceptions from metadata.
+     */
+    _exceptions = extractExceptionsFormV2Metadata();
+  }
+
+  /**
+   * Serialize trailer section to bytes.
+   * Format of the bytes looks:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]
+   * For each KV pairs:
+   * - if value is int/long, encode it as: [keyOrdinal, bigEndianRepresentationOfValue]
+   * - if value is string, encode it as: [keyOrdinal, valueLength, Utf8EncodedValue]
+   */
+  private byte[] serializeTrailer()
+      throws IOException {
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    int offset = 0;
+    dataOutputStream.writeInt(_trailer.size());
+    offset += Integer.BYTES;
+    for (Map.Entry<TrailerKeys, String> entry : _trailer.entrySet()) {
+      TrailerKeys key = entry.getKey();
+      String value = entry.getValue();
+      dataOutputStream.writeInt(key.ordinal());
+      offset += Integer.BYTES;
+      if (key == TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY) {
+        _responseSerializationCpuTimeNsValueOffset += offset;
+      }
+      if (IntValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Ints.toByteArray(Integer.parseInt(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else if (LongValueTrailerKeys.contains(key)) {
+        byte[] valueBytes = Longs.toByteArray(Long.parseLong(value));
+        dataOutputStream.write(valueBytes);
+        offset += valueBytes.length;
+      } else {
+        byte[] valueBytes = StringUtil.encodeUtf8(value);
+        dataOutputStream.writeInt(valueBytes.length);
+        dataOutputStream.write(valueBytes);
+        offset += Integer.BYTES + valueBytes.length;
+      }
+    }
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  private Map<TrailerKeys, String> deserializeTrailer(byte[] bytes)
+      throws IOException {
+    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
+        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
+      int numEntries = dataInputStream.readInt();
+      Map<TrailerKeys, String> trailer = new TreeMap<>();
+      for (int i = 0; i < numEntries; i++) {
+        int ordinal = dataInputStream.readInt();
+        TrailerKeys key = TrailerKeys.values()[ordinal];

Review comment:
       You should want to check if the oridinal is in range, and skip any unknown ones




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603518332



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()

Review comment:
       Let's not worry about spending time on that optimization. It can always be done later is not a must have for this change. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599199108



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -33,12 +33,15 @@
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
 import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
+public class DataTableImplV2V3 implements DataTable {

Review comment:
       (nit) suggest not including the version name in class name. It should just be DataTableImpl. Tomorrow if we bump up the version to 4, then the name will be DataTableImplV2V3V4 which is undesirable




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605218640



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,11 +94,16 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    _version = VERSION_3;
     _dataSchema = dataSchema;
     _columnOffsets = new int[dataSchema.size()];
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
   }
 
+  public static void setCurrentDataTableVersion(int version) {
+    _version = version;

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604408686



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -138,7 +138,7 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
       String errorMessage = String
           .format("Query scheduling took %dms (longer than query timeout of %dms)", querySchedulingTimeMs,
               queryTimeoutMs);
-      DataTable dataTable = new DataTableImplV2();
+      DataTable dataTable = new DataTableImplV3();

Review comment:
       Discussed this offline with @mqliang. For now we decided to go with option 1. Add a TODO there to follow-up with a PR which unifies the the way of constructing empty data table in the same manner everywhere




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] amrishlal commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

amrishlal commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r605325492



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -50,51 +46,19 @@
   // VARIABLE_SIZE_DATA (START|SIZE)
   private static final int HEADER_SIZE = Integer.BYTES * 13;
 
-  private final int _numRows;
-  private final int _numColumns;
-  private final DataSchema _dataSchema;
-  private final int[] _columnOffsets;
-  private final int _rowSizeInBytes;
-  private final Map<String, Map<Integer, String>> _dictionaryMap;
-  private final byte[] _fixedSizeDataBytes;
-  private final ByteBuffer _fixedSizeData;
-  private final byte[] _variableSizeDataBytes;
-  private final ByteBuffer _variableSizeData;
-  private final Map<String, String> _metadata;
-
   /**
    * Construct data table with results. (Server side)
    */
   public DataTableImplV2(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
       byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
-    _numRows = numRows;
-    _numColumns = dataSchema.size();
-    _dataSchema = dataSchema;
-    _columnOffsets = new int[_numColumns];
-    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
-    _dictionaryMap = dictionaryMap;
-    _fixedSizeDataBytes = fixedSizeDataBytes;
-    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
-    _variableSizeDataBytes = variableSizeDataBytes;
-    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
-    _metadata = new HashMap<>();
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
   }
 
   /**
    * Construct empty data table. (Server side)
    */
   public DataTableImplV2() {
-    _numRows = 0;
-    _numColumns = 0;
-    _dataSchema = null;
-    _columnOffsets = null;
-    _rowSizeInBytes = 0;
-    _dictionaryMap = null;
-    _fixedSizeDataBytes = null;
-    _fixedSizeData = null;
-    _variableSizeDataBytes = null;
-    _variableSizeData = null;
-    _metadata = new HashMap<>();
+    super();

Review comment:
       This constructor can be removed. super() is redundant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599306832



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -33,12 +33,15 @@
 import org.apache.pinot.common.utils.DataTable;
 import org.apache.pinot.common.utils.StringUtil;
 import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
 
-public class DataTableImplV2 implements DataTable {
-  private static final int VERSION = 2;
+public class DataTableImplV2V3 implements DataTable {

Review comment:
       I name it as DataTableImplV2V3 since V2 and V3 share a lot of common logic. If V2 and V3 has major changes, as you suggest:
   > Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs) + new positional stuff can be a file footer.
   
   If we do that, I vote for put V2 logic into DataTableImplV2 and V3 logic into DataTableImplV3, and extract common logic (e.g. serialize/de-serialize metada/dictionaryMap into DataTableUtils.java)
   
   > move the existing metadata of key-value pairs to the end of file 
   
   Actually I considered that. I also considered to make metadata as a `String[]` instead of `Map<String, String>` and make all meta data keys as enum value. Also make "serialization_cpu_times_ns" as part of metadata. In other words, "serialization_cpu_times_ns" is part of mate data and footer section only contains meta data. In this way:
   * all meta data is positional, we can replace values in metadata even after data table is serialized. (bytes of `Map<String, String>` is not positional because when loop over a hashmap, the order of items is not deterministic,  but loop over of an array, the order is deterministic)
   * meta data previously is `Map<String, String>`, where we need to write keys(type string) to byte buffer. When replaces as `String[]`, we don't write the enum constant itself. Just the value (length+bytes) corresponding to the ordinal/position of the constant. So less data is transfered between server/broker.
   
   But if we change in this way, as I previously stated, I vote to keep the current DataTableImplV2.java as it is, and create a DataTableImplV3.java to put all V3 logic (with extracting common  into DataTableUtils.java ). Otherwise, puting all V2/V3 logic in same file will make the code hard to read. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang closed pull request #6710: Add a trailer section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang closed pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603649866



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2.java
##########
@@ -36,6 +35,10 @@
 import org.apache.pinot.spi.utils.ByteArray;
 import org.apache.pinot.spi.utils.BytesUtils;
 
+import static org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.deserializeDictionaryMap;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.serializeDictionaryMap;
+
 
 public class DataTableImplV2 implements DataTable {
   private static final int VERSION = 2;

Review comment:
       This now becomes redundant and we can remove this. Use VERSION_2 constant already defined in DataTableBuilder




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603654939



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,594 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.EXECUTION_THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |

Review comment:
       This is the only difference between V2 and V3 protocol right? So handling exceptions should be the same as before. 
   
    ```
   // VERSION
     // NUM_ROWS
     // NUM_COLUMNS
     // DICTIONARY_MAP (START|SIZE)
     // METADATA (START|SIZE) -> in V3, this moves to trailer/footer/end
     // DATA_SCHEMA (START|SIZE)
     // FIXED_SIZE_DATA (START|SIZE)
     // VARIABLE_SIZE_DATA (START|SIZE)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r599310756



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV2V3.java
##########
@@ -344,6 +395,20 @@ public void addException(ProcessingException processingException) {
     return byteArrayOutputStream.toByteArray();
   }
 
+  private byte[] serializePositionalData()

Review comment:
       I will write a benchmark to compare these two serialization approach. If the proposed approach is better, will send a PR to address it. Create a issue to track this: https://github.com/apache/incubator-pinot/issues/6714




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mqliang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r603028236



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,702 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.TreeMap;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static org.apache.pinot.core.common.datatable.DataTableUtils.*;
+
+
+public class DataTableImplV3 implements DataTable {
+  private static final int VERSION = 3;
+
+  // VERSION
+  // NUM_ROWS
+  // NUM_COLUMNS
+  // EXCEPTIONS (START|SIZE)
+  // DICTIONARY_MAP (START|SIZE)
+  // DATA_SCHEMA (START|SIZE)
+  // FIXED_SIZE_DATA (START|SIZE)
+  // VARIABLE_SIZE_DATA (START|SIZE)
+  // TRAILER (START|SIZE)
+  private static final int HEADER_SIZE = Integer.BYTES * 15;
+
+  private final int _numRows;
+  private final int _numColumns;
+  private final DataSchema _dataSchema;
+  private final int[] _columnOffsets;
+  private final int _rowSizeInBytes;
+  private final Map<String, Map<Integer, String>> _dictionaryMap;
+  private final byte[] _fixedSizeDataBytes;
+  private final ByteBuffer _fixedSizeData;
+  private final byte[] _variableSizeDataBytes;
+  private final ByteBuffer _variableSizeData;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+  /**
+   * _metadata stores KV pairs for metadata. Metadata is actually a part of _trailer in V3 when serialize DataTable
+   * into bytes. When deserialize, we extract metadata from _trailer into this _metadata map to provide the same
+   * interface with V2. There are many code use
+   * datatable.getMetadata().get("key")/datatable.getMetadata().put("key", "value") to get/set metadata.
+   * TODO(@mqliang): revise this if we decide to get/set metadata by
+   *  datable.getTailerData(key)/datable.setTailer(key, value).
+   */
+  private final Map<String, String> _metadata;
+  private Map<TrailerKeys, String> _trailer;
+
+  private long _responseSerializationCpuTimeNs;
+  private int _responseSerializationCpuTimeNsValueOffset;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    _numRows = numRows;
+    _numColumns = dataSchema.size();
+    _dataSchema = dataSchema;
+    _columnOffsets = new int[_numColumns];
+    _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
+    _dictionaryMap = dictionaryMap;
+    _fixedSizeDataBytes = fixedSizeDataBytes;
+    _fixedSizeData = ByteBuffer.wrap(fixedSizeDataBytes);
+    _variableSizeDataBytes = variableSizeDataBytes;
+    _variableSizeData = ByteBuffer.wrap(variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    _numRows = 0;
+    _numColumns = 0;
+    _dataSchema = null;
+    _columnOffsets = null;
+    _rowSizeInBytes = 0;
+    _dictionaryMap = null;
+    _fixedSizeDataBytes = null;
+    _fixedSizeData = null;
+    _variableSizeDataBytes = null;
+    _variableSizeData = null;
+    _exceptions = new HashMap<>();
+    _metadata = new HashMap<>();
+    _trailer = new TreeMap<>();
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    _trailer.put(TrailerKeys.RESPONSE_SERIALIZATION_CPU_TIME_NS_METADATA_KEY, String.valueOf(-1));
+    ThreadTimer threadTimer = new ThreadTimer();

Review comment:
       > Instead of starting threadtimer here you can start and end the timer inside the toBytesInternal() method. We don't need to capture the timer for serializing the trailer. Instead, just save the value in a local variable, and add it to the trailer. That way, you can avoid having the variable _responseSerializationCpuTimeNsValueOffset
   
   This implementation is impossible, since we need write trailer section(or metadata as suggested by @Jackie-Jiang, whatever we name it) start offset and length into header, so trailer section serialization happed before we write actual data bytes (exceptions data bytes, data schema bytes, fixed size data bytes, variable size data byte) into data output stream. If we implement in this way, only the time of serialize each section get account, the time of writing data bytes into data output stream was ignored.
   
   Instead, my current implementation: when write trailer section start offset and length into header, increase the length by the length of serialization_cost KV pair. And append the bytes of serialization_cost KV pair to the end of data output stream.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6710: Add a positional data section to data table and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#issuecomment-805198697


   > With the addition of new data structure in this PR, there are essentially two places in DataTable where the key-value / name-value style structure is located.
   > 
   > * First is the existing DataTable metadata which is also a series of key-value pairs where key is string and value is some statistic/metric. This is towards the beginning of the byte stream
   > * Second is the structure introduced in this PR which is written as a footer.
   > 
   > Since we are anyway bumping up the version, how about we move the existing metadata of key-value pairs to the end of file to keep consistency in the format. So, all the metadata stuff (aka key-value pairs) + new positional stuff can be a file footer.
   
   KV pair might be misleading here. Within a KV pair, the value part is indeed a arbitrary serialized object. The KV concept in the footer is just to give it some structure. So, we can keep growing the footer by adding a key to the enum and then the corresponding serialized bytes into the payload


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6710: DataTable V3 implementation and measure data table serialization cost on server

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604258773



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |

Review comment:
       ```suggestion
    * 	| 13 integers of header:                           |
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
##########
@@ -321,6 +321,9 @@
     public static final String CONFIG_OF_ENABLE_THREAD_CPU_TIME_MEASUREMENT =
         "pinot.server.instance.enableThreadCpuTimeMeasurement";
     public static final boolean DEFAULT_ENABLE_THREAD_CPU_TIME_MEASUREMENT = false;
+
+    public static final String CONFIG_OF_CURRENT_DATA_TABLE_VERSION = "pinot.server.instance.currentDataTableVersion";

Review comment:
       We can retain this config forever, to be used for upgrading the protocol. 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -91,11 +94,16 @@
   private ByteBuffer _currentRowDataByteBuffer;
 
   public DataTableBuilder(DataSchema dataSchema) {
+    _version = VERSION_3;
     _dataSchema = dataSchema;
     _columnOffsets = new int[dataSchema.size()];
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, _columnOffsets);
   }
 
+  public static void setCurrentDataTableVersion(int version) {
+    _version = version;

Review comment:
       Throw exception if it is not one of the supported versions

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];
+    byteBuffer.get(trailerBytes);
+    _metadata = deserializeMetadata(trailerBytes);
+  }
+
+  @Override
+  public void addException(ProcessingException processingException) {
+    _exceptions.put(processingException.getErrorCode(), processingException.getMessage());
+  }
+
+  @Override
+  public Map<Integer, String> getExceptions() {
+    return _exceptions;
+  }
+
+  @Override
+  public byte[] toBytes()
+      throws IOException {
+    ThreadTimer threadTimer = new ThreadTimer();
+    threadTimer.start();
+
+    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
+    DataOutputStream dataOutputStream = new DataOutputStream(byteArrayOutputStream);
+    dataOutputStream.writeInt(VERSION_3);
+    dataOutputStream.writeInt(_numRows);
+    dataOutputStream.writeInt(_numColumns);
+    int dataOffset = HEADER_SIZE;
+
+    // Write exceptions section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] exceptionsBytes;
+    exceptionsBytes = serializeExceptions();
+    dataOutputStream.writeInt(exceptionsBytes.length);
+    dataOffset += exceptionsBytes.length;
+
+    // Write dictionary map section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dictionaryMapBytes = null;
+    if (_dictionaryMap != null) {
+      dictionaryMapBytes = serializeDictionaryMap(_dictionaryMap);
+      dataOutputStream.writeInt(dictionaryMapBytes.length);
+      dataOffset += dictionaryMapBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write data schema section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    byte[] dataSchemaBytes = null;
+    if (_dataSchema != null) {
+      dataSchemaBytes = _dataSchema.toBytes();
+      dataOutputStream.writeInt(dataSchemaBytes.length);
+      dataOffset += dataSchemaBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write fixed size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.writeInt(_fixedSizeDataBytes.length);
+      dataOffset += _fixedSizeDataBytes.length;
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write variable size data section offset(START|SIZE).
+    dataOutputStream.writeInt(dataOffset);
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.writeInt(_variableSizeDataBytes.length);
+    } else {
+      dataOutputStream.writeInt(0);
+    }
+
+    // Write actual data.
+    // Write exceptions bytes.
+    dataOutputStream.write(exceptionsBytes);
+    // Write dictionary map bytes.
+    if (dictionaryMapBytes != null) {
+      dataOutputStream.write(dictionaryMapBytes);
+    }
+    // Write data schema bytes.
+    if (dataSchemaBytes != null) {
+      dataOutputStream.write(dataSchemaBytes);
+    }
+    // Write fixed size data bytes.
+    if (_fixedSizeDataBytes != null) {
+      dataOutputStream.write(_fixedSizeDataBytes);
+    }
+    // Write variable size data bytes.
+    if (_variableSizeDataBytes != null) {
+      dataOutputStream.write(_variableSizeDataBytes);
+    }
+
+    // Update the value of "threadCpuTimeNs" to account data table serialization time.
+    long responseSerializationCpuTimeNs = threadTimer.stopAndGetThreadTimeNs();
+    // TODO: currently log/emit a total thread cpu time for query execution time and data table serialization time.
+    //  Figure out a way to log/emit separately. Probably via providing an API on the DataTable to get/set query
+    //  context, which is supposed to be used at server side only.
+    long threadCpuTimeNs =
+        Long.parseLong(getMetadata().getOrDefault(THREAD_CPU_TIME_NS.getName(), "0")) + responseSerializationCpuTimeNs;
+    getMetadata().put(THREAD_CPU_TIME_NS.getName(), String.valueOf(threadCpuTimeNs));
+
+    // Write metadata length and bytes.
+    byte[] metadataBytes = serializeMetadata();
+    dataOutputStream.writeInt(metadataBytes.length);
+    dataOutputStream.write(metadataBytes);
+
+    return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * Serialize metadata section to bytes.
+   * Format of the bytes looks like:
+   * [numEntries, bytesOfKV2, bytesOfKV2, bytesOfKV3]

Review comment:
       this is wrong description. The format is:
   - length of metadata section
   - actual metadata
   Metadata can be one of two types -- fixed (Int/long) or var length.
   A fixed length metadata is coded as: (enumOrdinal , metadata value )
   Var length metadata is coded as: (enumOrdinal, metadata length, metadata value)
   All integer values (including ordinal, etc.) are encoded in BigEndian format

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();

Review comment:
       Add the case where metadataLength is 0

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;

Review comment:
       ```suggestion
     private final Map<Integer, String> _errCodeToExceptionMap;
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java
##########
@@ -0,0 +1,397 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.common.datatable;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import org.apache.pinot.common.response.ProcessingException;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.query.request.context.ThreadTimer;
+
+import static org.apache.pinot.common.utils.DataTable.MetadataKeys.THREAD_CPU_TIME_NS;
+import static org.apache.pinot.core.common.datatable.DataTableBuilder.VERSION_3;
+
+
+/**
+ * Datatable V3 implementation.
+ * The layout of serialized V3 datatable looks like:
+ * 	+-----------------------------------------------+
+ * 	| 13 bytes of header:                           |
+ * 	| VERSION                                       |
+ * 	| NUM_ROWS                                      |
+ * 	| NUM_COLUMNS                                   |
+ * 	| EXCEPTIONS SECTION START OFFSET               |
+ * 	| EXCEPTIONS SECTION LENGTH                     |
+ * 	| DICTIONARY_MAP SECTION START OFFSET           |
+ * 	| DICTIONARY_MAP SECTION LENGTH                 |
+ * 	| DATA_SCHEMA SECTION START OFFSET              |
+ * 	| DATA_SCHEMA SECTION LENGTH                    |
+ * 	| FIXED_SIZE_DATA SECTION START OFFSET          |
+ * 	| FIXED_SIZE_DATA SECTION LENGTH                |
+ * 	| VARIABLE_SIZE_DATA SECTION START OFFSET       |
+ * 	| VARIABLE_SIZE_DATA SECTION LENGTH             |
+ * 	+-----------------------------------------------+
+ * 	| EXCEPTIONS SECTION                            |
+ * 	+-----------------------------------------------+
+ * 	| DICTIONARY_MAP SECTION                        |
+ * 	+-----------------------------------------------+
+ * 	| DATA_SCHEMA SECTION                           |
+ * 	+-----------------------------------------------+
+ * 	| FIXED_SIZE_DATA SECTION                       |
+ * 	+-----------------------------------------------+
+ * 	| VARIABLE_SIZE_DATA SECTION                    |
+ * 	+-----------------------------------------------+
+ * 	| METADATA LENGTH                               |
+ * 	| METADATA SECTION                              |
+ * 	+-----------------------------------------------+
+ */
+public class DataTableImplV3 extends DataTableImplBase {
+  private static final int HEADER_SIZE = Integer.BYTES * 13;
+  // _exceptions stores exceptions as a map of errorCode->errorMessage
+  private final Map<Integer, String> _exceptions;
+
+  /**
+   * Construct data table with results. (Server side)
+   */
+  public DataTableImplV3(int numRows, DataSchema dataSchema, Map<String, Map<Integer, String>> dictionaryMap,
+      byte[] fixedSizeDataBytes, byte[] variableSizeDataBytes) {
+    super(numRows, dataSchema, dictionaryMap, fixedSizeDataBytes, variableSizeDataBytes);
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct empty data table. (Server side)
+   */
+  public DataTableImplV3() {
+    super();
+    _exceptions = new HashMap<>();
+  }
+
+  /**
+   * Construct data table from byte array. (broker side)
+   */
+  public DataTableImplV3(ByteBuffer byteBuffer)
+      throws IOException {
+    // Read header.
+    _numRows = byteBuffer.getInt();
+    _numColumns = byteBuffer.getInt();
+    int exceptionsStart = byteBuffer.getInt();
+    int exceptionsLength = byteBuffer.getInt();
+    int dictionaryMapStart = byteBuffer.getInt();
+    int dictionaryMapLength = byteBuffer.getInt();
+    int dataSchemaStart = byteBuffer.getInt();
+    int dataSchemaLength = byteBuffer.getInt();
+    int fixedSizeDataStart = byteBuffer.getInt();
+    int fixedSizeDataLength = byteBuffer.getInt();
+    int variableSizeDataStart = byteBuffer.getInt();
+    int variableSizeDataLength = byteBuffer.getInt();
+
+    // Read exceptions.
+    if (exceptionsLength != 0) {
+      byte[] exceptionsBytes = new byte[exceptionsLength];
+      byteBuffer.position(exceptionsStart);
+      byteBuffer.get(exceptionsBytes);
+      _exceptions = deserializeExceptions(exceptionsBytes);
+    } else {
+      _exceptions = new HashMap<>();
+    }
+
+    // Read dictionary.
+    if (dictionaryMapLength != 0) {
+      byte[] dictionaryMapBytes = new byte[dictionaryMapLength];
+      byteBuffer.position(dictionaryMapStart);
+      byteBuffer.get(dictionaryMapBytes);
+      _dictionaryMap = deserializeDictionaryMap(dictionaryMapBytes);
+    } else {
+      _dictionaryMap = null;
+    }
+
+    // Read data schema.
+    if (dataSchemaLength != 0) {
+      byte[] schemaBytes = new byte[dataSchemaLength];
+      byteBuffer.position(dataSchemaStart);
+      byteBuffer.get(schemaBytes);
+      _dataSchema = DataSchema.fromBytes(schemaBytes);
+      _columnOffsets = new int[_dataSchema.size()];
+      _rowSizeInBytes = DataTableUtils.computeColumnOffsets(_dataSchema, _columnOffsets);
+    } else {
+      _dataSchema = null;
+      _columnOffsets = null;
+      _rowSizeInBytes = 0;
+    }
+
+    // Read fixed size data.
+    if (fixedSizeDataLength != 0) {
+      _fixedSizeDataBytes = new byte[fixedSizeDataLength];
+      byteBuffer.position(fixedSizeDataStart);
+      byteBuffer.get(_fixedSizeDataBytes);
+      _fixedSizeData = ByteBuffer.wrap(_fixedSizeDataBytes);
+    } else {
+      _fixedSizeDataBytes = null;
+      _fixedSizeData = null;
+    }
+
+    // Read variable size data.
+    if (variableSizeDataLength != 0) {
+      _variableSizeDataBytes = new byte[variableSizeDataLength];
+      byteBuffer.position(variableSizeDataStart);
+      byteBuffer.get(_variableSizeDataBytes);
+      _variableSizeData = ByteBuffer.wrap(_variableSizeDataBytes);
+    } else {
+      _variableSizeDataBytes = null;
+      _variableSizeData = null;
+    }
+
+    // Read metadata.
+    int metadataLength = byteBuffer.getInt();
+    byte[] trailerBytes = new byte[metadataLength];

Review comment:
       ```suggestion
       byte[] metadataBytes = new byte[metadataLength];
   ```
   Let us keep the naming consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org