You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/21 23:46:31 UTC
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Jackie-Jiang commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903149184


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   For MV read, let's add APIs to directly read the values without passing in a buffer, e.g. `int[] getIntMV(int docId, T context)`. This API will be very useful to prevent unnecessary copying of the array, or when the `maxNumValuesPerMVEntry` is not available



##########
pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java:
##########
@@ -713,8 +730,51 @@ void readStringValuesMV(TransformEvaluator evaluator, int[] docIds, int length,
 
     public void readNumValuesMV(int[] docIds, int length, int[] numValuesBuffer) {
       Tracing.activeRecording().setInputDataType(_dataType, _singleValue);
-      for (int i = 0; i < length; i++) {
-        numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], _reusableMVDictIds, getReaderContext());
+      if (_dictionary != null) {
+        for (int i = 0; i < length; i++) {
+          numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], _reusableMVDictIds, getReaderContext());
+        }
+      } else {
+        switch (_reader.getStoredType()) {
+          case INT:
+            int[] intValueBuffer = new int[_maxNumValuesPerMVEntry];
+            for (int i = 0; i < length; i++) {
+              numValuesBuffer[i] = _reader.getIntMV(docIds[i], intValueBuffer, getReaderContext());

Review Comment:
   This is adding lots of overhead because we don't really need to read the values. Let's add an API `int getNumValuesMV(int docId, T context)` to the `ForwardIndexReader` which simply returns the values in the MV entry without reading any content



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -47,7 +48,7 @@
    * Returns the data type of the values in the forward index. Returns {@link DataType#INT} for dictionary-encoded
    * forward index.
    */
-  DataType getValueType();
+  DataType getStoredType();

Review Comment:
   Can we put this change as a separate PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org