You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/10/19 10:57:15 UTC

[GitHub] [pinot] richardstartin opened a new pull request #7595: MV fwd index + MV `BYTES`

richardstartin opened a new pull request #7595:
URL: https://github.com/apache/pinot/pull/7595


   ## Description
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735047516



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/AbstractColumnStatisticsCollector.java
##########
@@ -72,6 +73,10 @@ public int getMaxNumberOfMultiValues() {
     return _maxNumberOfMultiValues;
   }
 
+  public int getMaxLengthOfMultiValues() {

Review comment:
       This seems not used

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
    * @param chunkSize Size of chunk
    * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
    * @param version version of File
-   * @throws FileNotFoundException
+   * @throws IOException if the file isn't found or can't be mapped
    */
   protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
       int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
-      throws FileNotFoundException {
+      throws IOException {
     Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+    _file = file;
+    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+    _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
     _chunkSize = chunkSize;
     _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
-    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
-    _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
     _chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
-    _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
-    _dataFile = new RandomAccessFile(file, "rw").getChannel();
+    _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+    _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);
+    writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);

Review comment:
       Since we directly write the header into the file, probably more readable if we directly pass in the `_dataChannel` and write into it. No need to keep the member variable `_header`

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
    *
    */
   protected void writeChunk() {
-    int sizeToWrite;
+    int sizeWritten;
     _chunkBuffer.flip();
 
-    try {
-      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
-      _dataFile.write(_compressedBuffer, _dataOffset);
-      _compressedBuffer.clear();
+    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+    // compress directly in to the mapped output rather keep a large buffer to compress into
+    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
+        maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {

Review comment:
       I'd suggest directly mapping the file channel to get the `ByteBuffer` instead of getting a `PinotDataBuffer` then create a view out of it.

##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
   default int getStringMV(int docId, String[] valueBuffer, T context) {
     throw new UnsupportedOperationException();
   }
+
+  /**
+   * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+   * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+   * entry.
+   *
+   * @param docId Document id
+   * @param valueBuffer Value buffer
+   * @param context Reader context
+   * @return Number of values within the multi-value entry
+   */
+  default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+    throw new UnsupportedOperationException();
+  }
+
+  default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {

Review comment:
       Remove this?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
-    // If buffer filled, then compress and write to file.
-    if (_chunkHeaderOffset == _chunkHeaderSize) {
-      writeChunk();
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+  public void putStrings(String[] values) {
+    // the entire String[] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the strings into the data buffer as if it's a single string,
+    // but with its own embedded header so offsets to strings within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i].getBytes(UTF_8);
+      _chunkBuffer.putInt(h, utf8.length);
+      _chunkBuffer.put(utf8);
+      bodySize += utf8.length;
     }
+    _chunkDataOffSet += headerSize + bodySize;
+    // go back to write the number of strings embedded in the big string
+    _chunkBuffer.putInt(headerPosition, values.length);
+
+    writeChunkIfNecessary();
   }
 
-  @Override
-  public void close()
-      throws IOException {
+  public void putByteArrays(byte[][] values) {
+    // the entire byte[][] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the byte[]s into the data buffer as if it's a single byte[],
+    // but with its own embedded header so offsets to byte[]s within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i];

Review comment:
       (nit) rename the variable

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {

Review comment:
       (nit) the value type should already be the stored type

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];
+              Arrays.fill((String[]) columnValueToIndex, value);
+            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+              int length = ((byte[][]) columnValueToIndex).length;
+              columnValueToIndex = new byte[length][];
+              Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+            } else {
+              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+            }
+          }
+          switch (forwardIndexCreator.getValueType()) {
+            case INT:
+              if (columnValueToIndex instanceof int[]) {

Review comment:
       This will always be `Object[]`. We won't pass primitive array to this class

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);
+    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
+    int numDocsPerChunk =
+        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+        numDocsPerChunk, totalMaxLength, writerVersion);
+    _valueType = valueType;
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+    int overheadPerEntry =
+        lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+  }
+
+  @Override
+  public boolean isDictionaryEncoded() {
+    return false;
+  }
+
+  @Override
+  public boolean isSingleValue() {
+    return false;
+  }
+
+  @Override
+  public DataType getValueType() {
+    return _valueType;
+  }
+
+  @Override
+  public void putIntMV(final int[] values) {
+
+    byte[] bytes = new byte[Integer.BYTES
+        + values.length * Integer.BYTES]; //numValues, bytes required to store the content
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the content of each element
+    for (final int value : values) {
+      byteBuffer.putInt(value);
+    }
+    _indexWriter.putBytes(bytes);
+  }
+
+  @Override
+  public void putLongMV(final long[] values) {
+
+    byte[] bytes = new byte[Integer.BYTES
+        + values.length * Long.BYTES]; //numValues, bytes required to store the content
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the content of each element
+    for (final long value : values) {
+      byteBuffer.putLong(value);
+    }
+    _indexWriter.putBytes(bytes);
+  }
+
+  @Override
+  public void putFloatMV(final float[] values) {
+
+    byte[] bytes = new byte[Integer.BYTES
+        + values.length * Float.BYTES]; //numValues, bytes required to store the content
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the content of each element
+    for (final float value : values) {
+      byteBuffer.putFloat(value);
+    }
+    _indexWriter.putBytes(bytes);
+  }
+
+  @Override
+  public void putDoubleMV(final double[] values) {
+
+    byte[] bytes = new byte[Integer.BYTES
+        + values.length * Long.BYTES]; //numValues, bytes required to store the content

Review comment:
       (nit) `Double.BYTES`

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);
+    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;

Review comment:
       Include the length integer to the total length?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];

Review comment:
       We want to put a single-element MV here. Same for BYTES

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -40,4 +40,9 @@ public int compress(ByteBuffer inUncompressed, ByteBuffer outCompressed)
     outCompressed.flip();
     return compressedSize;
   }
+
+  @Override
+  public int maxCompressedSize(int uncompressedSize) {
+    return 2 * uncompressedSize;

Review comment:
       Can you put some comments on how this value is calculated?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);

Review comment:
       (nit) unnecessary?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -734,10 +834,11 @@ public static void removeColumnMetadataInfo(PropertiesConfiguration properties,
    * @param deriveNumDocsPerChunk true if varbyte writer should auto-derive the number of rows per chunk
    * @param writerVersion version to use for the raw index writer
    * @return raw index creator
-   * @throws IOException
    */
-  public static ForwardIndexCreator getRawIndexCreatorForColumn(File file, ChunkCompressionType compressionType,
-      String column, DataType dataType, int totalDocs, int lengthOfLongestEntry, boolean deriveNumDocsPerChunk,
+  public static ForwardIndexCreator getRawIndexCreatorForSVColumn(File file,

Review comment:
       (nit) Wrong format

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -756,6 +857,41 @@ public static ForwardIndexCreator getRawIndexCreatorForColumn(File file, ChunkCo
     }
   }
 
+  /**
+   * Helper method to build the raw index creator for the column.
+   * Assumes that column to be indexed is single valued.
+   *
+   * @param file Output index file
+   * @param column Column name
+   * @param totalDocs Total number of documents to index
+   * @param deriveNumDocsPerChunk true if varbyte writer should auto-derive the number of rows
+   *     per chunk
+   * @param writerVersion version to use for the raw index writer
+   * @param maxRowLengthInBytes the length of the longest row in bytes
+   * @return raw index creator
+   */
+  public static ForwardIndexCreator getRawIndexCreatorForMVColumn(File file, ChunkCompressionType compressionType,
+      String column, DataType dataType, final int totalDocs,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, int writerVersion,
+      int maxRowLengthInBytes)
+      throws IOException {
+    switch (dataType.getStoredType()) {
+      case INT:
+      case LONG:
+      case FLOAT:
+      case DOUBLE:
+        return new MultiValueFixedByteRawIndexCreator(file, compressionType, column, totalDocs, dataType,

Review comment:
       (nit) The `dataType` here is already the stored type; we don't need to pass the size info for fixed length type because it can be inferred from the type

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BytesColumnPredIndexStatsCollector.java
##########
@@ -42,16 +43,32 @@ public BytesColumnPredIndexStatsCollector(String column, StatsCollectorConfig st
 
   @Override
   public void collect(Object entry) {
-    ByteArray value = new ByteArray((byte[]) entry);
-    addressSorted(value);
-    updatePartition(value);
-    _values.add(value);
-
-    int length = value.length();
-    _minLength = Math.min(_minLength, length);
-    _maxLength = Math.max(_maxLength, length);
-
-    _totalNumberOfEntries++;
+    if (entry instanceof Object[]) {
+      Object[] values = (Object[]) entry;
+      int rowLength = 0;
+      for (Object obj : values) {
+        ByteArray value = new ByteArray((byte[]) obj);
+        _values.add(value);
+        int length = value.length();
+        _minLength = Math.min(_minLength, length);
+        _maxLength = Math.max(_maxLength, length);
+        rowLength += length;

Review comment:
       Should we count the actual encoded bytes length? We need to add (1 + length) integers to this. Same for `STRING` type

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
     int numDocs = _originalSegmentMetadata.getTotalDocs();
     int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
     try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
-        .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+        .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,

Review comment:
       It already checks for SV on line 129. Maybe add a TODO to support MV?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
  * The layout of the file is as follows:
  * <p> Header Section: </p>
  * <ul>
- *   <li> Integer: File format version. </li>
- *   <li> Integer: Total number of chunks. </li>
- *   <li> Integer: Number of docs per chunk. </li>
- *   <li> Integer: Length of longest entry (in bytes). </li>
- *   <li> Integer: Total number of docs (version 2 onwards). </li>
- *   <li> Integer: Compression type enum value (version 2 onwards). </li>
- *   <li> Integer: Start offset of data header (version 2 onwards). </li>
- *   <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- *   Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>

Review comment:
       This should be reverted

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the length in bytes of the largest row
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxRowLengthInBytes)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);

Review comment:
       (Major) I don't think this is correct. This will always have one doc per chunk, which can cause very small chunks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735154847



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
-    // If buffer filled, then compress and write to file.
-    if (_chunkHeaderOffset == _chunkHeaderSize) {
-      writeChunk();
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+  public void putStrings(String[] values) {
+    // the entire String[] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the strings into the data buffer as if it's a single string,
+    // but with its own embedded header so offsets to strings within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i].getBytes(UTF_8);
+      _chunkBuffer.putInt(h, utf8.length);
+      _chunkBuffer.put(utf8);
+      bodySize += utf8.length;
     }
+    _chunkDataOffSet += headerSize + bodySize;
+    // go back to write the number of strings embedded in the big string
+    _chunkBuffer.putInt(headerPosition, values.length);
+
+    writeChunkIfNecessary();
   }
 
-  @Override
-  public void close()
-      throws IOException {
+  public void putByteArrays(byte[][] values) {
+    // the entire byte[][] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the byte[]s into the data buffer as if it's a single byte[],
+    // but with its own embedded header so offsets to byte[]s within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i];

Review comment:
       This is for the BYTES MV type, and I believe this piece of code is copy pasted from the STRING MV. For BYTES MV the value is not utf8, and this can cause confusion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735165884



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the length in bytes of the largest row
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxRowLengthInBytes)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);

Review comment:
       These are two separate issues. The maximum chunk length estimate needs to include the length terminators, but this calculation does not mask that bug.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (76ca7e3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.92%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.56%   14.64%   -56.93%     
   + Complexity     3881       80     -3801     
   =============================================
     Files          1559     1516       -43     
     Lines         79053    77499     -1554     
     Branches      11706    11547      -159     
   =============================================
   - Hits          56575    11346    -45229     
   - Misses        18669    65337    +46668     
   + Partials       3809      816     -2993     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.64% <0.00%> (-0.07%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1254 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...76ca7e3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732254388



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -95,6 +96,62 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake of memory efficiency

Review comment:
       @kishoreg I moved the construction of the MV `STRING`/`BYTES` into here to avoid excessive allocation and multiple passes. 
   
   This points to how to make this class a bit more memory efficient in general - we only need to guarantee fixed capacity for the offsets, which are fixed width, and we can write the variable length body page by page.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947836855


   I had to force derivation of `numDocs` for variable length data because there's no good solution to the buffer size problem given the following constraints:
   
   * There is a fixed number of documents per chunk
   * We don't want to OOM if there is a very large row in a segment, and applying an arbitrary multiplier amplifies this risk
   * Compression is applied at a chunk level, not intrachunk
   * The compression libraries all require a single buffer
   
   When there is a very large row (> 1MB) we end up with 1 doc per chunk in the segment. The only good solution is to evolve the forward index format to allow variable numbers of docs per chunk for variable length data, but we can do that later if this becomes a problem,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e9ccb61) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `6.46%`.
   > The diff coverage is `75.42%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.59%   65.13%   -6.47%     
   - Complexity     3882     3940      +58     
   ============================================
     Files          1559     1516      -43     
     Lines         79025    77470    -1555     
     Branches      11702    11544     -158     
   ============================================
   - Hits          56579    50460    -6119     
   - Misses        18639    23428    +4789     
   + Partials       3807     3582     -225     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.59% <75.42%> (+<0.01%)` | :arrow_up: |
   | unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [378 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732255082



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);
+    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;

Review comment:
       This is currently a dangerous overestimate, we need a way to use sublinear memory for the body before we can merge this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732256498



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxTotalContentLength max total content length
+   * @param maxElements max number of elements
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+        maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLength max length for each entry
+   * @param maxElements max number of elements
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+   *     chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType,
+      int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    int numDocsPerChunk =
+        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+        numDocsPerChunk, totalMaxLength,
+        writerVersion);
+    _valueType = valueType;
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+    int overheadPerEntry =
+        lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+  }
+
+  @Override
+  public boolean isDictionaryEncoded() {
+    return false;
+  }
+
+  @Override
+  public boolean isSingleValue() {
+    return false;
+  }
+
+  @Override
+  public DataType getValueType() {
+    return _valueType;
+  }
+
+  @Override
+  public void putStringMV(final String[] values) {
+    int totalBytes = 0;
+    for (int i = 0; i < values.length; i++) {
+      final String value = values[i];
+      int length = value.getBytes().length;
+      totalBytes += length;
+    }
+    byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+        + totalBytes]; //numValues, length array, concatenated bytes
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the length of each element
+    for (final String value : values) {
+      byteBuffer.putInt(value.getBytes().length);
+    }
+    //write the content of each element
+    //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+    // as needed
+    for (final String value : values) {
+      byteBuffer.put(value.getBytes());

Review comment:
       This code has gone, but I have used this method in the new place. Thanks for the pointer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732134552



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxTotalContentLength max total content length
+   * @param maxElements max number of elements
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+        maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLength max length for each entry
+   * @param maxElements max number of elements
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+   *     chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType,
+      int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    int numDocsPerChunk =
+        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+        numDocsPerChunk, totalMaxLength,
+        writerVersion);
+    _valueType = valueType;
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+    int overheadPerEntry =
+        lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+  }
+
+  @Override
+  public boolean isDictionaryEncoded() {
+    return false;
+  }
+
+  @Override
+  public boolean isSingleValue() {
+    return false;
+  }
+
+  @Override
+  public DataType getValueType() {
+    return _valueType;
+  }
+
+  @Override
+  public void putStringMV(final String[] values) {
+    int totalBytes = 0;
+    for (int i = 0; i < values.length; i++) {
+      final String value = values[i];
+      int length = value.getBytes().length;
+      totalBytes += length;
+    }
+    byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+        + totalBytes]; //numValues, length array, concatenated bytes
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the length of each element
+    for (final String value : values) {
+      byteBuffer.putInt(value.getBytes().length);
+    }
+    //write the content of each element
+    //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+    // as needed
+    for (final String value : values) {
+      byteBuffer.put(value.getBytes());

Review comment:
       ```suggestion
         byteBuffer.put(StringUtils.encodeUtf8(value));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735155046



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];

Review comment:
       This is the convention for MV default value, where we put a single element array of default value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735157476



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the length in bytes of the largest row
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxRowLengthInBytes)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);

Review comment:
       This `totalMaxLength` is used as the longest entry size, and with the current code the lower bound of it is (1M + 4). Given this longest entry size, we will always get one doc per chunk.
   The proper way to decide the `numDocsPerChunk` should be calling `getNumDocsPerChunk(maxRowLength)` (`maxRowLength` should count the length metadata size). The problem here is that we didn't count the length metadata size in the `maxRowLength` during stats collection.
   Because we always over-allocate the row length to 1M, we hide the bug of not tracking the correct row length. If a very large value is encountered (>1M), the bug should be revealed, and the buffer allocated will be smaller than the actual decompressed data.
   
   Since we already come up with the new format in #7616 which does not require these stats, we can probably ignore this issue for now and directly use the new format to support the MV types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (95c9aa3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.91%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.56%   14.65%   -56.92%     
   + Complexity     3881       80     -3801     
   =============================================
     Files          1559     1517       -42     
     Lines         79053    77484     -1569     
     Branches      11706    11544      -162     
   =============================================
   - Hits          56575    11354    -45221     
   - Misses        18669    65319    +46650     
   + Partials       3809      811     -2998     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.65% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1256 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...95c9aa3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090310



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the length in bytes of the largest row
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxRowLengthInBytes)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);

Review comment:
       No, it will produce the _maximum_ of `maxRowLengthInBytes` and `MAX_TARGET_CHUNK_SIZE`.  This is a deficiency of the format which makes us choose between large chunks (in bytes) and small chunks (in numbers of documents) because the number of documents is fixed. This intentionally limits the size of the chunk, and is the motivation for the upcoming work to improve the chunk format.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089808



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -40,4 +40,9 @@ public int compress(ByteBuffer inUncompressed, ByteBuffer outCompressed)
     outCompressed.flip();
     return compressedSize;
   }
+
+  @Override
+  public int maxCompressedSize(int uncompressedSize) {
+    return 2 * uncompressedSize;

Review comment:
       You're commenting on outdated code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e9ccb61) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `56.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.59%   14.64%   -56.96%     
   + Complexity     3882       80     -3802     
   =============================================
     Files          1559     1516       -43     
     Lines         79025    77470     -1555     
     Branches      11702    11544      -158     
   =============================================
   - Hits          56579    11345    -45234     
   - Misses        18639    65312    +46673     
   + Partials       3807      813     -2994     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1253 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7210086) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `3.01%`.
   > The diff coverage is `77.47%`.
   
   > :exclamation: Current head 7210086 differs from pull request most recent head d8bd2ad. Consider uploading reports for the commit d8bd2ad to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.59%   68.58%   -3.02%     
   + Complexity     3882     3857      -25     
   ============================================
     Files          1559     1167     -392     
     Lines         79025    57068   -21957     
     Branches      11702     8752    -2950     
   ============================================
   - Hits          56579    39138   -17441     
   + Misses        18639    15152    -3487     
   + Partials       3807     2778    -1029     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.58% <77.47%> (-0.01%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...reaming/StreamingSelectionOnlyCombineOperator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9zdHJlYW1pbmcvU3RyZWFtaW5nU2VsZWN0aW9uT25seUNvbWJpbmVPcGVyYXRvci5qYXZh) | `0.00% <0.00%> (-70.46%)` | :arrow_down: |
   | [...ore/startree/executor/StarTreeGroupByExecutor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zdGFydHJlZS9leGVjdXRvci9TdGFyVHJlZUdyb3VwQnlFeGVjdXRvci5qYXZh) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
   | ... and [642 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...d8bd2ad](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (814c87a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.42%`.
   > The diff coverage is `73.39%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.56%   65.13%   -6.43%     
   - Complexity     3881     3934      +53     
   ============================================
     Files          1559     1516      -43     
     Lines         79053    77482    -1571     
     Branches      11706    11548     -158     
   ============================================
   - Hits          56575    50470    -6105     
   - Misses        18669    23426    +4757     
   + Partials       3809     3586     -223     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.60% <73.39%> (+0.02%)` | :arrow_up: |
   | unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.60% <62.63%> (-5.06%)` | :arrow_down: |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [366 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...814c87a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091419



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];

Review comment:
       I guess this is an optimisation. I can address this in a cleanup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (349c152) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `40.68%`.
   > The diff coverage is `0.85%`.
   
   > :exclamation: Current head 349c152 differs from pull request most recent head e9ccb61. Consider uploading reports for the commit e9ccb61 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.59%   30.91%   -40.69%     
   =============================================
     Files          1559     1553        -6     
     Lines         79025    78976       -49     
     Branches      11702    11706        +4     
   =============================================
   - Hits          56579    24416    -32163     
   - Misses        18639    52487    +33848     
   + Partials       3807     2073     -1734     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.38% <0.85%> (-0.12%)` | :arrow_down: |
   | integration2 | `27.75% <0.57%> (-0.14%)` | :arrow_down: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [1068 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (814c87a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.91%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.56%   14.64%   -56.92%     
   + Complexity     3881       80     -3801     
   =============================================
     Files          1559     1516       -43     
     Lines         79053    77482     -1571     
     Branches      11706    11548      -158     
   =============================================
   - Hits          56575    11350    -45225     
   - Misses        18669    65318    +46649     
   + Partials       3809      814     -2995     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
   | [...impl/stats/BytesColumnPredIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CeXRlc0NvbHVtblByZWRJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | ... and [1244 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...814c87a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732255437



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,131 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxTotalContentLength max total content length
+   * @param maxElements max number of elements
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+        maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLength max length for each entry
+   * @param maxElements max number of elements
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+   *     chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType,
+      int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;

Review comment:
       This is currently a dangerous overestimate, we need a way to use sublinear memory for the body before we can merge this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7affdf5) into [master](https://codecov.io/gh/apache/pinot/commit/1bd899c9ba45676d1ac25979274391431bdf5ce9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1bd899c) will **decrease** coverage by `40.62%`.
   > The diff coverage is `0.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.56%   30.94%   -40.63%     
   =============================================
     Files          1560     1554        -6     
     Lines         79035    78986       -49     
     Branches      11702    11706        +4     
   =============================================
   - Hits          56565    24442    -32123     
   - Misses        18660    52460    +33800     
   + Partials       3810     2084     -1726     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.17% <0.85%> (-0.10%)` | :arrow_down: |
   | integration2 | `27.79% <0.56%> (+0.02%)` | :arrow_up: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [1063 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1bd899c...7affdf5](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8d02f6a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `2.99%`.
   > The diff coverage is `75.42%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.56%   68.57%   -3.00%     
   + Complexity     3881     3857      -24     
   ============================================
     Files          1559     1168     -391     
     Lines         79053    57038   -22015     
     Branches      11706     8748    -2958     
   ============================================
   - Hits          56575    39111   -17464     
   + Misses        18669    15155    -3514     
   + Partials       3809     2772    -1037     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.57% <75.42%> (-0.01%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [624 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...8d02f6a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734373632



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
     int numDocs = _originalSegmentMetadata.getTotalDocs();
     int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
     try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
-        .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+        .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,

Review comment:
       Can you clarify what your concern is?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947486189


   Can the other PR wait for this and then rebase? The work done here is intended to prevent OOM, and the common commits can’t be merged without the rest of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732254388



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -95,6 +96,62 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake of memory efficiency

Review comment:
       @kishoreg I moved the construction of the MV `STRING`/`BYTES` into here to avoid excessive allocation and multiple passes. 
   
   This points to how to make this class a bit more memory efficient in general - we only need to guarantee fixed capacity for the offsets, which are fixed width, and we can write the variable length body page by page.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (95c9aa3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.40%`.
   > The diff coverage is `75.42%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.56%   65.15%   -6.41%     
   - Complexity     3881     3942      +61     
   ============================================
     Files          1559     1517      -42     
     Lines         79053    77484    -1569     
     Branches      11706    11544     -162     
   ============================================
   - Hits          56575    50488    -6087     
   - Misses        18669    23413    +4744     
   + Partials       3809     3583     -226     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.61% <75.42%> (+0.03%)` | :arrow_up: |
   | unittests2 | `14.65% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [387 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...95c9aa3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089763



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
  * The layout of the file is as follows:
  * <p> Header Section: </p>
  * <ul>
- *   <li> Integer: File format version. </li>
- *   <li> Integer: Total number of chunks. </li>
- *   <li> Integer: Number of docs per chunk. </li>
- *   <li> Integer: Length of longest entry (in bytes). </li>
- *   <li> Integer: Total number of docs (version 2 onwards). </li>
- *   <li> Integer: Compression type enum value (version 2 onwards). </li>
- *   <li> Integer: Start offset of data header (version 2 onwards). </li>
- *   <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- *   Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>

Review comment:
       It already has been reverted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735435633



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732959430



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/VarByteChunkMVForwardIndexReader.java
##########
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers.forward;
+
+import java.nio.ByteBuffer;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+/**
+ * Chunk-based single-value raw (non-dictionary-encoded) forward index reader for values of
+ * variable
+ * length data type
+ * (STRING, BYTES).
+ * <p>For data layout, please refer to the documentation for {@link VarByteChunkSVForwardIndexWriter}
+ */
+public final class VarByteChunkMVForwardIndexReader extends BaseChunkSVForwardIndexReader {
+
+  private static final int ROW_OFFSET_SIZE = VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+
+  private final int _maxChunkSize;
+
+  public VarByteChunkMVForwardIndexReader(PinotDataBuffer dataBuffer, DataType valueType) {
+    super(dataBuffer, valueType);
+    _maxChunkSize = _numDocsPerChunk * (ROW_OFFSET_SIZE + _lengthOfLongestEntry);
+  }
+
+  @Nullable
+  @Override
+  public ChunkReaderContext createContext() {
+    if (_isCompressed) {
+      return new ChunkReaderContext(_maxChunkSize);
+    } else {
+      return null;
+    }
+  }
+
+  @Override
+  public int getStringMV(final int docId, final String[] valueBuffer,
+      final ChunkReaderContext context) {
+    byte[] compressedBytes;
+    if (_isCompressed) {
+      compressedBytes = getBytesCompressed(docId, context);
+    } else {
+      compressedBytes = getBytesUncompressed(docId);
+    }
+    ByteBuffer byteBuffer = ByteBuffer.wrap(compressedBytes);
+    int numValues = byteBuffer.getInt();
+    int contentOffset = (numValues + 1) * Integer.BYTES;
+    for (int i = 0; i < numValues; i++) {
+      int length = byteBuffer.getInt((i + 1) * Integer.BYTES);
+      byte[] bytes = new byte[length];
+      byteBuffer.position(contentOffset);
+      byteBuffer.get(bytes, 0, length);
+      valueBuffer[i] = new String(bytes);

Review comment:
       TODO: This doesn't specify encoding




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090701



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BytesColumnPredIndexStatsCollector.java
##########
@@ -42,16 +43,32 @@ public BytesColumnPredIndexStatsCollector(String column, StatsCollectorConfig st
 
   @Override
   public void collect(Object entry) {
-    ByteArray value = new ByteArray((byte[]) entry);
-    addressSorted(value);
-    updatePartition(value);
-    _values.add(value);
-
-    int length = value.length();
-    _minLength = Math.min(_minLength, length);
-    _maxLength = Math.max(_maxLength, length);
-
-    _totalNumberOfEntries++;
+    if (entry instanceof Object[]) {
+      Object[] values = (Object[]) entry;
+      int rowLength = 0;
+      for (Object obj : values) {
+        ByteArray value = new ByteArray((byte[]) obj);
+        _values.add(value);
+        int length = value.length();
+        _minLength = Math.min(_minLength, length);
+        _maxLength = Math.max(_maxLength, length);
+        rowLength += length;

Review comment:
       That seems wrong to me, this is the length of the data and it's not known whether it would be length prefixed (+4) or null terminated (+1) here and adding either would prevent the other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (377c7ae) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.46%`.
   > The diff coverage is `76.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.56%   65.10%   -6.47%     
   - Complexity     3881     3935      +54     
   ============================================
     Files          1559     1516      -43     
     Lines         79053    77499    -1554     
     Branches      11706    11549     -157     
   ============================================
   - Hits          56575    50455    -6120     
   - Misses        18669    23451    +4782     
   + Partials       3809     3593     -216     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.57% <76.00%> (-0.01%)` | :arrow_down: |
   | unittests2 | `14.63% <0.00%> (-0.08%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
   | ... and [378 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...377c7ae](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174f00b) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `55.61%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     70.32%   14.70%   -55.62%     
   + Complexity     3882       80     -3802     
   =============================================
     Files          1552     1509       -43     
     Lines         79012    77443     -1569     
     Branches      11705    11547      -158     
   =============================================
   - Hits          55562    11390    -44172     
   - Misses        19644    65228    +45584     
   + Partials       3806      825     -2981     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.70% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
   | [...impl/stats/BytesColumnPredIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CeXRlc0NvbHVtblByZWRJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | ... and [1216 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...174f00b](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947738475


   > This PR has a conflict with #7604 -- we need to figure out the sequencing of these two (duplicate commits for the FWD index).
   
   Hi @atris. Let's get this one in first and then rebase text index support on top of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732869620



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
    *
    */
   protected void writeChunk() {
-    int sizeToWrite;
+    int sizeWritten;
     _chunkBuffer.flip();
 
-    try {
-      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
-      _dataFile.write(_compressedBuffer, _dataOffset);
-      _compressedBuffer.clear();
+    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+    // compress directly in to the mapped output rather keep a large buffer to compress into
+    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,

Review comment:
       this was supposed to be BIG_ENDIAN




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734093576



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
     int numDocs = _originalSegmentMetadata.getTotalDocs();
     int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
     try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
-        .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+        .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,

Review comment:
       The converter should fail gracefully upfront for MV columns?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
  * The layout of the file is as follows:
  * <p> Header Section: </p>
  * <ul>
- *   <li> Integer: File format version. </li>
- *   <li> Integer: Total number of chunks. </li>
- *   <li> Integer: Number of docs per chunk. </li>
- *   <li> Integer: Length of longest entry (in bytes). </li>
- *   <li> Integer: Total number of docs (version 2 onwards). </li>
- *   <li> Integer: Compression type enum value (version 2 onwards). </li>
- *   <li> Integer: Start offset of data header (version 2 onwards). </li>
- *   <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- *   Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>

Review comment:
       Why lose the indentation? Is this yet another style check difference?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -151,8 +154,11 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
       int dataHeaderStart = offset + Integer.BYTES;
       _header.putInt(dataHeaderStart);
     }
+  }
 
-    return headerSize;
+  private static int headerSize(int totalDocs, int numDocsPerChunk, int headerEntryChunkOffsetSize) {

Review comment:
       IIRC, the reason why I had put the headerSize calculation so that in future if someone changes the header they don't miss to update the headerSize (easier to miss if a separate method). Any reason you prefer the latter?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947438669


   This PR has a conflict with #7604 -- we need to figure out the sequencing of these two (duplicate commits for the FWD index).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732869947



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
    * @param chunkSize Size of chunk
    * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
    * @param version version of File
-   * @throws FileNotFoundException
+   * @throws IOException if the file isn't found or can't be mapped
    */
   protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
       int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
-      throws FileNotFoundException {
+      throws IOException {
     Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+    _file = file;
+    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+    _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
     _chunkSize = chunkSize;
     _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
-    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
-    _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
     _chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
-    _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
-    _dataFile = new RandomAccessFile(file, "rw").getChannel();
+    _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+    _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);

Review comment:
       we don't need it, we just compress directly into the output file




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732871439



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,119 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];
+              Arrays.fill((String[]) columnValueToIndex, value);
+            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+              int length = ((byte[][]) columnValueToIndex).length;
+              columnValueToIndex = new byte[length][];
+              Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+            } else {
+              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+            }
+          }
+          switch (forwardIndexCreator.getValueType()) {
+            case INT:
+              if (columnValueToIndex instanceof int[]) {
+                forwardIndexCreator.putIntMV((int[]) columnValueToIndex);
+              } else if (columnValueToIndex instanceof Object[]) {
+                int[] array = new int[((Object[]) columnValueToIndex).length];
+                for (int i = 0; i < array.length; i++) {
+                  array[i] = (Integer) ((Object[]) columnValueToIndex)[i];
+                }
+                forwardIndexCreator.putIntMV(array);
+              } else {
+                //TODO: is this possible?

Review comment:
       OK I will get rid of these, I hadn't addressed this yet




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732881547



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/ColumnStatistics.java
##########
@@ -82,6 +82,13 @@ default boolean isFixedLength() {
    */
   int getMaxNumberOfMultiValues();
 
+  /**
+   * @return the length of the largest row in bytes for variable length types
+   */
+  default int getMaxRowLengthInBytes() {
+    return -1;
+  }

Review comment:
       @kishoreg I've only implemented this for MV bytes and strings, I don't think it's necessary otherwise?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (76ca7e3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.38%`.
   > The diff coverage is `75.42%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     71.56%   65.17%   -6.39%     
   - Complexity     3881     3942      +61     
   ============================================
     Files          1559     1516      -43     
     Lines         79053    77499    -1554     
     Branches      11706    11547     -159     
   ============================================
   - Hits          56575    50513    -6062     
   - Misses        18669    23400    +4731     
   + Partials       3809     3586     -223     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.66% <75.42%> (+0.08%)` | :arrow_up: |
   | unittests2 | `14.64% <0.00%> (-0.07%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [377 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...76ca7e3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090738



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);

Review comment:
       @kishoreg can you explain why you included this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091164



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
   default int getStringMV(int docId, String[] valueBuffer, T context) {
     throw new UnsupportedOperationException();
   }
+
+  /**
+   * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+   * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+   * entry.
+   *
+   * @param docId Document id
+   * @param valueBuffer Value buffer
+   * @param context Reader context
+   * @return Number of values within the multi-value entry
+   */
+  default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+    throw new UnsupportedOperationException();
+  }
+
+  default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {

Review comment:
       I will remove it in a follow up




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d8bd2ad) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `40.58%`.
   > The diff coverage is `0.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7595       +/-   ##
   =============================================
   - Coverage     71.59%   31.01%   -40.59%     
   =============================================
     Files          1559     1553        -6     
     Lines         79025    79022        -3     
     Branches      11702    11710        +8     
   =============================================
   - Hits          56579    24508    -32071     
   - Misses        18639    52417    +33778     
   + Partials       3807     2097     -1710     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.44% <0.85%> (-0.06%)` | :arrow_down: |
   | integration2 | `27.80% <0.57%> (-0.09%)` | :arrow_down: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
   | [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | ... and [1070 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...d8bd2ad](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg merged pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
kishoreg merged pull request #7595:
URL: https://github.com/apache/pinot/pull/7595


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3883a9e) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `5.19%`.
   > The diff coverage is `68.31%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     70.32%   65.12%   -5.20%     
   - Complexity     3882     3943      +61     
   ============================================
     Files          1552     1509      -43     
     Lines         79012    77451    -1561     
     Branches      11705    11553     -152     
   ============================================
   - Hits          55562    50438    -5124     
   - Misses        19644    23424    +3780     
   + Partials       3806     3589     -217     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `?` | |
   | unittests1 | `68.62% <68.31%> (+0.05%)` | :arrow_up: |
   | unittests2 | `14.61% <0.00%> (-0.08%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `58.46% <58.46%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.79% <67.24%> (-4.87%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `81.06% <88.23%> (+0.69%)` | :arrow_up: |
   | ... and [333 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...3883a9e](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734372096



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
  * The layout of the file is as follows:
  * <p> Header Section: </p>
  * <ul>
- *   <li> Integer: File format version. </li>
- *   <li> Integer: Total number of chunks. </li>
- *   <li> Integer: Number of docs per chunk. </li>
- *   <li> Integer: Length of longest entry (in bytes). </li>
- *   <li> Integer: Total number of docs (version 2 onwards). </li>
- *   <li> Integer: Compression type enum value (version 2 onwards). </li>
- *   <li> Integer: Start offset of data header (version 2 onwards). </li>
- *   <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- *   Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>

Review comment:
       Blame @kishoreg's IDE settings :-) I fixed this in the third commit on the PR but it came back after rebasing, will fix again




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734373235



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -151,8 +154,11 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
       int dataHeaderStart = offset + Integer.BYTES;
       _header.putInt(dataHeaderStart);
     }
+  }
 
-    return headerSize;
+  private static int headerSize(int totalDocs, int numDocsPerChunk, int headerEntryChunkOffsetSize) {

Review comment:
       I wanted the `_header` buffer to be final, and even if this was the original intention, `headerSize` did not depend on the what was put in to the buffer, so the intent must have been lost at some point. Ultimately, if this number and what's put in to the header get out of sync, tests will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174f00b) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `5.13%`.
   > The diff coverage is `73.39%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7595      +/-   ##
   ============================================
   - Coverage     70.32%   65.18%   -5.14%     
   - Complexity     3882     3933      +51     
   ============================================
     Files          1552     1509      -43     
     Lines         79012    77443    -1569     
     Branches      11705    11547     -158     
   ============================================
   - Hits          55562    50479    -5083     
   - Misses        19644    23365    +3721     
   + Partials       3806     3599     -207     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `?` | |
   | unittests1 | `68.58% <73.39%> (+0.01%)` | :arrow_up: |
   | unittests2 | `14.70% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
   | [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
   | [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
   | [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.60% <62.63%> (-5.06%)` | :arrow_down: |
   | [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
   | [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
   | ... and [332 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...174f00b](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735450047



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the length in bytes of the largest row
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxRowLengthInBytes)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);

Review comment:
       I've just reread this comment while addressing comments on the follow up branch and your maths is off. The lower bound is indeed 1M + 4, but it only leads to 1 doc per chunk when `maxRowLengthInBytes` > 512KB. That's the purpose of the expression; prevent a single large value leading to large buffers or large chunks on disk, and being able to make a better decision here is the motivation for the enhancement proposal.
   
   So there really are two issues here: one (which I've addressed) is not considering the size of length prefixes, and the other (which requires a format change) is how to do chunk sizing properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091466



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];
+              Arrays.fill((String[]) columnValueToIndex, value);
+            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+              int length = ((byte[][]) columnValueToIndex).length;
+              columnValueToIndex = new byte[length][];
+              Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+            } else {
+              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+            }
+          }
+          switch (forwardIndexCreator.getValueType()) {
+            case INT:
+              if (columnValueToIndex instanceof int[]) {

Review comment:
       Can address this in a follow up




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732164505



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxTotalContentLength max total content length
+   * @param maxElements max number of elements
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+        maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLength max length for each entry
+   * @param maxElements max number of elements
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+   *     chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType,
+      int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+      throws IOException {
+    //we will prepend the actual content with numElements and length array containing length of each element
+    int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    int numDocsPerChunk =
+        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+        numDocsPerChunk, totalMaxLength,
+        writerVersion);
+    _valueType = valueType;
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+    int overheadPerEntry =
+        lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+  }
+
+  @Override
+  public boolean isDictionaryEncoded() {
+    return false;
+  }
+
+  @Override
+  public boolean isSingleValue() {
+    return false;
+  }
+
+  @Override
+  public DataType getValueType() {
+    return _valueType;
+  }
+
+  @Override
+  public void putStringMV(final String[] values) {
+    int totalBytes = 0;
+    for (int i = 0; i < values.length; i++) {
+      final String value = values[i];
+      int length = value.getBytes().length;
+      totalBytes += length;
+    }
+    byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+        + totalBytes]; //numValues, length array, concatenated bytes
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the length of each element
+    for (final String value : values) {
+      byteBuffer.putInt(value.getBytes().length);
+    }
+    //write the content of each element
+    //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+    // as needed
+    for (final String value : values) {
+      byteBuffer.put(value.getBytes());

Review comment:
       I didn't write this code but thanks for pointing me at that class, which looks like it can be modernised a little bit 👍🏻 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732991368



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);
+    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
+    int numDocsPerChunk =
+        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+        numDocsPerChunk, totalMaxLength, writerVersion);
+    _valueType = valueType;
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+    int overheadPerEntry =
+        lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+  }
+
+  @Override
+  public boolean isDictionaryEncoded() {
+    return false;
+  }
+
+  @Override
+  public boolean isSingleValue() {
+    return false;
+  }
+
+  @Override
+  public DataType getValueType() {
+    return _valueType;
+  }
+
+  @Override
+  public void putIntMV(final int[] values) {
+
+    byte[] bytes = new byte[Integer.BYTES
+        + values.length * Integer.BYTES]; //numValues, bytes required to store the content
+    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+    //write the length
+    byteBuffer.putInt(values.length);
+    //write the content of each element
+    for (final int value : values) {
+      byteBuffer.putInt(value);
+    }
+    _indexWriter.putBytes(bytes);

Review comment:
       This should not require allocation of a temporary buffer, this could just be implemented as an MV pattern on `_indexWriter`, just as was done to eliminate the much larger buffers for `byte[][]` and `String[]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on a change in pull request #7595: MV fwd index + MV `BYTES`

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732861201



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
    * @param chunkSize Size of chunk
    * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
    * @param version version of File
-   * @throws FileNotFoundException
+   * @throws IOException if the file isn't found or can't be mapped
    */
   protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
       int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
-      throws FileNotFoundException {
+      throws IOException {
     Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+    _file = file;
+    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+    _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
     _chunkSize = chunkSize;
     _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
-    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
-    _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
     _chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
-    _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
-    _dataFile = new RandomAccessFile(file, "rw").getChannel();
+    _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+    _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);

Review comment:
       where is the compressedBuffer getting initialized or we dont need that anymore?

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,119 @@ public void indexRow(GenericRow row)
           }
         }
       } else {
-        // MV column (always dictionary encoded)
-        int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-        forwardIndexCreator.putDictIdMV(dictIds);
-        DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
-        if (invertedIndexCreator != null) {
-          invertedIndexCreator.add(dictIds, dictIds.length);
+        if (dictionaryCreator != null) {
+          //dictionary encoded
+          int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+          forwardIndexCreator.putDictIdMV(dictIds);
+          DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+              .get(columnName);
+          if (invertedIndexCreator != null) {
+            invertedIndexCreator.add(dictIds, dictIds.length);
+          }
+        } else {
+          // for text index on raw columns, check the config to determine if actual raw value should
+          // be stored or not
+          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+            Object value = _columnProperties.get(columnName)
+                .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+            if (value == null) {
+              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+            }
+            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+              value = String.valueOf(value);
+              int length = ((String[]) columnValueToIndex).length;
+              columnValueToIndex = new String[length];
+              Arrays.fill((String[]) columnValueToIndex, value);
+            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+              int length = ((byte[][]) columnValueToIndex).length;
+              columnValueToIndex = new byte[length][];
+              Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+            } else {
+              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+            }
+          }
+          switch (forwardIndexCreator.getValueType()) {
+            case INT:
+              if (columnValueToIndex instanceof int[]) {
+                forwardIndexCreator.putIntMV((int[]) columnValueToIndex);
+              } else if (columnValueToIndex instanceof Object[]) {
+                int[] array = new int[((Object[]) columnValueToIndex).length];
+                for (int i = 0; i < array.length; i++) {
+                  array[i] = (Integer) ((Object[]) columnValueToIndex)[i];
+                }
+                forwardIndexCreator.putIntMV(array);
+              } else {
+                //TODO: is this possible?

Review comment:
       I looked at the code and it should not enter this path

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
    *
    */
   protected void writeChunk() {
-    int sizeToWrite;
+    int sizeWritten;
     _chunkBuffer.flip();
 
-    try {
-      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
-      _dataFile.write(_compressedBuffer, _dataOffset);
-      _compressedBuffer.clear();
+    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+    // compress directly in to the mapped output rather keep a large buffer to compress into
+    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,

Review comment:
       I think the existing ones were written using BIG_ENDIAN, we might need to up the version if we are changing the byteorder and handle both cases for backwards compatibility.
   
   Might be better to keep the same byteorder as the existing one.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090988



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
   default int getStringMV(int docId, String[] valueBuffer, T context) {
     throw new UnsupportedOperationException();
   }
+
+  /**
+   * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+   * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+   * entry.
+   *
+   * @param docId Document id
+   * @param valueBuffer Value buffer
+   * @param context Reader context
+   * @return Number of values within the multi-value entry
+   */
+  default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+    throw new UnsupportedOperationException();
+  }
+
+  default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {

Review comment:
       Yes I hadn't noticed this from the initial commits @kishoreg made.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091052



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+  private final VarByteChunkSVForwardIndexWriter _indexWriter;
+  private final DataType _valueType;
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column,
+      int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements)
+      throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+        maxNumberOfMultiValueElements, false,
+        BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+  }
+
+  /**
+   * Create a var-byte raw index creator for the given column
+   *
+   * @param baseIndexDir Index directory
+   * @param compressionType Type of compression to use
+   * @param column Name of column to index
+   * @param totalDocs Total number of documents to index
+   * @param valueType Type of the values
+   * @param maxLengthOfEachEntry length of longest entry (in bytes)
+   * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+   * @param writerVersion writer format version
+   */
+  public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+      String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+      int writerVersion)
+      throws IOException {
+    File file = new File(baseIndexDir,
+        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    FileUtils.deleteQuietly(file);
+    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;

Review comment:
       Good catch. I will add this in a follow up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089918



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
    *
    */
   protected void writeChunk() {
-    int sizeToWrite;
+    int sizeWritten;
     _chunkBuffer.flip();
 
-    try {
-      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
-      _dataFile.write(_compressedBuffer, _dataOffset);
-      _compressedBuffer.clear();
+    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+    // compress directly in to the mapped output rather keep a large buffer to compress into
+    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
+        maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {

Review comment:
       Could do, and did at some point, but I don't think this is important. This way it's more succinct to use `AutoCloseable`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089959



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
-    // If buffer filled, then compress and write to file.
-    if (_chunkHeaderOffset == _chunkHeaderSize) {
-      writeChunk();
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+  public void putStrings(String[] values) {
+    // the entire String[] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the strings into the data buffer as if it's a single string,
+    // but with its own embedded header so offsets to strings within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i].getBytes(UTF_8);
+      _chunkBuffer.putInt(h, utf8.length);
+      _chunkBuffer.put(utf8);
+      bodySize += utf8.length;
     }
+    _chunkDataOffSet += headerSize + bodySize;
+    // go back to write the number of strings embedded in the big string
+    _chunkBuffer.putInt(headerPosition, values.length);
+
+    writeChunkIfNecessary();
   }
 
-  @Override
-  public void close()
-      throws IOException {
+  public void putByteArrays(byte[][] values) {
+    // the entire byte[][] will be encoded as a single string, write the header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the byte[]s into the data buffer as if it's a single byte[],
+    // but with its own embedded header so offsets to byte[]s within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i];

Review comment:
       Why?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org