You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/10/19 10:57:15 UTC
[GitHub] [pinot] richardstartin opened a new pull request #7595: MV fwd index + MV `BYTES`
richardstartin opened a new pull request #7595:
URL: https://github.com/apache/pinot/pull/7595
## Description
## Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
* [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
Does this PR fix a zero-downtime upgrade introduced earlier?
* [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
Does this PR otherwise need attention when creating release notes? Things to consider:
- New configuration options
- Deprecation of configurations
- Signature changes to public methods/interfaces
- New plugins added or old plugins removed
* [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
## Release Notes
<!-- If you have tagged this as either backward-incompat or release-notes,
you MUST add text here that you would like to see appear in release notes of the
next release. -->
<!-- If you have a series of commits adding or enabling a feature, then
add this section only in final commit that marks the feature completed.
Refer to earlier release notes to see examples of text.
-->
## Documentation
<!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735047516
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/AbstractColumnStatisticsCollector.java
##########
@@ -72,6 +73,10 @@ public int getMaxNumberOfMultiValues() {
return _maxNumberOfMultiValues;
}
+ public int getMaxLengthOfMultiValues() {
Review comment:
This seems not used
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
* @param chunkSize Size of chunk
* @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
* @param version version of File
- * @throws FileNotFoundException
+ * @throws IOException if the file isn't found or can't be mapped
*/
protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
- throws FileNotFoundException {
+ throws IOException {
Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+ _file = file;
+ _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+ _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
_chunkSize = chunkSize;
_chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
- _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
- _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
_chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
- _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
- _dataFile = new RandomAccessFile(file, "rw").getChannel();
+ _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+ _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);
+ writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
Review comment:
Since we directly write the header into the file, probably more readable if we directly pass in the `_dataChannel` and write into it. No need to keep the member variable `_header`
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
*
*/
protected void writeChunk() {
- int sizeToWrite;
+ int sizeWritten;
_chunkBuffer.flip();
- try {
- sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
- _dataFile.write(_compressedBuffer, _dataOffset);
- _compressedBuffer.clear();
+ int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+ // compress directly in to the mapped output rather keep a large buffer to compress into
+ try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
+ maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
Review comment:
I'd suggest directly mapping the file channel to get the `ByteBuffer` instead of getting a `PinotDataBuffer` then create a view out of it.
##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
default int getStringMV(int docId, String[] valueBuffer, T context) {
throw new UnsupportedOperationException();
}
+
+ /**
+ * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+ * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+ * entry.
+ *
+ * @param docId Document id
+ * @param valueBuffer Value buffer
+ * @param context Reader context
+ * @return Number of values within the multi-value entry
+ */
+ default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+ throw new UnsupportedOperationException();
+ }
+
+ default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {
Review comment:
Remove this?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
_chunkBuffer.put(value);
_chunkDataOffSet += value.length;
- // If buffer filled, then compress and write to file.
- if (_chunkHeaderOffset == _chunkHeaderSize) {
- writeChunk();
+ writeChunkIfNecessary();
+ }
+
+ // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+ public void putStrings(String[] values) {
+ // the entire String[] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the strings into the data buffer as if it's a single string,
+ // but with its own embedded header so offsets to strings within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i].getBytes(UTF_8);
+ _chunkBuffer.putInt(h, utf8.length);
+ _chunkBuffer.put(utf8);
+ bodySize += utf8.length;
}
+ _chunkDataOffSet += headerSize + bodySize;
+ // go back to write the number of strings embedded in the big string
+ _chunkBuffer.putInt(headerPosition, values.length);
+
+ writeChunkIfNecessary();
}
- @Override
- public void close()
- throws IOException {
+ public void putByteArrays(byte[][] values) {
+ // the entire byte[][] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the byte[]s into the data buffer as if it's a single byte[],
+ // but with its own embedded header so offsets to byte[]s within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i];
Review comment:
(nit) rename the variable
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
Review comment:
(nit) the value type should already be the stored type
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
+ Arrays.fill((String[]) columnValueToIndex, value);
+ } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+ int length = ((byte[][]) columnValueToIndex).length;
+ columnValueToIndex = new byte[length][];
+ Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+ } else {
+ throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+ }
+ }
+ switch (forwardIndexCreator.getValueType()) {
+ case INT:
+ if (columnValueToIndex instanceof int[]) {
Review comment:
This will always be `Object[]`. We won't pass primitive array to this class
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
+ int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+ numDocsPerChunk, totalMaxLength, writerVersion);
+ _valueType = valueType;
+ }
+
+ @VisibleForTesting
+ public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+ int overheadPerEntry =
+ lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+ }
+
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
+ @Override
+ public boolean isSingleValue() {
+ return false;
+ }
+
+ @Override
+ public DataType getValueType() {
+ return _valueType;
+ }
+
+ @Override
+ public void putIntMV(final int[] values) {
+
+ byte[] bytes = new byte[Integer.BYTES
+ + values.length * Integer.BYTES]; //numValues, bytes required to store the content
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the content of each element
+ for (final int value : values) {
+ byteBuffer.putInt(value);
+ }
+ _indexWriter.putBytes(bytes);
+ }
+
+ @Override
+ public void putLongMV(final long[] values) {
+
+ byte[] bytes = new byte[Integer.BYTES
+ + values.length * Long.BYTES]; //numValues, bytes required to store the content
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the content of each element
+ for (final long value : values) {
+ byteBuffer.putLong(value);
+ }
+ _indexWriter.putBytes(bytes);
+ }
+
+ @Override
+ public void putFloatMV(final float[] values) {
+
+ byte[] bytes = new byte[Integer.BYTES
+ + values.length * Float.BYTES]; //numValues, bytes required to store the content
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the content of each element
+ for (final float value : values) {
+ byteBuffer.putFloat(value);
+ }
+ _indexWriter.putBytes(bytes);
+ }
+
+ @Override
+ public void putDoubleMV(final double[] values) {
+
+ byte[] bytes = new byte[Integer.BYTES
+ + values.length * Long.BYTES]; //numValues, bytes required to store the content
Review comment:
(nit) `Double.BYTES`
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
+ int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
Review comment:
Include the length integer to the total length?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
Review comment:
We want to put a single-element MV here. Same for BYTES
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -40,4 +40,9 @@ public int compress(ByteBuffer inUncompressed, ByteBuffer outCompressed)
outCompressed.flip();
return compressedSize;
}
+
+ @Override
+ public int maxCompressedSize(int uncompressedSize) {
+ return 2 * uncompressedSize;
Review comment:
Can you put some comments on how this value is calculated?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
Review comment:
(nit) unnecessary?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -734,10 +834,11 @@ public static void removeColumnMetadataInfo(PropertiesConfiguration properties,
* @param deriveNumDocsPerChunk true if varbyte writer should auto-derive the number of rows per chunk
* @param writerVersion version to use for the raw index writer
* @return raw index creator
- * @throws IOException
*/
- public static ForwardIndexCreator getRawIndexCreatorForColumn(File file, ChunkCompressionType compressionType,
- String column, DataType dataType, int totalDocs, int lengthOfLongestEntry, boolean deriveNumDocsPerChunk,
+ public static ForwardIndexCreator getRawIndexCreatorForSVColumn(File file,
Review comment:
(nit) Wrong format
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -756,6 +857,41 @@ public static ForwardIndexCreator getRawIndexCreatorForColumn(File file, ChunkCo
}
}
+ /**
+ * Helper method to build the raw index creator for the column.
+ * Assumes that column to be indexed is single valued.
+ *
+ * @param file Output index file
+ * @param column Column name
+ * @param totalDocs Total number of documents to index
+ * @param deriveNumDocsPerChunk true if varbyte writer should auto-derive the number of rows
+ * per chunk
+ * @param writerVersion version to use for the raw index writer
+ * @param maxRowLengthInBytes the length of the longest row in bytes
+ * @return raw index creator
+ */
+ public static ForwardIndexCreator getRawIndexCreatorForMVColumn(File file, ChunkCompressionType compressionType,
+ String column, DataType dataType, final int totalDocs,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, int writerVersion,
+ int maxRowLengthInBytes)
+ throws IOException {
+ switch (dataType.getStoredType()) {
+ case INT:
+ case LONG:
+ case FLOAT:
+ case DOUBLE:
+ return new MultiValueFixedByteRawIndexCreator(file, compressionType, column, totalDocs, dataType,
Review comment:
(nit) The `dataType` here is already the stored type; we don't need to pass the size info for fixed length type because it can be inferred from the type
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BytesColumnPredIndexStatsCollector.java
##########
@@ -42,16 +43,32 @@ public BytesColumnPredIndexStatsCollector(String column, StatsCollectorConfig st
@Override
public void collect(Object entry) {
- ByteArray value = new ByteArray((byte[]) entry);
- addressSorted(value);
- updatePartition(value);
- _values.add(value);
-
- int length = value.length();
- _minLength = Math.min(_minLength, length);
- _maxLength = Math.max(_maxLength, length);
-
- _totalNumberOfEntries++;
+ if (entry instanceof Object[]) {
+ Object[] values = (Object[]) entry;
+ int rowLength = 0;
+ for (Object obj : values) {
+ ByteArray value = new ByteArray((byte[]) obj);
+ _values.add(value);
+ int length = value.length();
+ _minLength = Math.min(_minLength, length);
+ _maxLength = Math.max(_maxLength, length);
+ rowLength += length;
Review comment:
Should we count the actual encoded bytes length? We need to add (1 + length) integers to this. Same for `STRING` type
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
int numDocs = _originalSegmentMetadata.getTotalDocs();
int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
- .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+ .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
Review comment:
It already checks for SV on line 129. Maybe add a TODO to support MV?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
* The layout of the file is as follows:
* <p> Header Section: </p>
* <ul>
- * <li> Integer: File format version. </li>
- * <li> Integer: Total number of chunks. </li>
- * <li> Integer: Number of docs per chunk. </li>
- * <li> Integer: Length of longest entry (in bytes). </li>
- * <li> Integer: Total number of docs (version 2 onwards). </li>
- * <li> Integer: Compression type enum value (version 2 onwards). </li>
- * <li> Integer: Start offset of data header (version 2 onwards). </li>
- * <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- * Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>
Review comment:
This should be reverted
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the length in bytes of the largest row
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxRowLengthInBytes)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);
Review comment:
(Major) I don't think this is correct. This will always have one doc per chunk, which can cause very small chunks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735154847
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
_chunkBuffer.put(value);
_chunkDataOffSet += value.length;
- // If buffer filled, then compress and write to file.
- if (_chunkHeaderOffset == _chunkHeaderSize) {
- writeChunk();
+ writeChunkIfNecessary();
+ }
+
+ // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+ public void putStrings(String[] values) {
+ // the entire String[] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the strings into the data buffer as if it's a single string,
+ // but with its own embedded header so offsets to strings within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i].getBytes(UTF_8);
+ _chunkBuffer.putInt(h, utf8.length);
+ _chunkBuffer.put(utf8);
+ bodySize += utf8.length;
}
+ _chunkDataOffSet += headerSize + bodySize;
+ // go back to write the number of strings embedded in the big string
+ _chunkBuffer.putInt(headerPosition, values.length);
+
+ writeChunkIfNecessary();
}
- @Override
- public void close()
- throws IOException {
+ public void putByteArrays(byte[][] values) {
+ // the entire byte[][] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the byte[]s into the data buffer as if it's a single byte[],
+ // but with its own embedded header so offsets to byte[]s within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i];
Review comment:
This is for the BYTES MV type, and I believe this piece of code is copy pasted from the STRING MV. For BYTES MV the value is not utf8, and this can cause confusion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735165884
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the length in bytes of the largest row
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxRowLengthInBytes)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);
Review comment:
These are two separate issues. The maximum chunk length estimate needs to include the length terminators, but this calculation does not mask that bug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (76ca7e3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.92%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.56% 14.64% -56.93%
+ Complexity 3881 80 -3801
=============================================
Files 1559 1516 -43
Lines 79053 77499 -1554
Branches 11706 11547 -159
=============================================
- Hits 56575 11346 -45229
- Misses 18669 65337 +46668
+ Partials 3809 816 -2993
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `?` | |
| unittests2 | `14.64% <0.00%> (-0.07%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| ... and [1254 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...76ca7e3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732254388
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -95,6 +96,62 @@ public void putBytes(byte[] value) {
_chunkBuffer.put(value);
_chunkDataOffSet += value.length;
+ writeChunkIfNecessary();
+ }
+
+ // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
Review comment:
@kishoreg I moved the construction of the MV `STRING`/`BYTES` into here to avoid excessive allocation and multiple passes.
This points to how to make this class a bit more memory efficient in general - we only need to guarantee fixed capacity for the offsets, which are fixed width, and we can write the variable length body page by page.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947836855
I had to force derivation of `numDocs` for variable length data because there's no good solution to the buffer size problem given the following constraints:
* There is a fixed number of documents per chunk
* We don't want to OOM if there is a very large row in a segment, and applying an arbitrary multiplier amplifies this risk
* Compression is applied at a chunk level, not intrachunk
* The compression libraries all require a single buffer
When there is a very large row (> 1MB) we end up with 1 doc per chunk in the segment. The only good solution is to evolve the forward index format to allow variable numbers of docs per chunk for variable length data, but we can do that later if this becomes a problem,
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e9ccb61) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `6.46%`.
> The diff coverage is `75.42%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.59% 65.13% -6.47%
- Complexity 3882 3940 +58
============================================
Files 1559 1516 -43
Lines 79025 77470 -1555
Branches 11702 11544 -158
============================================
- Hits 56579 50460 -6119
- Misses 18639 23428 +4789
+ Partials 3807 3582 -225
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.59% <75.42%> (+<0.01%)` | :arrow_up: |
| unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [378 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732255082
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
+ int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
Review comment:
This is currently a dangerous overestimate, we need a way to use sublinear memory for the body before we can merge this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732256498
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxTotalContentLength max total content length
+ * @param maxElements max number of elements
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+ maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLength max length for each entry
+ * @param maxElements max number of elements
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+ * chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType,
+ int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+ numDocsPerChunk, totalMaxLength,
+ writerVersion);
+ _valueType = valueType;
+ }
+
+ @VisibleForTesting
+ public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+ int overheadPerEntry =
+ lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+ }
+
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
+ @Override
+ public boolean isSingleValue() {
+ return false;
+ }
+
+ @Override
+ public DataType getValueType() {
+ return _valueType;
+ }
+
+ @Override
+ public void putStringMV(final String[] values) {
+ int totalBytes = 0;
+ for (int i = 0; i < values.length; i++) {
+ final String value = values[i];
+ int length = value.getBytes().length;
+ totalBytes += length;
+ }
+ byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+ + totalBytes]; //numValues, length array, concatenated bytes
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the length of each element
+ for (final String value : values) {
+ byteBuffer.putInt(value.getBytes().length);
+ }
+ //write the content of each element
+ //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+ // as needed
+ for (final String value : values) {
+ byteBuffer.put(value.getBytes());
Review comment:
This code has gone, but I have used this method in the new place. Thanks for the pointer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] mcvsubbu commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732134552
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxTotalContentLength max total content length
+ * @param maxElements max number of elements
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+ maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLength max length for each entry
+ * @param maxElements max number of elements
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+ * chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType,
+ int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+ numDocsPerChunk, totalMaxLength,
+ writerVersion);
+ _valueType = valueType;
+ }
+
+ @VisibleForTesting
+ public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+ int overheadPerEntry =
+ lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+ }
+
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
+ @Override
+ public boolean isSingleValue() {
+ return false;
+ }
+
+ @Override
+ public DataType getValueType() {
+ return _valueType;
+ }
+
+ @Override
+ public void putStringMV(final String[] values) {
+ int totalBytes = 0;
+ for (int i = 0; i < values.length; i++) {
+ final String value = values[i];
+ int length = value.getBytes().length;
+ totalBytes += length;
+ }
+ byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+ + totalBytes]; //numValues, length array, concatenated bytes
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the length of each element
+ for (final String value : values) {
+ byteBuffer.putInt(value.getBytes().length);
+ }
+ //write the content of each element
+ //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+ // as needed
+ for (final String value : values) {
+ byteBuffer.put(value.getBytes());
Review comment:
```suggestion
byteBuffer.put(StringUtils.encodeUtf8(value));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735155046
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
Review comment:
This is the convention for MV default value, where we put a single element array of default value
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735157476
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the length in bytes of the largest row
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxRowLengthInBytes)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);
Review comment:
This `totalMaxLength` is used as the longest entry size, and with the current code the lower bound of it is (1M + 4). Given this longest entry size, we will always get one doc per chunk.
The proper way to decide the `numDocsPerChunk` should be calling `getNumDocsPerChunk(maxRowLength)` (`maxRowLength` should count the length metadata size). The problem here is that we didn't count the length metadata size in the `maxRowLength` during stats collection.
Because we always over-allocate the row length to 1M, we hide the bug of not tracking the correct row length. If a very large value is encountered (>1M), the bug should be revealed, and the buffer allocated will be smaller than the actual decompressed data.
Since we already come up with the new format in #7616 which does not require these stats, we can probably ignore this issue for now and directly use the new format to support the MV types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (95c9aa3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.91%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.56% 14.65% -56.92%
+ Complexity 3881 80 -3801
=============================================
Files 1559 1517 -42
Lines 79053 77484 -1569
Branches 11706 11544 -162
=============================================
- Hits 56575 11354 -45221
- Misses 18669 65319 +46650
+ Partials 3809 811 -2998
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `?` | |
| unittests2 | `14.65% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| ... and [1256 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...95c9aa3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090310
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the length in bytes of the largest row
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxRowLengthInBytes)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);
Review comment:
No, it will produce the _maximum_ of `maxRowLengthInBytes` and `MAX_TARGET_CHUNK_SIZE`. This is a deficiency of the format which makes us choose between large chunks (in bytes) and small chunks (in numbers of documents) because the number of documents is fixed. This intentionally limits the size of the chunk, and is the motivation for the upcoming work to improve the chunk format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089808
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -40,4 +40,9 @@ public int compress(ByteBuffer inUncompressed, ByteBuffer outCompressed)
outCompressed.flip();
return compressedSize;
}
+
+ @Override
+ public int maxCompressedSize(int uncompressedSize) {
+ return 2 * uncompressedSize;
Review comment:
You're commenting on outdated code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e9ccb61) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `56.95%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.59% 14.64% -56.96%
+ Complexity 3882 80 -3802
=============================================
Files 1559 1516 -43
Lines 79025 77470 -1555
Branches 11702 11544 -158
=============================================
- Hits 56579 11345 -45234
- Misses 18639 65312 +46673
+ Partials 3807 813 -2994
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `?` | |
| unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| ... and [1253 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7210086) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `3.01%`.
> The diff coverage is `77.47%`.
> :exclamation: Current head 7210086 differs from pull request most recent head d8bd2ad. Consider uploading reports for the commit d8bd2ad to get more accurate results
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.59% 68.58% -3.02%
+ Complexity 3882 3857 -25
============================================
Files 1559 1167 -392
Lines 79025 57068 -21957
Branches 11702 8752 -2950
============================================
- Hits 56579 39138 -17441
+ Misses 18639 15152 -3487
+ Partials 3807 2778 -1029
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.58% <77.47%> (-0.01%)` | :arrow_down: |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...reaming/StreamingSelectionOnlyCombineOperator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9zdHJlYW1pbmcvU3RyZWFtaW5nU2VsZWN0aW9uT25seUNvbWJpbmVPcGVyYXRvci5qYXZh) | `0.00% <0.00%> (-70.46%)` | :arrow_down: |
| [...ore/startree/executor/StarTreeGroupByExecutor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zdGFydHJlZS9leGVjdXRvci9TdGFyVHJlZUdyb3VwQnlFeGVjdXRvci5qYXZh) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
| ... and [642 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...d8bd2ad](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (814c87a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.42%`.
> The diff coverage is `73.39%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.56% 65.13% -6.43%
- Complexity 3881 3934 +53
============================================
Files 1559 1516 -43
Lines 79053 77482 -1571
Branches 11706 11548 -158
============================================
- Hits 56575 50470 -6105
- Misses 18669 23426 +4757
+ Partials 3809 3586 -223
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.60% <73.39%> (+0.02%)` | :arrow_up: |
| unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.60% <62.63%> (-5.06%)` | :arrow_down: |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [366 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...814c87a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091419
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
Review comment:
I guess this is an optimisation. I can address this in a cleanup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (349c152) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `40.68%`.
> The diff coverage is `0.85%`.
> :exclamation: Current head 349c152 differs from pull request most recent head e9ccb61. Consider uploading reports for the commit e9ccb61 to get more accurate results
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.59% 30.91% -40.69%
=============================================
Files 1559 1553 -6
Lines 79025 78976 -49
Branches 11702 11706 +4
=============================================
- Hits 56579 24416 -32163
- Misses 18639 52487 +33848
+ Partials 3807 2073 -1734
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `29.38% <0.85%> (-0.12%)` | :arrow_down: |
| integration2 | `27.75% <0.57%> (-0.14%)` | :arrow_down: |
| unittests1 | `?` | |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| ... and [1068 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...e9ccb61](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (814c87a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `56.91%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.56% 14.64% -56.92%
+ Complexity 3881 80 -3801
=============================================
Files 1559 1516 -43
Lines 79053 77482 -1571
Branches 11706 11548 -158
=============================================
- Hits 56575 11350 -45225
- Misses 18669 65318 +46649
+ Partials 3809 814 -2995
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `?` | |
| unittests2 | `14.64% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
| [...impl/stats/BytesColumnPredIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CeXRlc0NvbHVtblByZWRJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | |
| ... and [1244 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...814c87a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732255437
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,131 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxTotalContentLength max total content length
+ * @param maxElements max number of elements
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+ maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLength max length for each entry
+ * @param maxElements max number of elements
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+ * chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType,
+ int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
Review comment:
This is currently a dangerous overestimate, we need a way to use sublinear memory for the body before we can merge this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7affdf5) into [master](https://codecov.io/gh/apache/pinot/commit/1bd899c9ba45676d1ac25979274391431bdf5ce9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1bd899c) will **decrease** coverage by `40.62%`.
> The diff coverage is `0.85%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.56% 30.94% -40.63%
=============================================
Files 1560 1554 -6
Lines 79035 78986 -49
Branches 11702 11706 +4
=============================================
- Hits 56565 24442 -32123
- Misses 18660 52460 +33800
+ Partials 3810 2084 -1726
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `29.17% <0.85%> (-0.10%)` | :arrow_down: |
| integration2 | `27.79% <0.56%> (+0.02%)` | :arrow_up: |
| unittests1 | `?` | |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| ... and [1063 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1bd899c...7affdf5](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8d02f6a) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `2.99%`.
> The diff coverage is `75.42%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.56% 68.57% -3.00%
+ Complexity 3881 3857 -24
============================================
Files 1559 1168 -391
Lines 79053 57038 -22015
Branches 11706 8748 -2958
============================================
- Hits 56575 39111 -17464
+ Misses 18669 15155 -3514
+ Partials 3809 2772 -1037
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.57% <75.42%> (-0.01%)` | :arrow_down: |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [624 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...8d02f6a](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734373632
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
int numDocs = _originalSegmentMetadata.getTotalDocs();
int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
- .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+ .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
Review comment:
Can you clarify what your concern is?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947486189
Can the other PR wait for this and then rebase? The work done here is intended to prevent OOM, and the common commits can’t be merged without the rest of this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732254388
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -95,6 +96,62 @@ public void putBytes(byte[] value) {
_chunkBuffer.put(value);
_chunkDataOffSet += value.length;
+ writeChunkIfNecessary();
+ }
+
+ // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
Review comment:
@kishoreg I moved the construction of the MV `STRING`/`BYTES` into here to avoid excessive allocation and multiple passes.
This points to how to make this class a bit more memory efficient in general - we only need to guarantee fixed capacity for the offsets, which are fixed width, and we can write the variable length body page by page.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (95c9aa3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.40%`.
> The diff coverage is `75.42%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.56% 65.15% -6.41%
- Complexity 3881 3942 +61
============================================
Files 1559 1517 -42
Lines 79053 77484 -1569
Branches 11706 11544 -162
============================================
- Hits 56575 50488 -6087
- Misses 18669 23413 +4744
+ Partials 3809 3583 -226
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.61% <75.42%> (+0.03%)` | :arrow_up: |
| unittests2 | `14.65% <0.00%> (-0.06%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [387 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...95c9aa3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089763
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
* The layout of the file is as follows:
* <p> Header Section: </p>
* <ul>
- * <li> Integer: File format version. </li>
- * <li> Integer: Total number of chunks. </li>
- * <li> Integer: Number of docs per chunk. </li>
- * <li> Integer: Length of longest entry (in bytes). </li>
- * <li> Integer: Total number of docs (version 2 onwards). </li>
- * <li> Integer: Compression type enum value (version 2 onwards). </li>
- * <li> Integer: Start offset of data header (version 2 onwards). </li>
- * <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- * Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>
Review comment:
It already has been reverted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735435633
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
Review comment:
Done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732959430
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/VarByteChunkMVForwardIndexReader.java
##########
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers.forward;
+
+import java.nio.ByteBuffer;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+/**
+ * Chunk-based single-value raw (non-dictionary-encoded) forward index reader for values of
+ * variable
+ * length data type
+ * (STRING, BYTES).
+ * <p>For data layout, please refer to the documentation for {@link VarByteChunkSVForwardIndexWriter}
+ */
+public final class VarByteChunkMVForwardIndexReader extends BaseChunkSVForwardIndexReader {
+
+ private static final int ROW_OFFSET_SIZE = VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+
+ private final int _maxChunkSize;
+
+ public VarByteChunkMVForwardIndexReader(PinotDataBuffer dataBuffer, DataType valueType) {
+ super(dataBuffer, valueType);
+ _maxChunkSize = _numDocsPerChunk * (ROW_OFFSET_SIZE + _lengthOfLongestEntry);
+ }
+
+ @Nullable
+ @Override
+ public ChunkReaderContext createContext() {
+ if (_isCompressed) {
+ return new ChunkReaderContext(_maxChunkSize);
+ } else {
+ return null;
+ }
+ }
+
+ @Override
+ public int getStringMV(final int docId, final String[] valueBuffer,
+ final ChunkReaderContext context) {
+ byte[] compressedBytes;
+ if (_isCompressed) {
+ compressedBytes = getBytesCompressed(docId, context);
+ } else {
+ compressedBytes = getBytesUncompressed(docId);
+ }
+ ByteBuffer byteBuffer = ByteBuffer.wrap(compressedBytes);
+ int numValues = byteBuffer.getInt();
+ int contentOffset = (numValues + 1) * Integer.BYTES;
+ for (int i = 0; i < numValues; i++) {
+ int length = byteBuffer.getInt((i + 1) * Integer.BYTES);
+ byte[] bytes = new byte[length];
+ byteBuffer.position(contentOffset);
+ byteBuffer.get(bytes, 0, length);
+ valueBuffer[i] = new String(bytes);
Review comment:
TODO: This doesn't specify encoding
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090701
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BytesColumnPredIndexStatsCollector.java
##########
@@ -42,16 +43,32 @@ public BytesColumnPredIndexStatsCollector(String column, StatsCollectorConfig st
@Override
public void collect(Object entry) {
- ByteArray value = new ByteArray((byte[]) entry);
- addressSorted(value);
- updatePartition(value);
- _values.add(value);
-
- int length = value.length();
- _minLength = Math.min(_minLength, length);
- _maxLength = Math.max(_maxLength, length);
-
- _totalNumberOfEntries++;
+ if (entry instanceof Object[]) {
+ Object[] values = (Object[]) entry;
+ int rowLength = 0;
+ for (Object obj : values) {
+ ByteArray value = new ByteArray((byte[]) obj);
+ _values.add(value);
+ int length = value.length();
+ _minLength = Math.min(_minLength, length);
+ _maxLength = Math.max(_maxLength, length);
+ rowLength += length;
Review comment:
That seems wrong to me, this is the length of the data and it's not known whether it would be length prefixed (+4) or null terminated (+1) here and adding either would prevent the other.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (377c7ae) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.46%`.
> The diff coverage is `76.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.56% 65.10% -6.47%
- Complexity 3881 3935 +54
============================================
Files 1559 1516 -43
Lines 79053 77499 -1554
Branches 11706 11549 -157
============================================
- Hits 56575 50455 -6120
- Misses 18669 23451 +4782
+ Partials 3809 3593 -216
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.57% <76.00%> (-0.01%)` | :arrow_down: |
| unittests2 | `14.63% <0.00%> (-0.08%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
| ... and [378 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...377c7ae](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174f00b) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `55.61%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 70.32% 14.70% -55.62%
+ Complexity 3882 80 -3802
=============================================
Files 1552 1509 -43
Lines 79012 77443 -1569
Branches 11705 11547 -158
=============================================
- Hits 55562 11390 -44172
- Misses 19644 65228 +45584
+ Partials 3806 825 -2981
```
| Flag | Coverage Δ | |
|---|---|---|
| integration2 | `?` | |
| unittests1 | `?` | |
| unittests2 | `14.70% <0.00%> (+<0.01%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-79.75%)` | :arrow_down: |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `0.00% <0.00%> (-80.37%)` | :arrow_down: |
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
| [...impl/stats/BytesColumnPredIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CeXRlc0NvbHVtblByZWRJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `0.00% <0.00%> (ø)` | |
| ... and [1216 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...174f00b](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] kishoreg commented on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947738475
> This PR has a conflict with #7604 -- we need to figure out the sequencing of these two (duplicate commits for the FWD index).
Hi @atris. Let's get this one in first and then rebase text index support on top of this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732869620
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
*
*/
protected void writeChunk() {
- int sizeToWrite;
+ int sizeWritten;
_chunkBuffer.flip();
- try {
- sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
- _dataFile.write(_compressedBuffer, _dataOffset);
- _compressedBuffer.clear();
+ int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+ // compress directly in to the mapped output rather keep a large buffer to compress into
+ try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
Review comment:
this was supposed to be BIG_ENDIAN
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] mayankshriv commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734093576
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
##########
@@ -207,7 +207,7 @@ private void convertColumn(FieldSpec fieldSpec)
int numDocs = _originalSegmentMetadata.getTotalDocs();
int lengthOfLongestEntry = _originalSegmentMetadata.getColumnMetadataFor(columnName).getColumnMaxLength();
try (ForwardIndexCreator rawIndexCreator = SegmentColumnarIndexCreator
- .getRawIndexCreatorForColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
+ .getRawIndexCreatorForSVColumn(_convertedIndexDir, ChunkCompressionType.SNAPPY, columnName, storedType, numDocs,
Review comment:
The converter should fail gracefully upfront for MV columns?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
* The layout of the file is as follows:
* <p> Header Section: </p>
* <ul>
- * <li> Integer: File format version. </li>
- * <li> Integer: Total number of chunks. </li>
- * <li> Integer: Number of docs per chunk. </li>
- * <li> Integer: Length of longest entry (in bytes). </li>
- * <li> Integer: Total number of docs (version 2 onwards). </li>
- * <li> Integer: Compression type enum value (version 2 onwards). </li>
- * <li> Integer: Start offset of data header (version 2 onwards). </li>
- * <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- * Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>
Review comment:
Why lose the indentation? Is this yet another style check difference?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -151,8 +154,11 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
int dataHeaderStart = offset + Integer.BYTES;
_header.putInt(dataHeaderStart);
}
+ }
- return headerSize;
+ private static int headerSize(int totalDocs, int numDocsPerChunk, int headerEntryChunkOffsetSize) {
Review comment:
IIRC, the reason why I had put the headerSize calculation so that in future if someone changes the header they don't miss to update the headerSize (easier to miss if a separate method). Any reason you prefer the latter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] atris commented on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
atris commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-947438669
This PR has a conflict with #7604 -- we need to figure out the sequencing of these two (duplicate commits for the FWD index).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732869947
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
* @param chunkSize Size of chunk
* @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
* @param version version of File
- * @throws FileNotFoundException
+ * @throws IOException if the file isn't found or can't be mapped
*/
protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
- throws FileNotFoundException {
+ throws IOException {
Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+ _file = file;
+ _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+ _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
_chunkSize = chunkSize;
_chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
- _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
- _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
_chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
- _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
- _dataFile = new RandomAccessFile(file, "rw").getChannel();
+ _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+ _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);
Review comment:
we don't need it, we just compress directly into the output file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732871439
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,119 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
+ Arrays.fill((String[]) columnValueToIndex, value);
+ } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+ int length = ((byte[][]) columnValueToIndex).length;
+ columnValueToIndex = new byte[length][];
+ Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+ } else {
+ throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+ }
+ }
+ switch (forwardIndexCreator.getValueType()) {
+ case INT:
+ if (columnValueToIndex instanceof int[]) {
+ forwardIndexCreator.putIntMV((int[]) columnValueToIndex);
+ } else if (columnValueToIndex instanceof Object[]) {
+ int[] array = new int[((Object[]) columnValueToIndex).length];
+ for (int i = 0; i < array.length; i++) {
+ array[i] = (Integer) ((Object[]) columnValueToIndex)[i];
+ }
+ forwardIndexCreator.putIntMV(array);
+ } else {
+ //TODO: is this possible?
Review comment:
OK I will get rid of these, I hadn't addressed this yet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732881547
##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/ColumnStatistics.java
##########
@@ -82,6 +82,13 @@ default boolean isFixedLength() {
*/
int getMaxNumberOfMultiValues();
+ /**
+ * @return the length of the largest row in bytes for variable length types
+ */
+ default int getMaxRowLengthInBytes() {
+ return -1;
+ }
Review comment:
@kishoreg I've only implemented this for MV bytes and strings, I don't think it's necessary otherwise?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (76ca7e3) into [master](https://codecov.io/gh/apache/pinot/commit/4246e0f2b1dfb6ed387d584002d70a226f6fcd91?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4246e0f) will **decrease** coverage by `6.38%`.
> The diff coverage is `75.42%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 71.56% 65.17% -6.39%
- Complexity 3881 3942 +61
============================================
Files 1559 1516 -43
Lines 79053 77499 -1554
Branches 11706 11547 -159
============================================
- Hits 56575 50513 -6062
- Misses 18669 23400 +4731
+ Partials 3809 3586 -223
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `68.66% <75.42%> (+0.08%)` | :arrow_up: |
| unittests2 | `14.64% <0.00%> (-0.07%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (-56.61%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `100.00% <ø> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `51.38% <51.38%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.65% <63.04%> (-5.02%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [377 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4246e0f...76ca7e3](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090738
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
Review comment:
@kishoreg can you explain why you included this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091164
##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
default int getStringMV(int docId, String[] valueBuffer, T context) {
throw new UnsupportedOperationException();
}
+
+ /**
+ * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+ * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+ * entry.
+ *
+ * @param docId Document id
+ * @param valueBuffer Value buffer
+ * @param context Reader context
+ * @return Number of values within the multi-value entry
+ */
+ default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+ throw new UnsupportedOperationException();
+ }
+
+ default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {
Review comment:
I will remove it in a follow up
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d8bd2ad) into [master](https://codecov.io/gh/apache/pinot/commit/6fef2108098dfae4173b104aa5e5e221cc89dc9e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6fef210) will **decrease** coverage by `40.58%`.
> The diff coverage is `0.85%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.59% 31.01% -40.59%
=============================================
Files 1559 1553 -6
Lines 79025 79022 -3
Branches 11702 11710 +8
=============================================
- Hits 56579 24508 -32071
- Misses 18639 52417 +33778
+ Partials 3807 2097 -1710
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `29.44% <0.85%> (-0.06%)` | :arrow_down: |
| integration2 | `27.80% <0.57%> (-0.09%)` | :arrow_down: |
| unittests1 | `?` | |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...ot/segment/local/io/compression/LZ4Compressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9MWjRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...nt/local/io/compression/PassThroughCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9QYXNzVGhyb3VnaENvbXByZXNzb3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...segment/local/io/compression/SnappyCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9TbmFwcHlDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../io/writer/impl/BaseChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
| [...riter/impl/FixedByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9GaXhlZEJ5dGVDaHVua1NWRm9yd2FyZEluZGV4V3JpdGVyLmphdmE=) | `0.00% <ø> (-100.00%)` | :arrow_down: |
| [.../writer/impl/VarByteChunkSVForwardIndexWriter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby93cml0ZXIvaW1wbC9WYXJCeXRlQ2h1bmtTVkZvcndhcmRJbmRleFdyaXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (-86.67%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| ... and [1070 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6fef210...d8bd2ad](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] kishoreg merged pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
kishoreg merged pull request #7595:
URL: https://github.com/apache/pinot/pull/7595
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3883a9e) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `5.19%`.
> The diff coverage is `68.31%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 70.32% 65.12% -5.20%
- Complexity 3882 3943 +61
============================================
Files 1552 1509 -43
Lines 79012 77451 -1561
Branches 11705 11553 -152
============================================
- Hits 55562 50438 -5124
- Misses 19644 23424 +3780
+ Partials 3806 3589 -217
```
| Flag | Coverage Δ | |
|---|---|---|
| integration2 | `?` | |
| unittests1 | `68.62% <68.31%> (+0.05%)` | :arrow_up: |
| unittests2 | `14.61% <0.00%> (-0.08%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `58.46% <58.46%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.79% <67.24%> (-4.87%)` | :arrow_down: |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| [...a/org/apache/pinot/common/utils/PinotDataType.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUGlub3REYXRhVHlwZS5qYXZh) | `81.06% <88.23%> (+0.69%)` | :arrow_up: |
| ... and [333 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...3883a9e](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734372096
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -33,27 +33,29 @@
* The layout of the file is as follows:
* <p> Header Section: </p>
* <ul>
- * <li> Integer: File format version. </li>
- * <li> Integer: Total number of chunks. </li>
- * <li> Integer: Number of docs per chunk. </li>
- * <li> Integer: Length of longest entry (in bytes). </li>
- * <li> Integer: Total number of docs (version 2 onwards). </li>
- * <li> Integer: Compression type enum value (version 2 onwards). </li>
- * <li> Integer: Start offset of data header (version 2 onwards). </li>
- * <li> Integer array: Integer offsets for all chunks in the data (upto version 2),
- * Long array: Long offsets for all chunks in the data (version 3 onwards) </li>
+ * <li> Integer: File format version. </li>
Review comment:
Blame @kishoreg's IDE settings :-) I fixed this in the third commit on the PR but it came back after rebasing, will fix again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r734373235
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -151,8 +154,11 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
int dataHeaderStart = offset + Integer.BYTES;
_header.putInt(dataHeaderStart);
}
+ }
- return headerSize;
+ private static int headerSize(int totalDocs, int numDocsPerChunk, int headerEntryChunkOffsetSize) {
Review comment:
I wanted the `_header` buffer to be final, and even if this was the original intention, `headerSize` did not depend on the what was put in to the buffer, so the intent must have been lost at some point. Ultimately, if this number and what's put in to the header get out of sync, tests will fail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter edited a comment on pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#issuecomment-946634547
# [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#7595](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174f00b) into [master](https://codecov.io/gh/apache/pinot/commit/85e0d9e4f32df0cbcfbaa03cd123164506bb7139?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85e0d9e) will **decrease** coverage by `5.13%`.
> The diff coverage is `73.39%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7595/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #7595 +/- ##
============================================
- Coverage 70.32% 65.18% -5.14%
- Complexity 3882 3933 +51
============================================
Files 1552 1509 -43
Lines 79012 77443 -1569
Branches 11705 11547 -158
============================================
- Hits 55562 50479 -5083
- Misses 19644 23365 +3721
+ Partials 3806 3599 -207
```
| Flag | Coverage Δ | |
|---|---|---|
| integration2 | `?` | |
| unittests1 | `68.58% <73.39%> (+0.01%)` | :arrow_up: |
| unittests2 | `14.70% <0.00%> (+<0.01%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...rg/apache/pinot/core/minion/RawIndexConverter.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9taW5pb24vUmF3SW5kZXhDb252ZXJ0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [...java/org/apache/pinot/segment/spi/V1Constants.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `14.28% <ø> (ø)` | |
| [...segment/spi/index/creator/ForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L2NyZWF0b3IvRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...t/segment/spi/index/reader/ForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9Gb3J3YXJkSW5kZXhSZWFkZXIuamF2YQ==) | `5.88% <0.00%> (-0.79%)` | :arrow_down: |
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `78.38% <50.00%> (-1.36%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.50% <50.00%> (-2.24%)` | :arrow_down: |
| [...ders/forward/VarByteChunkMVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9WYXJCeXRlQ2h1bmtNVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `58.33% <58.33%> (ø)` | |
| [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `81.60% <62.63%> (-5.06%)` | :arrow_down: |
| [...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZVZhckJ5dGVSYXdJbmRleENyZWF0b3IuamF2YQ==) | `75.00% <75.00%> (ø)` | |
| [...r/impl/fwd/MultiValueFixedByteRawIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZUZpeGVkQnl0ZVJhd0luZGV4Q3JlYXRvci5qYXZh) | `84.44% <84.44%> (ø)` | |
| ... and [332 more](https://codecov.io/gh/apache/pinot/pull/7595/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [85e0d9e...174f00b](https://codecov.io/gh/apache/pinot/pull/7595?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735450047
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,122 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import java.io.File;
+import java.io.IOException;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the length in bytes of the largest row
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxRowLengthInBytes)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION, maxRowLengthInBytes);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxRowLengthInBytes the size in bytes of the largest row, the chunk size cannot be smaller than this
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, int writerVersion, int maxRowLengthInBytes)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, TARGET_MAX_CHUNK_SIZE);
Review comment:
I've just reread this comment while addressing comments on the follow up branch and your maths is off. The lower bound is indeed 1M + 4, but it only leads to 1 doc per chunk when `maxRowLengthInBytes` > 512KB. That's the purpose of the expression; prevent a single large value leading to large buffers or large chunks on disk, and being able to make a better decision here is the motivation for the enhancement proposal.
So there really are two issues here: one (which I've addressed) is not considering the size of length prefixes, and the other (which requires a format change) is how to do chunk sizing properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091466
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,107 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
+ Arrays.fill((String[]) columnValueToIndex, value);
+ } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+ int length = ((byte[][]) columnValueToIndex).length;
+ columnValueToIndex = new byte[length][];
+ Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+ } else {
+ throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+ }
+ }
+ switch (forwardIndexCreator.getValueType()) {
+ case INT:
+ if (columnValueToIndex instanceof int[]) {
Review comment:
Can address this in a follow up
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732164505
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -0,0 +1,215 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.BaseChunkSVForwardIndexReader.ChunkReaderContext;
+import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueVarByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxTotalContentLength max total content length
+ * @param maxElements max number of elements
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, int maxTotalContentLength, int maxElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxTotalContentLength,
+ maxElements, false, BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLength max length for each entry
+ * @param maxElements max number of elements
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per
+ * chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType,
+ int maxLength, int maxElements, boolean deriveNumDocsPerChunk, int writerVersion)
+ throws IOException {
+ //we will prepend the actual content with numElements and length array containing length of each element
+ int totalMaxLength = Integer.BYTES + maxElements * Integer.BYTES + maxLength * maxElements;
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+ numDocsPerChunk, totalMaxLength,
+ writerVersion);
+ _valueType = valueType;
+ }
+
+ @VisibleForTesting
+ public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+ int overheadPerEntry =
+ lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+ }
+
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
+ @Override
+ public boolean isSingleValue() {
+ return false;
+ }
+
+ @Override
+ public DataType getValueType() {
+ return _valueType;
+ }
+
+ @Override
+ public void putStringMV(final String[] values) {
+ int totalBytes = 0;
+ for (int i = 0; i < values.length; i++) {
+ final String value = values[i];
+ int length = value.getBytes().length;
+ totalBytes += length;
+ }
+ byte[] bytes = new byte[Integer.BYTES + Integer.BYTES * values.length
+ + totalBytes]; //numValues, length array, concatenated bytes
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the length of each element
+ for (final String value : values) {
+ byteBuffer.putInt(value.getBytes().length);
+ }
+ //write the content of each element
+ //todo:maybe there is a smart way to avoid 3 loops but at the cost of allocating more memory upfront and resize
+ // as needed
+ for (final String value : values) {
+ byteBuffer.put(value.getBytes());
Review comment:
I didn't write this code but thanks for pointing me at that class, which looks like it can be modernised a little bit 👍🏻
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732991368
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
+ int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, totalDocs,
+ numDocsPerChunk, totalMaxLength, writerVersion);
+ _valueType = valueType;
+ }
+
+ @VisibleForTesting
+ public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
+ int overheadPerEntry =
+ lengthOfLongestEntry + VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+ }
+
+ @Override
+ public boolean isDictionaryEncoded() {
+ return false;
+ }
+
+ @Override
+ public boolean isSingleValue() {
+ return false;
+ }
+
+ @Override
+ public DataType getValueType() {
+ return _valueType;
+ }
+
+ @Override
+ public void putIntMV(final int[] values) {
+
+ byte[] bytes = new byte[Integer.BYTES
+ + values.length * Integer.BYTES]; //numValues, bytes required to store the content
+ ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
+ //write the length
+ byteBuffer.putInt(values.length);
+ //write the content of each element
+ for (final int value : values) {
+ byteBuffer.putInt(value);
+ }
+ _indexWriter.putBytes(bytes);
Review comment:
This should not require allocation of a temporary buffer, this could just be implemented as an MV pattern on `_indexWriter`, just as was done to eliminate the much larger buffers for `byte[][]` and `String[]`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] kishoreg commented on a change in pull request #7595: MV fwd index + MV `BYTES`
Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r732861201
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -66,19 +68,21 @@
* @param chunkSize Size of chunk
* @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
* @param version version of File
- * @throws FileNotFoundException
+ * @throws IOException if the file isn't found or can't be mapped
*/
protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compressionType, int totalDocs,
int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
- throws FileNotFoundException {
+ throws IOException {
Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
+ _file = file;
+ _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+ _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
_chunkSize = chunkSize;
_chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
- _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
- _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
_chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
- _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);
- _dataFile = new RandomAccessFile(file, "rw").getChannel();
+ _dataChannel = new RandomAccessFile(file, "rw").getChannel();
+ _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);
Review comment:
where is the compressedBuffer getting initialized or we dont need that anymore?
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -452,12 +457,119 @@ public void indexRow(GenericRow row)
}
}
} else {
- // MV column (always dictionary encoded)
- int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
- forwardIndexCreator.putDictIdMV(dictIds);
- DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
- if (invertedIndexCreator != null) {
- invertedIndexCreator.add(dictIds, dictIds.length);
+ if (dictionaryCreator != null) {
+ //dictionary encoded
+ int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
+ forwardIndexCreator.putDictIdMV(dictIds);
+ DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap
+ .get(columnName);
+ if (invertedIndexCreator != null) {
+ invertedIndexCreator.add(dictIds, dictIds.length);
+ }
+ } else {
+ // for text index on raw columns, check the config to determine if actual raw value should
+ // be stored or not
+ if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
+ Object value = _columnProperties.get(columnName)
+ .get(FieldConfig.TEXT_INDEX_RAW_VALUE);
+ if (value == null) {
+ value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
+ }
+ if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
+ value = String.valueOf(value);
+ int length = ((String[]) columnValueToIndex).length;
+ columnValueToIndex = new String[length];
+ Arrays.fill((String[]) columnValueToIndex, value);
+ } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
+ int length = ((byte[][]) columnValueToIndex).length;
+ columnValueToIndex = new byte[length][];
+ Arrays.fill((byte[][]) columnValueToIndex, String.valueOf(value).getBytes());
+ } else {
+ throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
+ }
+ }
+ switch (forwardIndexCreator.getValueType()) {
+ case INT:
+ if (columnValueToIndex instanceof int[]) {
+ forwardIndexCreator.putIntMV((int[]) columnValueToIndex);
+ } else if (columnValueToIndex instanceof Object[]) {
+ int[] array = new int[((Object[]) columnValueToIndex).length];
+ for (int i = 0; i < array.length; i++) {
+ array[i] = (Integer) ((Object[]) columnValueToIndex)[i];
+ }
+ forwardIndexCreator.putIntMV(array);
+ } else {
+ //TODO: is this possible?
Review comment:
I looked at the code and it should not enter this path
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
*
*/
protected void writeChunk() {
- int sizeToWrite;
+ int sizeWritten;
_chunkBuffer.flip();
- try {
- sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
- _dataFile.write(_compressedBuffer, _dataOffset);
- _compressedBuffer.clear();
+ int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+ // compress directly in to the mapped output rather keep a large buffer to compress into
+ try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
Review comment:
I think the existing ones were written using BIG_ENDIAN, we might need to up the version if we are changing the byteorder and handle both cases for backwards compatibility.
Might be better to keep the same byteorder as the existing one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735090988
##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java
##########
@@ -242,4 +242,23 @@ default int getDoubleMV(int docId, double[] valueBuffer, T context) {
default int getStringMV(int docId, String[] valueBuffer, T context) {
throw new UnsupportedOperationException();
}
+
+ /**
+ * Reads the bytes type multi-value at the given document id into the passed in value buffer (the buffer size must
+ * be enough to hold all the values for the multi-value entry) and returns the number of values within the multi-value
+ * entry.
+ *
+ * @param docId Document id
+ * @param valueBuffer Value buffer
+ * @param context Reader context
+ * @return Number of values within the multi-value entry
+ */
+ default int getBytesMV(int docId, byte[][] valueBuffer, T context) {
+ throw new UnsupportedOperationException();
+ }
+
+ default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) {
Review comment:
Yes I hadn't noticed this from the initial commits @kishoreg made.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735091052
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -0,0 +1,181 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.fwd;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.BaseChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
+import org.apache.pinot.segment.spi.V1Constants.Indexes;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Forward index creator for raw (non-dictionary-encoded) single-value column of variable length
+ * data type (STRING,
+ * BYTES).
+ */
+public class MultiValueFixedByteRawIndexCreator implements ForwardIndexCreator {
+
+ private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
+ private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
+
+ private final VarByteChunkSVForwardIndexWriter _indexWriter;
+ private final DataType _valueType;
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column,
+ int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements)
+ throws IOException {
+ this(baseIndexDir, compressionType, column, totalDocs, valueType, maxLengthOfEachEntry,
+ maxNumberOfMultiValueElements, false,
+ BaseChunkSVForwardIndexWriter.DEFAULT_VERSION);
+ }
+
+ /**
+ * Create a var-byte raw index creator for the given column
+ *
+ * @param baseIndexDir Index directory
+ * @param compressionType Type of compression to use
+ * @param column Name of column to index
+ * @param totalDocs Total number of documents to index
+ * @param valueType Type of the values
+ * @param maxLengthOfEachEntry length of longest entry (in bytes)
+ * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk
+ * @param writerVersion writer format version
+ */
+ public MultiValueFixedByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType,
+ String column, int totalDocs, DataType valueType, final int maxLengthOfEachEntry,
+ final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
+ int writerVersion)
+ throws IOException {
+ File file = new File(baseIndexDir,
+ column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+ FileUtils.deleteQuietly(file);
+ int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
Review comment:
Good catch. I will add this in a follow up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089918
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -166,13 +172,15 @@ private int writeHeader(ChunkCompressionType compressionType, int totalDocs, int
*
*/
protected void writeChunk() {
- int sizeToWrite;
+ int sizeWritten;
_chunkBuffer.flip();
- try {
- sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
- _dataFile.write(_compressedBuffer, _dataOffset);
- _compressedBuffer.clear();
+ int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
+ // compress directly in to the mapped output rather keep a large buffer to compress into
+ try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
+ maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
Review comment:
Could do, and did at some point, but I don't think this is important. This way it's more succinct to use `AutoCloseable`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089959
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
_chunkBuffer.put(value);
_chunkDataOffSet += value.length;
- // If buffer filled, then compress and write to file.
- if (_chunkHeaderOffset == _chunkHeaderSize) {
- writeChunk();
+ writeChunkIfNecessary();
+ }
+
+ // Note: some duplication is tolerated between these overloads for the sake of memory efficiency
+
+ public void putStrings(String[] values) {
+ // the entire String[] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the strings into the data buffer as if it's a single string,
+ // but with its own embedded header so offsets to strings within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i].getBytes(UTF_8);
+ _chunkBuffer.putInt(h, utf8.length);
+ _chunkBuffer.put(utf8);
+ bodySize += utf8.length;
}
+ _chunkDataOffSet += headerSize + bodySize;
+ // go back to write the number of strings embedded in the big string
+ _chunkBuffer.putInt(headerPosition, values.length);
+
+ writeChunkIfNecessary();
}
- @Override
- public void close()
- throws IOException {
+ public void putByteArrays(byte[][] values) {
+ // the entire byte[][] will be encoded as a single string, write the header here
+ _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+ _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+ // write all the byte[]s into the data buffer as if it's a single byte[],
+ // but with its own embedded header so offsets to byte[]s within the body
+ // can be located
+ int headerPosition = _chunkDataOffSet;
+ int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+ int bodyPosition = headerPosition + headerSize;
+ _chunkBuffer.position(bodyPosition);
+ int bodySize = 0;
+ for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; i++, h += Integer.BYTES) {
+ byte[] utf8 = values[i];
Review comment:
Why?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org