You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/20 20:11:34 UTC

[GitHub] [pinot] siddharthteotia commented on a change in pull request #7930: Fix performance problem of base chunk forward index write

siddharthteotia commented on a change in pull request #7930:
URL: https://github.com/apache/pinot/pull/7930#discussion_r772640243



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
##########
@@ -74,15 +71,13 @@ protected BaseChunkSVForwardIndexWriter(File file, ChunkCompressionType compress
       int numDocsPerChunk, int chunkSize, int sizeOfEntry, int version)
       throws IOException {
     Preconditions.checkArgument(version == DEFAULT_VERSION || version == CURRENT_VERSION);
-    _file = file;
-    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
-    _dataOffset = headerSize(totalDocs, numDocsPerChunk, _headerEntryChunkOffsetSize);
     _chunkSize = chunkSize;
     _chunkCompressor = ChunkCompressorFactory.getCompressor(compressionType);
+    _headerEntryChunkOffsetSize = getHeaderEntryChunkOffsetSize(version);
+    _dataOffset = writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
     _chunkBuffer = ByteBuffer.allocateDirect(chunkSize);
-    _dataChannel = new RandomAccessFile(file, "rw").getChannel();
-    _header = _dataChannel.map(FileChannel.MapMode.READ_WRITE, 0, _dataOffset);
-    writeHeader(compressionType, totalDocs, numDocsPerChunk, sizeOfEntry, version);
+    _compressedBuffer = ByteBuffer.allocateDirect(chunkSize * 2);

Review comment:
       @sajjad-moradi  @richardstartin, I think 2x was being used to account for worst case where there can be negative compression which is a possibility (although unlikely) imo. Unless we are 100% sure that our compression codecs (LZ4, snappy, ZSTD) are never going to result in negative compression for any kind of data, using chunkSize is fine. Otherwise, I suggest moving it back to 2x (old code) or something more than chunkSize at least




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org