You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/18 20:02:27 UTC

[GitHub] [pinot] sajjad-moradi opened a new issue #7929: Performance problem in segment build

sajjad-moradi opened a new issue #7929:
URL: https://github.com/apache/pinot/issues/7929


   We found out that during segment completion, segment build takes a long time. For one table it takes more than half an hour to build an immutable segment while it used to take around one minute to do so. After some investigation, it turned out that the root cause is in this PR: https://github.com/apache/pinot/pull/7595
   More specifically the issue is with refactoring of `BaseChunkSVForwardIndexWriter` where we used to have one separate in-memory byte buffer to compress each chunk and then write its content to the index file:
   ```java
         sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
         _dataFile.write(_compressedBuffer, _dataOffset);
         _compressedBuffer.clear();
   ```
   After writing the chunk, the bytebuffer gets cleared and the same object will be reused in the next writeChunk call.
   Now after refactoring, the reusable byte buffer is gone and in every writeChunk call, small part of the index file gets memory mapped into a new MappedByteBuffer and and the chunk data is compressed to that mapped byte buffer which in turn automatically gets written into the index file.
   ```java
       int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
       try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
           maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
         ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, maxCompressedSize);
         sizeWritten = _chunkCompressor.compress(_chunkBuffer, view);
       } 
   ```
   This may look better as it doesn't need an extra byte buffer for compression, but since the size of the chunk is very small - 1000 * data type size (8 bytes for long) - memory mapping degrades the performance [1].
   We experimented a bit with the segments of the problematic table and turned out that even with SSD it takes more than 30% time to build the segment. For HDD, it's much worse and it takes more than 30x (one minute for using interim byte buffer vs more than 30 minutes for memory mapping).
   
   [1] From Oracle documentation:
   For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory.
   https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,%20long,%20long)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] mcvsubbu commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997323550


   If you want to allocate  memory in the realtime context, please use the class RealtimeIndexOffheapMemoryManager . It will make sure to allocate memory as per the configuration set by the admin -- memory mapped, or direct, etc. It is also accounted for, and so can be used for provisioning purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin edited a comment on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin edited a comment on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997279739


   @sajjad-moradi the purpose of the change was to reduce the amount of memory required for variable length data when some of the values are very long, which results in very large buffers. An impact on build time was expected, but the impact on small fixed size chunks wasn’t considered. Memory mapping 4-8KB at a time doesn’t make sense and will result in very high syscall overhead, so this is less than ideal. We now have the V4 raw format for variable length data, so the benefits (reduced memory consumption) of the change are no longer important, so it makes sense to me to reintroduce the compressed chunk buffer. However, some of the changes in that PR were important, so the change should be limited to reintroducing the compression buffer and removing the memory mapping per chunk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997279739


   @sajjad-moradi the purpose of the change was to reduce the amount of memory required for variable length data when some of the values are very long, which results in very large buffers. An impact on build time was expected, but the impact on small fixed size chunks wasn’t considered. Memory mapping 4-8KB at a time doesn’t make sense and will result in very high syscall overhead, so this is less than ideal. We now have the V4 raw format for variable length data, so the benefits (reduced memory consumption) of the change are no longer relevant important, so it makes sense to me to reintroduce the compressed chunk buffer. However, some of the changes in that PR were important, so the change should be limited to reintroducing the compression buffer and removing the memory mapping per chunk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997332098

>If you want to allocate memory in the realtime context, please use the class RealtimeIndexOffheapMemoryManager . It will make sure to allocate memory as per the configuration set by the admin -- memory mapped, or direct, etc. It is also accounted for, and so can be used for provisioning purpose.

Before my change `ByteBuffer.allocateDirect` was used, but I'll make a note of this.

> Yes, we should not modify general logic for outlier use cases. In this case, we had a production issue, and had to revert the deployment, and spend multiple days trying to reproduce the problem, narrowing down the commit and then identifying a problem within that commit.

I apologise for the inconvenience caused by the change, but it's anything but an outlier use case. New user experience has to be considered - if a new user stores text (web scraping, XML, JSON, typically no control over max length) in a raw index, they will find it requires a lot of memory. How do we prevent that new user from having a bad experience at the same time as avoiding finding out about inadvertent regressions in your production environment? Just not making changes would be a good way to avoid regressions in your production environment, but it leaves new users whose workloads are slightly different to yours out in the cold. To avoid a repeat incident, firstly, we need better performance tests so that unconsidered codepaths can't regress without anyone knowing. Secondly, I repeat the suggestion to decouple fixed width and variable width storage, because the implementation has a downside either way so long as it's shared.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] mcvsubbu commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997323678


   > For some context about the change and what this will go back to, if you happen to have an outlier record in a set of JSON records, of say 1MB (which isn’t that large) compared to 10KB on average, you need ~1GB for the buffer, then ~2GB for the compression buffer. Saving 2GB per raw index build is a really nice improvement, if that’s how you’re using Pinot. Perhaps the problem is sharing the logic, which isn’t particularly complicated, between fixed and variable length data sources?
   
   Yes, we should not modify general logic for outlier use cases. In this case, we had a production issue, and had to revert the deployment, and spend multiple days trying to reproduce the problem, narrowing down the commit and then identifying a problem within that commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997278061


   @kishoreg @richardstartin @Jackie-Jiang @mcvsubbu
   I'm going to create a PR to basically revert the code changes of `BaseChunkSVForwardIndexWriter`. Please take a look as this has blocked our deployment in Linkedin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-998145052


   Yeah let's merge the PR and then I'll create an issue to handle the case of large buffer size


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-998168662


   @richardstartin is V4 currently being used or is to going to be used in future?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-998146866


   V4 solves the buffer size problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997280445


   Yes, as I mentioned earlier, only BaseChunkSVForwardIndexWriter needs to be reverted, not all the changes in that PR. The fix PR is linked here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-1006095816


   The PR is merged. Closing the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997282170


   For some context about the change and what this will go back to, if you happen to have an outlier  record in a set of JSON records, of say 1MB (which isn’t that large) compared to 10KB on average, you need ~1GB for the buffer, then ~2GB for the compression buffer. Saving 2GB per raw index build is a really nice improvement, if that’s how you’re using Pinot. Perhaps the problem is sharing the logic, which isn’t particularly complicated, between fixed and variable length data sources?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997447817


   I think the best path forward is to merge the PR with the amendment to use the compression bound instead of 2x chunk size and then find a way to decouple fixed and variable byte raw index versions so that V4 can become the default for raw text without affecting the raw fixed width index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] mcvsubbu commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-997435881


   > > If you want to allocate memory in the realtime context, please use the class RealtimeIndexOffheapMemoryManager . It will make sure to allocate memory as per the configuration set by the admin -- memory mapped, or direct, etc. It is also accounted for, and so can be used for provisioning purpose.
   > 
   > Before my change `ByteBuffer.allocateDirect` was used, but I'll make a note of this.
   
   I meant if you wanted to change anything in the realtime mem allocation path. Code has evolved, and more and more special casing has been added since the time the memory manager was introduced. Not all changes have followed the approach. Perhaps because it did not matter either way. As we moved forward to make changes, we should keep that in mind. Now that you have done so, I am fine, thanks.
    
   > 
   > > Yes, we should not modify general logic for outlier use cases. In this case, we had a production issue, and had to revert the deployment, and spend multiple days trying to reproduce the problem, narrowing down the commit and then identifying a problem within that commit.
   > 
   > I apologise for the inconvenience caused by the change, but it's anything but an outlier use case. New user experience has to be considered - if a new user stores text (web scraping, XML, JSON, typically no control over max length) in a raw index, they will find it requires a lot of memory. How do we prevent that new user from having a bad experience at the same time as avoiding finding out about inadvertent regressions in your production environment? Just not making changes would be a good way to avoid regressions in your production environment, but it leaves new users whose workloads are slightly different to yours out in the cold. To avoid a repeat incident, firstly, we need better performance tests so that unconsidered codepaths can't regress without anyone knowing. Secondly, I repeat the suggestion to decouple fixed width and variable width storage, because the implementation has a downside either way so long as it's shared.
   
   Perhaps we differ a bit here. A new user of today is just starting off.  We must definitely be cognizant of revenue-impacting production use cases already running on the system. Today's new user will be tomorrow's production user, and will not be happy if a new release breaks something in production :-) So, we should be careful if we consider new use cases to be the most common ones.
   
   That being said, there should be a balance. I am definitely not suggesting not making any changes, and I believe you know that :-).  And some of these (especially performance bugs) are hard to catch before we go live. We do have performance testing as well in our pipeline, but this was not caught. Exactly one out of 150+ use cases happened to break :-) , so I would go as far to claim that it is impossible to guarantee that no regressions happen. We all commit code and do the best we can to be correct, and learn along the way as well.
   
   If you can please work with @sajjad-moradi  and arrive at a common solution that will be appreciated, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

richardstartin commented on issue #7929:
URL: https://github.com/apache/pinot/issues/7929#issuecomment-998186011


   @sajjad-moradi yes, but it requires field level config. I will open an issue to discuss making it default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] sajjad-moradi closed issue #7929: Performance problem in segment build

Posted by GitBox <gi...@apache.org>.

sajjad-moradi closed issue #7929:
URL: https://github.com/apache/pinot/issues/7929


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org