You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/21 05:12:36 UTC

[GitHub] [pinot] richardstartin edited a comment on pull request #7927: Fix realtime ingestion when an entire batch of messages is filtered out

richardstartin edited a comment on pull request #7927:
URL: https://github.com/apache/pinot/pull/7927#issuecomment-998475535


   > On a separate note, I saw that originally there were some optimizations piggy-backed to this PR. We should avoid doing that. Each PR should only focus on one feature, one bug fix, etc. Any optimization or refactoring should go in a separate PR. That might take a little longer for the author, but it benefits all of us in the long run.
   
   There were no optimisations in this PR, there were some fixes for concurrency bugs (e.g. it's incorrect to use an increment operator on a volatile variable) so I'm not sure what you're referring to. 
   
   > One example of this issue happened last week when we spent a long time finding the root cause for the chunk index writer bug. The problematic optimization was added in a PR with title Add MV raw forward index and MV BYTES data type. 
   
   I'm not sure this is the best place to be discussing this but you are implying that the change you mentioned was unrelated to the PR it was made in, but it wasn't. The purpose of the change, as has been discussed, is that the particular class creates very large buffers. It was included in that PR because its use for MV BYTES columns exacerbates this by multiplying what is already an overestimate of the buffer size by the maximum number of elements in the column. So the change was a mitigation to the worst case made worse by that PR.
   
   > The table having performance problem didn't have any multi-value columns or byte data type. So we couldn't easily tell if this commit is related or not. And there were a lot of commits we needed to examine. If we had a separate PR for that optimization, we could've found the root cause much easier!
   
   Looking at commits to figure out what changed isn't an efficient diagnostic technique. Had you instead used a profiler you would have found the process was spending a large amount of time in the syscalls `mmap` and `munmap`; looking at the git blame for each Pinot stack frame above those syscalls would have found the culprit in O(stack depth) time rather than O(lines of code changed). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org