You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/11/24 22:26:53 UTC

[GitHub] [incubator-pinot] JohnTortugo opened a new pull request #6287: Parallelize segment index init and building

JohnTortugo opened a new pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287


   I'm from a Microsoft team working with LinkedIn engineers to improve Pinot performance. In a previous talk with @mayankshriv, he said that segment creation is slow and if we could take a look to improve that. CPU profiling, using synthetic data, for the segment creation code, produced the following flame graph:
   
   ![CPU Hotspots](https://cesarshare.blob.core.windows.net/pinot-investigation/SegmentCreation-Flames.png) [Full resolution image](https://cesarshare.blob.core.windows.net/pinot-investigation/SegmentCreation-Flames.png)
   
   Basically, segment creation is made of two roughly equal parts: init & build. Both methods are very similar in the sense that they have a main loop that iterates over rows in the input data doing some transformation on each of them. This PR introduces two main changes:
   
   1) A new Pinot-perf benchmark used for benchmarking segment creation performance.
   
   2) The parallelization of the init->gatherStats and build methods mentioned above. The main loop of each method was parallelized using a technique called [DSWP](https://liberty.princeton.edu/Publications/micro05_dswp.pdf) and we made use of [Disruptor RingBuffer](https://github.com/LMAX-Exchange/disruptor) to implement thread communication.
   
   ## Benchmark results:
   
   ### Original code
   
   ```
   # Run progress: 0.00% complete, ETA 00:07:00
   # Fork: 1 of 2
   # Warmup Iteration   1: 31097.686 ms/op
   # Warmup Iteration   2: 26007.428 ms/op
   # Warmup Iteration   3: 26816.007 ms/op
   Iteration   1: 25951.170 ms/op
   Iteration   2: 26076.096 ms/op
   Iteration   3: 26045.939 ms/op
   
   # Run progress: 50.00% complete, ETA 00:05:19
   # Fork: 2 of 2
   # Warmup Iteration   1: 31711.546 ms/op
   # Warmup Iteration   2: 26587.875 ms/op
   # Warmup Iteration   3: 27360.283 ms/op
   Iteration   1: 26208.574 ms/op
   Iteration   2: 26316.409 ms/op
   Iteration   3: 26194.492 ms/op
   
   
   Result "org.apache.pinot.perf.BenchmarkSegmentCreation.segmentCreationFromCSV":
     26132.113 ±(99.9%) 369.912 ms/op [Average]
     (min, avg, max) = (25951.170, 26132.113, 26316.409), stdev = 131.914
     CI (99.9%): [25762.202, 26502.025] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:10:42
   
   Benchmark                                     Mode  Cnt      Score     Error  Units
   BenchmarkSegmentCreation.segmentCreationFromCSV  avgt    6  26132.113 ± 369.912  ms/op
   ```
   
   ### New code
   
   ```
   # Run progress: 0.00% complete, ETA 00:03:20
   # Fork: 1 of 2
   # Warmup Iteration   1: 23004.364 ms/op
   # Warmup Iteration   2: 19380.296 ms/op
   # Warmup Iteration   3: 20914.349 ms/op
   # Warmup Iteration   4: 19469.886 ms/op
   # Warmup Iteration   5: 19461.024 ms/op
   Iteration   1: 19523.648 ms/op
   Iteration   2: 19582.673 ms/op
   Iteration   3: 19409.540 ms/op
   Iteration   4: 19419.701 ms/op
   Iteration   5: 19386.130 ms/op
   
   # Run progress: 50.00% complete, ETA 00:03:20
   # Fork: 2 of 2
   # Warmup Iteration   1: 23344.723 ms/op
   # Warmup Iteration   2: 19335.702 ms/op
   # Warmup Iteration   3: 20535.619 ms/op
   # Warmup Iteration   4: 19512.260 ms/op
   # Warmup Iteration   5: 19461.238 ms/op
   Iteration   1: 19510.350 ms/op
   Iteration   2: 19453.281 ms/op
   Iteration   3: 19444.863 ms/op
   Iteration   4: 19399.972 ms/op
   Iteration   5: 19380.142 ms/op
   
   
   Result "org.apache.pinot.perf.BenchmarkSegmentCreation.segmentCreationFromCSV":
     19451.030 ±(99.9%) 101.684 ms/op [Average]
     (min, avg, max) = (19380.142, 19451.030, 19582.673), stdev = 67.258
     CI (99.9%): [19349.346, 19552.714] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:06:41
   
   Benchmark                                        Mode  Cnt      Score     Error  Units
   BenchmarkSegmentCreation.segmentCreationFromCSV  avgt   10  19451.030 ± 101.684  ms/op
   ```
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] JohnTortugo commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
JohnTortugo commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733433097


   Thank you @mayankshriv, @mcvsubbu for reviewing. I'll do some benchmarking to see assert if/how much heap usage increased with the changes I'm proposing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733434270


   @JohnTortugo  I updated the link in the issue to point to my fork
   https://github.com/apache/incubator-pinot/issues/4036
   https://github.com/mcvsubbu/incubator-pinot/commit/c866d9130a5ceddb7a25c3235d605e503e91e13c
   The work was done a while ago, so the changes will not apply as is. But you can get an idea of what is consuming heap memory and how to save it.
   @snleee  is also interested in contributing here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] JohnTortugo commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
JohnTortugo commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-736716850


   @mcvsubbu - I was told by @mayankshriv that this part of Pinot was slow. @mayankshriv - can you please answer @mcvsubbu question about in which situation is the segment creation code slow?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] JohnTortugo commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
JohnTortugo commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-742821590


   Hi @mayankshriv @mcvsubbu - after your reviews (and talking privately on Slack) I understand that segment creation execution time isn't a pain point; memory consumption is the real pain point of this part of Pinot. Nonetheless, the proposed change helps reduce segment creation wall time. My question then is: do you think the contribution of this PR is valuable and, if so, what the next step forward is, or should I just close the PR?
   
   @mcvsubbu - The amount of time allotted for me to work on Pinot this quarter is almost over; I can't take over your work for reducing memory allocation at this time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733294374


   Garbage collection during segment build has been a bigger problem for us than the segment build itself -- in the realtime path. In the offline path, where segments are built in hadoop/spark, we have not had issues with GC or performance.
   
   Here is a prototype that I had done a while ago, using a columnar segment builder reduced GC significantly.
   
   https://github.com/apache/incubator-pinot/issues/4036
   
   Please use a similar technique and post GC results for some use cases where you hit performance problems (yes, unfortunately, the amount of garbage depends on the kind of columns you have -- I suspect performance also goes the same way).
   
   In which scenario are you facing performance problems?
   
   Lastly, if you could please add a short paragraph outlining your approach, that will help reviewers. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-736733153


   @JohnTortugo please ping me via slack, and we can discuss further. I discussed with @mayankshriv  also, and we concur that reducing garbage collection is the first order of business. We have not observed performance problems in this area, only GC issues when this happens. We will gladly take any performance improvements, however.
   The prototype that I had done with columnar segment build yielded both performance as well as GC improvements. Can you please take that up?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-736193107


   @JohnTortugo can you elaborate on where you observed the performance problem? Was it during offline ingestion or in realtime pinot?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733288665


   Tagging @siddharthteotia @Jackie-Jiang for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733427767


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=h1) Report
   > Merging [#6287](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=desc) (e962df2) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `20.56%`.
   > The diff coverage is `51.83%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6287/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6287       +/-   ##
   ===========================================
   - Coverage   66.44%   45.88%   -20.57%     
   ===========================================
     Files        1075     1255      +180     
     Lines       54773    61229     +6456     
     Branches     8168     8859      +691     
   ===========================================
   - Hits        36396    28093     -8303     
   - Misses      15700    30861    +15161     
   + Partials     2677     2275      -402     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `45.88% <51.83%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
   | [...not/common/assignment/InstancePartitionsUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vYXNzaWdubWVudC9JbnN0YW5jZVBhcnRpdGlvbnNVdGlscy5qYXZh) | `64.28% <ø> (-8.89%)` | :arrow_down: |
   | [.../apache/pinot/common/exception/QueryException.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZXhjZXB0aW9uL1F1ZXJ5RXhjZXB0aW9uLmphdmE=) | `90.27% <ø> (+5.55%)` | :arrow_up: |
   | ... and [1246 more](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=footer). Last update [5a53fbe...e962df2](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io commented on pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#issuecomment-733427767


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=h1) Report
   > Merging [#6287](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=desc) (5af4d60) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `20.51%`.
   > The diff coverage is `52.01%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6287/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6287       +/-   ##
   ===========================================
   - Coverage   66.44%   45.93%   -20.52%     
   ===========================================
     Files        1075     1255      +180     
     Lines       54773    61218     +6445     
     Branches     8168     8859      +691     
   ===========================================
   - Hits        36396    28119     -8277     
   - Misses      15700    30819    +15119     
   + Partials     2677     2280      -397     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `45.93% <52.01%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
   | [...not/common/assignment/InstancePartitionsUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vYXNzaWdubWVudC9JbnN0YW5jZVBhcnRpdGlvbnNVdGlscy5qYXZh) | `64.28% <ø> (-8.89%)` | :arrow_down: |
   | [.../apache/pinot/common/exception/QueryException.java](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZXhjZXB0aW9uL1F1ZXJ5RXhjZXB0aW9uLmphdmE=) | `90.27% <ø> (+5.55%)` | :arrow_up: |
   | ... and [1247 more](https://codecov.io/gh/apache/incubator-pinot/pull/6287/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=footer). Last update [5a53fbe...e88b41f](https://codecov.io/gh/apache/incubator-pinot/pull/6287?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6287: Parallelize segment index init and building

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6287:
URL: https://github.com/apache/incubator-pinot/pull/6287#discussion_r529956834



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/RecordReaderSegmentCreationDataSource.java
##########
@@ -53,26 +51,11 @@ public SegmentPreIndexStatsCollector gatherStats(StatsCollectorConfig statsColle
       SegmentPreIndexStatsCollector collector = new SegmentPreIndexStatsCollectorImpl(statsCollectorConfig);
       collector.init();
 
-      // Gather the stats
-      GenericRow reuse = new GenericRow();
-      while (_recordReader.hasNext()) {
-        reuse.clear();
+      ParallelRowProcessor prp = new ParallelRowProcessor(
+          _recordReader, 

Review comment:
       Use Pinot formatting.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/BuildRingBufferConsumer.java
##########
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import java.util.Collection;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import com.lmax.disruptor.EventHandler;
+import org.apache.pinot.core.data.recordtransformer.RecordTransformer;
+import org.apache.pinot.core.segment.creator.SegmentCreator;
+import org.apache.pinot.core.util.IngestionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class BuildRingBufferConsumer implements EventHandler<GenericRow> {
+  private static final Logger LOGGER = LoggerFactory.getLogger(BuildRingBufferConsumer.class);
+  private RecordTransformer recordTransformer = null;
+  private SegmentCreator indexCreator = null;
+
+  private long totalRecordReadTime = 0;

Review comment:
       Please use Pinot code styling. We name all data members with `_` prefix and avoid `this.`.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/InitRingBufferConsumer.java
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import java.util.Collection;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import com.lmax.disruptor.EventHandler;
+import org.apache.pinot.core.data.recordtransformer.RecordTransformer;
+import org.apache.pinot.core.segment.creator.SegmentPreIndexStatsCollector;
+import org.apache.pinot.core.util.IngestionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class InitRingBufferConsumer implements EventHandler<GenericRow> {

Review comment:
       Rename to `RingBufferConsumerInitializer` (`InitRingBufferConsumer` sounds more like a method name than a class name)?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/RecordReaderSegmentCreationDataSource.java
##########
@@ -53,26 +51,11 @@ public SegmentPreIndexStatsCollector gatherStats(StatsCollectorConfig statsColle
       SegmentPreIndexStatsCollector collector = new SegmentPreIndexStatsCollectorImpl(statsCollectorConfig);
       collector.init();
 
-      // Gather the stats
-      GenericRow reuse = new GenericRow();
-      while (_recordReader.hasNext()) {
-        reuse.clear();
+      ParallelRowProcessor prp = new ParallelRowProcessor(

Review comment:
       Name the variable `parallelRowProcessor` for readability?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/BuildRingBufferConsumer.java
##########
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import java.util.Collection;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import com.lmax.disruptor.EventHandler;
+import org.apache.pinot.core.data.recordtransformer.RecordTransformer;
+import org.apache.pinot.core.segment.creator.SegmentCreator;
+import org.apache.pinot.core.util.IngestionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class BuildRingBufferConsumer implements EventHandler<GenericRow> {
+  private static final Logger LOGGER = LoggerFactory.getLogger(BuildRingBufferConsumer.class);
+  private RecordTransformer recordTransformer = null;
+  private SegmentCreator indexCreator = null;
+
+  private long totalRecordReadTime = 0;
+  private long totalIndexTime = 0;
+  private long totalStatsCollectorTime = 0;
+
+  public BuildRingBufferConsumer(RecordTransformer newTransformer, SegmentCreator newSegmentCreator) {

Review comment:
       Please add javadoc for public classes and methods. Also, the class name could be a bit more readable (something to do with segment generation?).

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/BuildRingBufferConsumer.java
##########
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import java.util.Collection;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import com.lmax.disruptor.EventHandler;
+import org.apache.pinot.core.data.recordtransformer.RecordTransformer;
+import org.apache.pinot.core.segment.creator.SegmentCreator;
+import org.apache.pinot.core.util.IngestionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class BuildRingBufferConsumer implements EventHandler<GenericRow> {
+  private static final Logger LOGGER = LoggerFactory.getLogger(BuildRingBufferConsumer.class);
+  private RecordTransformer recordTransformer = null;
+  private SegmentCreator indexCreator = null;
+
+  private long totalRecordReadTime = 0;
+  private long totalIndexTime = 0;
+  private long totalStatsCollectorTime = 0;
+
+  public BuildRingBufferConsumer(RecordTransformer newTransformer, SegmentCreator newSegmentCreator) {
+    this.recordTransformer = newTransformer;
+    this.indexCreator = newSegmentCreator;
+  }
+
+  public void onEvent(GenericRow row, long sequence, boolean endOfBatch) {
+    try {
+      long recordReadStartTime = System.currentTimeMillis();

Review comment:
       Measuring processing time for each row seems like an overhead, we should avoid this.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/ParallelRowProcessor.java
##########
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import com.lmax.disruptor.dsl.Disruptor;
+import com.lmax.disruptor.EventHandler;
+import com.lmax.disruptor.dsl.ProducerType;
+import com.lmax.disruptor.BusySpinWaitStrategy;
+import com.lmax.disruptor.RingBuffer;
+import com.lmax.disruptor.util.DaemonThreadFactory;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+public class ParallelRowProcessor {
+    /* bufferSize, Needs to fit L3 cache & must be power of 2 */
+    public static final int RING_SIZE = 64;
+    private RecordReader reader;
+    private Disruptor<GenericRow> disruptor;
+
+    public ParallelRowProcessor(RecordReader reader, EventHandler<GenericRow> handler) {
+        this.reader = reader;
+
+        this.disruptor = new Disruptor<GenericRow>(
+            GenericRow::new, 

Review comment:
       Will this create more garbage?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/ParallelRowProcessor.java
##########
@@ -0,0 +1,68 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl;
+
+import com.lmax.disruptor.dsl.Disruptor;
+import com.lmax.disruptor.EventHandler;
+import com.lmax.disruptor.dsl.ProducerType;
+import com.lmax.disruptor.BusySpinWaitStrategy;
+import com.lmax.disruptor.RingBuffer;
+import com.lmax.disruptor.util.DaemonThreadFactory;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+public class ParallelRowProcessor {
+    /* bufferSize, Needs to fit L3 cache & must be power of 2 */
+    public static final int RING_SIZE = 64;
+    private RecordReader reader;
+    private Disruptor<GenericRow> disruptor;
+
+    public ParallelRowProcessor(RecordReader reader, EventHandler<GenericRow> handler) {
+        this.reader = reader;
+
+        this.disruptor = new Disruptor<GenericRow>(
+            GenericRow::new, 
+            RING_SIZE, 
+            DaemonThreadFactory.INSTANCE,
+            ProducerType.SINGLE,
+            new BusySpinWaitStrategy());
+
+        this.disruptor.handleEventsWith(handler);
+    }
+
+    public void Run() throws Exception {

Review comment:
       Method names should start with lower case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org