You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/07 00:15:41 UTC
[GitHub] [pinot] Jackie-Jiang opened a new pull request, #8846: Reduce the heap memory usage for segment creation
Jackie-Jiang opened a new pull request, #8846:
URL: https://github.com/apache/pinot/pull/8846
During the segment creation, currently we hold multiple copies of unique values for all columns in heap memory (3 copies for dictionary encoded column, 2 copies for raw index column).
This PR enhances the `StatsCollector` to early release one copy when the values are no longer needed, and remove the reference of one copy of the unique values from the dictionary creator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1148094810
# [Codecov](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#8846](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4f338c8) into [master](https://codecov.io/gh/apache/pinot/commit/d1d2ddbc116f6dddbefdf1e69dd409434eaae0e4?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d1d2ddb) will **decrease** coverage by `6.87%`.
> The diff coverage is `92.25%`.
> :exclamation: Current head 4f338c8 differs from pull request most recent head b2cad00. Consider uploading reports for the commit b2cad00 to get more accurate results
```diff
@@ Coverage Diff @@
## master #8846 +/- ##
============================================
- Coverage 69.80% 62.92% -6.88%
+ Complexity 4659 4647 -12
============================================
Files 1741 1695 -46
Lines 91476 89470 -2006
Branches 13677 13457 -220
============================================
- Hits 63854 56301 -7553
- Misses 23199 29111 +5912
+ Partials 4423 4058 -365
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests1 | `66.24% <92.25%> (+0.02%)` | :arrow_up: |
| unittests2 | `14.16% <0.00%> (+0.03%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...t/creator/impl/SegmentIndexCreationDriverImpl.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50SW5kZXhDcmVhdGlvbkRyaXZlckltcGwuamF2YQ==) | `84.61% <ø> (ø)` | |
| [.../stats/BigDecimalColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CaWdEZWNpbWFsQ29sdW1uUHJlSW5kZXhTdGF0c0NvbGxlY3Rvci5qYXZh) | `69.04% <83.33%> (-0.96%)` | :arrow_down: |
| [...impl/stats/BytesColumnPredIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9CeXRlc0NvbHVtblByZWRJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `79.62% <83.33%> (-1.14%)` | :arrow_down: |
| [...impl/stats/StringColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9TdHJpbmdDb2x1bW5QcmVJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `79.24% <83.33%> (-1.15%)` | :arrow_down: |
| [...impl/stats/DoubleColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9Eb3VibGVDb2x1bW5QcmVJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `78.57% <87.50%> (-1.43%)` | :arrow_down: |
| [.../impl/stats/FloatColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9GbG9hdENvbHVtblByZUluZGV4U3RhdHNDb2xsZWN0b3IuamF2YQ==) | `78.57% <87.50%> (-1.43%)` | :arrow_down: |
| [...or/impl/stats/IntColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9JbnRDb2x1bW5QcmVJbmRleFN0YXRzQ29sbGVjdG9yLmphdmE=) | `78.57% <87.50%> (-1.43%)` | :arrow_down: |
| [...r/impl/stats/LongColumnPreIndexStatsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9Mb25nQ29sdW1uUHJlSW5kZXhTdGF0c0NvbGxlY3Rvci5qYXZh) | `78.57% <87.50%> (-1.43%)` | :arrow_down: |
| [.../impl/stats/AbstractColumnStatisticsCollector.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9zdGF0cy9BYnN0cmFjdENvbHVtblN0YXRpc3RpY3NDb2xsZWN0b3IuamF2YQ==) | `92.85% <93.33%> (ø)` | |
| [...segment/creator/impl/SegmentDictionaryCreator.java](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50RGljdGlvbmFyeUNyZWF0b3IuamF2YQ==) | `90.00% <95.65%> (-1.12%)` | :arrow_down: |
| ... and [421 more](https://codecov.io/gh/apache/pinot/pull/8846/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d1d2ddb...b2cad00](https://codecov.io/gh/apache/pinot/pull/8846?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r901025316
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -100,8 +97,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toLongArray();
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
Review Comment:
I don't want to introduce an extra per-value check (other checks are per column). IMO it is okay for it to throw NPE if someone uses it incorrectly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1164628897
Just for the sake of posterity, in case the tradeoff is discussed again, the replacement of the hashmap lookups with binary search change was implemented in #8924 and reverted in #8950.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r901020485
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -100,8 +97,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toLongArray();
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
Review Comment:
If there is no sorting after seal, we should also throw an exception in collect(Object) if _sealed=true to avoid any unexpected behavior. Same for all *ColumnPreIndexStatsCollector
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r899758406
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -57,10 +56,10 @@ public void collect(Object entry) {
}
}
- void addressSorted(long entry) {
- if (_isSorted) {
+ private void addressSorted(long entry) {
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1158566486
Performed a benchmark with 5M lookups on different data types with different cardinality:
```
Benchmark (_cardinality) Mode Cnt Score Error Units
BenchmarkBinarySearch.benchmarkDoubleBinarySearch 100 avgt 3 216.168 ± 17.112 ms/op
BenchmarkBinarySearch.benchmarkDoubleBinarySearch 10000 avgt 3 402.108 ± 37.544 ms/op
BenchmarkBinarySearch.benchmarkDoubleBinarySearch 1000000 avgt 3 831.683 ± 417.802 ms/op
BenchmarkBinarySearch.benchmarkDoubleMap 100 avgt 3 65.926 ± 21.608 ms/op
BenchmarkBinarySearch.benchmarkDoubleMap 10000 avgt 3 87.987 ± 7.291 ms/op
BenchmarkBinarySearch.benchmarkDoubleMap 1000000 avgt 3 568.169 ± 87.005 ms/op
BenchmarkBinarySearch.benchmarkFloatBinarySearch 100 avgt 3 203.972 ± 8.662 ms/op
BenchmarkBinarySearch.benchmarkFloatBinarySearch 10000 avgt 3 393.657 ± 49.358 ms/op
BenchmarkBinarySearch.benchmarkFloatBinarySearch 1000000 avgt 3 751.277 ± 13.846 ms/op
BenchmarkBinarySearch.benchmarkFloatMap 100 avgt 3 89.837 ± 24.091 ms/op
BenchmarkBinarySearch.benchmarkFloatMap 10000 avgt 3 87.869 ± 17.819 ms/op
BenchmarkBinarySearch.benchmarkFloatMap 1000000 avgt 3 404.022 ± 56.542 ms/op
BenchmarkBinarySearch.benchmarkIntBinarySearch 100 avgt 3 191.057 ± 24.173 ms/op
BenchmarkBinarySearch.benchmarkIntBinarySearch 10000 avgt 3 345.814 ± 78.147 ms/op
BenchmarkBinarySearch.benchmarkIntBinarySearch 1000000 avgt 3 693.074 ± 23.619 ms/op
BenchmarkBinarySearch.benchmarkIntMap 100 avgt 3 81.485 ± 1.535 ms/op
BenchmarkBinarySearch.benchmarkIntMap 10000 avgt 3 85.403 ± 7.045 ms/op
BenchmarkBinarySearch.benchmarkIntMap 1000000 avgt 3 219.425 ± 59.368 ms/op
BenchmarkBinarySearch.benchmarkLongBinarySearch 100 avgt 3 182.610 ± 13.995 ms/op
BenchmarkBinarySearch.benchmarkLongBinarySearch 10000 avgt 3 365.881 ± 15.217 ms/op
BenchmarkBinarySearch.benchmarkLongBinarySearch 1000000 avgt 3 774.414 ± 402.259 ms/op
BenchmarkBinarySearch.benchmarkLongMap 100 avgt 3 65.400 ± 5.087 ms/op
BenchmarkBinarySearch.benchmarkLongMap 10000 avgt 3 98.427 ± 11.991 ms/op
BenchmarkBinarySearch.benchmarkLongMap 1000000 avgt 3 302.917 ± 6.348 ms/op
BenchmarkBinarySearch.benchmarkStringBinarySearch 100 avgt 3 306.576 ± 39.766 ms/op
BenchmarkBinarySearch.benchmarkStringBinarySearch 10000 avgt 3 870.486 ± 52.621 ms/op
BenchmarkBinarySearch.benchmarkStringBinarySearch 1000000 avgt 3 3173.214 ± 1122.761 ms/op
BenchmarkBinarySearch.benchmarkStringMap 100 avgt 3 115.433 ± 4.070 ms/op
BenchmarkBinarySearch.benchmarkStringMap 10000 avgt 3 254.615 ± 15.829 ms/op
BenchmarkBinarySearch.benchmarkStringMap 1000000 avgt 3 1372.186 ± 197.025 ms/op
```
Note that all the strings have the same prefix, so it is actually adding some comparison cost for the binary search. With 1M cardinality, binary search is about 2-3x slower than hash lookup, but it is still fast enough (adding less than 2 seconds for string), and shouldn't become an issue. Considering the memory saving (especially preventing running out of memory when generating segments for wide tables), and the extra GC overhead of creating a map, this change should be beneficial.
Here is the benchmark code:
```
package org.apache.pinot.perf;
import it.unimi.dsi.fastutil.doubles.Double2IntOpenHashMap;
import it.unimi.dsi.fastutil.floats.Float2IntOpenHashMap;
import it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap;
import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
import java.io.IOException;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.TimeValue;
@State(Scope.Benchmark)
public class BenchmarkBinarySearch {
private static final int NUM_LOOK_UPS = 5000000;
private static final Random RANDOM = new Random();
@Param({"100", "10000", "1000000"})
private int _cardinality;
private int[] _sortedInts;
private long[] _sortedLongs;
private float[] _sortedFloats;
private double[] _sortedDoubles;
private String[] _sortedStrings;
@Setup
public void setUp()
throws IOException {
_sortedInts = new int[_cardinality];
_sortedLongs = new long[_cardinality];
_sortedFloats = new float[_cardinality];
_sortedDoubles = new double[_cardinality];
_sortedStrings = new String[_cardinality];
for (int i = 0; i < _cardinality; i++) {
_sortedInts[i] = i;
_sortedLongs[i] = i;
_sortedFloats[i] = i;
_sortedDoubles[i] = i;
String string = Integer.toString(i + _cardinality);
_sortedStrings[i] = string;
}
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkIntMap() {
Int2IntOpenHashMap intMap = new Int2IntOpenHashMap(_cardinality);
for (int i = 0; i < _cardinality; i++) {
intMap.put(i, i);
}
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += intMap.get(RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkIntBinarySearch() {
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += Arrays.binarySearch(_sortedInts, RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkLongMap() {
Long2IntOpenHashMap longMap = new Long2IntOpenHashMap(_cardinality);
for (int i = 0; i < _cardinality; i++) {
longMap.put(i, i);
}
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += longMap.get(RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkLongBinarySearch() {
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += Arrays.binarySearch(_sortedLongs, RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkFloatMap() {
Float2IntOpenHashMap floatMap = new Float2IntOpenHashMap(_cardinality);
for (int i = 0; i < _cardinality; i++) {
floatMap.put(i, i);
}
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += floatMap.get(RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkFloatBinarySearch() {
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += Arrays.binarySearch(_sortedFloats, RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkDoubleMap() {
Double2IntOpenHashMap doubleMap = new Double2IntOpenHashMap(_cardinality);
for (int i = 0; i < _cardinality; i++) {
doubleMap.put(i, i);
}
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += doubleMap.get(RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkDoubleBinarySearch() {
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += Arrays.binarySearch(_sortedDoubles, RANDOM.nextInt(_cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkStringMap() {
Object2IntOpenHashMap<String> stringMap = new Object2IntOpenHashMap<>(_cardinality);
for (int i = 0; i < _cardinality; i++) {
stringMap.put(Integer.toString(i + _cardinality), i);
}
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += stringMap.getInt(Integer.toString(RANDOM.nextInt(_cardinality) + _cardinality));
}
return result;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public int benchmarkStringBinarySearch() {
int result = 0;
for (int i = 0; i < NUM_LOOK_UPS; i++) {
result += Arrays.binarySearch(_sortedStrings, Integer.toString(RANDOM.nextInt(_cardinality) + _cardinality));
}
return result;
}
public static void main(String[] args)
throws Exception {
Options opt =
new OptionsBuilder().include(BenchmarkBinarySearch.class.getSimpleName()).warmupTime(TimeValue.seconds(5))
.warmupIterations(2).measurementTime(TimeValue.seconds(10)).measurementIterations(3).forks(1).build();
new Runner(opt).run();
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1159285009
> How do you know how much extra time it takes to build a column based on this benchmark? Shouldn’t it benchmark segment build time instead?
This benchmark isolates the steps of map creation + lookup when building a segment with 5M documents. I tried building the segments in the offline integration test, and the time spent is reduced from 4.8s to 3.6s, but that is the combined result of all the changes in this PR. To better testing the performance impact of binary search, I decide to not include that change in this PR, and have a separate one just for the binary search change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1158590337
> Performed a benchmark with 5M lookups on different data types with different cardinality:
>
> ```
> Benchmark (_cardinality) Mode Cnt Score Error Units
> BenchmarkBinarySearch.benchmarkDoubleBinarySearch 100 avgt 3 216.168 ± 17.112 ms/op
> BenchmarkBinarySearch.benchmarkDoubleBinarySearch 10000 avgt 3 402.108 ± 37.544 ms/op
> BenchmarkBinarySearch.benchmarkDoubleBinarySearch 1000000 avgt 3 831.683 ± 417.802 ms/op
> BenchmarkBinarySearch.benchmarkDoubleMap 100 avgt 3 65.926 ± 21.608 ms/op
> BenchmarkBinarySearch.benchmarkDoubleMap 10000 avgt 3 87.987 ± 7.291 ms/op
> BenchmarkBinarySearch.benchmarkDoubleMap 1000000 avgt 3 568.169 ± 87.005 ms/op
> BenchmarkBinarySearch.benchmarkFloatBinarySearch 100 avgt 3 203.972 ± 8.662 ms/op
> BenchmarkBinarySearch.benchmarkFloatBinarySearch 10000 avgt 3 393.657 ± 49.358 ms/op
> BenchmarkBinarySearch.benchmarkFloatBinarySearch 1000000 avgt 3 751.277 ± 13.846 ms/op
> BenchmarkBinarySearch.benchmarkFloatMap 100 avgt 3 89.837 ± 24.091 ms/op
> BenchmarkBinarySearch.benchmarkFloatMap 10000 avgt 3 87.869 ± 17.819 ms/op
> BenchmarkBinarySearch.benchmarkFloatMap 1000000 avgt 3 404.022 ± 56.542 ms/op
> BenchmarkBinarySearch.benchmarkIntBinarySearch 100 avgt 3 191.057 ± 24.173 ms/op
> BenchmarkBinarySearch.benchmarkIntBinarySearch 10000 avgt 3 345.814 ± 78.147 ms/op
> BenchmarkBinarySearch.benchmarkIntBinarySearch 1000000 avgt 3 693.074 ± 23.619 ms/op
> BenchmarkBinarySearch.benchmarkIntMap 100 avgt 3 81.485 ± 1.535 ms/op
> BenchmarkBinarySearch.benchmarkIntMap 10000 avgt 3 85.403 ± 7.045 ms/op
> BenchmarkBinarySearch.benchmarkIntMap 1000000 avgt 3 219.425 ± 59.368 ms/op
> BenchmarkBinarySearch.benchmarkLongBinarySearch 100 avgt 3 182.610 ± 13.995 ms/op
> BenchmarkBinarySearch.benchmarkLongBinarySearch 10000 avgt 3 365.881 ± 15.217 ms/op
> BenchmarkBinarySearch.benchmarkLongBinarySearch 1000000 avgt 3 774.414 ± 402.259 ms/op
> BenchmarkBinarySearch.benchmarkLongMap 100 avgt 3 65.400 ± 5.087 ms/op
> BenchmarkBinarySearch.benchmarkLongMap 10000 avgt 3 98.427 ± 11.991 ms/op
> BenchmarkBinarySearch.benchmarkLongMap 1000000 avgt 3 302.917 ± 6.348 ms/op
> BenchmarkBinarySearch.benchmarkStringBinarySearch 100 avgt 3 306.576 ± 39.766 ms/op
> BenchmarkBinarySearch.benchmarkStringBinarySearch 10000 avgt 3 870.486 ± 52.621 ms/op
> BenchmarkBinarySearch.benchmarkStringBinarySearch 1000000 avgt 3 3173.214 ± 1122.761 ms/op
> BenchmarkBinarySearch.benchmarkStringMap 100 avgt 3 115.433 ± 4.070 ms/op
> BenchmarkBinarySearch.benchmarkStringMap 10000 avgt 3 254.615 ± 15.829 ms/op
> BenchmarkBinarySearch.benchmarkStringMap 1000000 avgt 3 1372.186 ± 197.025 ms/op
> ```
>
> Note that all the strings have the same prefix, so it is actually adding some comparison cost for the binary search. With 1M cardinality, binary search is about 2-3x slower than hash lookup, but it is still fast enough (adding less than 2 seconds for string), and shouldn't become an issue. Considering the memory saving (especially preventing running out of memory when generating segments for wide tables), and the extra GC overhead of creating a map, this change should be beneficial.
>
> Here is the benchmark code:
>
> ```
> package org.apache.pinot.perf;
>
> import it.unimi.dsi.fastutil.doubles.Double2IntOpenHashMap;
> import it.unimi.dsi.fastutil.floats.Float2IntOpenHashMap;
> import it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap;
> import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
> import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
> import java.io.IOException;
> import java.util.Arrays;
> import java.util.Random;
> import java.util.concurrent.TimeUnit;
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.BenchmarkMode;
> import org.openjdk.jmh.annotations.Mode;
> import org.openjdk.jmh.annotations.OutputTimeUnit;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> import org.openjdk.jmh.runner.Runner;
> import org.openjdk.jmh.runner.options.Options;
> import org.openjdk.jmh.runner.options.OptionsBuilder;
> import org.openjdk.jmh.runner.options.TimeValue;
>
>
> @State(Scope.Benchmark)
> public class BenchmarkBinarySearch {
> private static final int NUM_LOOK_UPS = 5000000;
> private static final Random RANDOM = new Random();
>
> @Param({"100", "10000", "1000000"})
> private int _cardinality;
>
> private int[] _sortedInts;
> private long[] _sortedLongs;
> private float[] _sortedFloats;
> private double[] _sortedDoubles;
> private String[] _sortedStrings;
>
> @Setup
> public void setUp()
> throws IOException {
> _sortedInts = new int[_cardinality];
> _sortedLongs = new long[_cardinality];
> _sortedFloats = new float[_cardinality];
> _sortedDoubles = new double[_cardinality];
> _sortedStrings = new String[_cardinality];
> for (int i = 0; i < _cardinality; i++) {
> _sortedInts[i] = i;
> _sortedLongs[i] = i;
> _sortedFloats[i] = i;
> _sortedDoubles[i] = i;
> String string = Integer.toString(i + _cardinality);
> _sortedStrings[i] = string;
> }
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkIntMap() {
> Int2IntOpenHashMap intMap = new Int2IntOpenHashMap(_cardinality);
> for (int i = 0; i < _cardinality; i++) {
> intMap.put(i, i);
> }
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += intMap.get(RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkIntBinarySearch() {
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += Arrays.binarySearch(_sortedInts, RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkLongMap() {
> Long2IntOpenHashMap longMap = new Long2IntOpenHashMap(_cardinality);
> for (int i = 0; i < _cardinality; i++) {
> longMap.put(i, i);
> }
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += longMap.get(RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkLongBinarySearch() {
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += Arrays.binarySearch(_sortedLongs, RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkFloatMap() {
> Float2IntOpenHashMap floatMap = new Float2IntOpenHashMap(_cardinality);
> for (int i = 0; i < _cardinality; i++) {
> floatMap.put(i, i);
> }
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += floatMap.get(RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkFloatBinarySearch() {
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += Arrays.binarySearch(_sortedFloats, RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkDoubleMap() {
> Double2IntOpenHashMap doubleMap = new Double2IntOpenHashMap(_cardinality);
> for (int i = 0; i < _cardinality; i++) {
> doubleMap.put(i, i);
> }
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += doubleMap.get(RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkDoubleBinarySearch() {
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += Arrays.binarySearch(_sortedDoubles, RANDOM.nextInt(_cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkStringMap() {
> Object2IntOpenHashMap<String> stringMap = new Object2IntOpenHashMap<>(_cardinality);
> for (int i = 0; i < _cardinality; i++) {
> stringMap.put(Integer.toString(i + _cardinality), i);
> }
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += stringMap.getInt(Integer.toString(RANDOM.nextInt(_cardinality) + _cardinality));
> }
> return result;
> }
>
> @Benchmark
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> public int benchmarkStringBinarySearch() {
> int result = 0;
> for (int i = 0; i < NUM_LOOK_UPS; i++) {
> result += Arrays.binarySearch(_sortedStrings, Integer.toString(RANDOM.nextInt(_cardinality) + _cardinality));
> }
> return result;
> }
>
> public static void main(String[] args)
> throws Exception {
> Options opt =
> new OptionsBuilder().include(BenchmarkBinarySearch.class.getSimpleName()).warmupTime(TimeValue.seconds(5))
> .warmupIterations(2).measurementTime(TimeValue.seconds(10)).measurementIterations(3).forks(1).build();
>
> new Runner(opt).run();
> }
> }
> ```
How do you know how much extra time it takes to build a column based on this benchmark? Shouldn’t it benchmark segment build time instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r901025316
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -100,8 +97,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toLongArray();
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
Review Comment:
I don't want to introduce an extra per-value check (other checks are per column). IMO it is okay for it to throw NPE if someone uses it incorrectly. Added an assert
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r899754796
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPreIndexStatsCollector.java:
##########
@@ -113,8 +111,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toArray(new BigDecimal[0]);
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
+ _sortedValues = _values.toArray(new BigDecimal[0]);
+ _values = null;
Review Comment:
That shouldn't be needed. Once the `_values` is set to `null`, the whole map can be garbage collected. `_values.clear()` will introduce extra overhead, but not reduce the garbage
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r901020485
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -100,8 +97,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toLongArray();
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
Review Comment:
If there is no sorting after seal, we should also throw exception in collect(Object) if _sealed=true to avoid any unexpected behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8846:
URL: https://github.com/apache/pinot/pull/8846#issuecomment-1148400449
You don't mention replacing constant time lookups with logarithmic time lookups in `SegmentDictionaryCreator` making this a space/time tradeoff. It would be good to have some numbers to demonstrate the tradeoff is worthwhile, and to mention it in the PR description to help someone track down the root cause of a regression if it happens.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] jackjlli commented on a diff in pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
jackjlli commented on code in PR #8846:
URL: https://github.com/apache/pinot/pull/8846#discussion_r891664722
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPreIndexStatsCollector.java:
##########
@@ -113,8 +111,11 @@ public int getCardinality() {
@Override
public void seal() {
- _sortedValues = _values.toArray(new BigDecimal[0]);
- Arrays.sort(_sortedValues);
- _sealed = true;
+ if (!_sealed) {
+ _sortedValues = _values.toArray(new BigDecimal[0]);
+ _values = null;
Review Comment:
Should we consider invoking `_values.clear()` method before setting it `null`? Same for all the other stats collectors.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/LongColumnPreIndexStatsCollector.java:
##########
@@ -57,10 +56,10 @@ public void collect(Object entry) {
}
}
- void addressSorted(long entry) {
- if (_isSorted) {
+ private void addressSorted(long entry) {
Review Comment:
I saw some of the `addressSorted` method use `private` but some don't. It'd be good to unify them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang merged pull request #8846: Reduce the heap memory usage for segment creation
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #8846:
URL: https://github.com/apache/pinot/pull/8846
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org