You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2024/02/02 18:18:46 UTC
[PR] [draft] Mode null benchmark [pinot]
gortiz opened a new pull request, #12354:
URL: https://github.com/apache/pinot/pull/12354
There are some questions about the performance on the implementation proposed in #12227.
This PR modifies the code to add several Mode implementations that can be selected by using `mode(col, MIN, number)`, where number is a literal from 0 to 5 that selects the specific implementation.
The code is not the best and the idea is not to merge this PR as it is but to use it to make it easier to benchmark different solutions.
I would like to repeat the tests with longer executions. I may do that next week. Also notice that mode 1 does not support null handling
Results in AMD Ryzen 9 3900X:
```
(_aQueryTemplate) (_impl) (_nullHandling) Mode Cnt Score Error Units
select mode(value, 'MIN', %s) from benchmark 0 true thrpt 25 0.998 ± 0.111 ops/ms
select mode(value, 'MIN', %s) from benchmark 2 true thrpt 25 0.921 ± 0.143 ops/ms
select mode(value, 'MIN', %s) from benchmark 3 true thrpt 25 0.994 ± 0.137 ops/ms
select mode(value, 'MIN', %s) from benchmark 4 true thrpt 25 1.026 ± 0.129 ops/ms
select mode(value, 'MIN', %s) from benchmark 5 true thrpt 25 0.978 ± 0.135 ops/ms
select mode(value, 'MIN', %s) from benchmark 1 false thrpt 25 0.974 ± 0.135 ops/ms
select mode(value, 'MIN', %s) from benchmark 0 false thrpt 25 1.014 ± 0.139 ops/ms
select mode(value, 'MIN', %s) from benchmark 2 false thrpt 25 0.909 ± 0.119 ops/ms
select mode(value, 'MIN', %s) from benchmark 3 false thrpt 25 0.953 ± 0.111 ops/ms
select mode(value, 'MIN', %s) from benchmark 4 false thrpt 25 0.975 ± 0.136 ops/ms
select mode(value, 'MIN', %s) from benchmark 5 false thrpt 25 1.050 ± 0.120 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 0 true thrpt 25 1.921 ± 0.161 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 2 true thrpt 25 1.351 ± 0.160 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 3 true thrpt 25 1.456 ± 0.151 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 4 true thrpt 25 1.703 ± 0.179 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 5 true thrpt 25 1.742 ± 0.197 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 0 false thrpt 25 2.041 ± 0.154 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 1 false thrpt 25 2.023 ± 0.151 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 2 false thrpt 25 1.308 ± 0.138 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 3 false thrpt 25 1.715 ± 0.259 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 4 false thrpt 25 1.579 ± 0.173 ops/ms
select mode(valueDict, 'MIN', %s) from benchmark 5 false thrpt 25 1.812 ± 0.176 ops/ms
```
cc @Jackie-Jiang
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948104199
I've applied similar changes in SUM and results are:
In AMD Ryzen 9 3900X, Java 11:
```
Benchmark (_aQueryTemplate) (_impl) (_nullHandling) Mode Cnt Score Error Units
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal true thrpt 25 1.868 ± 0.153 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal false thrpt 25 1.868 ± 0.172 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble true thrpt 25 1.884 ± 0.145 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble false thrpt 25 1.841 ± 0.175 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive true thrpt 25 1.942 ± 0.153 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive false thrpt 25 1.869 ± 0.159 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder true thrpt 25 1.918 ± 0.165 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder false thrpt 25 1.941 ± 0.150 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal true thrpt 25 1.911 ± 0.178 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal false thrpt 25 1.879 ± 0.162 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble true thrpt 25 1.869 ± 0.171 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble false thrpt 25 1.856 ± 0.157 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive true thrpt 25 1.884 ± 0.196 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive false thrpt 25 1.915 ± 0.172 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder true thrpt 25 1.922 ± 0.167 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder false thrpt 25 1.966 ± 0.172 ops/ms
```
In AMD Ryzen 9 3900X, Java 21:
```
Benchmark (_aQueryTemplate) (_impl) (_nullHandling) Mode Cnt Score Error Units
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal true thrpt 25 2.477 ± 0.091 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal false thrpt 25 2.509 ± 0.074 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble true thrpt 25 2.609 ± 0.064 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble false thrpt 25 2.501 ± 0.083 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive true thrpt 25 2.652 ± 0.106 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive false thrpt 25 2.627 ± 0.095 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder true thrpt 25 2.664 ± 0.087 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder false thrpt 25 2.701 ± 0.097 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal true thrpt 25 2.428 ± 0.072 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal false thrpt 25 2.450 ± 0.049 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble true thrpt 25 2.451 ± 0.073 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble false thrpt 25 2.503 ± 0.055 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive true thrpt 25 2.573 ± 0.067 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive false thrpt 25 2.567 ± 0.084 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder true thrpt 25 2.664 ± 0.081 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder false thrpt 25 2.622 ± 0.074 ops/ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961194752
That is the interesting part. Aggregation operators implemented before did not used RoaringBitmap that well and they were iterating over it. Instead here we are not doing that. We are looking for the next null value. In this benchmark that is specially useful because we have 127 not null values followed by a single null value and we repeat that pattern all the time, so we end up executing 127 rows in a loop.
We can use the benchmark to play with different distributions, where a distribution where odd positions are null and even positions are not null (or the other way around) will probably be the worst distribution possible.
> Micro-benchmark is usually not accurate when too much code is tested (especially multi-threaded). Can you set up a benchmark to only test the aggregation function itself? We may construct a BlockValSet and pass it into the function.
We can think the other way around. Benchmarking this way we get results closer to the actual performance we would get. It shouldn't be actually problematic to spend twice much time with null handling if the operation is actually a 5% of the total execution time. On the other hand, we can see that some patterns are significantly slower, which means that the difference there is actually important. For example with `mode` we can see that version 2 is significantly slower than version 5.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961924193
I tried the following micro-benchmark and see very interesting result:
Code:
```
package org.apache.pinot.perf;
import java.io.IOException;
import java.math.BigDecimal;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.function.LongSupplier;
import org.apache.pinot.common.request.context.ExpressionContext;
import org.apache.pinot.core.common.BlockValSet;
import org.apache.pinot.core.plan.DocIdSetPlanNode;
import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
import org.apache.pinot.core.query.aggregation.DoubleAggregationResultHolder;
import org.apache.pinot.core.query.aggregation.function.AggregationFunction;
import org.apache.pinot.core.query.aggregation.function.SumAggregationFunction;
import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldDouble;
import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldHolder;
import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldPrimitive;
import org.apache.pinot.segment.spi.index.reader.Dictionary;
import org.apache.pinot.spi.data.FieldSpec.DataType;
import org.jetbrains.annotations.Nullable;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.roaringbitmap.RoaringBitmap;
@Fork(1)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 2, time = 1)
@Measurement(iterations = 3, time = 1)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class BenchmarkSumAggregation2 {
public static void main(String[] args)
throws RunnerException {
Options opt = new OptionsBuilder().include(BenchmarkSumAggregation2.class.getSimpleName())
// .addProfiler(GCProfiler.class)
.build();
new Runner(opt).run();
}
private static final ExpressionContext EXPR = ExpressionContext.forIdentifier("col");
@Param({"true", "false"})
public boolean _nullHandlingEnabled;
@Param({"2", "4", "8", "16", "32", "64", "128"})
public int _nullInterval;
@Param({"normal", "foldDouble", "foldPrimitive"})
public String _impl;
private AggregationFunction _aggregationFunction;
private AggregationResultHolder _resultHolder;
private RoaringBitmap _nullBitmap;
private long[] _values;
private Map<ExpressionContext, BlockValSet> _blockValSetMap;
private double _expectedSum;
private double _nextExpectedSum;
@Setup(Level.Trial)
public void setup()
throws IOException {
switch (_impl) {
case "normal":
_aggregationFunction = new SumAggregationFunction(List.of(EXPR), _nullHandlingEnabled);
break;
case "foldDouble":
_aggregationFunction = new SumAggregationFunctionFoldDouble(List.of(EXPR), _nullHandlingEnabled);
break;
case "foldPrimitive":
_aggregationFunction = new SumAggregationFunctionFoldPrimitive(List.of(EXPR), _nullHandlingEnabled);
break;
case "foldHolder":
_aggregationFunction = new SumAggregationFunctionFoldHolder(List.of(EXPR), _nullHandlingEnabled);
break;
default:
throw new IllegalArgumentException("Unknown impl: " + _impl);
}
_resultHolder = _aggregationFunction.createAggregationResultHolder();
_nullBitmap = createNullBitmap();
_values = new long[DocIdSetPlanNode.MAX_DOC_PER_CALL];
LongSupplier longSupplier = Distribution.createLongSupplier(42, "EXP(0.5)");
for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i++) {
_values[i] = longSupplier.getAsLong();
}
_expectedSum = 0;
for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i++) {
if (_nullBitmap == null || !_nullBitmap.contains(i)) {
_expectedSum += _values[i];
}
}
_nextExpectedSum = _expectedSum;
_blockValSetMap = Map.of(EXPR, new BenchmarkBlockValSet(_nullBitmap, _values));
}
private RoaringBitmap createNullBitmap() {
if (_nullHandlingEnabled) {
RoaringBitmap nullBitmap = new RoaringBitmap();
for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i += _nullInterval) {
nullBitmap.add(i);
}
return nullBitmap;
} else {
return null;
}
}
private static class BenchmarkBlockValSet implements BlockValSet {
final RoaringBitmap _nullBitmap;
final long[] _values;
private BenchmarkBlockValSet(@Nullable RoaringBitmap nullBitmap, long[] values) {
_nullBitmap = nullBitmap;
_values = values;
}
@Nullable
@Override
public RoaringBitmap getNullBitmap() {
return _nullBitmap;
}
@Override
public DataType getValueType() {
return DataType.LONG;
}
@Override
public boolean isSingleValue() {
return true;
}
@Nullable
@Override
public Dictionary getDictionary() {
throw new UnsupportedOperationException();
}
@Override
public int[] getDictionaryIdsSV() {
throw new UnsupportedOperationException();
}
@Override
public int[] getIntValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public long[] getLongValuesSV() {
return _values;
}
@Override
public float[] getFloatValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public double[] getDoubleValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public BigDecimal[] getBigDecimalValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public String[] getStringValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public byte[][] getBytesValuesSV() {
throw new UnsupportedOperationException();
}
@Override
public int[][] getDictionaryIdsMV() {
throw new UnsupportedOperationException();
}
@Override
public int[][] getIntValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public long[][] getLongValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public float[][] getFloatValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public double[][] getDoubleValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public String[][] getStringValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public byte[][][] getBytesValuesMV() {
throw new UnsupportedOperationException();
}
@Override
public int[] getNumMVEntries() {
throw new UnsupportedOperationException();
}
}
@Benchmark
public void test(Blackhole bh) {
_aggregationFunction.aggregate(DocIdSetPlanNode.MAX_DOC_PER_CALL, _resultHolder, _blockValSetMap);
if (_resultHolder instanceof DoubleAggregationResultHolder) {
double result = _resultHolder.getDoubleResult();
if (result != _nextExpectedSum) {
throw new IllegalStateException("Expected: " + _nextExpectedSum + ", got: " + result);
}
bh.consume(result);
} else {
Double result = _resultHolder.getResult();
if (result != _nextExpectedSum) {
throw new IllegalStateException("Expected: " + _nextExpectedSum + ", got: " + _resultHolder.getResult());
}
bh.consume(result);
}
_nextExpectedSum += _expectedSum;
}
}
```
Result (`foldHolder` doesn't give correct result, please take a look):
```
Benchmark (_impl) (_nullHandlingEnabled) (_nullInterval) Mode Cnt Score Error Units
BenchmarkSumAggregation2.test normal true 2 thrpt 3 39.276 ± 20.887 ops/ms
BenchmarkSumAggregation2.test normal true 4 thrpt 3 2.328 ± 3.333 ops/ms
BenchmarkSumAggregation2.test normal true 8 thrpt 3 3.280 ± 1.050 ops/ms
BenchmarkSumAggregation2.test normal true 16 thrpt 3 4.128 ± 1.031 ops/ms
BenchmarkSumAggregation2.test normal true 32 thrpt 3 4.516 ± 0.757 ops/ms
BenchmarkSumAggregation2.test normal true 64 thrpt 3 5.817 ± 0.127 ops/ms
BenchmarkSumAggregation2.test normal true 128 thrpt 3 5.833 ± 0.155 ops/ms
BenchmarkSumAggregation2.test normal false 2 thrpt 3 91.698 ± 16.053 ops/ms
BenchmarkSumAggregation2.test normal false 4 thrpt 3 90.171 ± 12.357 ops/ms
BenchmarkSumAggregation2.test normal false 8 thrpt 3 93.684 ± 4.769 ops/ms
BenchmarkSumAggregation2.test normal false 16 thrpt 3 91.339 ± 49.780 ops/ms
BenchmarkSumAggregation2.test normal false 32 thrpt 3 92.072 ± 13.054 ops/ms
BenchmarkSumAggregation2.test normal false 64 thrpt 3 92.704 ± 4.640 ops/ms
BenchmarkSumAggregation2.test normal false 128 thrpt 3 91.891 ± 20.353 ops/ms
BenchmarkSumAggregation2.test foldDouble true 2 thrpt 3 28.216 ± 0.871 ops/ms
BenchmarkSumAggregation2.test foldDouble true 4 thrpt 3 49.684 ± 12.845 ops/ms
BenchmarkSumAggregation2.test foldDouble true 8 thrpt 3 88.009 ± 24.841 ops/ms
BenchmarkSumAggregation2.test foldDouble true 16 thrpt 3 123.312 ± 11.984 ops/ms
BenchmarkSumAggregation2.test foldDouble true 32 thrpt 3 116.148 ± 65.820 ops/ms
BenchmarkSumAggregation2.test foldDouble true 64 thrpt 3 123.562 ± 23.135 ops/ms
BenchmarkSumAggregation2.test foldDouble true 128 thrpt 3 115.864 ± 116.102 ops/ms
BenchmarkSumAggregation2.test foldDouble false 2 thrpt 3 89.577 ± 69.898 ops/ms
BenchmarkSumAggregation2.test foldDouble false 4 thrpt 3 91.662 ± 35.556 ops/ms
BenchmarkSumAggregation2.test foldDouble false 8 thrpt 3 91.258 ± 17.352 ops/ms
BenchmarkSumAggregation2.test foldDouble false 16 thrpt 3 90.990 ± 27.492 ops/ms
BenchmarkSumAggregation2.test foldDouble false 32 thrpt 3 91.151 ± 22.358 ops/ms
BenchmarkSumAggregation2.test foldDouble false 64 thrpt 3 92.060 ± 3.458 ops/ms
BenchmarkSumAggregation2.test foldDouble false 128 thrpt 3 91.715 ± 4.820 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 2 thrpt 3 26.680 ± 7.063 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 4 thrpt 3 46.278 ± 5.670 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 8 thrpt 3 80.658 ± 33.080 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 16 thrpt 3 149.492 ± 30.142 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 32 thrpt 3 225.837 ± 63.126 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 64 thrpt 3 186.245 ± 24.884 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 128 thrpt 3 198.177 ± 73.624 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 2 thrpt 3 364.593 ± 44.298 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 4 thrpt 3 359.073 ± 194.996 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 8 thrpt 3 362.277 ± 92.352 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 16 thrpt 3 362.172 ± 52.064 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 32 thrpt 3 350.423 ± 193.227 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 64 thrpt 3 350.067 ± 107.294 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 128 thrpt 3 361.977 ± 128.563 ops/ms
```
As you can see, new implementation is giving much better performance when null is enabled, and similar performance when null is disabled. `foldPrimitive` gives even better performance, even though we cannot use it because of the potential overflow.
I tried this on my laptop. Can you try this benchmark on your work station and see if you can produce the results? Maybe we can get more insights from it.
Some improvements we should do for the new implementation:
- When null is enabled, we should use `ObjectAggregationResultHolder` to handle the case of all values are `null`
- Fix the `foldHolder` implementation to give correct result
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948072486
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
Attention: `2236 lines` in your changes are missing coverage. Please review.
> Comparison is base [(`38d86b0`)](https://app.codecov.io/gh/apache/pinot/commit/38d86b0a6432e9a7249f1692ace36b6e34171b0a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.74% compared to head [(`402d587`)](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 0.00%.
> Report is 1 commits behind head on master.
| [Files](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
|---|---|---|
| [...aggregation/function/ModeAggregationFunction3.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjMuamF2YQ==) | 0.00% | [377 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...aggregation/function/ModeAggregationFunction1.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjEuamF2YQ==) | 0.00% | [374 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...aggregation/function/ModeAggregationFunction2.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjIuamF2YQ==) | 0.00% | [362 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...aggregation/function/ModeAggregationFunction5.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjUuamF2YQ==) | 0.00% | [329 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...aggregation/function/ModeAggregationFunction4.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjQuamF2YQ==) | 0.00% | [318 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...nction/NullableSingleInputAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9OdWxsYWJsZVNpbmdsZUlucHV0QWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | 0.00% | [174 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...ion/function/SumAggregationFunctionFoldHolder.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZEhvbGRlci5qYXZh) | 0.00% | [86 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...ion/function/SumAggregationFunctionFoldDouble.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZERvdWJsZS5qYXZh) | 0.00% | [84 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [.../function/SumAggregationFunctionFoldPrimitive.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZFByaW1pdGl2ZS5qYXZh) | 0.00% | [84 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...gregation/function/AggregationFunctionFactory.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9BZ2dyZWdhdGlvbkZ1bmN0aW9uRmFjdG9yeS5qYXZh) | 0.00% | [20 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| ... and [5 more](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
<details><summary>Additional details and impacted files</summary>
```diff
@@ Coverage Diff @@
## master #12354 +/- ##
=============================================
- Coverage 61.74% 0.00% -61.75%
+ Complexity 207 6 -201
=============================================
Files 2436 2370 -66
Lines 133108 131584 -1524
Branches 20617 20521 -96
=============================================
- Hits 82191 6 -82185
- Misses 44876 131578 +86702
+ Partials 6041 0 -6041
```
| [Flag](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [integration](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [integration1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
| [integration2](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
| [java-11](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [java-21](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.63%)` | :arrow_down: |
| [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.73%)` | :arrow_down: |
| [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [temurin](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.75%)` | :arrow_down: |
| [unittests](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
</details>
[:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964199109
Extra investigations:
## Why performance in `normal` with `null handling` decreases so much when going from 2 to 4 interval?
I've checked with perfnorm:
```
Benchmark (_impl) (_nullHandlingEnabled) (_nullInterval) Mode Cnt Score Error Units
BenchmarkSumAggregation2.test normal true 2 thrpt 5 103.386 ± 3.852 ops/ms
BenchmarkSumAggregation2.test:CPI normal true 2 thrpt 0.177 clks/insn
BenchmarkSumAggregation2.test:IPC normal true 2 thrpt 5.660 insns/clk
BenchmarkSumAggregation2.test:L1-dcache-load-misses normal true 2 thrpt 1321.767 #/op
BenchmarkSumAggregation2.test:L1-dcache-loads normal true 2 thrpt 15314.726 #/op
BenchmarkSumAggregation2.test:L1-icache-load-misses normal true 2 thrpt 1.106 #/op
BenchmarkSumAggregation2.test:L1-icache-loads normal true 2 thrpt 329.697 #/op
BenchmarkSumAggregation2.test:branch-misses normal true 2 thrpt 8.702 #/op
BenchmarkSumAggregation2.test:branches normal true 2 thrpt 60461.749 #/op
BenchmarkSumAggregation2.test:cycles normal true 2 thrpt 41090.010 #/op
BenchmarkSumAggregation2.test:dTLB-load-misses normal true 2 thrpt 0.301 #/op
BenchmarkSumAggregation2.test:dTLB-loads normal true 2 thrpt 2.005 #/op
BenchmarkSumAggregation2.test:iTLB-load-misses normal true 2 thrpt 0.048 #/op
BenchmarkSumAggregation2.test:iTLB-loads normal true 2 thrpt 0.219 #/op
BenchmarkSumAggregation2.test:instructions normal true 2 thrpt 232553.341 #/op
BenchmarkSumAggregation2.test:stalled-cycles-backend normal true 2 thrpt 25474.941 #/op
BenchmarkSumAggregation2.test:stalled-cycles-frontend normal true 2 thrpt 57.691 #/op
BenchmarkSumAggregation2.test normal true 4 thrpt 5 3.525 ± 0.129 ops/ms
BenchmarkSumAggregation2.test:CPI normal true 4 thrpt 0.378 clks/insn
BenchmarkSumAggregation2.test:IPC normal true 4 thrpt 2.645 insns/clk
BenchmarkSumAggregation2.test:L1-dcache-load-misses normal true 4 thrpt 1689.903 #/op
BenchmarkSumAggregation2.test:L1-dcache-loads normal true 4 thrpt 685807.174 #/op
BenchmarkSumAggregation2.test:L1-icache-load-misses normal true 4 thrpt 21.785 #/op
BenchmarkSumAggregation2.test:L1-icache-loads normal true 4 thrpt 10603.011 #/op
BenchmarkSumAggregation2.test:branch-misses normal true 4 thrpt 15379.544 #/op
BenchmarkSumAggregation2.test:branches normal true 4 thrpt 817946.383 #/op
BenchmarkSumAggregation2.test:cycles normal true 4 thrpt 1223502.258 #/op
BenchmarkSumAggregation2.test:dTLB-load-misses normal true 4 thrpt 2.123 #/op
BenchmarkSumAggregation2.test:dTLB-loads normal true 4 thrpt 16.638 #/op
BenchmarkSumAggregation2.test:iTLB-load-misses normal true 4 thrpt 0.639 #/op
BenchmarkSumAggregation2.test:iTLB-loads normal true 4 thrpt 2.910 #/op
BenchmarkSumAggregation2.test:instructions normal true 4 thrpt 3236651.002 #/op
BenchmarkSumAggregation2.test:stalled-cycles-backend normal true 4 thrpt 116767.982 #/op
BenchmarkSumAggregation2.test:stalled-cycles-frontend normal true 4 thrpt 17092.450 #/op
```
AFAIU there are far more cache failures and mispredictions.
## Why performance in `foldDouble` with `null handling` increases so much when going from 4 to 8 interval?
Same reason:
```
BenchmarkSumAggregation2.test foldDouble true 4 thrpt 5 68.321 ± 0.344 ops/ms
BenchmarkSumAggregation2.test:CPI foldDouble true 4 thrpt 0.234 clks/insn
BenchmarkSumAggregation2.test:IPC foldDouble true 4 thrpt 4.264 insns/clk
BenchmarkSumAggregation2.test:L1-dcache-load-misses foldDouble true 4 thrpt 2382.564 #/op
BenchmarkSumAggregation2.test:L1-dcache-loads foldDouble true 4 thrpt 103018.730 #/op
BenchmarkSumAggregation2.test:L1-icache-load-misses foldDouble true 4 thrpt 4.476 #/op
BenchmarkSumAggregation2.test:L1-icache-loads foldDouble true 4 thrpt 402.647 #/op
BenchmarkSumAggregation2.test:branch-misses foldDouble true 4 thrpt 5.758 #/op
BenchmarkSumAggregation2.test:branches foldDouble true 4 thrpt 41678.023 #/op
BenchmarkSumAggregation2.test:cycles foldDouble true 4 thrpt 64669.720 #/op
BenchmarkSumAggregation2.test:dTLB-load-misses foldDouble true 4 thrpt 15.780 #/op
BenchmarkSumAggregation2.test:dTLB-loads foldDouble true 4 thrpt 21.655 #/op
BenchmarkSumAggregation2.test:iTLB-load-misses foldDouble true 4 thrpt 0.501 #/op
BenchmarkSumAggregation2.test:iTLB-loads foldDouble true 4 thrpt 0.600 #/op
BenchmarkSumAggregation2.test:instructions foldDouble true 4 thrpt 275779.218 #/op
BenchmarkSumAggregation2.test:stalled-cycles-backend foldDouble true 4 thrpt 37349.094 #/op
BenchmarkSumAggregation2.test:stalled-cycles-frontend foldDouble true 4 thrpt 700.245 #/op
BenchmarkSumAggregation2.test foldDouble true 8 thrpt 5 120.523 ± 1.062 ops/ms
BenchmarkSumAggregation2.test:CPI foldDouble true 8 thrpt 0.233 clks/insn
BenchmarkSumAggregation2.test:IPC foldDouble true 8 thrpt 4.293 insns/clk
BenchmarkSumAggregation2.test:L1-dcache-load-misses foldDouble true 8 thrpt 1838.574 #/op
BenchmarkSumAggregation2.test:L1-dcache-loads foldDouble true 8 thrpt 56420.620 #/op
BenchmarkSumAggregation2.test:L1-icache-load-misses foldDouble true 8 thrpt 1.978 #/op
BenchmarkSumAggregation2.test:L1-icache-loads foldDouble true 8 thrpt 217.917 #/op
BenchmarkSumAggregation2.test:branch-misses foldDouble true 8 thrpt 2.651 #/op
BenchmarkSumAggregation2.test:branches foldDouble true 8 thrpt 21608.391 #/op
BenchmarkSumAggregation2.test:cycles foldDouble true 8 thrpt 34548.798 #/op
BenchmarkSumAggregation2.test:dTLB-load-misses foldDouble true 8 thrpt 8.186 #/op
BenchmarkSumAggregation2.test:dTLB-loads foldDouble true 8 thrpt 10.305 #/op
BenchmarkSumAggregation2.test:iTLB-load-misses foldDouble true 8 thrpt 0.277 #/op
BenchmarkSumAggregation2.test:iTLB-loads foldDouble true 8 thrpt 0.263 #/op
BenchmarkSumAggregation2.test:instructions foldDouble true 8 thrpt 148330.222 #/op
BenchmarkSumAggregation2.test:stalled-cycles-backend foldDouble true 8 thrpt 21706.154 #/op
BenchmarkSumAggregation2.test:stalled-cycles-frontend foldDouble true 8 thrpt 302.209 #/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964228389
I've created another benchmark where the null distribution is not constant but random while keeping a single null value per given number of rows (aka interval).
Also in order to mess up the compilation I've changed the benchmark to do not always run with or without null handling but changing the calls in the same JVM by providing a null percentage. Every trial will toss a coin to check whether nulls will be used or not. In order to have enough samples, I've increased the number of trials and reduced the trial length.
I've also sorted the params to first group resutls by null interval, then percentage and finally implementation
Results are:
```
Benchmark (_aNullInterval) (_nullHandlingEnabledPerCent) (_zImpl) Mode Cnt Score Error Units
BenchmarkSumAggregation3.test 2 100 normal thrpt 50 111.205 ± 1.330 ops/ms
BenchmarkSumAggregation3.test 2 100 foldDouble thrpt 50 31.637 ± 0.562 ops/ms
BenchmarkSumAggregation3.test 2 50 normal thrpt 50 125.432 ± 11.036 ops/ms
BenchmarkSumAggregation3.test 2 50 foldDouble thrpt 50 60.079 ± 17.668 ops/ms
BenchmarkSumAggregation3.test 2 0 normal thrpt 50 142.717 ± 1.249 ops/ms
BenchmarkSumAggregation3.test 2 0 foldDouble thrpt 50 140.213 ± 0.543 ops/ms
BenchmarkSumAggregation3.test 4 100 normal thrpt 50 4.304 ± 0.104 ops/ms
BenchmarkSumAggregation3.test 4 100 foldDouble thrpt 50 52.099 ± 0.310 ops/ms
BenchmarkSumAggregation3.test 4 50 normal thrpt 50 81.429 ± 34.278 ops/ms
BenchmarkSumAggregation3.test 4 50 foldDouble thrpt 50 108.721 ± 19.590 ops/ms
BenchmarkSumAggregation3.test 4 0 normal thrpt 50 144.474 ± 0.680 ops/ms
BenchmarkSumAggregation3.test 4 0 foldDouble thrpt 50 143.460 ± 1.093 ops/ms
BenchmarkSumAggregation3.test 8 100 normal thrpt 50 4.801 ± 0.018 ops/ms
BenchmarkSumAggregation3.test 8 100 foldDouble thrpt 50 85.509 ± 0.574 ops/ms
BenchmarkSumAggregation3.test 8 50 normal thrpt 50 81.977 ± 34.200 ops/ms
BenchmarkSumAggregation3.test 8 50 foldDouble thrpt 50 134.188 ± 3.026 ops/ms
BenchmarkSumAggregation3.test 8 0 normal thrpt 50 141.820 ± 0.859 ops/ms
BenchmarkSumAggregation3.test 8 0 foldDouble thrpt 50 144.061 ± 0.633 ops/ms
BenchmarkSumAggregation3.test 16 100 normal thrpt 50 6.259 ± 0.044 ops/ms
BenchmarkSumAggregation3.test 16 100 foldDouble thrpt 50 119.675 ± 0.945 ops/ms
BenchmarkSumAggregation3.test 16 50 normal thrpt 50 84.264 ± 34.529 ops/ms
BenchmarkSumAggregation3.test 16 50 foldDouble thrpt 50 158.657 ± 7.734 ops/ms
BenchmarkSumAggregation3.test 16 0 normal thrpt 50 145.898 ± 0.665 ops/ms
BenchmarkSumAggregation3.test 16 0 foldDouble thrpt 50 144.811 ± 0.740 ops/ms
BenchmarkSumAggregation3.test 32 100 normal thrpt 50 7.463 ± 0.034 ops/ms
BenchmarkSumAggregation3.test 32 100 foldDouble thrpt 50 148.919 ± 1.352 ops/ms
BenchmarkSumAggregation3.test 32 50 normal thrpt 50 84.360 ± 34.162 ops/ms
BenchmarkSumAggregation3.test 32 50 foldDouble thrpt 50 167.707 ± 12.256 ops/ms
BenchmarkSumAggregation3.test 32 0 normal thrpt 50 146.683 ± 0.838 ops/ms
BenchmarkSumAggregation3.test 32 0 foldDouble thrpt 50 144.886 ± 0.741 ops/ms
BenchmarkSumAggregation3.test 64 100 normal thrpt 50 7.122 ± 0.125 ops/ms
BenchmarkSumAggregation3.test 64 100 foldDouble thrpt 50 160.228 ± 0.353 ops/ms
BenchmarkSumAggregation3.test 64 50 normal thrpt 50 82.165 ± 33.204 ops/ms
BenchmarkSumAggregation3.test 64 50 foldDouble thrpt 50 161.608 ± 12.202 ops/ms
BenchmarkSumAggregation3.test 64 0 normal thrpt 50 146.049 ± 0.583 ops/ms
BenchmarkSumAggregation3.test 64 0 foldDouble thrpt 50 140.151 ± 0.255 ops/ms
BenchmarkSumAggregation3.test 128 100 normal thrpt 50 10.146 ± 0.124 ops/ms
BenchmarkSumAggregation3.test 128 100 foldDouble thrpt 50 175.641 ± 1.888 ops/ms
BenchmarkSumAggregation3.test 128 50 normal thrpt 50 84.023 ± 32.846 ops/ms
BenchmarkSumAggregation3.test 128 50 foldDouble thrpt 50 153.464 ± 7.716 ops/ms
BenchmarkSumAggregation3.test 128 0 normal thrpt 50 145.157 ± 0.685 ops/ms
BenchmarkSumAggregation3.test 128 0 foldDouble thrpt 50 144.998 ± 0.738 ops/ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1947365697
For benchmark purpose, can we try the same idea on a cheaper aggregation such as `SUM`? This way we can amplify the performance impact
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948171973
I've repeated the benchmark with:
- Java 21
- 1M rows per segment (instead of 10k)
- 3 consecutive nulls every 3 not null rows (instead of 1 every 127)
And the results are:
```
Benchmark (_aQueryTemplate) (_impl) (_nullHandling) Mode Cnt Score Error Units
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal true thrpt 25 0.172 ± 0.005 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark normal false thrpt 25 0.175 ± 0.005 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble true thrpt 25 0.156 ± 0.006 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldDouble false thrpt 25 0.158 ± 0.006 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive true thrpt 25 0.206 ± 0.006 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldPrimitive false thrpt 25 0.197 ± 0.016 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder true thrpt 25 0.200 ± 0.031 ops/ms
BenchmarkSumAggregation.test select sum(valueLong, '%s') from benchmark foldHolder false thrpt 25 0.218 ± 0.018 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal true thrpt 25 0.174 ± 0.005 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark normal false thrpt 25 0.175 ± 0.005 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble true thrpt 25 0.151 ± 0.012 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldDouble false thrpt 25 0.151 ± 0.012 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive true thrpt 25 0.189 ± 0.020 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldPrimitive false thrpt 25 0.190 ± 0.016 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder true thrpt 25 0.190 ± 0.010 ops/ms
BenchmarkSumAggregation.test select sum(valueInt, '%s') from benchmark foldHolder false thrpt 25 0.194 ± 0.018 ops/ms
```
So it seems some of the solutions are even faster than the original one, although the ones that go faster are a bit different than the original code, as they actually add ints or longs and then the value is added into the double, which is obviously more expensive
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948107488
We could try to execute the same benchmark changing the null distribution and increasing the size of the segments, but TBH the difference seems very small when we compare with real costs we may have in an actual environment (including IO, larger memory usage, etc)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963704893
BTW, there is another change in this implementation (although I guess we are not planning to merge it).
Results in the current implementation are less precise than the new implementations because new implementations are not casting ints to doubles on each iteration but after each batch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963827711
So the speedup in the new operations is explained because we are actually doing a better work in the new implementations by using primitive values. These implementations:
- Are faster because they use the int/long value, which may be optimized by the compiler.
- Provide correct results when the sum is not outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc).
- Overflow when the partial sum is outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc). Remember that the current implementation also overflow if the value end up being higher than Double.MAX_VALUE, although that is very unlikely.
There is an interesting topic here. Should we create these alternative implementations in order to provide correct results with extra speedup at the cost of possible overflow? Should we make them the default implementations for V2?
But that is a discussion we can have later. The matter here is the null handling implementation. Therefore I would ignore `foldPrimitive` and `foldHolder`, which are changing semantics, and focus on `normal` vs `foldDouble`, which keeps semantics.
As we can see in the results:
- Comparing `normal` and `foldDouble` when `nullHandling` is disabled: It doesn't seem the introduced lambda adds a performance penalty.
- Comparing `normal` and `foldDouble` when `nullHandling` is enabled:
- We can also notice an increase in performance when we have at least 15 not null consecutive values
TBH I don't know why we get more performance when null handling enabled with enough not null consecutive values. But assuming we use `foldNotNull` instead of the single row methods in `NullableSingleInputAggregationFunction`, I think we can assume performance won't be affected significantly.
These are the values in my computer:
```
Benchmark (_impl) (_nullHandlingEnabled) (_nullInterval) Mode Cnt Score Error Units
BenchmarkSumAggregation2.test normal true 2 thrpt 5 103.210 ± 9.108 ops/ms
BenchmarkSumAggregation2.test normal true 4 thrpt 5 4.062 ± 0.026 ops/ms
BenchmarkSumAggregation2.test normal true 8 thrpt 5 4.377 ± 0.045 ops/ms
BenchmarkSumAggregation2.test normal true 16 thrpt 5 5.164 ± 0.068 ops/ms
BenchmarkSumAggregation2.test normal true 32 thrpt 5 5.935 ± 0.296 ops/ms
BenchmarkSumAggregation2.test normal true 64 thrpt 5 8.044 ± 0.335 ops/ms
BenchmarkSumAggregation2.test normal true 128 thrpt 5 9.003 ± 0.143 ops/ms
BenchmarkSumAggregation2.test normal false 2 thrpt 5 140.407 ± 0.573 ops/ms
BenchmarkSumAggregation2.test normal false 4 thrpt 5 141.040 ± 0.451 ops/ms
BenchmarkSumAggregation2.test normal false 8 thrpt 5 141.907 ± 0.992 ops/ms
BenchmarkSumAggregation2.test normal false 16 thrpt 5 143.397 ± 1.548 ops/ms
BenchmarkSumAggregation2.test normal false 32 thrpt 5 139.698 ± 1.044 ops/ms
BenchmarkSumAggregation2.test normal false 64 thrpt 5 144.736 ± 1.227 ops/ms
BenchmarkSumAggregation2.test normal false 128 thrpt 5 144.443 ± 1.678 ops/ms
BenchmarkSumAggregation2.test foldDouble true 2 thrpt 5 34.237 ± 1.460 ops/ms
BenchmarkSumAggregation2.test foldDouble true 4 thrpt 5 65.086 ± 3.262 ops/ms
BenchmarkSumAggregation2.test foldDouble true 8 thrpt 5 120.121 ± 2.080 ops/ms
BenchmarkSumAggregation2.test foldDouble true 16 thrpt 5 182.891 ± 2.179 ops/ms
BenchmarkSumAggregation2.test foldDouble true 32 thrpt 5 191.954 ± 1.011 ops/ms
BenchmarkSumAggregation2.test foldDouble true 64 thrpt 5 197.672 ± 2.340 ops/ms
BenchmarkSumAggregation2.test foldDouble true 128 thrpt 5 187.932 ± 5.005 ops/ms
BenchmarkSumAggregation2.test foldDouble false 2 thrpt 5 142.703 ± 3.093 ops/ms
BenchmarkSumAggregation2.test foldDouble false 4 thrpt 5 139.740 ± 2.003 ops/ms
BenchmarkSumAggregation2.test foldDouble false 8 thrpt 5 145.153 ± 3.140 ops/ms
BenchmarkSumAggregation2.test foldDouble false 16 thrpt 5 143.858 ± 2.861 ops/ms
BenchmarkSumAggregation2.test foldDouble false 32 thrpt 5 139.753 ± 1.024 ops/ms
BenchmarkSumAggregation2.test foldDouble false 64 thrpt 5 142.077 ± 0.810 ops/ms
BenchmarkSumAggregation2.test foldDouble false 128 thrpt 5 142.829 ± 1.676 ops/ms
```
For the record, these are the results when using the other two options (with foldHolder fixed):
```
BenchmarkSumAggregation2.test foldHolder true 2 thrpt 5 43.157 ± 1.498 ops/ms
BenchmarkSumAggregation2.test foldHolder true 4 thrpt 5 68.783 ± 5.989 ops/ms
BenchmarkSumAggregation2.test foldHolder true 8 thrpt 5 124.738 ± 3.362 ops/ms
BenchmarkSumAggregation2.test foldHolder true 16 thrpt 5 216.012 ± 10.644 ops/ms
BenchmarkSumAggregation2.test foldHolder true 32 thrpt 5 313.113 ± 21.122 ops/ms
BenchmarkSumAggregation2.test foldHolder true 64 thrpt 5 392.115 ± 6.565 ops/ms
BenchmarkSumAggregation2.test foldHolder true 128 thrpt 5 413.473 ± 19.703 ops/ms
BenchmarkSumAggregation2.test foldHolder false 2 thrpt 5 426.726 ± 5.831 ops/ms
BenchmarkSumAggregation2.test foldHolder false 4 thrpt 5 423.828 ± 8.957 ops/ms
BenchmarkSumAggregation2.test foldHolder false 8 thrpt 5 415.301 ± 4.804 ops/ms
BenchmarkSumAggregation2.test foldHolder false 16 thrpt 5 425.916 ± 5.353 ops/ms
BenchmarkSumAggregation2.test foldHolder false 32 thrpt 5 419.126 ± 3.485 ops/ms
BenchmarkSumAggregation2.test foldHolder false 64 thrpt 5 427.603 ± 3.293 ops/ms
BenchmarkSumAggregation2.test foldHolder false 128 thrpt 5 413.514 ± 3.203 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 2 thrpt 5 34.018 ± 0.853 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 4 thrpt 5 63.797 ± 0.683 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 8 thrpt 5 117.642 ± 2.266 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 16 thrpt 5 158.315 ± 7.338 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 32 thrpt 5 188.807 ± 35.955 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 64 thrpt 5 222.271 ± 111.049 ops/ms
BenchmarkSumAggregation2.test foldPrimitive true 128 thrpt 5 221.113 ± 110.499 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 2 thrpt 5 416.386 ± 3.038 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 4 thrpt 5 423.796 ± 3.550 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 8 thrpt 5 414.014 ± 3.438 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 16 thrpt 5 413.562 ± 3.760 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 32 thrpt 5 420.902 ± 13.165 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 64 thrpt 5 412.881 ± 17.209 ops/ms
BenchmarkSumAggregation2.test foldPrimitive false 128 thrpt 5 426.167 ± 7.631 ops/ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [draft] Mode null benchmark [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1960600876
The result is a little bit counter intuitive. In most cases, when null handling is enabled, it should be much slower than when it is disabled because we need to pay overhead of iterating over the null bitmap, which should be much more expensive than simply summing up all values. Is it possible that other part of the query execution is dominating the results, or some of the execution is skipped?
Micro-benchmark is usually not accurate when too much code is tested (especially multi-threaded). Can you set up a benchmark to only test the aggregation function itself? We may construct a `BlockValSet` and pass it into the function. After the execution, we can use a blackhole to consume the value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org