You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2024/02/02 18:18:46 UTC

[PR] [draft] Mode null benchmark [pinot]

gortiz opened a new pull request, #12354:
URL: https://github.com/apache/pinot/pull/12354

   There are some questions about the performance on the implementation proposed in #12227.
   
   This PR modifies the code to add several Mode implementations that can be selected by using `mode(col, MIN, number)`, where number is a literal from 0 to 5 that selects the specific implementation.
   
   The code is not the best and the idea is not to merge this PR as it is but to use it to make it easier to benchmark different solutions.
   
   
   I would like to repeat the tests with longer executions. I may do that next week. Also notice that mode 1 does not support null handling
   
   Results in AMD Ryzen 9 3900X:
   ```
                                  (_aQueryTemplate)  (_impl)  (_nullHandling)   Mode  Cnt  Score   Error   Units
       select mode(value, 'MIN', %s) from benchmark        0             true  thrpt   25  0.998 ± 0.111  ops/ms
       select mode(value, 'MIN', %s) from benchmark        2             true  thrpt   25  0.921 ± 0.143  ops/ms
       select mode(value, 'MIN', %s) from benchmark        3             true  thrpt   25  0.994 ± 0.137  ops/ms
       select mode(value, 'MIN', %s) from benchmark        4             true  thrpt   25  1.026 ± 0.129  ops/ms
       select mode(value, 'MIN', %s) from benchmark        5             true  thrpt   25  0.978 ± 0.135  ops/ms
       select mode(value, 'MIN', %s) from benchmark        1            false  thrpt   25  0.974 ± 0.135  ops/ms
       select mode(value, 'MIN', %s) from benchmark        0            false  thrpt   25  1.014 ± 0.139  ops/ms
       select mode(value, 'MIN', %s) from benchmark        2            false  thrpt   25  0.909 ± 0.119  ops/ms
       select mode(value, 'MIN', %s) from benchmark        3            false  thrpt   25  0.953 ± 0.111  ops/ms
       select mode(value, 'MIN', %s) from benchmark        4            false  thrpt   25  0.975 ± 0.136  ops/ms
       select mode(value, 'MIN', %s) from benchmark        5            false  thrpt   25  1.050 ± 0.120  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        0             true  thrpt   25  1.921 ± 0.161  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        2             true  thrpt   25  1.351 ± 0.160  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        3             true  thrpt   25  1.456 ± 0.151  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        4             true  thrpt   25  1.703 ± 0.179  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        5             true  thrpt   25  1.742 ± 0.197  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        0            false  thrpt   25  2.041 ± 0.154  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        1            false  thrpt   25  2.023 ± 0.151  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        2            false  thrpt   25  1.308 ± 0.138  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        3            false  thrpt   25  1.715 ± 0.259  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        4            false  thrpt   25  1.579 ± 0.173  ops/ms
   select mode(valueDict, 'MIN', %s) from benchmark        5            false  thrpt   25  1.812 ± 0.176  ops/ms
   ```
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948104199

   I've applied similar changes in SUM and results are:
   
   In AMD Ryzen 9 3900X, Java 11:
   
   ```
   Benchmark                                              (_aQueryTemplate)        (_impl)  (_nullHandling)   Mode  Cnt  Score   Error   Units
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal             true  thrpt   25  1.868 ± 0.153  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal            false  thrpt   25  1.868 ± 0.172  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble             true  thrpt   25  1.884 ± 0.145  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble            false  thrpt   25  1.841 ± 0.175  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive             true  thrpt   25  1.942 ± 0.153  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive            false  thrpt   25  1.869 ± 0.159  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder             true  thrpt   25  1.918 ± 0.165  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder            false  thrpt   25  1.941 ± 0.150  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal             true  thrpt   25  1.911 ± 0.178  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal            false  thrpt   25  1.879 ± 0.162  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble             true  thrpt   25  1.869 ± 0.171  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble            false  thrpt   25  1.856 ± 0.157  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive             true  thrpt   25  1.884 ± 0.196  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive            false  thrpt   25  1.915 ± 0.172  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder             true  thrpt   25  1.922 ± 0.167  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder            false  thrpt   25  1.966 ± 0.172  ops/ms
   ```
   
   In AMD Ryzen 9 3900X, Java 21:
   ```
   Benchmark                                              (_aQueryTemplate)        (_impl)  (_nullHandling)   Mode  Cnt  Score   Error   Units
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal             true  thrpt   25  2.477 ± 0.091  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal            false  thrpt   25  2.509 ± 0.074  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble             true  thrpt   25  2.609 ± 0.064  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble            false  thrpt   25  2.501 ± 0.083  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive             true  thrpt   25  2.652 ± 0.106  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive            false  thrpt   25  2.627 ± 0.095  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder             true  thrpt   25  2.664 ± 0.087  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder            false  thrpt   25  2.701 ± 0.097  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal             true  thrpt   25  2.428 ± 0.072  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal            false  thrpt   25  2.450 ± 0.049  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble             true  thrpt   25  2.451 ± 0.073  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble            false  thrpt   25  2.503 ± 0.055  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive             true  thrpt   25  2.573 ± 0.067  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive            false  thrpt   25  2.567 ± 0.084  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder             true  thrpt   25  2.664 ± 0.081  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder            false  thrpt   25  2.622 ± 0.074  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961194752

   That is the interesting part. Aggregation operators implemented before did not used RoaringBitmap that well and they were iterating over it. Instead here we are not doing that. We are looking for the next null value. In this benchmark that is specially useful because we have 127 not null values followed by a single null value and we repeat that pattern all the time, so we end up executing 127 rows in a loop.
   
   We can use the benchmark to play with different distributions, where a distribution where odd positions are null and even positions are not null (or the other way around) will probably be the worst distribution possible.
   
   > Micro-benchmark is usually not accurate when too much code is tested (especially multi-threaded). Can you set up a benchmark to only test the aggregation function itself? We may construct a BlockValSet and pass it into the function.
   
   We can think the other way around. Benchmarking this way we get results closer to the actual performance we would get. It shouldn't be actually problematic to spend twice much time with null handling if the operation is actually a 5% of the total execution time. On the other hand, we can see that some patterns are significantly slower, which means that the difference there is actually important. For example with `mode` we can see that version 2 is significantly slower than version 5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961924193

   I tried the following micro-benchmark and see very interesting result:
   
   Code:
   ```
   package org.apache.pinot.perf;
   
   import java.io.IOException;
   import java.math.BigDecimal;
   import java.util.List;
   import java.util.Map;
   import java.util.concurrent.TimeUnit;
   import java.util.function.LongSupplier;
   import org.apache.pinot.common.request.context.ExpressionContext;
   import org.apache.pinot.core.common.BlockValSet;
   import org.apache.pinot.core.plan.DocIdSetPlanNode;
   import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
   import org.apache.pinot.core.query.aggregation.DoubleAggregationResultHolder;
   import org.apache.pinot.core.query.aggregation.function.AggregationFunction;
   import org.apache.pinot.core.query.aggregation.function.SumAggregationFunction;
   import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldDouble;
   import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldHolder;
   import org.apache.pinot.core.query.aggregation.function.SumAggregationFunctionFoldPrimitive;
   import org.apache.pinot.segment.spi.index.reader.Dictionary;
   import org.apache.pinot.spi.data.FieldSpec.DataType;
   import org.jetbrains.annotations.Nullable;
   import org.openjdk.jmh.annotations.Benchmark;
   import org.openjdk.jmh.annotations.BenchmarkMode;
   import org.openjdk.jmh.annotations.Fork;
   import org.openjdk.jmh.annotations.Level;
   import org.openjdk.jmh.annotations.Measurement;
   import org.openjdk.jmh.annotations.Mode;
   import org.openjdk.jmh.annotations.OutputTimeUnit;
   import org.openjdk.jmh.annotations.Param;
   import org.openjdk.jmh.annotations.Scope;
   import org.openjdk.jmh.annotations.Setup;
   import org.openjdk.jmh.annotations.State;
   import org.openjdk.jmh.annotations.Warmup;
   import org.openjdk.jmh.infra.Blackhole;
   import org.openjdk.jmh.runner.Runner;
   import org.openjdk.jmh.runner.RunnerException;
   import org.openjdk.jmh.runner.options.Options;
   import org.openjdk.jmh.runner.options.OptionsBuilder;
   import org.roaringbitmap.RoaringBitmap;
   
   
   @Fork(1)
   @BenchmarkMode(Mode.Throughput)
   @Warmup(iterations = 2, time = 1)
   @Measurement(iterations = 3, time = 1)
   @OutputTimeUnit(TimeUnit.MILLISECONDS)
   @State(Scope.Benchmark)
   public class BenchmarkSumAggregation2 {
     public static void main(String[] args)
         throws RunnerException {
       Options opt = new OptionsBuilder().include(BenchmarkSumAggregation2.class.getSimpleName())
   //        .addProfiler(GCProfiler.class)
           .build();
   
       new Runner(opt).run();
     }
   
     private static final ExpressionContext EXPR = ExpressionContext.forIdentifier("col");
   
     @Param({"true", "false"})
     public boolean _nullHandlingEnabled;
     @Param({"2", "4", "8", "16", "32", "64", "128"})
     public int _nullInterval;
     @Param({"normal", "foldDouble", "foldPrimitive"})
     public String _impl;
   
     private AggregationFunction _aggregationFunction;
     private AggregationResultHolder _resultHolder;
     private RoaringBitmap _nullBitmap;
     private long[] _values;
     private Map<ExpressionContext, BlockValSet> _blockValSetMap;
     private double _expectedSum;
     private double _nextExpectedSum;
   
     @Setup(Level.Trial)
     public void setup()
         throws IOException {
       switch (_impl) {
         case "normal":
           _aggregationFunction = new SumAggregationFunction(List.of(EXPR), _nullHandlingEnabled);
           break;
         case "foldDouble":
           _aggregationFunction = new SumAggregationFunctionFoldDouble(List.of(EXPR), _nullHandlingEnabled);
           break;
         case "foldPrimitive":
           _aggregationFunction = new SumAggregationFunctionFoldPrimitive(List.of(EXPR), _nullHandlingEnabled);
           break;
         case "foldHolder":
           _aggregationFunction = new SumAggregationFunctionFoldHolder(List.of(EXPR), _nullHandlingEnabled);
           break;
         default:
           throw new IllegalArgumentException("Unknown impl: " + _impl);
       }
       _resultHolder = _aggregationFunction.createAggregationResultHolder();
       _nullBitmap = createNullBitmap();
       _values = new long[DocIdSetPlanNode.MAX_DOC_PER_CALL];
       LongSupplier longSupplier = Distribution.createLongSupplier(42, "EXP(0.5)");
       for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i++) {
         _values[i] = longSupplier.getAsLong();
       }
       _expectedSum = 0;
       for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i++) {
         if (_nullBitmap == null || !_nullBitmap.contains(i)) {
           _expectedSum += _values[i];
         }
       }
       _nextExpectedSum = _expectedSum;
       _blockValSetMap = Map.of(EXPR, new BenchmarkBlockValSet(_nullBitmap, _values));
     }
   
     private RoaringBitmap createNullBitmap() {
       if (_nullHandlingEnabled) {
         RoaringBitmap nullBitmap = new RoaringBitmap();
         for (int i = 0; i < DocIdSetPlanNode.MAX_DOC_PER_CALL; i += _nullInterval) {
           nullBitmap.add(i);
         }
         return nullBitmap;
       } else {
         return null;
       }
     }
   
     private static class BenchmarkBlockValSet implements BlockValSet {
   
       final RoaringBitmap _nullBitmap;
       final long[] _values;
   
       private BenchmarkBlockValSet(@Nullable RoaringBitmap nullBitmap, long[] values) {
         _nullBitmap = nullBitmap;
         _values = values;
       }
   
       @Nullable
       @Override
       public RoaringBitmap getNullBitmap() {
         return _nullBitmap;
       }
   
       @Override
       public DataType getValueType() {
         return DataType.LONG;
       }
   
       @Override
       public boolean isSingleValue() {
         return true;
       }
   
       @Nullable
       @Override
       public Dictionary getDictionary() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public int[] getDictionaryIdsSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public int[] getIntValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public long[] getLongValuesSV() {
         return _values;
       }
   
       @Override
       public float[] getFloatValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public double[] getDoubleValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public BigDecimal[] getBigDecimalValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public String[] getStringValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public byte[][] getBytesValuesSV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public int[][] getDictionaryIdsMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public int[][] getIntValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public long[][] getLongValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public float[][] getFloatValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public double[][] getDoubleValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public String[][] getStringValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public byte[][][] getBytesValuesMV() {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public int[] getNumMVEntries() {
         throw new UnsupportedOperationException();
       }
     }
   
     @Benchmark
     public void test(Blackhole bh) {
       _aggregationFunction.aggregate(DocIdSetPlanNode.MAX_DOC_PER_CALL, _resultHolder, _blockValSetMap);
       if (_resultHolder instanceof DoubleAggregationResultHolder) {
         double result = _resultHolder.getDoubleResult();
         if (result != _nextExpectedSum) {
           throw new IllegalStateException("Expected: " + _nextExpectedSum + ", got: " + result);
         }
         bh.consume(result);
       } else {
         Double result = _resultHolder.getResult();
         if (result != _nextExpectedSum) {
           throw new IllegalStateException("Expected: " + _nextExpectedSum + ", got: " + _resultHolder.getResult());
         }
         bh.consume(result);
       }
       _nextExpectedSum += _expectedSum;
     }
   }
   ```
   
   Result (`foldHolder` doesn't give correct result, please take a look):
   ```
   Benchmark                            (_impl)  (_nullHandlingEnabled)  (_nullInterval)   Mode  Cnt    Score     Error   Units
   BenchmarkSumAggregation2.test         normal                    true                2  thrpt    3   39.276 ±  20.887  ops/ms
   BenchmarkSumAggregation2.test         normal                    true                4  thrpt    3    2.328 ±   3.333  ops/ms
   BenchmarkSumAggregation2.test         normal                    true                8  thrpt    3    3.280 ±   1.050  ops/ms
   BenchmarkSumAggregation2.test         normal                    true               16  thrpt    3    4.128 ±   1.031  ops/ms
   BenchmarkSumAggregation2.test         normal                    true               32  thrpt    3    4.516 ±   0.757  ops/ms
   BenchmarkSumAggregation2.test         normal                    true               64  thrpt    3    5.817 ±   0.127  ops/ms
   BenchmarkSumAggregation2.test         normal                    true              128  thrpt    3    5.833 ±   0.155  ops/ms
   BenchmarkSumAggregation2.test         normal                   false                2  thrpt    3   91.698 ±  16.053  ops/ms
   BenchmarkSumAggregation2.test         normal                   false                4  thrpt    3   90.171 ±  12.357  ops/ms
   BenchmarkSumAggregation2.test         normal                   false                8  thrpt    3   93.684 ±   4.769  ops/ms
   BenchmarkSumAggregation2.test         normal                   false               16  thrpt    3   91.339 ±  49.780  ops/ms
   BenchmarkSumAggregation2.test         normal                   false               32  thrpt    3   92.072 ±  13.054  ops/ms
   BenchmarkSumAggregation2.test         normal                   false               64  thrpt    3   92.704 ±   4.640  ops/ms
   BenchmarkSumAggregation2.test         normal                   false              128  thrpt    3   91.891 ±  20.353  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true                2  thrpt    3   28.216 ±   0.871  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true                4  thrpt    3   49.684 ±  12.845  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true                8  thrpt    3   88.009 ±  24.841  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true               16  thrpt    3  123.312 ±  11.984  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true               32  thrpt    3  116.148 ±  65.820  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true               64  thrpt    3  123.562 ±  23.135  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                    true              128  thrpt    3  115.864 ± 116.102  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false                2  thrpt    3   89.577 ±  69.898  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false                4  thrpt    3   91.662 ±  35.556  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false                8  thrpt    3   91.258 ±  17.352  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false               16  thrpt    3   90.990 ±  27.492  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false               32  thrpt    3   91.151 ±  22.358  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false               64  thrpt    3   92.060 ±   3.458  ops/ms
   BenchmarkSumAggregation2.test     foldDouble                   false              128  thrpt    3   91.715 ±   4.820  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                2  thrpt    3   26.680 ±   7.063  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                4  thrpt    3   46.278 ±   5.670  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                8  thrpt    3   80.658 ±  33.080  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               16  thrpt    3  149.492 ±  30.142  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               32  thrpt    3  225.837 ±  63.126  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               64  thrpt    3  186.245 ±  24.884  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true              128  thrpt    3  198.177 ±  73.624  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                2  thrpt    3  364.593 ±  44.298  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                4  thrpt    3  359.073 ± 194.996  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                8  thrpt    3  362.277 ±  92.352  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               16  thrpt    3  362.172 ±  52.064  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               32  thrpt    3  350.423 ± 193.227  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               64  thrpt    3  350.067 ± 107.294  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false              128  thrpt    3  361.977 ± 128.563  ops/ms
   ```
   
   As you can see, new implementation is giving much better performance when null is enabled, and similar performance when null is disabled. `foldPrimitive` gives even better performance, even though we cannot use it because of the potential overflow.
   
   I tried this on my laptop. Can you try this benchmark on your work station and see if you can produce the results? Maybe we can get more insights from it.
   
   Some improvements we should do for the new implementation:
   - When null is enabled, we should use `ObjectAggregationResultHolder` to handle the case of all values are `null`
   - Fix the `foldHolder` implementation to give correct result


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948072486

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `2236 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`38d86b0`)](https://app.codecov.io/gh/apache/pinot/commit/38d86b0a6432e9a7249f1692ace36b6e34171b0a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.74% compared to head [(`402d587`)](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 0.00%.
   > Report is 1 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...aggregation/function/ModeAggregationFunction3.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjMuamF2YQ==) | 0.00% | [377 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...aggregation/function/ModeAggregationFunction1.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjEuamF2YQ==) | 0.00% | [374 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...aggregation/function/ModeAggregationFunction2.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjIuamF2YQ==) | 0.00% | [362 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...aggregation/function/ModeAggregationFunction5.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjUuamF2YQ==) | 0.00% | [329 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...aggregation/function/ModeAggregationFunction4.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Nb2RlQWdncmVnYXRpb25GdW5jdGlvbjQuamF2YQ==) | 0.00% | [318 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...nction/NullableSingleInputAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9OdWxsYWJsZVNpbmdsZUlucHV0QWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | 0.00% | [174 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...ion/function/SumAggregationFunctionFoldHolder.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZEhvbGRlci5qYXZh) | 0.00% | [86 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...ion/function/SumAggregationFunctionFoldDouble.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZERvdWJsZS5qYXZh) | 0.00% | [84 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [.../function/SumAggregationFunctionFoldPrimitive.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9TdW1BZ2dyZWdhdGlvbkZ1bmN0aW9uRm9sZFByaW1pdGl2ZS5qYXZh) | 0.00% | [84 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...gregation/function/AggregationFunctionFactory.java](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9BZ2dyZWdhdGlvbkZ1bmN0aW9uRmFjdG9yeS5qYXZh) | 0.00% | [20 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | ... and [5 more](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12354       +/-   ##
   =============================================
   - Coverage     61.74%    0.00%   -61.75%     
   + Complexity      207        6      -201     
   =============================================
     Files          2436     2370       -66     
     Lines        133108   131584     -1524     
     Branches      20617    20521       -96     
   =============================================
   - Hits          82191        6    -82185     
   - Misses        44876   131578    +86702     
   + Partials       6041        0     -6041     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.63%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.73%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.75%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12354/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964199109

   Extra investigations:
   ## Why performance in `normal` with `null handling` decreases so much when going from 2 to 4 interval?
   
   I've checked with perfnorm:
   ```
   Benchmark                                                 (_impl)  (_nullHandlingEnabled)  (_nullInterval)   Mode  Cnt        Score   Error      Units
   BenchmarkSumAggregation2.test                              normal                    true                2  thrpt    5      103.386 ± 3.852     ops/ms
   BenchmarkSumAggregation2.test:CPI                          normal                    true                2  thrpt             0.177          clks/insn
   BenchmarkSumAggregation2.test:IPC                          normal                    true                2  thrpt             5.660          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses        normal                    true                2  thrpt          1321.767               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads              normal                    true                2  thrpt         15314.726               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses        normal                    true                2  thrpt             1.106               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads              normal                    true                2  thrpt           329.697               #/op
   BenchmarkSumAggregation2.test:branch-misses                normal                    true                2  thrpt             8.702               #/op
   BenchmarkSumAggregation2.test:branches                     normal                    true                2  thrpt         60461.749               #/op
   BenchmarkSumAggregation2.test:cycles                       normal                    true                2  thrpt         41090.010               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses             normal                    true                2  thrpt             0.301               #/op
   BenchmarkSumAggregation2.test:dTLB-loads                   normal                    true                2  thrpt             2.005               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses             normal                    true                2  thrpt             0.048               #/op
   BenchmarkSumAggregation2.test:iTLB-loads                   normal                    true                2  thrpt             0.219               #/op
   BenchmarkSumAggregation2.test:instructions                 normal                    true                2  thrpt        232553.341               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend       normal                    true                2  thrpt         25474.941               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend      normal                    true                2  thrpt            57.691               #/op
   BenchmarkSumAggregation2.test                              normal                    true                4  thrpt    5        3.525 ± 0.129     ops/ms
   BenchmarkSumAggregation2.test:CPI                          normal                    true                4  thrpt             0.378          clks/insn
   BenchmarkSumAggregation2.test:IPC                          normal                    true                4  thrpt             2.645          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses        normal                    true                4  thrpt          1689.903               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads              normal                    true                4  thrpt        685807.174               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses        normal                    true                4  thrpt            21.785               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads              normal                    true                4  thrpt         10603.011               #/op
   BenchmarkSumAggregation2.test:branch-misses                normal                    true                4  thrpt         15379.544               #/op
   BenchmarkSumAggregation2.test:branches                     normal                    true                4  thrpt        817946.383               #/op
   BenchmarkSumAggregation2.test:cycles                       normal                    true                4  thrpt       1223502.258               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses             normal                    true                4  thrpt             2.123               #/op
   BenchmarkSumAggregation2.test:dTLB-loads                   normal                    true                4  thrpt            16.638               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses             normal                    true                4  thrpt             0.639               #/op
   BenchmarkSumAggregation2.test:iTLB-loads                   normal                    true                4  thrpt             2.910               #/op
   BenchmarkSumAggregation2.test:instructions                 normal                    true                4  thrpt       3236651.002               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend       normal                    true                4  thrpt        116767.982               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend      normal                    true                4  thrpt         17092.450               #/op
   ```
   
   AFAIU there are far more cache failures and mispredictions.
   
   ## Why performance in `foldDouble` with `null handling` increases so much when going from 4 to 8 interval?
   
   Same reason:
   ```
   BenchmarkSumAggregation2.test                          foldDouble                    true                4  thrpt    5       68.321 ± 0.344     ops/ms
   BenchmarkSumAggregation2.test:CPI                      foldDouble                    true                4  thrpt             0.234          clks/insn
   BenchmarkSumAggregation2.test:IPC                      foldDouble                    true                4  thrpt             4.264          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses    foldDouble                    true                4  thrpt          2382.564               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads          foldDouble                    true                4  thrpt        103018.730               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses    foldDouble                    true                4  thrpt             4.476               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads          foldDouble                    true                4  thrpt           402.647               #/op
   BenchmarkSumAggregation2.test:branch-misses            foldDouble                    true                4  thrpt             5.758               #/op
   BenchmarkSumAggregation2.test:branches                 foldDouble                    true                4  thrpt         41678.023               #/op
   BenchmarkSumAggregation2.test:cycles                   foldDouble                    true                4  thrpt         64669.720               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses         foldDouble                    true                4  thrpt            15.780               #/op
   BenchmarkSumAggregation2.test:dTLB-loads               foldDouble                    true                4  thrpt            21.655               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses         foldDouble                    true                4  thrpt             0.501               #/op
   BenchmarkSumAggregation2.test:iTLB-loads               foldDouble                    true                4  thrpt             0.600               #/op
   BenchmarkSumAggregation2.test:instructions             foldDouble                    true                4  thrpt        275779.218               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend   foldDouble                    true                4  thrpt         37349.094               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend  foldDouble                    true                4  thrpt           700.245               #/op
   BenchmarkSumAggregation2.test                          foldDouble                    true                8  thrpt    5      120.523 ± 1.062     ops/ms
   BenchmarkSumAggregation2.test:CPI                      foldDouble                    true                8  thrpt             0.233          clks/insn
   BenchmarkSumAggregation2.test:IPC                      foldDouble                    true                8  thrpt             4.293          insns/clk
   BenchmarkSumAggregation2.test:L1-dcache-load-misses    foldDouble                    true                8  thrpt          1838.574               #/op
   BenchmarkSumAggregation2.test:L1-dcache-loads          foldDouble                    true                8  thrpt         56420.620               #/op
   BenchmarkSumAggregation2.test:L1-icache-load-misses    foldDouble                    true                8  thrpt             1.978               #/op
   BenchmarkSumAggregation2.test:L1-icache-loads          foldDouble                    true                8  thrpt           217.917               #/op
   BenchmarkSumAggregation2.test:branch-misses            foldDouble                    true                8  thrpt             2.651               #/op
   BenchmarkSumAggregation2.test:branches                 foldDouble                    true                8  thrpt         21608.391               #/op
   BenchmarkSumAggregation2.test:cycles                   foldDouble                    true                8  thrpt         34548.798               #/op
   BenchmarkSumAggregation2.test:dTLB-load-misses         foldDouble                    true                8  thrpt             8.186               #/op
   BenchmarkSumAggregation2.test:dTLB-loads               foldDouble                    true                8  thrpt            10.305               #/op
   BenchmarkSumAggregation2.test:iTLB-load-misses         foldDouble                    true                8  thrpt             0.277               #/op
   BenchmarkSumAggregation2.test:iTLB-loads               foldDouble                    true                8  thrpt             0.263               #/op
   BenchmarkSumAggregation2.test:instructions             foldDouble                    true                8  thrpt        148330.222               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-backend   foldDouble                    true                8  thrpt         21706.154               #/op
   BenchmarkSumAggregation2.test:stalled-cycles-frontend  foldDouble                    true                8  thrpt           302.209               #/op
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964228389

   I've created another benchmark where the null distribution is not constant but random while keeping a single null value per given number of rows (aka interval).
   
   Also in order to mess up the compilation I've changed the benchmark to do not always run with or without null handling but changing the calls in the same JVM by providing a null percentage. Every trial will toss a coin to check whether nulls will be used or not. In order to have enough samples, I've increased the number of trials and reduced the trial length.
   
   I've also sorted the params to first group resutls by null interval, then percentage and finally implementation
   
   Results are:
   ```
   Benchmark                      (_aNullInterval)  (_nullHandlingEnabledPerCent)    (_zImpl)   Mode  Cnt    Score    Error   Units
   BenchmarkSumAggregation3.test                 2                            100      normal  thrpt   50  111.205 ±  1.330  ops/ms
   BenchmarkSumAggregation3.test                 2                            100  foldDouble  thrpt   50   31.637 ±  0.562  ops/ms
   BenchmarkSumAggregation3.test                 2                             50      normal  thrpt   50  125.432 ± 11.036  ops/ms
   BenchmarkSumAggregation3.test                 2                             50  foldDouble  thrpt   50   60.079 ± 17.668  ops/ms
   BenchmarkSumAggregation3.test                 2                              0      normal  thrpt   50  142.717 ±  1.249  ops/ms
   BenchmarkSumAggregation3.test                 2                              0  foldDouble  thrpt   50  140.213 ±  0.543  ops/ms
   BenchmarkSumAggregation3.test                 4                            100      normal  thrpt   50    4.304 ±  0.104  ops/ms
   BenchmarkSumAggregation3.test                 4                            100  foldDouble  thrpt   50   52.099 ±  0.310  ops/ms
   BenchmarkSumAggregation3.test                 4                             50      normal  thrpt   50   81.429 ± 34.278  ops/ms
   BenchmarkSumAggregation3.test                 4                             50  foldDouble  thrpt   50  108.721 ± 19.590  ops/ms
   BenchmarkSumAggregation3.test                 4                              0      normal  thrpt   50  144.474 ±  0.680  ops/ms
   BenchmarkSumAggregation3.test                 4                              0  foldDouble  thrpt   50  143.460 ±  1.093  ops/ms
   BenchmarkSumAggregation3.test                 8                            100      normal  thrpt   50    4.801 ±  0.018  ops/ms
   BenchmarkSumAggregation3.test                 8                            100  foldDouble  thrpt   50   85.509 ±  0.574  ops/ms
   BenchmarkSumAggregation3.test                 8                             50      normal  thrpt   50   81.977 ± 34.200  ops/ms
   BenchmarkSumAggregation3.test                 8                             50  foldDouble  thrpt   50  134.188 ±  3.026  ops/ms
   BenchmarkSumAggregation3.test                 8                              0      normal  thrpt   50  141.820 ±  0.859  ops/ms
   BenchmarkSumAggregation3.test                 8                              0  foldDouble  thrpt   50  144.061 ±  0.633  ops/ms
   BenchmarkSumAggregation3.test                16                            100      normal  thrpt   50    6.259 ±  0.044  ops/ms
   BenchmarkSumAggregation3.test                16                            100  foldDouble  thrpt   50  119.675 ±  0.945  ops/ms
   BenchmarkSumAggregation3.test                16                             50      normal  thrpt   50   84.264 ± 34.529  ops/ms
   BenchmarkSumAggregation3.test                16                             50  foldDouble  thrpt   50  158.657 ±  7.734  ops/ms
   BenchmarkSumAggregation3.test                16                              0      normal  thrpt   50  145.898 ±  0.665  ops/ms
   BenchmarkSumAggregation3.test                16                              0  foldDouble  thrpt   50  144.811 ±  0.740  ops/ms
   BenchmarkSumAggregation3.test                32                            100      normal  thrpt   50    7.463 ±  0.034  ops/ms
   BenchmarkSumAggregation3.test                32                            100  foldDouble  thrpt   50  148.919 ±  1.352  ops/ms
   BenchmarkSumAggregation3.test                32                             50      normal  thrpt   50   84.360 ± 34.162  ops/ms
   BenchmarkSumAggregation3.test                32                             50  foldDouble  thrpt   50  167.707 ± 12.256  ops/ms
   BenchmarkSumAggregation3.test                32                              0      normal  thrpt   50  146.683 ±  0.838  ops/ms
   BenchmarkSumAggregation3.test                32                              0  foldDouble  thrpt   50  144.886 ±  0.741  ops/ms
   BenchmarkSumAggregation3.test                64                            100      normal  thrpt   50    7.122 ±  0.125  ops/ms
   BenchmarkSumAggregation3.test                64                            100  foldDouble  thrpt   50  160.228 ±  0.353  ops/ms
   BenchmarkSumAggregation3.test                64                             50      normal  thrpt   50   82.165 ± 33.204  ops/ms
   BenchmarkSumAggregation3.test                64                             50  foldDouble  thrpt   50  161.608 ± 12.202  ops/ms
   BenchmarkSumAggregation3.test                64                              0      normal  thrpt   50  146.049 ±  0.583  ops/ms
   BenchmarkSumAggregation3.test                64                              0  foldDouble  thrpt   50  140.151 ±  0.255  ops/ms
   BenchmarkSumAggregation3.test               128                            100      normal  thrpt   50   10.146 ±  0.124  ops/ms
   BenchmarkSumAggregation3.test               128                            100  foldDouble  thrpt   50  175.641 ±  1.888  ops/ms
   BenchmarkSumAggregation3.test               128                             50      normal  thrpt   50   84.023 ± 32.846  ops/ms
   BenchmarkSumAggregation3.test               128                             50  foldDouble  thrpt   50  153.464 ±  7.716  ops/ms
   BenchmarkSumAggregation3.test               128                              0      normal  thrpt   50  145.157 ±  0.685  ops/ms
   BenchmarkSumAggregation3.test               128                              0  foldDouble  thrpt   50  144.998 ±  0.738  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1947365697

   For benchmark purpose, can we try the same idea on a cheaper aggregation such as `SUM`? This way we can amplify the performance impact


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948171973

   I've repeated the benchmark with:
   - Java 21
   - 1M rows per segment (instead of 10k)
   - 3 consecutive nulls every 3 not null rows (instead of 1 every 127)
   
   And the results are:
   ```
   Benchmark                                              (_aQueryTemplate)        (_impl)  (_nullHandling)   Mode  Cnt  Score   Error   Units
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal             true  thrpt   25  0.172 ± 0.005  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark         normal            false  thrpt   25  0.175 ± 0.005  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble             true  thrpt   25  0.156 ± 0.006  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldDouble            false  thrpt   25  0.158 ± 0.006  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive             true  thrpt   25  0.206 ± 0.006  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark  foldPrimitive            false  thrpt   25  0.197 ± 0.016  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder             true  thrpt   25  0.200 ± 0.031  ops/ms
   BenchmarkSumAggregation.test  select sum(valueLong, '%s') from benchmark     foldHolder            false  thrpt   25  0.218 ± 0.018  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal             true  thrpt   25  0.174 ± 0.005  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark         normal            false  thrpt   25  0.175 ± 0.005  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble             true  thrpt   25  0.151 ± 0.012  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldDouble            false  thrpt   25  0.151 ± 0.012  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive             true  thrpt   25  0.189 ± 0.020  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark  foldPrimitive            false  thrpt   25  0.190 ± 0.016  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder             true  thrpt   25  0.190 ± 0.010  ops/ms
   BenchmarkSumAggregation.test   select sum(valueInt, '%s') from benchmark     foldHolder            false  thrpt   25  0.194 ± 0.018  ops/ms
   ```
   
   So it seems some of the solutions are even faster than the original one, although the ones that go faster are a bit different than the original code, as they actually add ints or longs and then the value is added into the double, which is obviously more expensive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948107488

   We could try to execute the same benchmark changing the null distribution and increasing the size of the segments, but TBH the difference seems very small when we compare with real costs we may have in an actual environment (including IO, larger memory usage, etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963704893

   BTW, there is another change in this implementation (although I guess we are not planning to merge it).
   
   Results in the current implementation are less precise than the new implementations because new implementations are not casting ints to doubles on each iteration but after each batch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963827711

   So the speedup in the new operations is explained because we are actually doing a better work in the new implementations by using primitive values. These implementations:
   - Are faster because they use the int/long value, which may be optimized by the compiler.
   - Provide correct results when the sum is not outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc).
   - Overflow when the partial sum is outside the original range (Integer.MAX_VALUE, Long.MAX_VALUE, etc). Remember that the current implementation also overflow if the value end up being higher than Double.MAX_VALUE, although that is very unlikely.
   
   There is an interesting topic here. Should we create these alternative implementations in order to provide correct results with extra speedup at the cost of possible overflow? Should we make them the default implementations for V2?
   
   But that is a discussion we can have later. The matter here is the null handling implementation. Therefore I would ignore `foldPrimitive` and `foldHolder`, which are changing semantics, and focus on `normal` vs `foldDouble`, which keeps semantics. 
   
   As we can see in the results:
   - Comparing `normal` and `foldDouble` when `nullHandling` is disabled: It doesn't seem the introduced lambda adds a performance penalty.
   - Comparing `normal` and `foldDouble` when `nullHandling` is enabled: 
   - We can also notice an increase in performance when we have at least 15 not null consecutive values
   
   TBH I don't know why we get more performance when null handling enabled with enough not null consecutive values. But assuming we use `foldNotNull` instead of the single row methods in `NullableSingleInputAggregationFunction`, I think we can assume performance won't be affected significantly.
   
   These are the values in my computer:
   ```
   Benchmark                         (_impl)  (_nullHandlingEnabled)  (_nullInterval)   Mode  Cnt    Score   Error   Units
   BenchmarkSumAggregation2.test      normal                    true                2  thrpt    5  103.210 ± 9.108  ops/ms
   BenchmarkSumAggregation2.test      normal                    true                4  thrpt    5    4.062 ± 0.026  ops/ms
   BenchmarkSumAggregation2.test      normal                    true                8  thrpt    5    4.377 ± 0.045  ops/ms
   BenchmarkSumAggregation2.test      normal                    true               16  thrpt    5    5.164 ± 0.068  ops/ms
   BenchmarkSumAggregation2.test      normal                    true               32  thrpt    5    5.935 ± 0.296  ops/ms
   BenchmarkSumAggregation2.test      normal                    true               64  thrpt    5    8.044 ± 0.335  ops/ms
   BenchmarkSumAggregation2.test      normal                    true              128  thrpt    5    9.003 ± 0.143  ops/ms
   BenchmarkSumAggregation2.test      normal                   false                2  thrpt    5  140.407 ± 0.573  ops/ms
   BenchmarkSumAggregation2.test      normal                   false                4  thrpt    5  141.040 ± 0.451  ops/ms
   BenchmarkSumAggregation2.test      normal                   false                8  thrpt    5  141.907 ± 0.992  ops/ms
   BenchmarkSumAggregation2.test      normal                   false               16  thrpt    5  143.397 ± 1.548  ops/ms
   BenchmarkSumAggregation2.test      normal                   false               32  thrpt    5  139.698 ± 1.044  ops/ms
   BenchmarkSumAggregation2.test      normal                   false               64  thrpt    5  144.736 ± 1.227  ops/ms
   BenchmarkSumAggregation2.test      normal                   false              128  thrpt    5  144.443 ± 1.678  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true                2  thrpt    5   34.237 ± 1.460  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true                4  thrpt    5   65.086 ± 3.262  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true                8  thrpt    5  120.121 ± 2.080  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true               16  thrpt    5  182.891 ± 2.179  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true               32  thrpt    5  191.954 ± 1.011  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true               64  thrpt    5  197.672 ± 2.340  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                    true              128  thrpt    5  187.932 ± 5.005  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false                2  thrpt    5  142.703 ± 3.093  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false                4  thrpt    5  139.740 ± 2.003  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false                8  thrpt    5  145.153 ± 3.140  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false               16  thrpt    5  143.858 ± 2.861  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false               32  thrpt    5  139.753 ± 1.024  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false               64  thrpt    5  142.077 ± 0.810  ops/ms
   BenchmarkSumAggregation2.test  foldDouble                   false              128  thrpt    5  142.829 ± 1.676  ops/ms
   ```
   
   For the record, these are the results when using the other two options (with foldHolder fixed):
   
   ```
   BenchmarkSumAggregation2.test     foldHolder                    true                2  thrpt    5   43.157 ±   1.498  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true                4  thrpt    5   68.783 ±   5.989  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true                8  thrpt    5  124.738 ±   3.362  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true               16  thrpt    5  216.012 ±  10.644  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true               32  thrpt    5  313.113 ±  21.122  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true               64  thrpt    5  392.115 ±   6.565  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                    true              128  thrpt    5  413.473 ±  19.703  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false                2  thrpt    5  426.726 ±   5.831  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false                4  thrpt    5  423.828 ±   8.957  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false                8  thrpt    5  415.301 ±   4.804  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false               16  thrpt    5  425.916 ±   5.353  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false               32  thrpt    5  419.126 ±   3.485  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false               64  thrpt    5  427.603 ±   3.293  ops/ms
   BenchmarkSumAggregation2.test     foldHolder                   false              128  thrpt    5  413.514 ±   3.203  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                2  thrpt    5   34.018 ±   0.853  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                4  thrpt    5   63.797 ±   0.683  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true                8  thrpt    5  117.642 ±   2.266  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               16  thrpt    5  158.315 ±   7.338  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               32  thrpt    5  188.807 ±  35.955  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true               64  thrpt    5  222.271 ± 111.049  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                    true              128  thrpt    5  221.113 ± 110.499  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                2  thrpt    5  416.386 ±   3.038  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                4  thrpt    5  423.796 ±   3.550  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false                8  thrpt    5  414.014 ±   3.438  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               16  thrpt    5  413.562 ±   3.760  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               32  thrpt    5  420.902 ±  13.165  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false               64  thrpt    5  412.881 ±  17.209  ops/ms
   BenchmarkSumAggregation2.test  foldPrimitive                   false              128  thrpt    5  426.167 ±   7.631  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Mode null benchmark [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12354:
URL: https://github.com/apache/pinot/pull/12354#issuecomment-1960600876

   The result is a little bit counter intuitive. In most cases, when null handling is enabled, it should be much slower than when it is disabled because we need to pay overhead of iterating over the null bitmap, which should be much more expensive than simply summing up all values. Is it possible that other part of the query execution is dominating the results, or some of the execution is skipped?
   Micro-benchmark is usually not accurate when too much code is tested (especially multi-threaded). Can you set up a benchmark to only test the aggregation function itself? We may construct a `BlockValSet` and pass it into the function. After the execution, we can use a blackhole to consume the value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org