You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/28 23:02:20 UTC
[GitHub] [incubator-pinot] mqliang commented on issue #6720: Benchmark setColumn() in DataTableBuilder and write all values one by one instead of using rowId and colId to position if need be

mqliang commented on issue #6720:
URL: https://github.com/apache/incubator-pinot/issues/6720#issuecomment-808974129


   I write a bench mark here: https://github.com/mqliang/incubator-pinot/commit/a32a61aad5dfa6b6c4a09064c75926b00495cd3a
   
   The benchmark compares three ways to build a data table:
   * `BenchmarkDataTableRowIdColIdBuildInOrder`: for each row, call `dataTableBuilder.setColumn(colId, value);` to set the value for each column. However, set value of columns in order of: 1st col, 2nd col, 3nd col...
   * `BenchmarkDataTableRowIdColIdBuildRandomOrder`: call `dataTableBuilder.setColumn(colId, value);` to set the value for each column. However, set value of columns in ramdom order, e.g. in order of : 11st col, 20st col, 3nd col, 5st col...
   * `BenchmarkDataTableRowBulkBuild`: for each row, first put values of all column into a `Object[]` array, then build the row in bulk -- write all values one by one, without calling `ByteBuffer.position()` at all.
   
   The result of building a table of 100 rows is:
   ```
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild
   
   # Run progress: 0.00% complete, ETA 00:08:00
   # Fork: 1 of 1
   # Warmup Iteration   1: 198.463 us/op
   Iteration   1: 157.345 us/op
   Iteration   2: 157.778 us/op
   Iteration   3: 155.212 us/op
   Iteration   4: 154.870 us/op
   Iteration   5: 153.947 us/op
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild":
     155.830 ±(99.9%) 6.368 us/op [Average]
     (min, avg, max) = (153.947, 155.830, 157.778), stdev = 1.654
     CI (99.9%): [149.462, 162.198] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder
   
   # Run progress: 33.33% complete, ETA 00:05:21
   # Fork: 1 of 1
   # Warmup Iteration   1: 193.779 us/op
   Iteration   1: 150.726 us/op
   Iteration   2: 150.649 us/op
   Iteration   3: 151.587 us/op
   Iteration   4: 151.765 us/op
   Iteration   5: 151.749 us/op
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder":
     151.295 ±(99.9%) 2.155 us/op [Average]
     (min, avg, max) = (150.649, 151.295, 151.765), stdev = 0.560
     CI (99.9%): [149.140, 153.451] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=56968:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder
   
   # Run progress: 66.67% complete, ETA 00:02:40
   # Fork: 1 of 1
   # Warmup Iteration   1: 216.635 us/op
   Iteration   1: 175.108 us/op
   Iteration   2: 174.428 us/op
   Iteration   3: 178.706 us/op
   Iteration   4: 180.284 us/op
   Iteration   5: 178.219 us/op
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder":
     177.349 ±(99.9%) 9.581 us/op [Average]
     (min, avg, max) = (174.428, 177.349, 180.284), stdev = 2.488
     CI (99.9%): [167.768, 186.930] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:08:02
   
   REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
   why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
   experiments, perform baseline and negative tests that provide experimental control, make sure
   the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
   Do not assume the numbers tell you what you want them to tell.
   
   Benchmark                                                                 Mode  Cnt    Score   Error  Units
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild                avgt    5  155.830 ± 6.368  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder      avgt    5  151.295 ± 2.155  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder  avgt    5  177.349 ± 9.581  us/op
   
   Process finished with exit code 0
   ``` 
   
   The result shows there is not significant difference between  BenchmarkDataTableRowBulkBuild and BenchmarkDataTableRowIdColIdBuildInOrder. Which means: even if we use `setColumn(colId, value)` to build datatable, as long as we set values for columns in increasing order, not randomly, the overhead of calling `ByteBuffer.position()` is negligible.
   
   Currently, our code base set values for columns in increasing order, so there is no need to address the TODO from the point of view of improving performance. But from the code cleaning point of view, we we can provide such a `setColumnValuesInBulk()` method, and change all current
   datatable building code to use  `setColumnValuesInBulk()`. This way, our code is more self-explainable -- setting column value in bulk (in increasing order) is better than setting in random order. However, all `setColumn(colId, value)` methods should be kept, since there may be some circumstance we need to set a value for an arbitrary row/col.
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org