You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/23 06:54:08 UTC

[GitHub] [incubator-pinot] mqliang opened a new issue #6714: Benchmark data table serialization logic and pre-allocate byte[] array if need be

mqliang opened a new issue #6714:
URL: https://github.com/apache/incubator-pinot/issues/6714


   As @siddharthteotia pointed out in https://github.com/apache/incubator-pinot/pull/6710#discussion_r599240463_
   
   > serialization functions first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream. I think the reason for doing that is upfront we don't know the length of byte[] array to allocate.
   
   > However,  we can probably do different and it might be faster
   >* Write a loop to go over each entry and keep a running sum of size
   >* At the end of loop, allocate byte array of that size
   >* Start another loop and go over each entry again and fill out the pre-allocated byte array.
   >* Return the filled byte array
   
   We need to benchmark this two serialization approach. If the proposed approach is better, will send a PR to address it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on issue #6714: Benchmark data table serialization logic and pre-allocate byte[] array if need be

Posted by GitBox <gi...@apache.org>.
mqliang commented on issue #6714:
URL: https://github.com/apache/incubator-pinot/issues/6714#issuecomment-806403959


   I write a benchmark here: https://github.com/mqliang/incubator-pinot/commit/7892423579b20dafcb5802a09f20f826377f6c39
   
   The benchmark compares three serialization methods (serialize a typical metadata map):
   *  `temporaryOutputStream`: For each KV pair in metadata, first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream
   * `preAllocateByteArrayNative`:
       * loop to go over each entry and keep a running sum of size
       * At the end of loop, allocate byte array of that size
       * Start another loop and go over each entry again and fill out the pre-allocated byte array.
       * Return the filled byte array
       * key and values are encoded two times during two loop
    * `preAllocateByteArrayWithBytesCache`: same logic as `preAllocateByteArrayNative`, just add a cache to cache the encoded K/V so can be used in the second loop.
   
   Here is the result:
   ```
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative
   
   # Run progress: 0.00% complete, ETA 00:08:00
   # Fork: 1 of 1
   # Warmup Iteration   1: 552.178 us/op
   Iteration   1: 519.531 us/op
                    ·gc.alloc.rate:                   3270.480 MB/sec
                    ·gc.alloc.rate.norm:              1811608.009 B/op
                    ·gc.churn.PS_Eden_Space:          3275.114 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1814175.318 B/op
                    ·gc.churn.PS_Survivor_Space:      0.558 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 309.168 B/op
                    ·gc.count:                        525.000 counts
                    ·gc.time:                         261.000 ms
   
   Iteration   2: 524.659 us/op
                    ·gc.alloc.rate:                   3238.871 MB/sec
                    ·gc.alloc.rate.norm:              1811608.011 B/op
                    ·gc.churn.PS_Eden_Space:          3242.901 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1813862.347 B/op
                    ·gc.churn.PS_Survivor_Space:      0.563 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 314.968 B/op
                    ·gc.count:                        516.000 counts
                    ·gc.time:                         263.000 ms
   
   Iteration   3: 526.323 us/op
                    ·gc.alloc.rate:                   3228.230 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3232.024 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1813736.682 B/op
                    ·gc.churn.PS_Survivor_Space:      0.471 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 264.539 B/op
                    ·gc.count:                        470.000 counts
                    ·gc.time:                         254.000 ms
   
   Iteration   4: 521.779 us/op
                    ·gc.alloc.rate:                   3256.320 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3261.433 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1814452.617 B/op
                    ·gc.churn.PS_Survivor_Space:      0.560 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 311.772 B/op
                    ·gc.count:                        534.000 counts
                    ·gc.time:                         270.000 ms
   
   Iteration   5: 524.474 us/op
                    ·gc.alloc.rate:                   3239.855 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3242.045 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1812832.659 B/op
                    ·gc.churn.PS_Survivor_Space:      0.547 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 305.975 B/op
                    ·gc.count:                        483.000 counts
                    ·gc.time:                         255.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative":
     523.353 ±(99.9%) 10.345 us/op [Average]
     (min, avg, max) = (519.531, 523.353, 526.323), stdev = 2.687
     CI (99.9%): [513.008, 533.698] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate":
     3246.751 ±(99.9%) 64.066 MB/sec [Average]
     (min, avg, max) = (3228.230, 3246.751, 3270.480), stdev = 16.638
     CI (99.9%): [3182.685, 3310.818] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm":
     1811608.009 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1811608.008, 1811608.009, 1811608.011), stdev = 0.001
     CI (99.9%): [1811608.003, 1811608.014] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space":
     3250.704 ±(99.9%) 66.578 MB/sec [Average]
     (min, avg, max) = (3232.024, 3250.704, 3275.114), stdev = 17.290
     CI (99.9%): [3184.126, 3317.282] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm":
     1813811.924 ±(99.9%) 2365.646 B/op [Average]
     (min, avg, max) = (1812832.659, 1813811.924, 1814452.617), stdev = 614.351
     CI (99.9%): [1811446.279, 1816177.570] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space":
     0.540 ±(99.9%) 0.150 MB/sec [Average]
     (min, avg, max) = (0.471, 0.540, 0.563), stdev = 0.039
     CI (99.9%): [0.390, 0.690] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm":
     301.285 ±(99.9%) 80.118 B/op [Average]
     (min, avg, max) = (264.539, 301.285, 314.968), stdev = 20.806
     CI (99.9%): [221.166, 381.403] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count":
     2528.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (470.000, 505.600, 534.000), stdev = 27.700
     CI (99.9%): [2528.000, 2528.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time":
     1303.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (254.000, 260.600, 270.000), stdev = 6.504
     CI (99.9%): [1303.000, 1303.000] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache
   
   # Run progress: 33.33% complete, ETA 00:05:28
   # Fork: 1 of 1
   # Warmup Iteration   1: 390.616 us/op
   Iteration   1: 375.676 us/op
                    ·gc.alloc.rate:                   3524.091 MB/sec
                    ·gc.alloc.rate.norm:              1411608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3532.601 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1415016.587 B/op
                    ·gc.churn.PS_Survivor_Space:      0.538 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 215.400 B/op
                    ·gc.count:                        458.000 counts
                    ·gc.time:                         248.000 ms
   
   Iteration   2: 375.171 us/op
                    ·gc.alloc.rate:                   3528.907 MB/sec
                    ·gc.alloc.rate.norm:              1411608.006 B/op
                    ·gc.churn.PS_Eden_Space:          3534.356 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1413787.624 B/op
                    ·gc.churn.PS_Survivor_Space:      0.494 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 197.609 B/op
                    ·gc.count:                        435.000 counts
                    ·gc.time:                         247.000 ms
   
   Iteration   3: 373.233 us/op
                    ·gc.alloc.rate:                   3547.720 MB/sec
                    ·gc.alloc.rate.norm:              1411608.005 B/op
                    ·gc.churn.PS_Eden_Space:          3559.728 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1416385.929 B/op
                    ·gc.churn.PS_Survivor_Space:      0.539 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 214.343 B/op
                    ·gc.count:                        462.000 counts
                    ·gc.time:                         247.000 ms
   
   Iteration   4: 371.186 us/op
                    ·gc.alloc.rate:                   3567.068 MB/sec
                    ·gc.alloc.rate.norm:              1411608.006 B/op
                    ·gc.churn.PS_Eden_Space:          3566.702 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1411463.405 B/op
                    ·gc.churn.PS_Survivor_Space:      0.597 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 236.411 B/op
                    ·gc.count:                        520.000 counts
                    ·gc.time:                         271.000 ms
   
   Iteration   5: 370.738 us/op
                    ·gc.alloc.rate:                   3571.354 MB/sec
                    ·gc.alloc.rate.norm:              1411608.005 B/op
                    ·gc.churn.PS_Eden_Space:          3582.874 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1416161.234 B/op
                    ·gc.churn.PS_Survivor_Space:      0.588 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 232.322 B/op
                    ·gc.count:                        509.000 counts
                    ·gc.time:                         262.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache":
     373.201 ±(99.9%) 8.639 us/op [Average]
     (min, avg, max) = (370.738, 373.201, 375.676), stdev = 2.243
     CI (99.9%): [364.562, 381.840] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate":
     3547.828 ±(99.9%) 82.702 MB/sec [Average]
     (min, avg, max) = (3524.091, 3547.828, 3571.354), stdev = 21.477
     CI (99.9%): [3465.126, 3630.530] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm":
     1411608.006 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1411608.005, 1411608.006, 1411608.008), stdev = 0.001
     CI (99.9%): [1411608.001, 1411608.011] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space":
     3555.252 ±(99.9%) 83.120 MB/sec [Average]
     (min, avg, max) = (3532.601, 3555.252, 3582.874), stdev = 21.586
     CI (99.9%): [3472.132, 3638.373] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm":
     1414562.956 ±(99.9%) 7771.212 B/op [Average]
     (min, avg, max) = (1411463.405, 1414562.956, 1416385.929), stdev = 2018.159
     CI (99.9%): [1406791.744, 1422334.168] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space":
     0.551 ±(99.9%) 0.162 MB/sec [Average]
     (min, avg, max) = (0.494, 0.551, 0.597), stdev = 0.042
     CI (99.9%): [0.389, 0.713] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm":
     219.217 ±(99.9%) 60.046 B/op [Average]
     (min, avg, max) = (197.609, 219.217, 236.411), stdev = 15.594
     CI (99.9%): [159.171, 279.263] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count":
     2384.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (435.000, 476.800, 520.000), stdev = 36.134
     CI (99.9%): [2384.000, 2384.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time":
     1275.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (247.000, 255.000, 271.000), stdev = 10.977
     CI (99.9%): [1275.000, 1275.000] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream
   
   # Run progress: 66.67% complete, ETA 00:02:44
   # Fork: 1 of 1
   # Warmup Iteration   1: 483.366 us/op
   Iteration   1: 408.351 us/op
                    ·gc.alloc.rate:                   3580.758 MB/sec
                    ·gc.alloc.rate.norm:              1558808.007 B/op
                    ·gc.churn.PS_Eden_Space:          3586.078 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561123.846 B/op
                    ·gc.churn.PS_Survivor_Space:      0.511 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 222.603 B/op
                    ·gc.count:                        476.000 counts
                    ·gc.time:                         253.000 ms
   
   Iteration   2: 410.342 us/op
                    ·gc.alloc.rate:                   3563.256 MB/sec
                    ·gc.alloc.rate.norm:              1558808.009 B/op
                    ·gc.churn.PS_Eden_Space:          3569.765 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561655.686 B/op
                    ·gc.churn.PS_Survivor_Space:      0.451 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 197.394 B/op
                    ·gc.count:                        409.000 counts
                    ·gc.time:                         244.000 ms
   
   Iteration   3: 407.314 us/op
                    ·gc.alloc.rate:                   3589.291 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3592.335 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1560130.076 B/op
                    ·gc.churn.PS_Survivor_Space:      0.557 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 241.833 B/op
                    ·gc.count:                        495.000 counts
                    ·gc.time:                         261.000 ms
   
   Iteration   4: 407.294 us/op
                    ·gc.alloc.rate:                   3590.035 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3595.643 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561243.143 B/op
                    ·gc.churn.PS_Survivor_Space:      0.439 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 190.513 B/op
                    ·gc.count:                        382.000 counts
                    ·gc.time:                         239.000 ms
   
   Iteration   5: 410.068 us/op
                    ·gc.alloc.rate:                   3565.783 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3576.571 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1563524.046 B/op
                    ·gc.churn.PS_Survivor_Space:      0.542 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 236.741 B/op
                    ·gc.count:                        460.000 counts
                    ·gc.time:                         252.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream":
     408.674 ±(99.9%) 5.641 us/op [Average]
     (min, avg, max) = (407.294, 408.674, 410.342), stdev = 1.465
     CI (99.9%): [403.033, 414.314] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate":
     3577.824 ±(99.9%) 48.952 MB/sec [Average]
     (min, avg, max) = (3563.256, 3577.824, 3590.035), stdev = 12.713
     CI (99.9%): [3528.873, 3626.776] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm":
     1558808.007 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1558808.006, 1558808.007, 1558808.009), stdev = 0.001
     CI (99.9%): [1558808.002, 1558808.011] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space":
     3584.078 ±(99.9%) 41.614 MB/sec [Average]
     (min, avg, max) = (3569.765, 3584.078, 3595.643), stdev = 10.807
     CI (99.9%): [3542.465, 3625.692] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm":
     1561535.360 ±(99.9%) 4793.590 B/op [Average]
     (min, avg, max) = (1560130.076, 1561535.360, 1563524.046), stdev = 1244.880
     CI (99.9%): [1556741.769, 1566328.950] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space":
     0.500 ±(99.9%) 0.204 MB/sec [Average]
     (min, avg, max) = (0.439, 0.500, 0.557), stdev = 0.053
     CI (99.9%): [0.296, 0.704] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm":
     217.817 ±(99.9%) 88.656 B/op [Average]
     (min, avg, max) = (190.513, 217.817, 241.833), stdev = 23.024
     CI (99.9%): [129.161, 306.473] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count":
     2222.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (382.000, 444.400, 495.000), stdev = 47.300
     CI (99.9%): [2222.000, 2222.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time":
     1249.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (239.000, 249.800, 261.000), stdev = 8.526
     CI (99.9%): [1249.000, 1249.000] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:08:12
   
   REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
   why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
   experiments, perform baseline and negative tests that provide experimental control, make sure
   the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
   Do not assume the numbers tell you what you want them to tell.
   
   Benchmark                                                                                            Mode  Cnt        Score      Error   Units
   BenchmarkDataTableSerialization.preAllocateByteArrayNative                                           avgt    5      523.353 ±   10.345   us/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate                            avgt    5     3246.751 ±   64.066  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm                       avgt    5  1811608.009 ±    0.005    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space                   avgt    5     3250.704 ±   66.578  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm              avgt    5  1813811.924 ± 2365.646    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space               avgt    5        0.540 ±    0.150  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm          avgt    5      301.285 ±   80.118    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count                                 avgt    5     2528.000             counts
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time                                  avgt    5     1303.000                 ms
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache                                   avgt    5      373.201 ±    8.639   us/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate                    avgt    5     3547.828 ±   82.702  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm               avgt    5  1411608.006 ±    0.005    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space           avgt    5     3555.252 ±   83.120  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm      avgt    5  1414562.956 ± 7771.212    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space       avgt    5        0.551 ±    0.162  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm  avgt    5      219.217 ±   60.046    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count                         avgt    5     2384.000             counts
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time                          avgt    5     1275.000                 ms
   BenchmarkDataTableSerialization.temporaryOutputStream                                                avgt    5      408.674 ±    5.641   us/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate                                 avgt    5     3577.824 ±   48.952  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm                            avgt    5  1558808.007 ±    0.005    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space                        avgt    5     3584.078 ±   41.614  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm                   avgt    5  1561535.360 ± 4793.590    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space                    avgt    5        0.500 ±    0.204  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm               avgt    5      217.817 ±   88.656    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count                                      avgt    5     2222.000             counts
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time                                       avgt    5     1249.000                 ms
   
   Process finished with exit code 0
   
   ```
   
   If my implementation is correct, benchmark result shows that using pre-allocate byte array with cache is slightly better that temporary output stream (10% faster --  373.201 us/op VS. 408.674 us/op, use more memory of course to cache encoded KV, but GC time does not increased -- 1275ms VS 1249ms). It's easy to understand why `preAllocateByteArrayNative` is the worst one -- it encode K/V twice, whereas other two methods only encode K/V once.
   
   Not sure whether we should do the change just for getting 10% improvement. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang edited a comment on issue #6714: Benchmark data table serialization logic and pre-allocate byte[] array if need be

Posted by GitBox <gi...@apache.org>.
mqliang edited a comment on issue #6714:
URL: https://github.com/apache/incubator-pinot/issues/6714#issuecomment-806403959


   I write a benchmark here: https://github.com/mqliang/incubator-pinot/commit/7892423579b20dafcb5802a09f20f826377f6c39
   
   The benchmark compares three serialization methods (serialize a typical metadata map):
   *  `temporaryOutputStream`: For each KV pair in metadata, first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream
   * `preAllocateByteArrayNative`:
       * loop to go over each entry and keep a running sum of size
       * At the end of loop, allocate byte array of that size
       * Start another loop and go over each entry again and fill out the pre-allocated byte array.
       * Return the filled byte array
       * key and values are encoded two times during two loop
    * `preAllocateByteArrayWithBytesCache`: same logic as `preAllocateByteArrayNative`, just add a cache to cache the encoded K/V so can be used in the second loop.
   
   Here is the result:
   ```
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative
   
   # Run progress: 0.00% complete, ETA 00:08:00
   # Fork: 1 of 1
   # Warmup Iteration   1: 552.178 us/op
   Iteration   1: 519.531 us/op
                    ·gc.alloc.rate:                   3270.480 MB/sec
                    ·gc.alloc.rate.norm:              1811608.009 B/op
                    ·gc.churn.PS_Eden_Space:          3275.114 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1814175.318 B/op
                    ·gc.churn.PS_Survivor_Space:      0.558 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 309.168 B/op
                    ·gc.count:                        525.000 counts
                    ·gc.time:                         261.000 ms
   
   Iteration   2: 524.659 us/op
                    ·gc.alloc.rate:                   3238.871 MB/sec
                    ·gc.alloc.rate.norm:              1811608.011 B/op
                    ·gc.churn.PS_Eden_Space:          3242.901 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1813862.347 B/op
                    ·gc.churn.PS_Survivor_Space:      0.563 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 314.968 B/op
                    ·gc.count:                        516.000 counts
                    ·gc.time:                         263.000 ms
   
   Iteration   3: 526.323 us/op
                    ·gc.alloc.rate:                   3228.230 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3232.024 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1813736.682 B/op
                    ·gc.churn.PS_Survivor_Space:      0.471 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 264.539 B/op
                    ·gc.count:                        470.000 counts
                    ·gc.time:                         254.000 ms
   
   Iteration   4: 521.779 us/op
                    ·gc.alloc.rate:                   3256.320 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3261.433 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1814452.617 B/op
                    ·gc.churn.PS_Survivor_Space:      0.560 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 311.772 B/op
                    ·gc.count:                        534.000 counts
                    ·gc.time:                         270.000 ms
   
   Iteration   5: 524.474 us/op
                    ·gc.alloc.rate:                   3239.855 MB/sec
                    ·gc.alloc.rate.norm:              1811608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3242.045 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1812832.659 B/op
                    ·gc.churn.PS_Survivor_Space:      0.547 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 305.975 B/op
                    ·gc.count:                        483.000 counts
                    ·gc.time:                         255.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative":
     523.353 ±(99.9%) 10.345 us/op [Average]
     (min, avg, max) = (519.531, 523.353, 526.323), stdev = 2.687
     CI (99.9%): [513.008, 533.698] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate":
     3246.751 ±(99.9%) 64.066 MB/sec [Average]
     (min, avg, max) = (3228.230, 3246.751, 3270.480), stdev = 16.638
     CI (99.9%): [3182.685, 3310.818] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm":
     1811608.009 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1811608.008, 1811608.009, 1811608.011), stdev = 0.001
     CI (99.9%): [1811608.003, 1811608.014] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space":
     3250.704 ±(99.9%) 66.578 MB/sec [Average]
     (min, avg, max) = (3232.024, 3250.704, 3275.114), stdev = 17.290
     CI (99.9%): [3184.126, 3317.282] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm":
     1813811.924 ±(99.9%) 2365.646 B/op [Average]
     (min, avg, max) = (1812832.659, 1813811.924, 1814452.617), stdev = 614.351
     CI (99.9%): [1811446.279, 1816177.570] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space":
     0.540 ±(99.9%) 0.150 MB/sec [Average]
     (min, avg, max) = (0.471, 0.540, 0.563), stdev = 0.039
     CI (99.9%): [0.390, 0.690] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm":
     301.285 ±(99.9%) 80.118 B/op [Average]
     (min, avg, max) = (264.539, 301.285, 314.968), stdev = 20.806
     CI (99.9%): [221.166, 381.403] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count":
     2528.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (470.000, 505.600, 534.000), stdev = 27.700
     CI (99.9%): [2528.000, 2528.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time":
     1303.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (254.000, 260.600, 270.000), stdev = 6.504
     CI (99.9%): [1303.000, 1303.000] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache
   
   # Run progress: 33.33% complete, ETA 00:05:28
   # Fork: 1 of 1
   # Warmup Iteration   1: 390.616 us/op
   Iteration   1: 375.676 us/op
                    ·gc.alloc.rate:                   3524.091 MB/sec
                    ·gc.alloc.rate.norm:              1411608.008 B/op
                    ·gc.churn.PS_Eden_Space:          3532.601 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1415016.587 B/op
                    ·gc.churn.PS_Survivor_Space:      0.538 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 215.400 B/op
                    ·gc.count:                        458.000 counts
                    ·gc.time:                         248.000 ms
   
   Iteration   2: 375.171 us/op
                    ·gc.alloc.rate:                   3528.907 MB/sec
                    ·gc.alloc.rate.norm:              1411608.006 B/op
                    ·gc.churn.PS_Eden_Space:          3534.356 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1413787.624 B/op
                    ·gc.churn.PS_Survivor_Space:      0.494 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 197.609 B/op
                    ·gc.count:                        435.000 counts
                    ·gc.time:                         247.000 ms
   
   Iteration   3: 373.233 us/op
                    ·gc.alloc.rate:                   3547.720 MB/sec
                    ·gc.alloc.rate.norm:              1411608.005 B/op
                    ·gc.churn.PS_Eden_Space:          3559.728 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1416385.929 B/op
                    ·gc.churn.PS_Survivor_Space:      0.539 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 214.343 B/op
                    ·gc.count:                        462.000 counts
                    ·gc.time:                         247.000 ms
   
   Iteration   4: 371.186 us/op
                    ·gc.alloc.rate:                   3567.068 MB/sec
                    ·gc.alloc.rate.norm:              1411608.006 B/op
                    ·gc.churn.PS_Eden_Space:          3566.702 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1411463.405 B/op
                    ·gc.churn.PS_Survivor_Space:      0.597 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 236.411 B/op
                    ·gc.count:                        520.000 counts
                    ·gc.time:                         271.000 ms
   
   Iteration   5: 370.738 us/op
                    ·gc.alloc.rate:                   3571.354 MB/sec
                    ·gc.alloc.rate.norm:              1411608.005 B/op
                    ·gc.churn.PS_Eden_Space:          3582.874 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1416161.234 B/op
                    ·gc.churn.PS_Survivor_Space:      0.588 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 232.322 B/op
                    ·gc.count:                        509.000 counts
                    ·gc.time:                         262.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache":
     373.201 ±(99.9%) 8.639 us/op [Average]
     (min, avg, max) = (370.738, 373.201, 375.676), stdev = 2.243
     CI (99.9%): [364.562, 381.840] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate":
     3547.828 ±(99.9%) 82.702 MB/sec [Average]
     (min, avg, max) = (3524.091, 3547.828, 3571.354), stdev = 21.477
     CI (99.9%): [3465.126, 3630.530] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm":
     1411608.006 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1411608.005, 1411608.006, 1411608.008), stdev = 0.001
     CI (99.9%): [1411608.001, 1411608.011] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space":
     3555.252 ±(99.9%) 83.120 MB/sec [Average]
     (min, avg, max) = (3532.601, 3555.252, 3582.874), stdev = 21.586
     CI (99.9%): [3472.132, 3638.373] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm":
     1414562.956 ±(99.9%) 7771.212 B/op [Average]
     (min, avg, max) = (1411463.405, 1414562.956, 1416385.929), stdev = 2018.159
     CI (99.9%): [1406791.744, 1422334.168] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space":
     0.551 ±(99.9%) 0.162 MB/sec [Average]
     (min, avg, max) = (0.494, 0.551, 0.597), stdev = 0.042
     CI (99.9%): [0.389, 0.713] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm":
     219.217 ±(99.9%) 60.046 B/op [Average]
     (min, avg, max) = (197.609, 219.217, 236.411), stdev = 15.594
     CI (99.9%): [159.171, 279.263] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count":
     2384.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (435.000, 476.800, 520.000), stdev = 36.134
     CI (99.9%): [2384.000, 2384.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time":
     1275.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (247.000, 255.000, 271.000), stdev = 10.977
     CI (99.9%): [1275.000, 1275.000] (assumes normal distribution)
   
   
   # JMH version: 1.26
   # VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
   # VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
   # Warmup: 1 iterations, 10 s each
   # Measurement: 5 iterations, 30 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream
   
   # Run progress: 66.67% complete, ETA 00:02:44
   # Fork: 1 of 1
   # Warmup Iteration   1: 483.366 us/op
   Iteration   1: 408.351 us/op
                    ·gc.alloc.rate:                   3580.758 MB/sec
                    ·gc.alloc.rate.norm:              1558808.007 B/op
                    ·gc.churn.PS_Eden_Space:          3586.078 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561123.846 B/op
                    ·gc.churn.PS_Survivor_Space:      0.511 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 222.603 B/op
                    ·gc.count:                        476.000 counts
                    ·gc.time:                         253.000 ms
   
   Iteration   2: 410.342 us/op
                    ·gc.alloc.rate:                   3563.256 MB/sec
                    ·gc.alloc.rate.norm:              1558808.009 B/op
                    ·gc.churn.PS_Eden_Space:          3569.765 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561655.686 B/op
                    ·gc.churn.PS_Survivor_Space:      0.451 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 197.394 B/op
                    ·gc.count:                        409.000 counts
                    ·gc.time:                         244.000 ms
   
   Iteration   3: 407.314 us/op
                    ·gc.alloc.rate:                   3589.291 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3592.335 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1560130.076 B/op
                    ·gc.churn.PS_Survivor_Space:      0.557 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 241.833 B/op
                    ·gc.count:                        495.000 counts
                    ·gc.time:                         261.000 ms
   
   Iteration   4: 407.294 us/op
                    ·gc.alloc.rate:                   3590.035 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3595.643 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1561243.143 B/op
                    ·gc.churn.PS_Survivor_Space:      0.439 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 190.513 B/op
                    ·gc.count:                        382.000 counts
                    ·gc.time:                         239.000 ms
   
   Iteration   5: 410.068 us/op
                    ·gc.alloc.rate:                   3565.783 MB/sec
                    ·gc.alloc.rate.norm:              1558808.006 B/op
                    ·gc.churn.PS_Eden_Space:          3576.571 MB/sec
                    ·gc.churn.PS_Eden_Space.norm:     1563524.046 B/op
                    ·gc.churn.PS_Survivor_Space:      0.542 MB/sec
                    ·gc.churn.PS_Survivor_Space.norm: 236.741 B/op
                    ·gc.count:                        460.000 counts
                    ·gc.time:                         252.000 ms
   
   
   
   Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream":
     408.674 ±(99.9%) 5.641 us/op [Average]
     (min, avg, max) = (407.294, 408.674, 410.342), stdev = 1.465
     CI (99.9%): [403.033, 414.314] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate":
     3577.824 ±(99.9%) 48.952 MB/sec [Average]
     (min, avg, max) = (3563.256, 3577.824, 3590.035), stdev = 12.713
     CI (99.9%): [3528.873, 3626.776] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm":
     1558808.007 ±(99.9%) 0.005 B/op [Average]
     (min, avg, max) = (1558808.006, 1558808.007, 1558808.009), stdev = 0.001
     CI (99.9%): [1558808.002, 1558808.011] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space":
     3584.078 ±(99.9%) 41.614 MB/sec [Average]
     (min, avg, max) = (3569.765, 3584.078, 3595.643), stdev = 10.807
     CI (99.9%): [3542.465, 3625.692] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm":
     1561535.360 ±(99.9%) 4793.590 B/op [Average]
     (min, avg, max) = (1560130.076, 1561535.360, 1563524.046), stdev = 1244.880
     CI (99.9%): [1556741.769, 1566328.950] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space":
     0.500 ±(99.9%) 0.204 MB/sec [Average]
     (min, avg, max) = (0.439, 0.500, 0.557), stdev = 0.053
     CI (99.9%): [0.296, 0.704] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm":
     217.817 ±(99.9%) 88.656 B/op [Average]
     (min, avg, max) = (190.513, 217.817, 241.833), stdev = 23.024
     CI (99.9%): [129.161, 306.473] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count":
     2222.000 ±(99.9%) 0.001 counts [Sum]
     (min, avg, max) = (382.000, 444.400, 495.000), stdev = 47.300
     CI (99.9%): [2222.000, 2222.000] (assumes normal distribution)
   
   Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time":
     1249.000 ±(99.9%) 0.001 ms [Sum]
     (min, avg, max) = (239.000, 249.800, 261.000), stdev = 8.526
     CI (99.9%): [1249.000, 1249.000] (assumes normal distribution)
   
   
   # Run complete. Total time: 00:08:12
   
   REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
   why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
   experiments, perform baseline and negative tests that provide experimental control, make sure
   the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
   Do not assume the numbers tell you what you want them to tell.
   
   Benchmark                                                                                            Mode  Cnt        Score      Error   Units
   BenchmarkDataTableSerialization.preAllocateByteArrayNative                                           avgt    5      523.353 ±   10.345   us/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate                            avgt    5     3246.751 ±   64.066  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm                       avgt    5  1811608.009 ±    0.005    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space                   avgt    5     3250.704 ±   66.578  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm              avgt    5  1813811.924 ± 2365.646    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space               avgt    5        0.540 ±    0.150  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm          avgt    5      301.285 ±   80.118    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count                                 avgt    5     2528.000             counts
   BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time                                  avgt    5     1303.000                 ms
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache                                   avgt    5      373.201 ±    8.639   us/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate                    avgt    5     3547.828 ±   82.702  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm               avgt    5  1411608.006 ±    0.005    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space           avgt    5     3555.252 ±   83.120  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm      avgt    5  1414562.956 ± 7771.212    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space       avgt    5        0.551 ±    0.162  MB/sec
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm  avgt    5      219.217 ±   60.046    B/op
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count                         avgt    5     2384.000             counts
   BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time                          avgt    5     1275.000                 ms
   BenchmarkDataTableSerialization.temporaryOutputStream                                                avgt    5      408.674 ±    5.641   us/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate                                 avgt    5     3577.824 ±   48.952  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm                            avgt    5  1558808.007 ±    0.005    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space                        avgt    5     3584.078 ±   41.614  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm                   avgt    5  1561535.360 ± 4793.590    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space                    avgt    5        0.500 ±    0.204  MB/sec
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm               avgt    5      217.817 ±   88.656    B/op
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count                                      avgt    5     2222.000             counts
   BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time                                       avgt    5     1249.000                 ms
   
   Process finished with exit code 0
   
   ```
   
   If my implementation is correct, benchmark result shows that using pre-allocate byte array with cache is slightly better than temporary output stream (10% faster --  373.201 us/op VS. 408.674 us/op, use more memory of course to cache encoded KV, but GC time does not increased -- 1275ms VS 1249ms). It's easy to understand why `preAllocateByteArrayNative` is the worst one -- it encode K/V twice, whereas other two methods only encode K/V once.
   
   Not sure whether we should do the change just for getting 10% improvement. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org