You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/05 14:12:13 UTC

[GitHub] [flink-benchmarks] dawidwys commented on a change in pull request #34: [hotfix] Increase warmup iterations and iterations in CheckpointingTimeBenchmark

dawidwys commented on a change in pull request #34:
URL: https://github.com/apache/flink-benchmarks/pull/34#discussion_r722283476



##########
File path: src/main/java/org/apache/flink/benchmark/CheckpointingTimeBenchmark.java
##########
@@ -89,7 +84,7 @@
  * </ul>
  */
 @OutputTimeUnit(SECONDS)
-@Warmup(iterations = 4)
+@Measurement(iterations = 20)

Review comment:
       I could not find a way.
   
   I spent some time investigating the stability of the checkpointing benchmarks. Few observations:
   
   1. I compared the stability of the new benchmark with old benchmarks. First of all, I think the old unalignedCheckpoint.0 is simply wrong and it does not test what we wanted. As far as I understand the results it did not test unaligned checkpoints, but job submission. If you compare the new results ~500op/s vs ~40op/s you can see the majority of time was spent on starting the job. Therefore I would not take any of its characteristic as a baseline. I believe the new results make more sense, as unaligned checkpoints in the range of couple of ms (~2-3) make more sense.
   2. The old benchmarks were run with parallelism=1 , new tests are run with parallelism of 4. Higher parallelism tends to give a bigger error, resulting in less stable results. Both in case of old tests and new tests, which make sense.
   3. I observed that "performance" increases with the number of iterations. Maybe decreasing the number of warmup iterations was not the best idea.
   4. 
   In conclusion, I think the checkpointing time is mostly as instable as the tests show. We can improve the situation a bit by increasing the warmup iterations and iterations. I think the submission as part of the benchmarking method played a vital role in smoothing out results in old tests.
   Additionally some results of my experiments:
   
   ```
   old, parallelism = 1
   Benchmark                                             (timeout)   Mode  Cnt   Score   Error  Units
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint          0  thrpt   30  43.158 ± 0.715  ops/s
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint          1  thrpt   30  16.210 ± 0.226  ops/s
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint    ALIGNED  thrpt   30   5.385 ± 0.109  ops/s
   
   old, parallelism = 4
   Benchmark                                             (timeout)   Mode  Cnt   Score   Error  Units
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint          0  thrpt   30  26.874 ± 1.500  ops/s
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint          1  thrpt   30   7.635 ± 0.354  ops/s
   UnalignedCheckpointTimeBenchmark.unalignedCheckpoint    ALIGNED  thrpt   30   3.523 ± 0.127  ops/s
   
   new, parallelism = 4, checkpoints_per_iteration = 10
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt   30  436.524 ± 28.198  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt   30    7.961 ±  0.681  ops/s
   
   new, parallelism = 1, checkpoints_per_iteration = 10
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt   30  658.693 ± 99.033  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt   30   24.083 ±  0.264  ops/s
   
   new, parallelism = 4, checkpoints_per_iteration = 1
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt   30  449.005 ± 33.578  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt   30    8.025 ±  0.541  ops/s
   
   new, parallelism = 1, checkpoints_per_iteration = 1
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt   30  705.661 ± 94.854  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt   30   24.128 ±  0.169  ops/s
   
   
   new, parallelism = 4, warmup = 4, iterations = 40
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt  120  544.965 ± 23.766  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt  120    7.803 ±  0.440  ops/s
   
   new, parallelism = 4, warmup = 10, iterations = 30
   Benchmark                                              (mode)   Mode  Cnt    Score    Error  Units
   CheckpointingTimeBenchmark.checkpointSingleInput    UNALIGNED  thrpt   90  551.281 ± 29.598  ops/s
   CheckpointingTimeBenchmark.checkpointSingleInput  UNALIGNED_1  thrpt   90    8.424 ±  0.393  ops/s
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org