You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/21 07:33:25 UTC
[GitHub] [iceberg] nastra edited a comment on pull request #3040: Arrow: Bump to Apache Arrow 5.0
nastra edited a comment on pull request #3040:
URL: https://github.com/apache/iceberg/pull/3040#issuecomment-923687747
> It looks like `readDatesIcebergVectorized5k` takes 30% longer? And it looks like the difference between `readFloatsIcebergVectorized5k` is just outside where the error ranges overlap.
The variability in the timings come from the fact because I was doing dev work while running those tests on my local machine. The other thing is that with `Mode.SingleShotTime` benchmarks we're effectively measuring the **cold** performance (we do 3 warmup iterations and 5 measurement iterations). Below is an excerpt from the Javadoc taken from [here](http://javadox.com/org.openjdk.jmh/jmh-core/0.8/org/openjdk/jmh/annotations/Mode.html#SingleShotTime):
> Single shot time: measures the time for a single operation.
> Runs by calling {@link Benchmark} once and measuring its time. This mode is useful to estimate the "cold" performance when > you don't want to hide the warmup invocations, or if you want to see the progress from call to call, or you want to record every > single sample. This mode is work-based, and will run only for a single invocation of {@link Benchmark} method.
> Caveats for this mode include:
> More warmup/measurement iterations are generally required.
> Timers overhead might be significant if benchmarks are small; switch to {@link #SampleTime} mode if that is a problem.
I also did another run on this branch and below are the new results:
```
Benchmark Mode Cnt Score Error Units
VectorizedReadFlatParquetDataBenchmark.readDatesIcebergVectorized5k ss 5 1.870 ± 0.091 s/op
VectorizedReadFlatParquetDataBenchmark.readDatesSparkVectorized5k ss 5 1.511 ± 0.080 s/op
VectorizedReadFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k ss 5 9.005 ± 0.539 s/op
VectorizedReadFlatParquetDataBenchmark.readDecimalsSparkVectorized5k ss 5 8.424 ± 0.462 s/op
VectorizedReadFlatParquetDataBenchmark.readDoublesIcebergVectorized5k ss 5 2.829 ± 0.152 s/op
VectorizedReadFlatParquetDataBenchmark.readDoublesSparkVectorized5k ss 5 2.385 ± 0.129 s/op
VectorizedReadFlatParquetDataBenchmark.readFloatsIcebergVectorized5k ss 5 2.429 ± 0.120 s/op
VectorizedReadFlatParquetDataBenchmark.readFloatsSparkVectorized5k ss 5 2.357 ± 0.125 s/op
VectorizedReadFlatParquetDataBenchmark.readIntegersIcebergVectorized5k ss 5 2.434 ± 0.158 s/op
VectorizedReadFlatParquetDataBenchmark.readIntegersSparkVectorized5k ss 5 2.587 ± 0.160 s/op
VectorizedReadFlatParquetDataBenchmark.readLongsIcebergVectorized5k ss 5 2.857 ± 0.138 s/op
VectorizedReadFlatParquetDataBenchmark.readLongsSparkVectorized5k ss 5 2.638 ± 0.161 s/op
VectorizedReadFlatParquetDataBenchmark.readStringsIcebergVectorized5k ss 5 5.662 ± 0.411 s/op
VectorizedReadFlatParquetDataBenchmark.readStringsSparkVectorized5k ss 5 4.693 ± 0.183 s/op
VectorizedReadFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k ss 5 1.993 ± 0.139 s/op
VectorizedReadFlatParquetDataBenchmark.readTimestampsSparkVectorized5k ss 5 1.942 ± 0.053 s/op
```
[vectorized-read-flat-parquet-data-result-bump-arrow2.txt](https://github.com/apache/iceberg/files/7201221/vectorized-read-flat-parquet-data-result-bump-arrow2.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org