You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/21 07:33:25 UTC

[GitHub] [iceberg] nastra edited a comment on pull request #3040: Arrow: Bump to Apache Arrow 5.0

nastra edited a comment on pull request #3040:
URL: https://github.com/apache/iceberg/pull/3040#issuecomment-923687747


   > It looks like `readDatesIcebergVectorized5k` takes 30% longer? And it looks like the difference between `readFloatsIcebergVectorized5k` is just outside where the error ranges overlap.
   
   The variability in the timings come from the fact because I was doing dev work while running those tests on my local machine. The other thing is that with `Mode.SingleShotTime` benchmarks we're effectively measuring the **cold** performance (we do 3 warmup iterations and 5 measurement iterations). Below is an excerpt from the Javadoc taken from [here](http://javadox.com/org.openjdk.jmh/jmh-core/0.8/org/openjdk/jmh/annotations/Mode.html#SingleShotTime):
   
   > Single shot time: measures the time for a single operation.
   
   > Runs by calling {@link Benchmark} once and measuring its time. This mode is useful to estimate the "cold" performance when > you don't want to hide the warmup invocations, or if you want to see the progress from call to call, or you want to record every > single sample. This mode is work-based, and will run only for a single invocation of {@link Benchmark} method.
   
   > Caveats for this mode include:
   
   > More warmup/measurement iterations are generally required.
   > Timers overhead might be significant if benchmarks are small; switch to {@link #SampleTime} mode if that is a problem.
   
   I also did another run on this branch and below are the new results:
   
   ```
   Benchmark                                                                 Mode  Cnt  Score   Error  Units
   VectorizedReadFlatParquetDataBenchmark.readDatesIcebergVectorized5k         ss    5  1.870 ± 0.091   s/op
   VectorizedReadFlatParquetDataBenchmark.readDatesSparkVectorized5k           ss    5  1.511 ± 0.080   s/op
   VectorizedReadFlatParquetDataBenchmark.readDecimalsIcebergVectorized5k      ss    5  9.005 ± 0.539   s/op
   VectorizedReadFlatParquetDataBenchmark.readDecimalsSparkVectorized5k        ss    5  8.424 ± 0.462   s/op
   VectorizedReadFlatParquetDataBenchmark.readDoublesIcebergVectorized5k       ss    5  2.829 ± 0.152   s/op
   VectorizedReadFlatParquetDataBenchmark.readDoublesSparkVectorized5k         ss    5  2.385 ± 0.129   s/op
   VectorizedReadFlatParquetDataBenchmark.readFloatsIcebergVectorized5k        ss    5  2.429 ± 0.120   s/op
   VectorizedReadFlatParquetDataBenchmark.readFloatsSparkVectorized5k          ss    5  2.357 ± 0.125   s/op
   VectorizedReadFlatParquetDataBenchmark.readIntegersIcebergVectorized5k      ss    5  2.434 ± 0.158   s/op
   VectorizedReadFlatParquetDataBenchmark.readIntegersSparkVectorized5k        ss    5  2.587 ± 0.160   s/op
   VectorizedReadFlatParquetDataBenchmark.readLongsIcebergVectorized5k         ss    5  2.857 ± 0.138   s/op
   VectorizedReadFlatParquetDataBenchmark.readLongsSparkVectorized5k           ss    5  2.638 ± 0.161   s/op
   VectorizedReadFlatParquetDataBenchmark.readStringsIcebergVectorized5k       ss    5  5.662 ± 0.411   s/op
   VectorizedReadFlatParquetDataBenchmark.readStringsSparkVectorized5k         ss    5  4.693 ± 0.183   s/op
   VectorizedReadFlatParquetDataBenchmark.readTimestampsIcebergVectorized5k    ss    5  1.993 ± 0.139   s/op
   VectorizedReadFlatParquetDataBenchmark.readTimestampsSparkVectorized5k      ss    5  1.942 ± 0.053   s/op
   
   ```
   [vectorized-read-flat-parquet-data-result-bump-arrow2.txt](https://github.com/apache/iceberg/files/7201221/vectorized-read-flat-parquet-data-result-bump-arrow2.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org