You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/12 06:15:01 UTC

[GitHub] [spark] sadikovi commented on pull request #37485: [SPARK-40052][SQL] Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

sadikovi commented on PR #37485:
URL: https://github.com/apache/spark/pull/37485#issuecomment-1212765192

   I reran the benchmarks again, on a larger 4x dataset (I changed the size in DataSourceReadBenchmark). The numbers are still very similar with the patch performing slightly better than the current code. I don't quite understand how that is possible unless the benchmark does not exercise the encoding.
   
   ### Before
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 5.4.0-1071-aws
   Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ---------------------------------------------------------------------------------------------------------------------------
   ParquetReader Vectorized: DataPageV1                   672            707          45         93.6          10.7       1.0X
   ParquetReader Vectorized: DataPageV2                   945           1012          95         66.6          15.0       0.7X
   ParquetReader Vectorized -> Row: DataPageV1            383            432          28        164.4           6.1       1.8X
   ParquetReader Vectorized -> Row: DataPageV2            670            678           8         93.9          10.6       1.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 5.4.0-1071-aws
   Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ---------------------------------------------------------------------------------------------------------------------------
   ParquetReader Vectorized: DataPageV1                   931            935           4         67.6          14.8       1.0X
   ParquetReader Vectorized: DataPageV2                  1475           1477           4         42.7          23.4       0.6X
   ParquetReader Vectorized -> Row: DataPageV1            638            650          14         98.5          10.1       1.5X
   ParquetReader Vectorized -> Row: DataPageV2           1172           1173           2         53.7          18.6       0.8X
   ```
   
   ### After
   ```
   [info] OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 5.4.0-1071-aws
   [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   [info] Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------
   [info] ParquetReader Vectorized: DataPageV1                   656            704          60         95.9          10.4       1.0X
   [info] ParquetReader Vectorized: DataPageV2                   888            898          12         70.9          14.1       0.7X
   [info] ParquetReader Vectorized -> Row: DataPageV1            393            435          24        160.2           6.2       1.7X
   [info] ParquetReader Vectorized -> Row: DataPageV2            667            681          12         94.3          10.6       1.0X
   
   [info] OpenJDK 64-Bit Server VM 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 on Linux 5.4.0-1071-aws
   [info] Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
   [info] Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] ---------------------------------------------------------------------------------------------------------------------------
   [info] ParquetReader Vectorized: DataPageV1                   935            953          16         67.3          14.9       1.0X
   [info] ParquetReader Vectorized: DataPageV2                  1437           1440           4         43.8          22.8       0.7X
   [info] ParquetReader Vectorized -> Row: DataPageV1            717            731          12         87.7          11.4       1.3X
   [info] ParquetReader Vectorized -> Row: DataPageV2           1176           1185          13         53.5          18.7       0.8X
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org