You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/03 13:53:13 UTC
[GitHub] [spark] LuciferYang commented on pull request #35669: [SPARK-38041][SQL]DataFilter pushed down with PartitionFilter
LuciferYang commented on pull request #35669:
URL: https://github.com/apache/spark/pull/35669#issuecomment-1058062770
> > Could we add the evidence of Parquet skipping files/row-groups (either a micro benchmark or some logs during execution or code pointers), when we push down partition filter here?
>
> @c21 I have add some benchmark tests in FilterPushdownBenchmark, and run them in github actions. Test code can be found [here](https://github.com/stczwd/spark/blob/SPARK-38041-2/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L81).
>
> Test result
>
> ```
> OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure
> Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
> Data filter with partitions: ((a = 10 and part = 0) or (a = 10240 and part = 1) or (part = 2)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Parquet Vectorized with partition 3039 3157 122 5.2 193.2 1.0X
> Parquet Vectorized with partition (Pushdown) 1548 1568 15 10.2 98.4 2.0X
>
> OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1028-azure
> Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
> Data filter with partitions: ((a > 10 and part = 0) or (a <= 10 and part >=1 and part < 3)): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Parquet Vectorized with partition 2942 2997 40 5.3 187.1 1.0X
> Parquet Vectorized with partition (Pushdown) 1497 1513 15 10.5 95.2 2.0X
> ```
@stczwd Can you add the benchmark code to this pr and use GA to produce the benchmark results?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org