You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Micah Kornfield <em...@gmail.com> on 2020/06/22 17:04:51 UTC

Documentation on SupportsReportStatistics Outdated?

I was wondering if the documentation on SupportsReportStatistics [1] about
its interaction with the planner and predicate pushdowns is still
accurate.  It says:

"Implementations that return more accurate statistics based on pushed
operators will not improve query performance until the planner can push
operators before getting stats."

Is this still accurate? When looking through the code it seems like there
is now functionality that explicitly wants the operators pushed down [2].
Is the documentation for SupportsReportStatistics referring to something
other than [2] or should it be updated?

Thanks,
Micah

[1]
https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/SupportsReportStatistics.html
[2]
https://github.com/apache/spark/blob/d0800fc8e2e71a79bf0f72c3e4bc608ae34053e7/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala#L86