You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (Jira)" <ji...@apache.org> on 2021/11/23 08:04:00 UTC
[jira] [Created] (SPARK-37447) Cache LogicalPlan.isStreaming() in a lazy val
Josh Rosen created SPARK-37447:
----------------------------------
Summary: Cache LogicalPlan.isStreaming() in a lazy val
Key: SPARK-37447
URL: https://issues.apache.org/jira/browse/SPARK-37447
Project: Spark
Issue Type: Improvement
Components: Optimizer
Affects Versions: 3.2.0
Reporter: Josh Rosen
The default implementation of `LogicalPlan.isStreaming()` calls `children.exists(_.isStreaming)`. This can be expensive for large trees, so as a performance optimization I think we should cache the result in a private lazy val.
This is especially important for programs that programmatically construct huge query plans because that will result in multiple analysis passes (and therefore multiple invocations of rules which call `isStreaming`). For example, this the `isStreaming` check accounts for a significant portion of the time in `DeduplicateRelations` (> 20% in my local tests).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org