You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (Jira)" <ji...@apache.org> on 2021/11/23 08:04:00 UTC

[jira] [Created] (SPARK-37447) Cache LogicalPlan.isStreaming() in a lazy val

Josh Rosen created SPARK-37447:
----------------------------------

             Summary: Cache LogicalPlan.isStreaming() in a lazy val
                 Key: SPARK-37447
                 URL: https://issues.apache.org/jira/browse/SPARK-37447
             Project: Spark
          Issue Type: Improvement
          Components: Optimizer
    Affects Versions: 3.2.0
            Reporter: Josh Rosen


The default implementation of `LogicalPlan.isStreaming()` calls `children.exists(_.isStreaming)`. This can be expensive for large trees, so as a performance optimization I think we should cache the result in a private lazy val.

This is especially important for programs that programmatically construct huge query plans because that will result in multiple analysis passes (and therefore multiple invocations of rules which call `isStreaming`). For example, this the `isStreaming` check accounts for a significant portion of the time in `DeduplicateRelations` (> 20% in my local tests).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org