You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "cloud-fan (via GitHub)" <gi...@apache.org> on 2023/05/09 09:21:05 UTC

[GitHub] [spark] cloud-fan commented on pull request #40812: [SPARK-43157][SQL] Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects

cloud-fan commented on PR #40812:
URL: https://github.com/apache/spark/pull/40812#issuecomment-1539764854

   I agree that we do need to copy the cached plan when getting it, but I feel the current change is hard to understand and reason about.
   
   I think a better place to fix is `CacheManager#useCachedDataInternal`. It does copy `InMemoryRelation` but does not copy the physical plan:
   https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L298-L306
   
   We can add a new method `InMemoryRelation#freshCopy` which takes new output attributes and copy the physical plan, then we change `cached.cachedRepresentation.withOutput(currentFragment.output)` to `cached.cachedRepresentation.freshCopy(currentFragment.output)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org