You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/05/31 09:00:17 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

grundprinzip opened a new pull request, #41399:
URL: https://github.com/apache/spark/pull/41399

   ### What changes were proposed in this pull request?
   Previously calling `df.cache()` would result in an invalid plan input exception because we did not invoke `persist()` with the right arguments. This patch simplifies the logic and makes it compatible to the behavior in Spark itself.
   
   ### Why are the changes needed?
   Bug
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41399:
URL: https://github.com/apache/spark/pull/41399#issuecomment-1570528256

   Hi, @grundprinzip and @hvanhovell . If you don't mind, could you use `[CONNECT]` tag for `Spark Connect` PRs like this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #41399:
URL: https://github.com/apache/spark/pull/41399#discussion_r1211329867


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1826,9 +1826,7 @@ def rdd(self, *args: Any, **kwargs: Any) -> None:
     def cache(self) -> "DataFrame":
         if self._plan is None:
             raise Exception("Cannot cache on empty plan.")
-        relation = self._plan.plan(self._session.client)
-        self._session.client._analyze(method="persist", relation=relation)
-        return self
+        return self.persist(storageLevel=StorageLevel.MEMORY_AND_DISK)

Review Comment:
   This is the default StorageLevel - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L91



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #41399:
URL: https://github.com/apache/spark/pull/41399#issuecomment-1571008486

   @dongjoon-hyun thanks for the reminder. I will be more diligent next time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #41399:
URL: https://github.com/apache/spark/pull/41399#issuecomment-1570501994

   Merged to master & 3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()
URL: https://github.com/apache/spark/pull/41399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41399:
URL: https://github.com/apache/spark/pull/41399#issuecomment-1571247978

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org