You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/31 20:12:30 UTC

[GitHub] [spark] mridulm commented on pull request #35005: [SPARK-8582][CORE] Checkpoint eagerly when asked to do so for real

mridulm commented on pull request #35005:
URL: https://github.com/apache/spark/pull/35005#issuecomment-1003445114


   > To confirm: If people do
   > 
   > ```
   > rdd.checkpoint()
   > rdd.count
   > ```
   > 
   > Spark will run the job twice? This looks like an existing bug in spark core. I'm fine with this PR as a workaround at the SQL side.
   
   It does not run the complete job, just the suffix required to perform the action.
   Typically only the last result stage or (in case DAG involves shuffles) - or when persisted, the suffix after the persist to materialize the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org