You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/26 19:11:23 UTC

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28038: [SPARK-31208][CORE][KUBERNETES] Add an expiremental cleanShuffleDependencies

dongjoon-hyun commented on a change in pull request #28038: [SPARK-31208][CORE][KUBERNETES] Add an expiremental cleanShuffleDependencies
URL: https://github.com/apache/spark/pull/28038#discussion_r398826293
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -647,6 +647,39 @@ class Dataset[T] private[sql](
    */
   def checkpoint(eager: Boolean): Dataset[T] = checkpoint(eager = eager, reliableCheckpoint = true)
 
+  /**
+   * :: Experimental ::
+   * Marks an Dataframe's shuffles and it's ancestors non-persisted ancestors as no longer needed.
+   * This cleans up shuffle files aggressively to allow nodes to be terminated.
+   * If you are uncertain of what you are doing please do not use this feature.
+   * Additional techniques for mitigating orphaned shuffle files:
+   *   * Tuning the driver GC to be more aggressive so the regular context cleaner is triggered
+   *   * Setting an appropriate TTL for shuffle files to be auto cleaned
+   *
+   * @since("3.1.0")
+   */
+  @Experimental
+  @DeveloperApi
+  def cleanShuffleDependencies(blocking: Boolean = false): Unit = {
+    sc.cleaner.foreach { cleaner =>
 
 Review comment:
   Could you fix a compilation error here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org