You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/05/12 00:34:34 UTC

[GitHub] [iceberg] dramaticlly commented on issue #7480: Inconsistent API for remove_orphan_files and DeleteOrphanFiles

dramaticlly commented on issue #7480:
URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1544936003

   I dont think there's guarantee for keeping the API consistent between iceberg SparkAction and SparkProcedure. The Procedure can be exposed and used by client who's more familiar with SparkSQL interface while SparkAction provide more versatile capabilities to allow native integration in java or scala. 
   
   If you want to run multithreading delete in spark 3.1 actions, this is how it can be done below in scala/java
   
   ```scala
   import org.apache.iceberg.Table
   import org.apache.iceberg.actions.DeleteOrphanFiles
   import org.apache.iceberg.spark.actions.SparkActions
   import org.apache.spark.sql.SparkSession
   
   import java.util.concurrent.{Executors, TimeUnit}
   
   class RemoveOrphansAPI {
   
     def removeOrphansWithSparkAction(
         sparkSession: SparkSession,
         table: Table,
         threadsCount: Int,
         olderThanTS: Long
     ): DeleteOrphanFiles.Result = {
   
       val executor = Executors.newFixedThreadPool(threadsCount)
       val result: DeleteOrphanFiles.Result = SparkActions
         .get(sparkSession)
         .deleteOrphanFiles(table)
         .olderThan(olderThanTS)
         .executeDeleteWith(executor)
         .execute()
   
       executor.shutdown()
       result
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org