You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/05/12 00:34:34 UTC
[GitHub] [iceberg] dramaticlly commented on issue #7480: Inconsistent API for remove_orphan_files and DeleteOrphanFiles
dramaticlly commented on issue #7480:
URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1544936003
I dont think there's guarantee for keeping the API consistent between iceberg SparkAction and SparkProcedure. The Procedure can be exposed and used by client who's more familiar with SparkSQL interface while SparkAction provide more versatile capabilities to allow native integration in java or scala.
If you want to run multithreading delete in spark 3.1 actions, this is how it can be done below in scala/java
```scala
import org.apache.iceberg.Table
import org.apache.iceberg.actions.DeleteOrphanFiles
import org.apache.iceberg.spark.actions.SparkActions
import org.apache.spark.sql.SparkSession
import java.util.concurrent.{Executors, TimeUnit}
class RemoveOrphansAPI {
def removeOrphansWithSparkAction(
sparkSession: SparkSession,
table: Table,
threadsCount: Int,
olderThanTS: Long
): DeleteOrphanFiles.Result = {
val executor = Executors.newFixedThreadPool(threadsCount)
val result: DeleteOrphanFiles.Result = SparkActions
.get(sparkSession)
.deleteOrphanFiles(table)
.olderThan(olderThanTS)
.executeDeleteWith(executor)
.execute()
executor.shutdown()
result
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org