You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Cheng Su <ch...@fb.com.INVALID> on 2022/01/25 06:29:57 UTC

[DISCUSS] Deprecate legacy file naming functions in FileCommitProtocol

Hello all,

FileCommitProtocol<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala> is the class to commit Spark job output (staging file & directory renaming, etc). During Spark 3.2 development, we added new functions into this class to allow more flexible output file naming (the PR detail is here<https://github.com/apache/spark/pull/33012>). We didn’t delete the existing file naming functions (newTaskTempFile(ext) & newTaskTempFileAbsPath(ext)), because we were aware of many other downstream projects or codebases already implemented their own custom implementation for FileCommitProtocol. Delete the existing functions would be a breaking change for them when upgrading Spark version, and we would like to avoid this unpleasant surprise for anyone if possible. But we also need to clean up legacy as we evolve our codebase. The newly added functions should supersede the legacy ones, and the cost to migrate would be fairly minimal.

So for next step, I would like to propose:

  *   Spark 3.3 (now): Add @deprecate annotation to legacy functions in FileCommitProtocol - newTaskTempFile(ext)<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L98> & newTaskTempFileAbsPath(ext)<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L135>. So developers depending on the legacy functions would notice this and take action to move to new functions.
  *   Next Spark major release (or whenever people feel comfortable): delete the legacy functions mentioned above from our codebase.

The PR to add @deprecate annotation is ready for review https://github.com/apache/spark/pull/35311 . Feel free to comment here or on the PR for further discussion.

Thanks,
Cheng Su (@c21)