You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2019/12/12 00:39:00 UTC

[jira] [Created] (SPARK-30227) Add close() on DataWriter interface

Jungtaek Lim created SPARK-30227:
------------------------------------

             Summary: Add close() on DataWriter interface
                 Key: SPARK-30227
                 URL: https://issues.apache.org/jira/browse/SPARK-30227
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Jungtaek Lim


If the scaladoc of DataWriter is correct, the lifecycle of DataWriter instance ends at either commit() or abort(). That makes datasource implementors to feel they can place resource cleanup in both sides, but abort() can be called when commit() fails; so they have to ensure they don't do double-cleanup if cleanup is not idempotent.

So I'm proposing to add close() on DataWriter explicitly, which is "the place" for resource cleanup. The lifecycle of DataWriter instance will (and should) end at close().

I've checked some callers to see whether they can apply "try-catch-finally" to ensure close() is called at the end of lifecycle for DataWriter, and they look like so.

The change would bring backward incompatible change, but given the interface is marked as Evolving and we're making backward incompatible changes in Spark 3.0, so I feel it may not matter.

I've raised the discussion around this issue and the feedbacks are positive: https://lists.apache.org/thread.html/bfdb989fa83bc4d774804473610bd0cfcaa1dd5a020ca9a522f3510c%40%3Cdev.spark.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org