You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Gengliang Wang (JIRA)" <ji...@apache.org> on 2018/01/24 17:20:00 UTC

[jira] [Created] (SPARK-23202) Break down DataSourceV2Writer.commit into two phase

Gengliang Wang created SPARK-23202:
--------------------------------------

             Summary: Break down DataSourceV2Writer.commit into two phase
                 Key: SPARK-23202
                 URL: https://issues.apache.org/jira/browse/SPARK-23202
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.2.1
            Reporter: Gengliang Wang


Currently, the api DataSourceV2Writer#commit(WriterCommitMessage[]) commits a 

writing job with a list of commit messages.

It makes sense in some scenarios, e.g. MicroBatchExecution.

However, on receiving commit message, driver can start processing messages(e.g. persist messages into files) before all the messages are collected.

The proposal is to Break down DataSourceV2Writer.commit into two phase:
 # add(WriterCommitMessage message): Handles a commit message produced by \{@link DataWriter#commit()}.
 # commit():  Commits the writing job.

This should make the API more flexible, and more reasonable for implementing some datasources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org