You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/11/02 23:12:58 UTC

[jira] [Updated] (SPARK-18024) Introduce an internal commit protocol API along with OutputCommitter implementation

     [ https://issues.apache.org/jira/browse/SPARK-18024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin updated SPARK-18024:
--------------------------------
    Summary: Introduce an internal commit protocol API along with OutputCommitter implementation  (was: Introduce a commit protocol API along with OutputCommitter implementation)

> Introduce an internal commit protocol API along with OutputCommitter implementation
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-18024
>                 URL: https://issues.apache.org/jira/browse/SPARK-18024
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>             Fix For: 2.1.0
>
>
> This commit protocol API should wrap around Hadoop's output committer. Later we can expand the API to cover streaming commits.
> The existing Hadoop output committer API is insufficient for streaming use cases:
> 1. It has no way for tasks to pass information back to the driver.
> 2. It relies on the weird Hadoop hashmap to pass information from the driver to the executors, largely because there is no support for language integration and serialization in Hadoop MapReduce. Spark has more natural support for passing information through automatic closure serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org