You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2014/08/01 22:47:38 UTC

[jira] [Resolved] (SPARK-1645) Improve Spark Streaming compatibility with Flume

     [ https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-1645.
----------------------------------

    Resolution: Fixed

> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
>                 Key: SPARK-1645
>                 URL: https://issues.apache.org/jira/browse/SPARK-1645
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Hari Shreedharan
>            Assignee: Tathagata Das
>             Fix For: 1.1.0
>
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node, else Flume cannot send data to it. We can fix this by adding a Flume receiver that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest, not just Flume. I will file a separate jira for this and we should work on it there. This is a longer term project and requires considerable development work.
> I intend to start working on these soon. Any input is appreciated. (It'd be great if someone can add me as a contributor on jira, so I can assign the jira to myself).



--
This message was sent by Atlassian JIRA
(v6.2#6252)