You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "sunshangchun (JIRA)" <ji...@apache.org> on 2014/06/20 08:26:24 UTC
[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sunshangchun updated SPARK-2201:
--------------------------------
Summary: Improve FlumeInputDStream's stability (was: Improve FlumeInputDStream)
> Improve FlumeInputDStream's stability
> -------------------------------------
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
> Issue Type: Improvement
> Reporter: sunshangchun
>
> Currently only one flume receiver can work with FlumeInputDStream and I am willing to do some works to improve it, my ideas are described as follows:
> a ip and port denotes a physical host, and a logical host consists of one or more physical hosts
> In our case, spark flume receivers bind themselves to a logical host when started, and a flume agent get physical hosts and push events to them.
> Two classes are introduced, LogicalHostRouter supplies a map between logical host and physical host, and LogicalHostRouterListener let relation changes watchable.
> Some works need to be done here:
> 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by zookeeper. when physical host started, create tmp node in zk, listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner.
> Does it a feasible plan? Thanks.
--
This message was sent by Atlassian JIRA
(v6.2#6252)