You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "chao.wu (JIRA)" <ji...@apache.org> on 2014/06/20 08:35:25 UTC

[jira] [Commented] (SPARK-2201) Improve FlumeInputDStream's stability

    [ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038525#comment-14038525 ] 

chao.wu commented on SPARK-2201:
--------------------------------

good idea

> Improve FlumeInputDStream's stability
> -------------------------------------
>
>                 Key: SPARK-2201
>                 URL: https://issues.apache.org/jira/browse/SPARK-2201
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: sunshangchun
>
> Currently only one flume receiver can work with FlumeInputDStream and I am willing to do some works to improve it, my ideas are described as follows: 
> a ip and port denotes a physical host, and a logical host consists of one or more physical hosts
> In our case, spark flume receivers bind themselves to a logical host when started, and a flume agent get physical hosts and push events to them.
> Two classes are introduced, LogicalHostRouter supplies a map between logical host and physical host, and LogicalHostRouterListener let relation changes watchable.
> Some works need to be done here: 
> 1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by zookeeper. when physical host started, create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner.
> Does it a feasible plan? Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)