You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "John Yu (JIRA)" <ji...@apache.org> on 2014/07/17 21:45:06 UTC

[jira] [Commented] (FALCON-511) Support for multiple sources to multiple targets, without partitions

    [ https://issues.apache.org/jira/browse/FALCON-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065442#comment-14065442 ] 

John Yu commented on FALCON-511:
--------------------------------

I am thinking of the following logic:

if ( 2+ source clusters specified and no partition ) {
  if ( source cluster preference specified )  // config example a
    look for the file in order of source cluster preference
  else {
    try copy from a source with same colo (based on cluster def)   // config example b
    will search through all source clusters for the data and copy from any // config example c
  }
}


--- config example a ---
   <clusters>
        <cluster name="colo1-etl" type="source"> .. </cluster>
        <cluster name="colo2-etl" type="source"> .. </cluster>
        <cluster name="colo1-adhoc" type="target" source="colo1-etl,colo2-etl"> .. </cluster>
        <cluster name="colo2-adhoc" type="target"  source="colo2-etl,colo1-etl"> .. </cluster>
    </clusters>

--- config example b ---
   <clusters>
        <cluster name="colo1-etl" type="source"> .. </cluster>
        <cluster name="colo2-etl" type="source"> .. </cluster>
        <cluster name="colo1-adhoc" type="target"> .. </cluster>
        <cluster name="colo2-adhoc" type="target"> .. </cluster>
    </clusters>

--- config example c ---
   <clusters>
        <cluster name="colo1" type="source"> .. </cluster>
        <cluster name="colo2" type="source"> .. </cluster>
        <cluster name="colo3" type="target"> .. </cluster>
        <cluster name="colo4" type="target"> .. </cluster>
    </clusters>


> Support for multiple sources to multiple targets, without partitions
> --------------------------------------------------------------------
>
>                 Key: FALCON-511
>                 URL: https://issues.apache.org/jira/browse/FALCON-511
>             Project: Falcon
>          Issue Type: New Feature
>            Reporter: John Yu
>
> We currently have the following use case:
> Colo1 has 1 ETL cluster (Colo1-ETL) and 1 adhoc cluster (Colo1-A)
> Colo2 has 1 ETL cluster (Colo2-ETL) and 1 adhoc cluster (Colo2-A)
> Due to the bandwidth constraint between the two colo's, we are thinking of having the 2 ETL clusters perform the same computation to generate the same dataset, and have the 2 adhoc clusters pull from their respective colo-local ETL cluster.
> This can be done currently by specifying 2 different feeds.  However, a critical dataset might be computed on different colos simultaneously for both DR and load balancing purposes.  In this scenario, we would like to ease data discovery for end users by having only 1 feed definition, so that end users know these pieces of data are logically the same data, and they are free to pick one to use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)