You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@falcon.apache.org by "John Yu (JIRA)" <ji...@apache.org> on 2014/07/17 21:19:04 UTC

[jira] [Created] (FALCON-511) Support for multiple sources to multiple targets, without partitions

John Yu created FALCON-511:
------------------------------

             Summary: Support for multiple sources to multiple targets, without partitions
                 Key: FALCON-511
                 URL: https://issues.apache.org/jira/browse/FALCON-511
             Project: Falcon
          Issue Type: New Feature
            Reporter: John Yu


We currently have the following use case:
Colo1 has 1 ETL cluster (Colo1-ETL) and 1 adhoc cluster (Colo1-A)
Colo2 has 1 ETL cluster (Colo2-ETL) and 1 adhoc cluster (Colo2-A)

Due to the bandwidth constraint between the two colo's, we are thinking of having the 2 ETL clusters perform the same computation to generate the same dataset, and have the 2 adhoc clusters pull from their respective colo-local ETL cluster.

This can be done currently by specifying 2 different feeds.  However, a critical dataset might be computed on different colos simultaneously for both DR and load balancing purposes.  In this scenario, we would like to ease data discovery for end users by having only 1 feed definition, so that end users know these pieces of data are logically the same data, and they are free to pick one to use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)