You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Balu Vellanki (JIRA)" <ji...@apache.org> on 2016/01/06 23:00:41 UTC

[jira] [Comment Edited] (FALCON-1728) Process entity definition allows multiple clusters when it has output Feed defined.

    [ https://issues.apache.org/jira/browse/FALCON-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086358#comment-15086358 ] 

Balu Vellanki edited comment on FALCON-1728 at 1/6/16 9:59 PM:
---------------------------------------------------------------

[~pavan kumar] and [~ajayyadava] : {noformat}
 Say you have an output feed FeedOne whose source cluster is ClusterOne and target cluster is ClusterTwo. The location of the feed is /apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}

Now you have a process ProcessOne whose output feed is FeedOne. The process is run on clusters ClusterTwo and ClusterThree. When oozie runs the process instance, the user expects the output data to be generated in ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}. The user also expects this dir to be replicated to ClusterTwo/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}.   Now, if two jobs for same process instance on two different clusters are writing to the same dir ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR} , wont this be a problem?

If the process is run on ClusterThree and the output is written to ClusterThree/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR} instead of location in ClusterOne, I think it is a bug.   

[~venkatnrangan] and [~sriksun] : What do you think?
{noformat}



was (Author: bvellanki):
[~pavan kumar] and [~ajayyadava] : Say you have an output feed FeedOne whose source cluster is ClusterOne and target cluster is ClusterTwo. The location of the feed is /apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}

Now you have a process ProcessOne whose output feed is FeedOne. The process is run on clusters ClusterTwo and ClusterThree. When oozie runs the process instance, the user expects the output data to be generated in ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}. The user also expects this dir to be replicated to ClusterTwo/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}.   Now, if two jobs for same process instance on two different clusters are writing to the same dir ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR} , wont this be a problem?

If the process is run on ClusterThree and the output is written to ClusterThree/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR} instead of location in ClusterOne, I think it is a bug.   

[~venkatnrangan] and [~sriksun] : What do you think?


> Process entity definition allows multiple clusters when it has output Feed defined. 
> ------------------------------------------------------------------------------------
>
>                 Key: FALCON-1728
>                 URL: https://issues.apache.org/jira/browse/FALCON-1728
>             Project: Falcon
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 0.9
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>            Priority: Critical
>
> Process XSD allows user to specify multiple clusters per process entity. I am guessing this would allow a user to run duplicate instance of the process on multiple clusters at the same time (I do not really see a need for this). When the process has an output feed defined, you can have duplicate process instances writing to same feed instance, causing data corruption/failures. The solution is to 
> 1. Do not allow multiple clusters per process. Let the user define a duplicate process if user wants to run duplicate instances.  
> OR
> 2. Allow multiple clusters, but only when there is no output feed defined.
> [~sriksun] please let me know if there is any other reason for allowing multiple clusters in a process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)