You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Srikanth Sundarrajan (JIRA)" <ji...@apache.org> on 2013/07/14 18:24:48 UTC
[jira] [Commented] (FALCON-48) Pipeline entity for Falcon
[ https://issues.apache.org/jira/browse/FALCON-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708063#comment-13708063 ]
Srikanth Sundarrajan commented on FALCON-48:
--------------------------------------------
Yes makes sense. Does it make sense to define a pipeline as the flow between two feed instance.
Consider for example:
Two source feeds A & B defined in clusters X1 & X2 each and then are transformed (process P1) to A1 & B1 respectively again in both the clusters X1 & X2. Now consider another process (P2) which consumes A1 & B1 in each of the clusters and produce feed C1 in each of the clusters. And let us say C1 is replicated (multiple source, single target) from X1 & X2 to X3.
In this case the dependency graph would look something like (Refer FALCON-37)
{noformat}
X1 X2
============== ==============
A B A B
| | | |
| | | |
A1 B1 A1 B1
| | | |
| | | |
|| ||
C1 C1
| |
| |
| |
||
C1
===============
X3
{noformat}
In the above flow pipeline can be defined as any pair of source feed & target feed combination.
Using a few notations to represent this:
* - indicates "All clusters"
#local - indicates the specific cluster on which the source originated
Possible pipeline abstractions (as long as there is a path in the graph between source & targets):
1. (A@*,B@*) - C@X3
2. (A@X1) - (A1@#local)
3. (A@*) - (C@#local)
Comments welcome.
> Pipeline entity for Falcon
> --------------------------
>
> Key: FALCON-48
> URL: https://issues.apache.org/jira/browse/FALCON-48
> Project: Falcon
> Issue Type: Wish
> Components: general
> Reporter: Sanjeev T
> Priority: Minor
> Labels: operability
>
> Falcon should also have pipeline entity.
> * Pipeline entity,can comprise of the complete DAG for given set of process and feeds, within cluster or across clusters.
> * How this helps,
> * setting up a pipeline, should take care of relevant feeds and process
> to be submitted.
> * in case of cluster having issue, a particular pipeline can be processed
> on another cluster
> * to build monitoring system for a pipeline system
> * run a particular pipeline for given time-window
> * cases like, backlog and catch can be handled easily
> * for Pipeline(A) to complete, we can suspend Pipeline(B),
> if they have dependency
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira