You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by ve...@apache.org on 2014/05/06 20:52:57 UTC

[5/5] git commit: FALCON-324 Document lineage feature. Contributed by Sowmya Ramesh

FALCON-324 Document lineage feature. Contributed by Sowmya Ramesh


Project: http://git-wip-us.apache.org/repos/asf/incubator-falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-falcon/commit/ad2701d2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-falcon/tree/ad2701d2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-falcon/diff/ad2701d2

Branch: refs/heads/master
Commit: ad2701d2b10148b3ee112ec7058ad3c512a2ded1
Parents: 5689007
Author: Venkatesh Seetharam <ve...@apache.org>
Authored: Tue May 6 11:52:10 2014 -0700
Committer: Venkatesh Seetharam <ve...@apache.org>
Committed: Tue May 6 11:52:10 2014 -0700

----------------------------------------------------------------------
 CHANGES.txt                                     |  2 +
 docs/src/site/twiki/FalconDocumentation.twiki   | 29 +++++++++
 docs/src/site/twiki/index.twiki                 |  2 +
 .../site/twiki/restapi/AdjacentVertices.twiki   | 67 ++++++++++++++++++++
 docs/src/site/twiki/restapi/AllEdges.twiki      | 42 ++++++++++++
 docs/src/site/twiki/restapi/AllVertices.twiki   | 43 +++++++++++++
 docs/src/site/twiki/restapi/Edge.twiki          | 33 ++++++++++
 docs/src/site/twiki/restapi/Graph.twiki         | 22 +++++++
 docs/src/site/twiki/restapi/ResourceList.twiki  | 28 +++++---
 docs/src/site/twiki/restapi/Vertex.twiki        | 35 ++++++++++
 .../site/twiki/restapi/VertexProperties.twiki   | 33 ++++++++++
 docs/src/site/twiki/restapi/Vertices.twiki      | 37 +++++++++++
 12 files changed, 365 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 2a9a35b..ca2ce75 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -40,6 +40,8 @@ Release Version: 0.5-incubating
     Venkatesh Seetharam)
    
   IMPROVEMENTS
+    FALCON-324 Document lineage feature (Sowmya Ramesh via Venkatesh Seetharam)
+
     FALCON-312 Falcon LogCleanupServiceTest seems to clean up root "/"
     (Venkatesh Seetharam)
 

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index c78765e..36b989c 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -12,6 +12,7 @@
    * <a href="#Idempotency">Idempotency</a>
    * <a href="#Alerting_and_Monitoring">Alerting and Monitoring</a>
    * <a href="#Falcon_EL_Expressions">Falcon EL Expressions</a>
+   * <a href="#Lineage">Lineage</a>
 
 ---++ Architecture
 ---+++ Introduction
@@ -709,3 +710,31 @@ Falcon currently support following ELs:
    * 8. *latest(number of latest instance)*: This will simply make you input consider the number of latest available instance of the feed given as parameter. For example: latest(0) will consider the last available instance of feed, where as latest latest(-1) will consider second last available feed and latest(-3) will consider 4th last available feed.
    
 
+---++ Lineage
+
+Falcon adds the ability to capture lineage for both entities and its associated instances. It
+also captures the metadata tags associated with each of the entities as relationships. The
+following relationships are captured:
+
+   * owner of entities - User
+   * data classification tags
+   * groups defined in feeds
+   * Relationships between entities
+      * Clusters associated with Feed and Process entity
+      * Input and Output feeds for a Process
+   * Instances refer to corresponding entities
+
+Lineage is exposed in 3 ways:
+
+   * REST API
+   * CLI
+   * Dashboard - Interactive lineage for Process instances
+
+This feature is enabled by default but could be disabled by removing the following from:
+<verbatim>
+config name: *.application.services
+config value: org.apache.falcon.metadata.MetadataMappingService
+<verbatim>
+
+Lineage is only captured for Process executions. A future release will capture lineage for
+lifecycle policies such as replication and retention.

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/index.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/index.twiki b/docs/src/site/twiki/index.twiki
index ee48fbb..e7917c5 100644
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -19,6 +19,8 @@ management on hadoop clusters.
 
    * Enables use cases for local processing in colo and global aggregations
 
+   * Captures Lineage information for feeds and processes
+
 ---+ Getting Started
 
 Start with these simple steps to install an falcon instance [[InstallationSteps][Simple setup]]. Also refer

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AdjacentVertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AdjacentVertices.twiki b/docs/src/site/twiki/restapi/AdjacentVertices.twiki
new file mode 100644
index 0000000..407ee85
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AdjacentVertices.twiki
@@ -0,0 +1,67 @@
+---++  GET api/graphs/lineage/vertices/:id/:direction
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Get a list of adjacent vertices or edges with a direction.
+
+---++ Parameters
+   * :id is the id of the vertex.
+   * :direction is the direction associated with the edges. To get the adjacent out vertices of vertex pass direction
+     as out, in to get adjacent in vertices and both to get both in and out adjacent vertices. Similarly to get the
+     out edges of vertex pass outE, inE to get in edges and bothE to get the both in and out edges of vertex.
+      * out  : get the adjacent out vertices of vertex
+      * in   : get the adjacent in vertices of vertex
+      * both : get the both adjacent in and out vertices of vertex
+      * outCount  : get the number of out vertices of vertex
+      * inCount   : get the number of in vertices of vertex
+      * bothCount : get the number of adjacent in and out vertices of vertex
+      * outIds  : get the identifiers of out vertices of vertex
+      * inIds   : get the identifiers of in vertices of vertex
+      * bothIds : get the identifiers of adjacent in and out vertices of vertex
+
+---++ Results
+Adjacent vertices of the vertex for the specified direction.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4/out
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results": [
+        {
+            "timestamp":"2014-04-21T20:55Z",
+            "name":"sampleFeed",
+            "type":"feed-instance",
+            "_id":8,
+            "_type":"vertex"
+        }
+    ],
+    "totalSize":1}
+}
+</verbatim>
+
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4/bothE
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results":[
+        {
+            "_id":"Q5V-4-5g",
+            "_type":"edge",
+            "_outV":4,
+            "_inV":8,
+            "_label":"output"
+        }
+    ],
+    "totalSize":1
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AllEdges.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AllEdges.twiki b/docs/src/site/twiki/restapi/AllEdges.twiki
new file mode 100644
index 0000000..d51da06
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AllEdges.twiki
@@ -0,0 +1,42 @@
+---++  GET pi/graphs/lineage//edges/all
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all edges.
+
+---++ Parameters
+None.
+
+---++ Results
+All edges in lineage graph.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/edges/all
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results": [
+        {
+            "_id":"Q5V-4-5g",
+            "_type":"edge",
+            "_outV":4,
+            "_inV":8,
+            "_label":"output"
+        },
+        {
+            "_id":"Q6t-c-5g",
+            "_type":"edge",
+            "_outV":12,
+            "_inV":16,
+            "_label":"output"
+        }
+    ],
+    "totalSize": 2
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AllVertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AllVertices.twiki b/docs/src/site/twiki/restapi/AllVertices.twiki
new file mode 100644
index 0000000..9a64415
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AllVertices.twiki
@@ -0,0 +1,43 @@
+---++  GET api/graphs/lineage/vertices/all
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all vertices.
+
+---++ Parameters
+None.
+
+---++ Results
+All vertices in lineage graph.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/all
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results": [
+        {
+            "timestamp":"2014-04-21T20:55Z",
+            "name":"sampleIngestProcess\/2014-03-01T10:00Z",
+            "type":"process-instance",
+            "version":"2.0.0",
+            "_id":4,
+            "_type":"vertex"
+        },
+        {
+            "timestamp":"2014-04-21T20:55Z",
+            "name":"rawEmailFeed\/2014-03-01T10:00Z",
+            "type":"feed-instance",
+            "_id":8,
+            "_type":"vertex"
+        }
+    ],
+    "totalSize": 2
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Edge.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Edge.twiki b/docs/src/site/twiki/restapi/Edge.twiki
new file mode 100644
index 0000000..4fa0874
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Edge.twiki
@@ -0,0 +1,33 @@
+---++  GET api/graphs/lineage/edges/:id
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the edge with specified id.
+
+---++ Parameters
+   * :id is the unique id of the edge.
+
+---++ Results
+Edge with the specified id.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/edges/Q6t-c-5g
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results":
+        {
+            "_id":"Q6t-c-5g",
+            "_type":"edge",
+            "_outV":12,
+            "_inV":16,
+            "_label":"output"
+        }
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Graph.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Graph.twiki b/docs/src/site/twiki/restapi/Graph.twiki
new file mode 100644
index 0000000..4850b10
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Graph.twiki
@@ -0,0 +1,22 @@
+---++  GET api/graphs/lineage/serialize
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Dump the graph.
+
+---++ Parameters
+None.
+
+---++ Results
+Serialize graph to a file configured using *.falcon.graph.serialize.path in Custom startup.properties.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/serialize
+</verbatim>
+---+++ Result
+None.

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/ResourceList.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/ResourceList.twiki b/docs/src/site/twiki/restapi/ResourceList.twiki
index 6ca0dea..ad0a53f 100644
--- a/docs/src/site/twiki/restapi/ResourceList.twiki
+++ b/docs/src/site/twiki/restapi/ResourceList.twiki
@@ -4,6 +4,7 @@
    * <a href="#REST_Call_on_Entity_Resource">REST Call on Entity Resource</a>
    * <a href="#REST_Call_on_Feed_and_Process_Instances">REST Call on Feed/Process Instances</a>
    * <a href="#REST_Call_on_Admin_Resource">REST Call on Admin Resource</a>
+   * <a href="#REST_Call_on_Lineage_Graph">REST Call on Lineage Graph Resource</a>
 
 ---++ Authentication
 
@@ -52,12 +53,23 @@ See also: [[../Security.twiki][Security in Falcon]]
 
 ---++ REST Call on Feed and Process Instances
 
-| *Call Type* | *Resource*                                                           | *Description*                |
-| GET         | [[InstanceRunning][api/instance/running/:entity-type/:entity-name]]  | List of running instances.   |
-| GET         | [[InstanceStatus][api/instance/status/:entity-type/:entity-name]]]   | Status of a given instance   |
-| POST        | [[InstanceKill][api/instance/kill/:entity-type/:entity-name]]]       | Kill a given instance        |
-| POST        | [[InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]]] | Suspend a running instance   |
-| POST        | [[InstanceResume][api/instance/resume/:entity-type/:entity-name]]]   | Resume a given instance      |
-| POST        | [[InstanceRerun][api/instance/rerun/:entity-type/:entity-name]]]     | Rerun a given instance       |
-| GET         | [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]]]       | Get logs of a given instance |
+| *Call Type* | *Resource*                                                          | *Description*                |
+| GET         | [[InstanceRunning][api/instance/running/:entity-type/:entity-name]] | List of running instances.   |
+| GET         | [[InstanceStatus][api/instance/status/:entity-type/:entity-name]]   | Status of a given instance   |
+| POST        | [[InstanceKill][api/instance/kill/:entity-type/:entity-name]]       | Kill a given instance        |
+| POST        | [[InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]] | Suspend a running instance   |
+| POST        | [[InstanceResume][api/instance/resume/:entity-type/:entity-name]]   | Resume a given instance      |
+| POST        | [[InstanceRerun][api/instance/rerun/:entity-type/:entity-name]]     | Rerun a given instance       |
+| GET         | [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]]       | Get logs of a given instance |
 
+---++ REST Call on Lineage Graph
+
+| *Call Type* | *Resource*                                                                           | *Description*                                       |
+| GET         | [[Graph][api/graphs/lineage/serialize]]                                              | dump the graph                                      |
+| GET         | [[AllVertices][api/graphs/lineage/vertices/all]]                                     | get all vertices                                    |
+| GET         | [[Vertices][api/graphs/lineage/vertices?key=:key&value=:value]]                      | get all vertices for a key index                    |
+| GET         | [[Vertex][api/graphs/lineage/vertices/:id]]                                          | get vertex with id <id>                             |
+| GET         | [[VertexProperties][api/graphs/lineage/vertices/properties/:id?relationships=:true]] | get vertex properties with id                       |
+| GET         | [[AdjacentVertices][api/graphs/lineage/vertices/:id/:direction]]                     | get the adjacent vertices or edges with a direction |
+| GET         | [[AllEdges][api/graphs/lineage//edges/all]]                                          | get all edges                                       |
+| GET         | [[Edge][api/graphs/lineage/edges/:id]]                                               | get edge with id <id>                               |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Vertex.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Vertex.twiki b/docs/src/site/twiki/restapi/Vertex.twiki
new file mode 100644
index 0000000..1102bee
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Vertex.twiki
@@ -0,0 +1,35 @@
+---++  GET api/graphs/lineage/vertices/:id
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the vertex with specified id.
+
+---++ Parameters
+   * :id is the unique id of the vertex.
+
+---++ Results
+Vertex with the specified id.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results": [
+        {
+            "timestamp":"2014-04-21T20:55Z",
+            "name":"sampleIngestProcess",
+            "type":"process-instance",
+            "version":"2.0.0",
+            "_id":4,
+            "_type":"vertex"
+        }
+    ]
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/VertexProperties.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/VertexProperties.twiki b/docs/src/site/twiki/restapi/VertexProperties.twiki
new file mode 100644
index 0000000..68247ef
--- /dev/null
+++ b/docs/src/site/twiki/restapi/VertexProperties.twiki
@@ -0,0 +1,33 @@
+---++  GET api/graphs/lineage/vertices/properties/:id?relationships=:true
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the properties of the vertex with specified id.
+
+---++ Parameters
+   * :id is the unique id of the vertex.
+   * :relationships has default value of false. Pass true if relationships should be fetched.
+
+---++ Results
+ Properties associated with the specified vertex.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/properties/40004?relationships=true
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results":
+        {
+            "timestamp":"2014-04-25T22:20Z",
+            "name":"local",
+            "type":"cluster-entity"
+        },
+    "totalSize":3
+}
+</verbatim>

http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Vertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Vertices.twiki b/docs/src/site/twiki/restapi/Vertices.twiki
new file mode 100644
index 0000000..8406b2c
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Vertices.twiki
@@ -0,0 +1,37 @@
+---++  GET api/graphs/lineage/vertices?key=:key&value=:value
+   * <a href="#Description">Description</a>
+   * <a href="#Parameters">Parameters</a>
+   * <a href="#Results">Results</a>
+   * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all vertices for a key index given the specified value.
+
+---++ Parameters
+   * :key is the key to be matched.
+   * :value is the associated value of the key.
+
+---++ Results
+All vertices matching given property key and a value.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices?key=name&value=sampleIngestProcess
+</verbatim>
+---+++ Result
+<verbatim>
+{
+    "results": [
+        {
+            "timestamp":"2014-04-21T20:55Z",
+            "name":"sampleIngestProcess",
+            "type":"process-instance",
+            "version":"2.0.0",
+            "_id":4,
+            "_type":"vertex"
+        }
+    ],
+    "totalSize": 1
+}
+</verbatim>