You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by ve...@apache.org on 2014/05/06 20:52:57 UTC
[5/5] git commit: FALCON-324 Document lineage feature. Contributed by
Sowmya Ramesh
FALCON-324 Document lineage feature. Contributed by Sowmya Ramesh
Project: http://git-wip-us.apache.org/repos/asf/incubator-falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-falcon/commit/ad2701d2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-falcon/tree/ad2701d2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-falcon/diff/ad2701d2
Branch: refs/heads/master
Commit: ad2701d2b10148b3ee112ec7058ad3c512a2ded1
Parents: 5689007
Author: Venkatesh Seetharam <ve...@apache.org>
Authored: Tue May 6 11:52:10 2014 -0700
Committer: Venkatesh Seetharam <ve...@apache.org>
Committed: Tue May 6 11:52:10 2014 -0700
----------------------------------------------------------------------
CHANGES.txt | 2 +
docs/src/site/twiki/FalconDocumentation.twiki | 29 +++++++++
docs/src/site/twiki/index.twiki | 2 +
.../site/twiki/restapi/AdjacentVertices.twiki | 67 ++++++++++++++++++++
docs/src/site/twiki/restapi/AllEdges.twiki | 42 ++++++++++++
docs/src/site/twiki/restapi/AllVertices.twiki | 43 +++++++++++++
docs/src/site/twiki/restapi/Edge.twiki | 33 ++++++++++
docs/src/site/twiki/restapi/Graph.twiki | 22 +++++++
docs/src/site/twiki/restapi/ResourceList.twiki | 28 +++++---
docs/src/site/twiki/restapi/Vertex.twiki | 35 ++++++++++
.../site/twiki/restapi/VertexProperties.twiki | 33 ++++++++++
docs/src/site/twiki/restapi/Vertices.twiki | 37 +++++++++++
12 files changed, 365 insertions(+), 8 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 2a9a35b..ca2ce75 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -40,6 +40,8 @@ Release Version: 0.5-incubating
Venkatesh Seetharam)
IMPROVEMENTS
+ FALCON-324 Document lineage feature (Sowmya Ramesh via Venkatesh Seetharam)
+
FALCON-312 Falcon LogCleanupServiceTest seems to clean up root "/"
(Venkatesh Seetharam)
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index c78765e..36b989c 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -12,6 +12,7 @@
* <a href="#Idempotency">Idempotency</a>
* <a href="#Alerting_and_Monitoring">Alerting and Monitoring</a>
* <a href="#Falcon_EL_Expressions">Falcon EL Expressions</a>
+ * <a href="#Lineage">Lineage</a>
---++ Architecture
---+++ Introduction
@@ -709,3 +710,31 @@ Falcon currently support following ELs:
* 8. *latest(number of latest instance)*: This will simply make you input consider the number of latest available instance of the feed given as parameter. For example: latest(0) will consider the last available instance of feed, where as latest latest(-1) will consider second last available feed and latest(-3) will consider 4th last available feed.
+---++ Lineage
+
+Falcon adds the ability to capture lineage for both entities and its associated instances. It
+also captures the metadata tags associated with each of the entities as relationships. The
+following relationships are captured:
+
+ * owner of entities - User
+ * data classification tags
+ * groups defined in feeds
+ * Relationships between entities
+ * Clusters associated with Feed and Process entity
+ * Input and Output feeds for a Process
+ * Instances refer to corresponding entities
+
+Lineage is exposed in 3 ways:
+
+ * REST API
+ * CLI
+ * Dashboard - Interactive lineage for Process instances
+
+This feature is enabled by default but could be disabled by removing the following from:
+<verbatim>
+config name: *.application.services
+config value: org.apache.falcon.metadata.MetadataMappingService
+<verbatim>
+
+Lineage is only captured for Process executions. A future release will capture lineage for
+lifecycle policies such as replication and retention.
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/index.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/index.twiki b/docs/src/site/twiki/index.twiki
index ee48fbb..e7917c5 100644
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -19,6 +19,8 @@ management on hadoop clusters.
* Enables use cases for local processing in colo and global aggregations
+ * Captures Lineage information for feeds and processes
+
---+ Getting Started
Start with these simple steps to install an falcon instance [[InstallationSteps][Simple setup]]. Also refer
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AdjacentVertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AdjacentVertices.twiki b/docs/src/site/twiki/restapi/AdjacentVertices.twiki
new file mode 100644
index 0000000..407ee85
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AdjacentVertices.twiki
@@ -0,0 +1,67 @@
+---++ GET api/graphs/lineage/vertices/:id/:direction
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Get a list of adjacent vertices or edges with a direction.
+
+---++ Parameters
+ * :id is the id of the vertex.
+ * :direction is the direction associated with the edges. To get the adjacent out vertices of vertex pass direction
+ as out, in to get adjacent in vertices and both to get both in and out adjacent vertices. Similarly to get the
+ out edges of vertex pass outE, inE to get in edges and bothE to get the both in and out edges of vertex.
+ * out : get the adjacent out vertices of vertex
+ * in : get the adjacent in vertices of vertex
+ * both : get the both adjacent in and out vertices of vertex
+ * outCount : get the number of out vertices of vertex
+ * inCount : get the number of in vertices of vertex
+ * bothCount : get the number of adjacent in and out vertices of vertex
+ * outIds : get the identifiers of out vertices of vertex
+ * inIds : get the identifiers of in vertices of vertex
+ * bothIds : get the identifiers of adjacent in and out vertices of vertex
+
+---++ Results
+Adjacent vertices of the vertex for the specified direction.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4/out
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results": [
+ {
+ "timestamp":"2014-04-21T20:55Z",
+ "name":"sampleFeed",
+ "type":"feed-instance",
+ "_id":8,
+ "_type":"vertex"
+ }
+ ],
+ "totalSize":1}
+}
+</verbatim>
+
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4/bothE
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results":[
+ {
+ "_id":"Q5V-4-5g",
+ "_type":"edge",
+ "_outV":4,
+ "_inV":8,
+ "_label":"output"
+ }
+ ],
+ "totalSize":1
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AllEdges.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AllEdges.twiki b/docs/src/site/twiki/restapi/AllEdges.twiki
new file mode 100644
index 0000000..d51da06
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AllEdges.twiki
@@ -0,0 +1,42 @@
+---++ GET pi/graphs/lineage//edges/all
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all edges.
+
+---++ Parameters
+None.
+
+---++ Results
+All edges in lineage graph.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/edges/all
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results": [
+ {
+ "_id":"Q5V-4-5g",
+ "_type":"edge",
+ "_outV":4,
+ "_inV":8,
+ "_label":"output"
+ },
+ {
+ "_id":"Q6t-c-5g",
+ "_type":"edge",
+ "_outV":12,
+ "_inV":16,
+ "_label":"output"
+ }
+ ],
+ "totalSize": 2
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/AllVertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/AllVertices.twiki b/docs/src/site/twiki/restapi/AllVertices.twiki
new file mode 100644
index 0000000..9a64415
--- /dev/null
+++ b/docs/src/site/twiki/restapi/AllVertices.twiki
@@ -0,0 +1,43 @@
+---++ GET api/graphs/lineage/vertices/all
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all vertices.
+
+---++ Parameters
+None.
+
+---++ Results
+All vertices in lineage graph.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/all
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results": [
+ {
+ "timestamp":"2014-04-21T20:55Z",
+ "name":"sampleIngestProcess\/2014-03-01T10:00Z",
+ "type":"process-instance",
+ "version":"2.0.0",
+ "_id":4,
+ "_type":"vertex"
+ },
+ {
+ "timestamp":"2014-04-21T20:55Z",
+ "name":"rawEmailFeed\/2014-03-01T10:00Z",
+ "type":"feed-instance",
+ "_id":8,
+ "_type":"vertex"
+ }
+ ],
+ "totalSize": 2
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Edge.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Edge.twiki b/docs/src/site/twiki/restapi/Edge.twiki
new file mode 100644
index 0000000..4fa0874
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Edge.twiki
@@ -0,0 +1,33 @@
+---++ GET api/graphs/lineage/edges/:id
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the edge with specified id.
+
+---++ Parameters
+ * :id is the unique id of the edge.
+
+---++ Results
+Edge with the specified id.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/edges/Q6t-c-5g
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results":
+ {
+ "_id":"Q6t-c-5g",
+ "_type":"edge",
+ "_outV":12,
+ "_inV":16,
+ "_label":"output"
+ }
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Graph.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Graph.twiki b/docs/src/site/twiki/restapi/Graph.twiki
new file mode 100644
index 0000000..4850b10
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Graph.twiki
@@ -0,0 +1,22 @@
+---++ GET api/graphs/lineage/serialize
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Dump the graph.
+
+---++ Parameters
+None.
+
+---++ Results
+Serialize graph to a file configured using *.falcon.graph.serialize.path in Custom startup.properties.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/serialize
+</verbatim>
+---+++ Result
+None.
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/ResourceList.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/ResourceList.twiki b/docs/src/site/twiki/restapi/ResourceList.twiki
index 6ca0dea..ad0a53f 100644
--- a/docs/src/site/twiki/restapi/ResourceList.twiki
+++ b/docs/src/site/twiki/restapi/ResourceList.twiki
@@ -4,6 +4,7 @@
* <a href="#REST_Call_on_Entity_Resource">REST Call on Entity Resource</a>
* <a href="#REST_Call_on_Feed_and_Process_Instances">REST Call on Feed/Process Instances</a>
* <a href="#REST_Call_on_Admin_Resource">REST Call on Admin Resource</a>
+ * <a href="#REST_Call_on_Lineage_Graph">REST Call on Lineage Graph Resource</a>
---++ Authentication
@@ -52,12 +53,23 @@ See also: [[../Security.twiki][Security in Falcon]]
---++ REST Call on Feed and Process Instances
-| *Call Type* | *Resource* | *Description* |
-| GET | [[InstanceRunning][api/instance/running/:entity-type/:entity-name]] | List of running instances. |
-| GET | [[InstanceStatus][api/instance/status/:entity-type/:entity-name]]] | Status of a given instance |
-| POST | [[InstanceKill][api/instance/kill/:entity-type/:entity-name]]] | Kill a given instance |
-| POST | [[InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]]] | Suspend a running instance |
-| POST | [[InstanceResume][api/instance/resume/:entity-type/:entity-name]]] | Resume a given instance |
-| POST | [[InstanceRerun][api/instance/rerun/:entity-type/:entity-name]]] | Rerun a given instance |
-| GET | [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]]] | Get logs of a given instance |
+| *Call Type* | *Resource* | *Description* |
+| GET | [[InstanceRunning][api/instance/running/:entity-type/:entity-name]] | List of running instances. |
+| GET | [[InstanceStatus][api/instance/status/:entity-type/:entity-name]] | Status of a given instance |
+| POST | [[InstanceKill][api/instance/kill/:entity-type/:entity-name]] | Kill a given instance |
+| POST | [[InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]] | Suspend a running instance |
+| POST | [[InstanceResume][api/instance/resume/:entity-type/:entity-name]] | Resume a given instance |
+| POST | [[InstanceRerun][api/instance/rerun/:entity-type/:entity-name]] | Rerun a given instance |
+| GET | [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]] | Get logs of a given instance |
+---++ REST Call on Lineage Graph
+
+| *Call Type* | *Resource* | *Description* |
+| GET | [[Graph][api/graphs/lineage/serialize]] | dump the graph |
+| GET | [[AllVertices][api/graphs/lineage/vertices/all]] | get all vertices |
+| GET | [[Vertices][api/graphs/lineage/vertices?key=:key&value=:value]] | get all vertices for a key index |
+| GET | [[Vertex][api/graphs/lineage/vertices/:id]] | get vertex with id <id> |
+| GET | [[VertexProperties][api/graphs/lineage/vertices/properties/:id?relationships=:true]] | get vertex properties with id |
+| GET | [[AdjacentVertices][api/graphs/lineage/vertices/:id/:direction]] | get the adjacent vertices or edges with a direction |
+| GET | [[AllEdges][api/graphs/lineage//edges/all]] | get all edges |
+| GET | [[Edge][api/graphs/lineage/edges/:id]] | get edge with id <id> |
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Vertex.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Vertex.twiki b/docs/src/site/twiki/restapi/Vertex.twiki
new file mode 100644
index 0000000..1102bee
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Vertex.twiki
@@ -0,0 +1,35 @@
+---++ GET api/graphs/lineage/vertices/:id
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the vertex with specified id.
+
+---++ Parameters
+ * :id is the unique id of the vertex.
+
+---++ Results
+Vertex with the specified id.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/4
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results": [
+ {
+ "timestamp":"2014-04-21T20:55Z",
+ "name":"sampleIngestProcess",
+ "type":"process-instance",
+ "version":"2.0.0",
+ "_id":4,
+ "_type":"vertex"
+ }
+ ]
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/VertexProperties.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/VertexProperties.twiki b/docs/src/site/twiki/restapi/VertexProperties.twiki
new file mode 100644
index 0000000..68247ef
--- /dev/null
+++ b/docs/src/site/twiki/restapi/VertexProperties.twiki
@@ -0,0 +1,33 @@
+---++ GET api/graphs/lineage/vertices/properties/:id?relationships=:true
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Gets the properties of the vertex with specified id.
+
+---++ Parameters
+ * :id is the unique id of the vertex.
+ * :relationships has default value of false. Pass true if relationships should be fetched.
+
+---++ Results
+ Properties associated with the specified vertex.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices/properties/40004?relationships=true
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results":
+ {
+ "timestamp":"2014-04-25T22:20Z",
+ "name":"local",
+ "type":"cluster-entity"
+ },
+ "totalSize":3
+}
+</verbatim>
http://git-wip-us.apache.org/repos/asf/incubator-falcon/blob/ad2701d2/docs/src/site/twiki/restapi/Vertices.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/restapi/Vertices.twiki b/docs/src/site/twiki/restapi/Vertices.twiki
new file mode 100644
index 0000000..8406b2c
--- /dev/null
+++ b/docs/src/site/twiki/restapi/Vertices.twiki
@@ -0,0 +1,37 @@
+---++ GET api/graphs/lineage/vertices?key=:key&value=:value
+ * <a href="#Description">Description</a>
+ * <a href="#Parameters">Parameters</a>
+ * <a href="#Results">Results</a>
+ * <a href="#Examples">Examples</a>
+
+---++ Description
+Get all vertices for a key index given the specified value.
+
+---++ Parameters
+ * :key is the key to be matched.
+ * :value is the associated value of the key.
+
+---++ Results
+All vertices matching given property key and a value.
+
+---++ Examples
+---+++ Rest Call
+<verbatim>
+GET http://localhost:15000/api/graphs/lineage/vertices?key=name&value=sampleIngestProcess
+</verbatim>
+---+++ Result
+<verbatim>
+{
+ "results": [
+ {
+ "timestamp":"2014-04-21T20:55Z",
+ "name":"sampleIngestProcess",
+ "type":"process-instance",
+ "version":"2.0.0",
+ "_id":4,
+ "_type":"vertex"
+ }
+ ],
+ "totalSize": 1
+}
+</verbatim>