You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Rajat Goel <ra...@gmail.com> on 2017/08/24 18:04:18 UTC

Regarding pipeline of multiple processes and ATLAS-1236

Hi,

I am new user and exploring Apache Atlas for metadata management. I have a
use case where I want to track lineage across a pipeline of processing
functions i.e. DataSet 1 -> Function/Process1 -> Process2 -> Process 3 ->
DataSet 2.

Another use case is that each of the above Processes/Functions themselves
could be some named pipelines. Eg. DataSet1 ->
Process1 (which is pipeline of SubProcess1 -> SubProcess2) -> Process2 ->
DataSet2.

Can Apache Atlas support these use cases for metadata management and
lineage?
If yes, please suggest how. If not, is there any plan to support these in
future?

I found one Jira improvement ticket ATLAS-1236 which looks to be relevant
to the above use cases. Is there any plan to resolve it in upcoming
releases?

Thanks & Regards,
Rajat Goel

Re: Regarding pipeline of multiple processes and ATLAS-1236

Posted by Ernie Ostic <eo...@us.ibm.com>.
Hi Goel.  

Welcome.  Great thoughts and use cases.  I will let others chime in here, but one question would be "what is your "process"? ".    For example, is it HiveQL?  SQOOP?    Something for which there exists today a supported "hook"? (see the atlas.apache.org page with the high level specs, api, and hooks list).  ....and are your datasets known to Atlas (hive, for example), or something external to Atlas?  

This would help outline what can work today directly, or possibly need custom work...

There is great work being done by the team here to further enhance the underlying models (as you can see thru recent jira's)....that will facilitate even wider creative use cases.  

Looking forward to the details on your "process" and dataset object types.

Ernie





Sent from IBM Verse


   Rajat Goel --- Regarding pipeline of multiple processes and ATLAS-1236 --- 
    From:"Rajat Goel" <ra...@gmail.com>To:dev@atlas.apache.orgDate:Thu, Aug 24, 2017 2:06 PMSubject:Regarding pipeline of multiple processes and ATLAS-1236
  
    Hi,I am new user and exploring Apache Atlas for metadata management. I have ause case where I want to track lineage across a pipeline of processingfunctions i.e. DataSet 1 -> Function/Process1 -> Process2 -> Process 3 ->DataSet 2.Another use case is that each of the above Processes/Functions themselvescould be some named pipelines. Eg. DataSet1 ->Process1 (which is pipeline of SubProcess1 -> SubProcess2) -> Process2 ->DataSet2.Can Apache Atlas support these use cases for metadata management andlineage?If yes, please suggest how. If not, is there any plan to support these infuture?I found one Jira improvement ticket ATLAS-1236 which looks to be relevantto the above use cases. Is there any plan to resolve it in upcomingreleases?Thanks & Regards,Rajat Goel