You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/22 12:41:35 UTC

[GitHub] [airflow] nodyke opened a new issue #8969: Atlas Backend module improvements

nodyke opened a new issue #8969:
URL: https://github.com/apache/airflow/issues/8969


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   As part of data lineage implementation, there is requirement to sending lineage data from airflow to Apache Atlas. Current module, what represented in 1.10 stable version,  has a few problems:
      1. Create new Atlas Operator entity for each dagrun
      2. Can't control creating missing entities using configs
      3. Fail operator if sending lineage data was failed
      4. Http timeout can't be configured
      5. Current Atlas type definition has a small set of attributes
      6. Errors in class wrappers for atlas types
      
   <!-- A short description of your feature -->
   
   **Use case / motivation**
   
   As part of analytic data platform, auto importing lineage data is needed and the most part of data lineage can be send by airflow in auto mode. Our module use as base old Atlas backend module, but contains fixes and improvements. What was fixed:
       1. Create airflow operator atlas entity don't use execution date any more.
       2. Add config property for enabled/disabled creating missing inlets, outlets entities.
       3. Add config property for enabled/disabled falling operator if sending lineage was failed
       4. Add atlas timeout config property
       5. Add "template_fields" into airflow operator typedef and add additional config property for setting any additional operator attributes
      6. Fix DataSet class wrapper, add abstract types for file and jdbc source 
      7. Added utils methods for correct generating inlets and outlets objects.
       
   
   <!-- What do you want to happen?
   
   Rather than telling us how you might implement this solution, try to take a
   step back and describe what you are trying to achieve.
   
   -->
   
   **Related Issues**
   
   AIRFLOW-5912
   <!-- Is there currently another issue associated with this? -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-975316809


   Feel free to contribute to either the module or the examples.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thibaultbl edited a comment on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
thibaultbl edited a comment on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-977963342


   I am willing to spend some time on it.
   
   Nevertheless, it require some change to Operator class, mainly adding a "lineage_data" attribute to every Operator, are you open to this addition ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thibaultbl commented on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
thibaultbl commented on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-977963342


   I would be ready to spend some time on it.
   
   Nevertheless, it require some change to Operator class, mainly adding a "lineage_data" attribute to every Operator, are you open to this addition ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-632671185


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thibaultbl edited a comment on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
thibaultbl edited a comment on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-975280899


   Do you think this module will be implemented any time soon ?
   
   If not, It could at least be usefull to add few examples to use custom LineageBackend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thibaultbl commented on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
thibaultbl commented on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-975280899


   Do you think this module will be implemented any time soon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #8969: Atlas Backend module improvements

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #8969:
URL: https://github.com/apache/airflow/issues/8969#issuecomment-978172812


   It kind of depends on what that attribute would hold, how it will be populated, persisted, and used. The act of adding that attribute should be easy and without negative consequences (just add it to `BaseOperator`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org