You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/09 08:55:40 UTC

[GitHub] [airflow] JPonte opened a new pull request #14146: Restore base lineage backend

JPonte opened a new pull request #14146:
URL: https://github.com/apache/airflow/pull/14146


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   closes: #14106 
   
   This adds back the base lineage backend which can be extended to send lineage metadata to any custom backend.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580402866



##########
File path: docs/apache-airflow/lineage.rst
##########
@@ -95,3 +95,24 @@ has outlets defined (e.g. by using ``add_outlets(..)`` or has out of the box sup
     f_in > run_this | (run_this_last > outlets)
 
 .. _precedence: https://docs.python.org/3/reference/expressions.html
+
+
+Lineage Backend
+---------------
+
+It's possible to push the lineage metrics to a custom backend by providing an instance of a LinageBackend in the config:
+
+.. code-block:: ini
+
+  [lineage]
+  backend = my.lineage.CustomBackend
+
+The backend should inherit from ``airflow.lineage.LineageBackend``.
+
+.. code-block:: python
+
+  from airflow.lineage.backend import LineageBackend
+
+  class ExampleBackend(LineageBackend):
+    def send_lineage(self, operator=None, inlets=None, outlets=None, context=None):

Review comment:
       ```suggestion
       def send_lineage(self, operator, inlets=None, outlets=None, context=None):
   ```
   Operator will never be `None` I think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r572774595



##########
File path: airflow/lineage/__init__.py
##########
@@ -45,6 +48,20 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def _get_backend() -> Optional[LineageBackend]:
+    _backend: Optional[LineageBackend] = None
+
+    try:
+        _backend_str = conf.get("lineage", "backend")
+        _backend = import_string(_backend_str)  # pylint:disable=protected-access
+    except ImportError as err:
+        log.debug("Cannot import %s due to %s", _backend_str, err)  # pylint:disable=protected-access
+    except AirflowConfigException:
+        log.debug("Could not find lineage backend key in config")

Review comment:
       Should there be a fallback to some default backend?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580413636



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+# pylint: disable=unused-import
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',

Review comment:
       Ah! It does create a cyclic import and that's why I put the whole name instead of importing, didn't know the `TYPE_CHECKING` trick. I'll change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580242949



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',
+        inlets: Optional[list] = None,
+        outlets: Optional[list] = None,

Review comment:
       We don't, the user provides a list of attr annotated objects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580447919



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import TYPE_CHECKING, Optional
+
+if TYPE_CHECKING:
+    from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'BaseOperator',

Review comment:
       Oh, yes you are right 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r572776319



##########
File path: airflow/lineage/__init__.py
##########
@@ -45,6 +48,20 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def _get_backend() -> Optional[LineageBackend]:
+    _backend: Optional[LineageBackend] = None
+
+    try:
+        _backend_str = conf.get("lineage", "backend")
+        _backend = import_string(_backend_str)  # pylint:disable=protected-access
+    except ImportError as err:
+        log.debug("Cannot import %s due to %s", _backend_str, err)  # pylint:disable=protected-access
+    except AirflowConfigException:
+        log.debug("Could not find lineage backend key in config")

Review comment:
       Lineage is an optional feature, and this is just restoring the deleted code, so I don't think we need a default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r576131438



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,32 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(self, operator=None, inlets=None, outlets=None, context=None):
+        """
+        Sends lineage metadata to a backend

Review comment:
       ```suggestion
           Sends lineage metadata to a backend
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#issuecomment-775822013


   [The Workflow run](https://github.com/apache/airflow/actions/runs/550952224) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580402182



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',
+        inlets: Optional[list] = None,
+        outlets: Optional[list] = None,

Review comment:
       Ok, got it 👌 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r576132022



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,32 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(self, operator=None, inlets=None, outlets=None, context=None):

Review comment:
       Can you please add type annotations so users understand more? Also can you please add `:type operator:` and others in the docstring?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#issuecomment-783351143


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580225253



##########
File path: docs/apache-airflow/lineage.rst
##########
@@ -95,3 +95,24 @@ has outlets defined (e.g. by using ``add_outlets(..)`` or has out of the box sup
     f_in > run_this | (run_this_last > outlets)
 
 .. _precedence: https://docs.python.org/3/reference/expressions.html
+
+
+LineageBackend
+--------------

Review comment:
       ```suggestion
   Lineage Backend
   ---------------
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#issuecomment-783487213


   [The Workflow run](https://github.com/apache/airflow/actions/runs/589728630) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580404820



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+# pylint: disable=unused-import
+import airflow

Review comment:
       ```suggestion
   
   ```
   I don't think we need it here, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580404397



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+# pylint: disable=unused-import
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',

Review comment:
       ```suggestion
           operator: BaseOperator,
   ```
   
   Probably we can import this at the top level. If it creates a cyclic import we can consider:
   ```
   if TYPE_CHECKING:
       from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580426488



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import TYPE_CHECKING, Optional
+
+if TYPE_CHECKING:
+    from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'BaseOperator',

Review comment:
       I don't think we can, we still need a forward reference since `BaseOperator` imports the lineage methods to prepare and apply lineage.
   I get this error when I run the tests if I remove the quotes:
   ```ERROR tests/lineage/test_lineage.py - NameError: name 'BaseOperator' is not defined```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] hsheth2 commented on pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
hsheth2 commented on pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#issuecomment-812763729


   Hi @JPonte / @turbaszek - do you have a sense of when this might be merged in?
   
   Seems like the build failures are unrelated - "GitHub Actions has encountered an internal error when running your job."
   
   I'd be happy to help get this to the finish line if that's what's necessary. I'm looking forwards to using it 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#issuecomment-783399102


   [The Workflow run](https://github.com/apache/airflow/actions/runs/589383527) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r576131280



##########
File path: airflow/lineage/__init__.py
##########
@@ -45,6 +48,20 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def _get_backend() -> Optional[LineageBackend]:
+    _backend: Optional[LineageBackend] = None
+
+    try:
+        _backend_str = conf.get("lineage", "backend")
+        _backend = import_string(_backend_str)  # pylint: disable=protected-access

Review comment:
       Should we add check that the backend is a sub class of `LineageBackend` as we do in `XComBackend`? Currently there's no common methods but in future this may help us with catching errors. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580225832



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',
+        inlets: Optional[list] = None,
+        outlets: Optional[list] = None,

Review comment:
       Do we know the type of the list's objects?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580415258



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+# pylint: disable=unused-import
+import airflow

Review comment:
       For the same reason as above, it was used in the forward reference of the BaseOperator. I'll change it to use the `TYPE_CHECKING`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r572776463



##########
File path: airflow/lineage/__init__.py
##########
@@ -45,6 +48,20 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def _get_backend() -> Optional[LineageBackend]:
+    _backend: Optional[LineageBackend] = None
+
+    try:
+        _backend_str = conf.get("lineage", "backend")
+        _backend = import_string(_backend_str)  # pylint:disable=protected-access
+    except ImportError as err:
+        log.debug("Cannot import %s due to %s", _backend_str, err)  # pylint:disable=protected-access
+    except AirflowConfigException:
+        log.debug("Could not find lineage backend key in config")

Review comment:
       By default there is no backend, lineage stays only in the XComs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580225481



##########
File path: docs/apache-airflow/lineage.rst
##########
@@ -95,3 +95,24 @@ has outlets defined (e.g. by using ``add_outlets(..)`` or has out of the box sup
     f_in > run_this | (run_this_last > outlets)
 
 .. _precedence: https://docs.python.org/3/reference/expressions.html
+
+
+LineageBackend
+--------------
+
+It's possible to push the lineage metrics to a custom backend by providing an instance of a LinageBackend in the config:
+
+.. code-block:: ini
+
+  [lineage]
+  backend = my.lineage.CustomBackend
+
+The backend should inherit ``airflow.lineage.LineageBackend``.

Review comment:
       ```suggestion
   The backend should inherit from ``airflow.lineage.LineageBackend``.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580424306



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import TYPE_CHECKING, Optional
+
+if TYPE_CHECKING:
+    from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'BaseOperator',

Review comment:
       ```suggestion
           operator: BaseOperator,
   ```
   We can remove the quotes 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r580404397



##########
File path: airflow/lineage/backend.py
##########
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import Optional
+
+# pylint: disable=unused-import
+import airflow
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'airflow.models.baseoperator.BaseOperator',

Review comment:
       ```suggestion
           operator: 'airflow.models.baseoperator.BaseOperator',
   ```
   
   Probably we can import this at the top level. If it creates a cyclic import we can consider:
   ```
   if TYPE_CHECKING:
       from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #14146:
URL: https://github.com/apache/airflow/pull/14146


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #14146: Restore base lineage backend

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #14146:
URL: https://github.com/apache/airflow/pull/14146#discussion_r576176627



##########
File path: airflow/lineage/__init__.py
##########
@@ -45,6 +48,20 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def _get_backend() -> Optional[LineageBackend]:
+    _backend: Optional[LineageBackend] = None
+
+    try:
+        _backend_str = conf.get("lineage", "backend")
+        _backend = import_string(_backend_str)  # pylint: disable=protected-access

Review comment:
       Sounds reasonable, I'll add it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org