You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/31 20:16:01 UTC

[GitHub] [airflow] bolkedebruin opened a new pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

bolkedebruin opened a new pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314
 
 
   This adds shorthand notation to define dags that have lineage
   support. | for piping between operators and > and < for setting
   (static) lineage defintions.
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   @potiuk 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#issuecomment-581018489
 
 
   Nice!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] bolkedebruin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
bolkedebruin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#discussion_r373783904
 
 

 ##########
 File path: airflow/providers/papermill/operators/papermill.py
 ##########
 @@ -45,19 +45,26 @@ class PapermillOperator(BaseOperator):
     :param parameters: the notebook parameters to set
     :type parameters: dict
     """
+    supports_lineage = True
+
     @apply_defaults
     def __init__(self,
-                 input_nb: str,
-                 output_nb: str,
-                 parameters: Dict,
+                 input_nb: Optional[str] = None,
+                 output_nb: Optional[str] = None,
+                 parameters: Optional[Dict] = None,
                  *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
 
-        self.inlets.append(NoteBook(url=input_nb,
-                                    parameters=parameters))
-        self.outlets.append(NoteBook(url=output_nb))
+        if input_nb:
+            self.inlets.append(NoteBook(url=input_nb,
+                                        parameters=parameters))
+        if output_nb:
+            self.outlets.append(NoteBook(url=output_nb))
 
     def execute(self, context):
+        if not self.inlets or not self.outlets:
+            raise ValueError("Input notebook or output notebook is not specified")
 
 Review comment:
   I agree but can you propose how to do that if with short hand notation inlets/outlets are set after initialization has taken place of the operator?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] bolkedebruin merged pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
bolkedebruin merged pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#discussion_r373784742
 
 

 ##########
 File path: airflow/providers/papermill/operators/papermill.py
 ##########
 @@ -45,19 +45,26 @@ class PapermillOperator(BaseOperator):
     :param parameters: the notebook parameters to set
     :type parameters: dict
     """
+    supports_lineage = True
+
     @apply_defaults
     def __init__(self,
-                 input_nb: str,
-                 output_nb: str,
-                 parameters: Dict,
+                 input_nb: Optional[str] = None,
+                 output_nb: Optional[str] = None,
+                 parameters: Optional[Dict] = None,
                  *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
 
-        self.inlets.append(NoteBook(url=input_nb,
-                                    parameters=parameters))
-        self.outlets.append(NoteBook(url=output_nb))
+        if input_nb:
+            self.inlets.append(NoteBook(url=input_nb,
+                                        parameters=parameters))
+        if output_nb:
+            self.outlets.append(NoteBook(url=output_nb))
 
     def execute(self, context):
+        if not self.inlets or not self.outlets:
+            raise ValueError("Input notebook or output notebook is not specified")
 
 Review comment:
   I think it's ok to have it at execute time - no matter if we use builder pattern or the shorthand operators, it cannot be done init().

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] nuclearpinguin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
nuclearpinguin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#discussion_r373776933
 
 

 ##########
 File path: airflow/providers/papermill/operators/papermill.py
 ##########
 @@ -45,19 +45,26 @@ class PapermillOperator(BaseOperator):
     :param parameters: the notebook parameters to set
     :type parameters: dict
     """
+    supports_lineage = True
+
     @apply_defaults
     def __init__(self,
-                 input_nb: str,
-                 output_nb: str,
-                 parameters: Dict,
+                 input_nb: Optional[str] = None,
+                 output_nb: Optional[str] = None,
+                 parameters: Optional[Dict] = None,
                  *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
 
-        self.inlets.append(NoteBook(url=input_nb,
-                                    parameters=parameters))
-        self.outlets.append(NoteBook(url=output_nb))
+        if input_nb:
+            self.inlets.append(NoteBook(url=input_nb,
+                                        parameters=parameters))
+        if output_nb:
+            self.outlets.append(NoteBook(url=output_nb))
 
     def execute(self, context):
+        if not self.inlets or not self.outlets:
+            raise ValueError("Input notebook or output notebook is not specified")
 
 Review comment:
   I don't like it. We should raise this error in `__init__` to inform users as soon as possible that their DAG will fail if they do not specify inlets and outlets. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] nuclearpinguin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
nuclearpinguin commented on a change in pull request #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#discussion_r373776933
 
 

 ##########
 File path: airflow/providers/papermill/operators/papermill.py
 ##########
 @@ -45,19 +45,26 @@ class PapermillOperator(BaseOperator):
     :param parameters: the notebook parameters to set
     :type parameters: dict
     """
+    supports_lineage = True
+
     @apply_defaults
     def __init__(self,
-                 input_nb: str,
-                 output_nb: str,
-                 parameters: Dict,
+                 input_nb: Optional[str] = None,
+                 output_nb: Optional[str] = None,
+                 parameters: Optional[Dict] = None,
                  *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
 
-        self.inlets.append(NoteBook(url=input_nb,
-                                    parameters=parameters))
-        self.outlets.append(NoteBook(url=output_nb))
+        if input_nb:
+            self.inlets.append(NoteBook(url=input_nb,
+                                        parameters=parameters))
+        if output_nb:
+            self.outlets.append(NoteBook(url=output_nb))
 
     def execute(self, context):
+        if not self.inlets or not self.outlets:
+            raise ValueError("Input notebook or output notebook is not specified")
 
 Review comment:
   I don't like it. We should raise this error in __init__ to inform users as soon as possible that their DAG will fail if they do not specify inlets and outlets. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#issuecomment-581007116
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=h1) Report
   > Merging [#7314](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/a2d6a2f85e07c38be479e91e4a27981f308f4711?src=pr&el=desc) will **increase** coverage by `0.24%`.
   > The diff coverage is `97.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7314/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7314      +/-   ##
   ==========================================
   + Coverage   85.34%   85.58%   +0.24%     
   ==========================================
     Files         863      863              
     Lines       40484    40520      +36     
   ==========================================
   + Hits        34552    34681     +129     
   + Misses       5932     5839      -93
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/utils/decorators.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5) | `90.47% <ø> (ø)` | :arrow_up: |
   | [airflow/models/baseoperator.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZW9wZXJhdG9yLnB5) | `96.51% <100%> (+0.23%)` | :arrow_up: |
   | [airflow/providers/papermill/operators/papermill.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcGFwZXJtaWxsL29wZXJhdG9ycy9wYXBlcm1pbGwucHk=) | `96% <87.5%> (-4%)` | :arrow_down: |
   | [airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=) | `89.19% <0%> (+0.43%)` | :arrow_up: |
   | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `90.9% <0%> (+2.47%)` | :arrow_up: |
   | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `88.12% <0%> (+2.49%)` | :arrow_up: |
   | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `96.66% <0%> (+3.33%)` | :arrow_up: |
   | [airflow/providers/postgres/hooks/postgres.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvaG9va3MvcG9zdGdyZXMucHk=) | `94.36% <0%> (+16.9%)` | :arrow_up: |
   | [...roviders/google/cloud/operators/postgres\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9wb3N0Z3Jlc190b19nY3MucHk=) | `85.29% <0%> (+32.35%)` | :arrow_up: |
   | [airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5) | `100% <0%> (+100%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=footer). Last update [a2d6a2f...bad63d4](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7314: [AIRFLOW-6698] Add shorthand notation for lineage
URL: https://github.com/apache/airflow/pull/7314#issuecomment-581007116
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=h1) Report
   > Merging [#7314](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/a2d6a2f85e07c38be479e91e4a27981f308f4711?src=pr&el=desc) will **increase** coverage by `0.48%`.
   > The diff coverage is `97.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7314/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7314      +/-   ##
   ==========================================
   + Coverage   85.34%   85.82%   +0.48%     
   ==========================================
     Files         863      863              
     Lines       40484    40520      +36     
   ==========================================
   + Hits        34552    34778     +226     
   + Misses       5932     5742     -190
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/utils/decorators.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5) | `90.47% <ø> (ø)` | :arrow_up: |
   | [airflow/models/baseoperator.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZW9wZXJhdG9yLnB5) | `96.51% <100%> (+0.23%)` | :arrow_up: |
   | [airflow/providers/papermill/operators/papermill.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcGFwZXJtaWxsL29wZXJhdG9ycy9wYXBlcm1pbGwucHk=) | `96% <87.5%> (-4%)` | :arrow_down: |
   | [airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=) | `89.34% <0%> (+0.58%)` | :arrow_up: |
   | [airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `77.4% <0%> (+0.96%)` | :arrow_up: |
   | [airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==) | `91.59% <0%> (+1.15%)` | :arrow_up: |
   | [airflow/providers/apache/hive/hooks/hive.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvaG9va3MvaGl2ZS5weQ==) | `77.55% <0%> (+1.53%)` | :arrow_up: |
   | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `88.12% <0%> (+2.49%)` | :arrow_up: |
   | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `91.73% <0%> (+3.3%)` | :arrow_up: |
   | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `96.66% <0%> (+3.33%)` | :arrow_up: |
   | ... and [7 more](https://codecov.io/gh/apache/airflow/pull/7314/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=footer). Last update [a2d6a2f...bad63d4](https://codecov.io/gh/apache/airflow/pull/7314?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services