You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/22 14:54:12 UTC

[GitHub] [airflow] turbaszek opened a new pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

turbaszek opened a new pull request #13256:
URL: https://github.com/apache/airflow/pull/13256


   This PR changes Google Dataproc operators to be compatible with 2.0+ version of google-cloud-dataproc library. To do this I used the script provided by google (https://googleapis.dev/python/dataproc/latest/UPGRADING.html#method-calls) and adjusted some issues found when testing with the example DAG.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-749580073


   FYI @michalslowikowski00 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r553806448



##########
File path: airflow/providers/google/ADDITIONAL_INFO.md
##########
@@ -34,6 +34,7 @@ Details are covered in the UPDATING.md files for each library, but there are som
 | [``google-cloud-os-login``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-oslogin/blob/master/UPGRADING.md) |
 | [``google-cloud-pubsub``](https://pypi.org/project/google-cloud-pubsub/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-pubsub/blob/master/UPGRADING.md) |
 | [``google-cloud-kms``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.2.1,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-kms/blob/master/UPGRADING.md) |
+| [``google-cloud-dataproc``](https://pypi.org/project/google-cloud-dataproc/) | ``>=1.0.1,<2.0.0`` | ``>=2.2.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-dataproc/blob/master/UPGRADING.md) |

Review comment:
       I will introduce the ordering then. There was no a-z order so I added this just as a next item 😄 

##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -613,18 +609,18 @@ def execute(self, context) -> dict:
 
         # Check if cluster is not in ERROR state
         self._handle_error_state(hook, cluster)
-        if cluster.status.state == cluster.status.CREATING:
+        if cluster.status.state == cluster.status.State.CREATING:
             # Wait for cluster to be be created
             cluster = self._wait_for_cluster_in_creating_state(hook)
             self._handle_error_state(hook, cluster)
-        elif cluster.status.state == cluster.status.DELETING:
+        elif cluster.status.state == cluster.status.State.DELETING:
             # Wait for cluster to be deleted
             self._wait_for_cluster_in_deleting_state(hook)
             # Create new cluster
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
-        return MessageToDict(cluster)
+        return Cluster.to_json(cluster)

Review comment:
       Nice catch!
   

##########
File path: airflow/providers/google/cloud/hooks/dataproc.py
##########
@@ -26,18 +26,16 @@
 from google.api_core.exceptions import ServerError
 from google.api_core.retry import Retry
 from google.cloud.dataproc_v1beta2 import (  # pylint: disable=no-name-in-module
-    ClusterControllerClient,
-    JobControllerClient,
-    WorkflowTemplateServiceClient,
-)
-from google.cloud.dataproc_v1beta2.types import (  # pylint: disable=no-name-in-module
     Cluster,
-    Duration,
-    FieldMask,
+    ClusterControllerClient,
     Job,
+    JobControllerClient,
     JobStatus,
     WorkflowTemplate,
+    WorkflowTemplateServiceClient,
 )
+from google.protobuf.duration_pb2 import Duration
+from google.protobuf.field_mask_pb2 import FieldMask

Review comment:
       Fixed 🆗 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r553171286



##########
File path: setup.py
##########
@@ -255,7 +255,7 @@ def write_version(filename: str = os.path.join(*[my_dir, "airflow", "git_version
     'google-cloud-bigtable>=1.0.0,<2.0.0',
     'google-cloud-container>=0.1.1,<2.0.0',
     'google-cloud-datacatalog>=1.0.0,<2.0.0',
-    'google-cloud-dataproc>=1.0.1,<2.0.0',
+    'google-cloud-dataproc>=2.2.0,<3.0.0',

Review comment:
       Done in 6162cfe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554447700



##########
File path: tests/providers/google/cloud/hooks/test_dataproc.py
##########
@@ -191,59 +199,62 @@ def test_update_cluster(self, mock_client):
         )
         mock_client.assert_called_once_with(location=GCP_LOCATION)
         mock_client.return_value.update_cluster.assert_called_once_with(
-            project_id=GCP_PROJECT,
-            region=GCP_LOCATION,
-            cluster=CLUSTER,
-            cluster_name=CLUSTER_NAME,
-            update_mask=update_mask,
-            graceful_decommission_timeout=None,
+            request=dict(
+                project_id=GCP_PROJECT,
+                region=GCP_LOCATION,
+                cluster=CLUSTER,
+                cluster_name=CLUSTER_NAME,
+                update_mask=update_mask,
+                graceful_decommission_timeout=None,
+                request_id=None,
+            ),
             metadata=None,
-            request_id=None,
             retry=None,
             timeout=None,
         )
 
     @mock.patch(DATAPROC_STRING.format("DataprocHook.get_template_client"))
     def test_create_workflow_template(self, mock_client):
         template = {"test": "test"}
-        mock_client.return_value.region_path.return_value = PARENT
+        parent = f'projects/{GCP_PROJECT}/regions/{GCP_LOCATION}'

Review comment:
       How about keeping `PARENT` and `NAME` defined with other constants at the top of the file?

##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -246,9 +247,11 @@ def test_execute(self, mock_hook):
             timeout=TIMEOUT,
             metadata=METADATA,
         )
+        to_dict_mock.assert_called_once_with(mock_hook().create_cluster().result())
 
+    @mock.patch(DATAPROC_PATH.format("Cluster.to_dict"))
     @mock.patch(DATAPROC_PATH.format("DataprocHook"))
-    def test_execute_if_cluster_exists(self, mock_hook):
+    def test_execute_if_cluster_exists(self, mock_hook, _):

Review comment:
       ```suggestion
       def test_execute_if_cluster_exists(self, mock_hook, to_dict_mock):
   ```
   And `to_dict_mock.assert_called_once_with(mock_hook().get_cluster())` at the end of this test. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-756994620


   [The Workflow run](https://github.com/apache/airflow/actions/runs/472215607) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-756176918


   [The Workflow run](https://github.com/apache/airflow/actions/runs/469003776) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-758542400


   @mik-laj  @potiuk would you mind taking a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-760995452


   [The Workflow run](https://github.com/apache/airflow/actions/runs/488027172) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-756994620






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-758556147


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554861459



##########
File path: tests/providers/google/cloud/hooks/test_dataproc.py
##########
@@ -191,59 +199,62 @@ def test_update_cluster(self, mock_client):
         )
         mock_client.assert_called_once_with(location=GCP_LOCATION)
         mock_client.return_value.update_cluster.assert_called_once_with(
-            project_id=GCP_PROJECT,
-            region=GCP_LOCATION,
-            cluster=CLUSTER,
-            cluster_name=CLUSTER_NAME,
-            update_mask=update_mask,
-            graceful_decommission_timeout=None,
+            request=dict(
+                project_id=GCP_PROJECT,
+                region=GCP_LOCATION,
+                cluster=CLUSTER,
+                cluster_name=CLUSTER_NAME,
+                update_mask=update_mask,
+                graceful_decommission_timeout=None,
+                request_id=None,
+            ),
             metadata=None,
-            request_id=None,
             retry=None,
             timeout=None,
         )
 
     @mock.patch(DATAPROC_STRING.format("DataprocHook.get_template_client"))
     def test_create_workflow_template(self, mock_client):
         template = {"test": "test"}
-        mock_client.return_value.region_path.return_value = PARENT
+        parent = f'projects/{GCP_PROJECT}/regions/{GCP_LOCATION}'

Review comment:
       @olchas we only use it two times so it's not that much and the `PARENT` may change in future for other endpoints.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-760522244


   Needs rebase,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554019107



##########
File path: airflow/providers/google/cloud/hooks/dataproc.py
##########
@@ -26,18 +26,16 @@
 from google.api_core.exceptions import ServerError
 from google.api_core.retry import Retry
 from google.cloud.dataproc_v1beta2 import (  # pylint: disable=no-name-in-module
-    ClusterControllerClient,
-    JobControllerClient,
-    WorkflowTemplateServiceClient,
-)
-from google.cloud.dataproc_v1beta2.types import (  # pylint: disable=no-name-in-module
     Cluster,
-    Duration,
-    FieldMask,
+    ClusterControllerClient,
     Job,
+    JobControllerClient,
     JobStatus,
     WorkflowTemplate,
+    WorkflowTemplateServiceClient,
 )
+from google.protobuf.duration_pb2 import Duration
+from google.protobuf.field_mask_pb2 import FieldMask

Review comment:
       It seems that docstrings still point to old paths.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-758651831


   [The Workflow run](https://github.com/apache/airflow/actions/runs/479969842) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r553161263



##########
File path: setup.py
##########
@@ -255,7 +255,7 @@ def write_version(filename: str = os.path.join(*[my_dir, "airflow", "git_version
     'google-cloud-bigtable>=1.0.0,<2.0.0',
     'google-cloud-container>=0.1.1,<2.0.0',
     'google-cloud-datacatalog>=1.0.0,<2.0.0',
-    'google-cloud-dataproc>=1.0.1,<2.0.0',
+    'google-cloud-dataproc>=2.2.0,<3.0.0',

Review comment:
       Can you add info about this change to [`ADDITIONAL_INFO.md`](https://github.com/apache/airflow/blob/master/airflow/providers/google/ADDITIONAL_INFO.md)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r553806448



##########
File path: airflow/providers/google/ADDITIONAL_INFO.md
##########
@@ -34,6 +34,7 @@ Details are covered in the UPDATING.md files for each library, but there are som
 | [``google-cloud-os-login``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-oslogin/blob/master/UPGRADING.md) |
 | [``google-cloud-pubsub``](https://pypi.org/project/google-cloud-pubsub/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-pubsub/blob/master/UPGRADING.md) |
 | [``google-cloud-kms``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.2.1,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-kms/blob/master/UPGRADING.md) |
+| [``google-cloud-dataproc``](https://pypi.org/project/google-cloud-dataproc/) | ``>=1.0.1,<2.0.0`` | ``>=2.2.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-dataproc/blob/master/UPGRADING.md) |

Review comment:
       I will introduce the ordering then. There was no a-z order so I added this just as a next item 😄 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-757121865


   [The Workflow run](https://github.com/apache/airflow/actions/runs/473787633) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r553252904



##########
File path: airflow/providers/google/ADDITIONAL_INFO.md
##########
@@ -34,6 +34,7 @@ Details are covered in the UPDATING.md files for each library, but there are som
 | [``google-cloud-os-login``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-oslogin/blob/master/UPGRADING.md) |
 | [``google-cloud-pubsub``](https://pypi.org/project/google-cloud-pubsub/) | ``>=1.0.0,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-pubsub/blob/master/UPGRADING.md) |
 | [``google-cloud-kms``](https://pypi.org/project/google-cloud-os-login/) | ``>=1.2.1,<2.0.0`` | ``>=2.0.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-kms/blob/master/UPGRADING.md) |
+| [``google-cloud-dataproc``](https://pypi.org/project/google-cloud-dataproc/) | ``>=1.0.1,<2.0.0`` | ``>=2.2.0,<3.0.0``  | [`UPGRADING.md`](https://github.com/googleapis/python-dataproc/blob/master/UPGRADING.md) |

Review comment:
       Can you keep alphabetical order? In this way, it will be easier to make changes to this document, because we will avoid conflicts, and the document will also be more accessible to the reader.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554858582



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -246,9 +247,11 @@ def test_execute(self, mock_hook):
             timeout=TIMEOUT,
             metadata=METADATA,
         )
+        to_dict_mock.assert_called_once_with(mock_hook().create_cluster().result())
 
+    @mock.patch(DATAPROC_PATH.format("Cluster.to_dict"))
     @mock.patch(DATAPROC_PATH.format("DataprocHook"))
-    def test_execute_if_cluster_exists(self, mock_hook):
+    def test_execute_if_cluster_exists(self, mock_hook, _):

Review comment:
       Don't we test it in previous test?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-758189125


   [The Workflow run](https://github.com/apache/airflow/actions/runs/477358469) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554453370



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -246,9 +247,11 @@ def test_execute(self, mock_hook):
             timeout=TIMEOUT,
             metadata=METADATA,
         )
+        to_dict_mock.assert_called_once_with(mock_hook().create_cluster().result())
 
+    @mock.patch(DATAPROC_PATH.format("Cluster.to_dict"))
     @mock.patch(DATAPROC_PATH.format("DataprocHook"))
-    def test_execute_if_cluster_exists(self, mock_hook):
+    def test_execute_if_cluster_exists(self, mock_hook, _):

Review comment:
       ```suggestion
       def test_execute_if_cluster_exists(self, mock_hook, to_dict_mock):
   ```
   And `to_dict_mock.assert_called_once_with(mock_hook().get_cluster())` at the end of this test. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554045869



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -613,18 +609,18 @@ def execute(self, context) -> dict:
 
         # Check if cluster is not in ERROR state
         self._handle_error_state(hook, cluster)
-        if cluster.status.state == cluster.status.CREATING:
+        if cluster.status.state == cluster.status.State.CREATING:
             # Wait for cluster to be be created
             cluster = self._wait_for_cluster_in_creating_state(hook)
             self._handle_error_state(hook, cluster)
-        elif cluster.status.state == cluster.status.DELETING:
+        elif cluster.status.state == cluster.status.State.DELETING:
             # Wait for cluster to be deleted
             self._wait_for_cluster_in_deleting_state(hook)
             # Create new cluster
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
-        return MessageToDict(cluster)
+        return Cluster.to_json(cluster)

Review comment:
       Nice catch!
   

##########
File path: airflow/providers/google/cloud/hooks/dataproc.py
##########
@@ -26,18 +26,16 @@
 from google.api_core.exceptions import ServerError
 from google.api_core.retry import Retry
 from google.cloud.dataproc_v1beta2 import (  # pylint: disable=no-name-in-module
-    ClusterControllerClient,
-    JobControllerClient,
-    WorkflowTemplateServiceClient,
-)
-from google.cloud.dataproc_v1beta2.types import (  # pylint: disable=no-name-in-module
     Cluster,
-    Duration,
-    FieldMask,
+    ClusterControllerClient,
     Job,
+    JobControllerClient,
     JobStatus,
     WorkflowTemplate,
+    WorkflowTemplateServiceClient,
 )
+from google.protobuf.duration_pb2 import Duration
+from google.protobuf.field_mask_pb2 import FieldMask

Review comment:
       Fixed 🆗 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-755947881


   @michalslowikowski00 @olchas @TobKed please take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554447700



##########
File path: tests/providers/google/cloud/hooks/test_dataproc.py
##########
@@ -191,59 +199,62 @@ def test_update_cluster(self, mock_client):
         )
         mock_client.assert_called_once_with(location=GCP_LOCATION)
         mock_client.return_value.update_cluster.assert_called_once_with(
-            project_id=GCP_PROJECT,
-            region=GCP_LOCATION,
-            cluster=CLUSTER,
-            cluster_name=CLUSTER_NAME,
-            update_mask=update_mask,
-            graceful_decommission_timeout=None,
+            request=dict(
+                project_id=GCP_PROJECT,
+                region=GCP_LOCATION,
+                cluster=CLUSTER,
+                cluster_name=CLUSTER_NAME,
+                update_mask=update_mask,
+                graceful_decommission_timeout=None,
+                request_id=None,
+            ),
             metadata=None,
-            request_id=None,
             retry=None,
             timeout=None,
         )
 
     @mock.patch(DATAPROC_STRING.format("DataprocHook.get_template_client"))
     def test_create_workflow_template(self, mock_client):
         template = {"test": "test"}
-        mock_client.return_value.region_path.return_value = PARENT
+        parent = f'projects/{GCP_PROJECT}/regions/{GCP_LOCATION}'

Review comment:
       How about keeping `PARENT` and `NAME` defined with other constants at the top of the file?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #13256:
URL: https://github.com/apache/airflow/pull/13256


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554013925



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -613,18 +609,18 @@ def execute(self, context) -> dict:
 
         # Check if cluster is not in ERROR state
         self._handle_error_state(hook, cluster)
-        if cluster.status.state == cluster.status.CREATING:
+        if cluster.status.state == cluster.status.State.CREATING:
             # Wait for cluster to be be created
             cluster = self._wait_for_cluster_in_creating_state(hook)
             self._handle_error_state(hook, cluster)
-        elif cluster.status.state == cluster.status.DELETING:
+        elif cluster.status.state == cluster.status.State.DELETING:
             # Wait for cluster to be deleted
             self._wait_for_cluster_in_deleting_state(hook)
             # Create new cluster
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
-        return MessageToDict(cluster)
+        return Cluster.to_json(cluster)

Review comment:
       Why `to_json` and not `to_dict` method?

##########
File path: airflow/providers/google/cloud/hooks/dataproc.py
##########
@@ -26,18 +26,16 @@
 from google.api_core.exceptions import ServerError
 from google.api_core.retry import Retry
 from google.cloud.dataproc_v1beta2 import (  # pylint: disable=no-name-in-module
-    ClusterControllerClient,
-    JobControllerClient,
-    WorkflowTemplateServiceClient,
-)
-from google.cloud.dataproc_v1beta2.types import (  # pylint: disable=no-name-in-module
     Cluster,
-    Duration,
-    FieldMask,
+    ClusterControllerClient,
     Job,
+    JobControllerClient,
     JobStatus,
     WorkflowTemplate,
+    WorkflowTemplateServiceClient,
 )
+from google.protobuf.duration_pb2 import Duration
+from google.protobuf.field_mask_pb2 import FieldMask

Review comment:
       It seems that docstrings still point to old paths.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#discussion_r554013925



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -613,18 +609,18 @@ def execute(self, context) -> dict:
 
         # Check if cluster is not in ERROR state
         self._handle_error_state(hook, cluster)
-        if cluster.status.state == cluster.status.CREATING:
+        if cluster.status.state == cluster.status.State.CREATING:
             # Wait for cluster to be be created
             cluster = self._wait_for_cluster_in_creating_state(hook)
             self._handle_error_state(hook, cluster)
-        elif cluster.status.state == cluster.status.DELETING:
+        elif cluster.status.state == cluster.status.State.DELETING:
             # Wait for cluster to be deleted
             self._wait_for_cluster_in_deleting_state(hook)
             # Create new cluster
             cluster = self._create_cluster(hook)
             self._handle_error_state(hook, cluster)
 
-        return MessageToDict(cluster)
+        return Cluster.to_json(cluster)

Review comment:
       Why `to_json` and not `to_dict` method?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13256: Refactor DataprocOperators to support google-cloud-dataproc 2.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13256:
URL: https://github.com/apache/airflow/pull/13256#issuecomment-760996154


   [The Workflow run](https://github.com/apache/airflow/actions/runs/488045277) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org