You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/19 12:15:06 UTC

[GitHub] [airflow] turbaszek opened a new pull request #10403: Fix duration object in Dataproc cluster generator

turbaszek opened a new pull request #10403:
URL: https://github.com/apache/airflow/pull/10403


   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#issuecomment-686420969


   Probably we should also do same refactor for update cluster but in next PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#issuecomment-676261191


   I would love to see this in #10014 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480063016



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +240,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": self.init_action_timeout}

Review comment:
       It seems value under `seconds` key will be of `str` type in this case, and an `int` type in the other - is that intended/valid?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#issuecomment-686329155


   @mik-laj @olchas can you please take another look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483084838



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -506,9 +488,12 @@ def __init__(  # pylint: disable=too-many-arguments
                 del kwargs['params']
 
             # Create cluster object from kwargs
-            kwargs['region'] = region

Review comment:
       Is `region` unnecessary for `ClusterGenerator` anymore?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483458975



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -506,9 +488,12 @@ def __init__(  # pylint: disable=too-many-arguments
                 del kwargs['params']
 
             # Create cluster object from kwargs
-            kwargs['region'] = region

Review comment:
       Yes, I will remove it from ClusterGenerator constructor




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483082378



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -467,20 +454,15 @@ class DataprocCreateClusterOperator(BaseOperator):
     :type impersonation_chain: Union[str, Sequence[str]]
     """
 
-    template_fields = (

Review comment:
       Is there a particular reason to move `template_fields` after `__init__` definition?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482855527



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -155,10 +155,11 @@ class ClusterGenerator:
     """
 
     # pylint: disable=too-many-arguments,too-many-locals
+    @DataprocHook.fallback_to_default_project_id

Review comment:
       Is it works for operators?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483074811



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       @turbaszek it seems you have reverted to the previous version here in one of latter commits :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483082378



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -467,20 +454,15 @@ class DataprocCreateClusterOperator(BaseOperator):
     :type impersonation_chain: Union[str, Sequence[str]]
     """
 
-    template_fields = (

Review comment:
       Is there a particular reason to remove `template_fields` for this operator entirely?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480103439



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       You cannot convert `init_action_timeout` to `int`, as `init_action_timeout` has a character `s` at the end. You have to convert `match.group(1)` instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482900283



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -467,20 +454,15 @@ class DataprocCreateClusterOperator(BaseOperator):
     :type impersonation_chain: Union[str, Sequence[str]]
     """
 
-    template_fields = (
-        'project_id',
-        'region',
-        'cluster',
-        'impersonation_chain',
-    )
-
     @apply_defaults
     def __init__(  # pylint: disable=too-many-arguments
         self,
         *,
+        cluster_name: str,

Review comment:
       This backward compatible with 1.10.X:
   https://github.com/apache/airflow/blob/827a717fecc675b4d58fa202be003dee8e423632/airflow/contrib/operators/dataproc_operator.py#L192-L193




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480088727



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,
+        cluster_name: str,

Review comment:
       I added fallback for default project id. The cluster name has required any way




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480102063



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       Not sure if I understand 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480066043



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,
+        cluster_name: str,

Review comment:
       It depends if this parameter was actually optional before. I think cluster_name was a required parameter, but we had the wrong method signature.  In that case, this change doesn't change anything, so it's not a breaking change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480096982



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       ```suggestion
                   return {"seconds": int(match.group(1))}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482864338



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -155,10 +155,11 @@ class ClusterGenerator:
     """
 
     # pylint: disable=too-many-arguments,too-many-locals
+    @DataprocHook.fallback_to_default_project_id

Review comment:
       No, it doesn't, so do we want to add such logic? This will require to initialize the hook somewhere so we can access the project_id. I personally don't see a lot of value in it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482900596



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -506,9 +488,12 @@ def __init__(  # pylint: disable=too-many-arguments
                 del kwargs['params']
 
             # Create cluster object from kwargs
-            kwargs['region'] = region
-            kwargs['project_id'] = project_id
-            cluster = ClusterGenerator(**kwargs).make()
+            if project_id is None:
+                raise AirflowException(
+                    "project_id argument is required when building cluster from keywords parameters"

Review comment:
       That should not be problem when migrating from 1.10.X becaue it should be in kwargs
   https://github.com/apache/airflow/blob/827a717fecc675b4d58fa202be003dee8e423632/airflow/contrib/operators/dataproc_operator.py#L192-L193




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483465323



##########
File path: tests/providers/google/cloud/operators/test_dataproc_system.py
##########
@@ -57,12 +52,12 @@ class DataprocExampleDagsTest(GoogleSystemTest):
     def setUp(self):
         super().setUp()
         self.create_gcs_bucket(BUCKET)
-        self.upload_content_to_gcs(lines=pyspark_file, bucket=PYSPARK_URI, filename=PYSPARK_MAIN)
-        self.upload_content_to_gcs(lines=sparkr_file, bucket=SPARKR_URI, filename=SPARKR_MAIN)
+        self.upload_content_to_gcs(lines=pyspark_file, bucket=GCS_URI, filename=PYSPARK_MAIN)
+        self.upload_content_to_gcs(lines=sparkr_file, bucket=GCS_URI, filename=SPARKR_MAIN)
 
     @provide_gcp_context(GCP_DATAPROC_KEY)
     def tearDown(self):
-        self.delete_gcs_bucket(BUCKET)
+        # self.delete_gcs_bucket(BUCKET)

Review comment:
       ```suggestion
           self.delete_gcs_bucket(BUCKET)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480065195



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,

Review comment:
       Should we add fallback for project ID from service account?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480096982



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       ```suggestion
                   return {"seconds": int(match.group(1)}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482900596



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -506,9 +488,12 @@ def __init__(  # pylint: disable=too-many-arguments
                 del kwargs['params']
 
             # Create cluster object from kwargs
-            kwargs['region'] = region
-            kwargs['project_id'] = project_id
-            cluster = ClusterGenerator(**kwargs).make()
+            if project_id is None:
+                raise AirflowException(
+                    "project_id argument is required when building cluster from keywords parameters"

Review comment:
       That should not be problem when migrating from 1.10.X becaue:
   https://github.com/apache/airflow/blob/827a717fecc675b4d58fa202be003dee8e423632/airflow/contrib/operators/dataproc_operator.py#L192-L193




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #10403:
URL: https://github.com/apache/airflow/pull/10403


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480098192



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       Actually, converting to float first seems to be unnecessary, as a float value will not match a regex anyway.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480088454



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +240,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": self.init_action_timeout}

Review comment:
       Fixed

##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,

Review comment:
       Added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r482865270



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -155,10 +155,11 @@ class ClusterGenerator:
     """
 
     # pylint: disable=too-many-arguments,too-many-locals
+    @DataprocHook.fallback_to_default_project_id

Review comment:
       Or probably we should refactor the operator, so there's no need to provide the project in two places: as argument and in cluster definition




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480096982



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       ```suggestion
                   return {"seconds": int(float(match.group(1))}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480103439



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       You cannot convert `init_action_timeout` to `int`, as `init_action_timeout` has a character `s` at the end. You have to convert `match.group(1)` instead. I edited the original suggestion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r472985198



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -289,26 +289,24 @@ def _build_gce_cluster_config(self, cluster_data):
     def _build_lifecycle_config(self, cluster_data):
         if self.idle_delete_ttl:
             cluster_data['config']['lifecycle_config']['idle_delete_ttl'] = \
-                "{}s".format(self.idle_delete_ttl)
+                {"seconds": self.idle_delete_ttl}
 
         if self.auto_delete_time:
             utc_auto_delete_time = timezone.convert_to_utc(self.auto_delete_time)
             cluster_data['config']['lifecycle_config']['auto_delete_time'] = \
                 utc_auto_delete_time.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
         elif self.auto_delete_ttl:
             cluster_data['config']['lifecycle_config']['auto_delete_ttl'] = \
-                "{}s".format(self.auto_delete_ttl)
+                {"seconds": self.auto_delete_ttl}
 
         return cluster_data
 
     def _build_cluster_data(self):
         if self.zone:
             master_type_uri = \
-                "https://www.googleapis.com/compute/v1/projects/{}/zones/{}/machineTypes/{}".format(
-                    self.project_id, self.zone, self.master_machine_type)
+                f"projects/{self.project_id}/zones/{self.zone}/machineTypes/{self.master_machine_type}"
             worker_type_uri = \
-                "https://www.googleapis.com/compute/v1/projects/{}/zones/{}/machineTypes/{}".format(
-                    self.project_id, self.zone, self.worker_machine_type)
+                f"projects/{self.project_id}/zones/{self.zone}/machineTypes/{self.worker_machine_type}"

Review comment:
       Not related but simplifies things. Reference:
   >Optional. The Compute Engine machine type used for cluster instances.
   >A full URL, partial URI, or short name are valid. Examples:
   > - https://www.googleapis.com/compute/v1/projects/[projectId]/zones/us-east1-a/machineTypes/n1-standard-2
   > - projects/[projectId]/zones/us-east1-a/machineTypes/n1-standard-2
   > - n1-standard-2




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483459171



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -467,20 +454,15 @@ class DataprocCreateClusterOperator(BaseOperator):
     :type impersonation_chain: Union[str, Sequence[str]]
     """
 
-    template_fields = (

Review comment:
       No particular reason, I will move it to align it with other operators




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
olchas commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480063537



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,
+        cluster_name: str,

Review comment:
       Should a note about these arguments no longer being optional be added somewhere (UPDATING.md perhaps)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Refactor DataprocCreateCluster operator to use simpler interface

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r483468768



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -240,10 +241,10 @@ def _get_init_action_timeout(self):
         match = re.match(r"^(\d+)([sm])$", self.init_action_timeout)
         if match:
             if match.group(2) == "s":
-                return self.init_action_timeout
+                return {"seconds": int(self.init_action_timeout)}

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10403: Fix duration object in Dataproc cluster generator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10403:
URL: https://github.com/apache/airflow/pull/10403#discussion_r480082506



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -157,8 +157,8 @@ class ClusterGenerator:
     # pylint: disable=too-many-arguments,too-many-locals
     def __init__(
         self,
-        project_id: Optional[str] = None,
-        cluster_name: Optional[str] = None,
+        project_id: str,
+        cluster_name: str,

Review comment:
       Yes, this change in fact fixes the cluster generator




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org