You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/08 07:10:27 UTC

[GitHub] [airflow] MaksYermak opened a new pull request, #22852: Add ability for Dataproc operator to create cluster in GKE cluster

MaksYermak opened a new pull request, #22852:
URL: https://github.com/apache/airflow/pull/22852

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   Create ability for Dataproc operator to create dataproc cluster in Google Kubernetes Engine cluster. Also in this PR was updated google-cloud-container library version and GKE operators code which uses this library.
   
   Co-authored-by: Wojciech Januszek [januszek@google.com](mailto:januszek@google.com)
   Co-authored-by: Lukasz Wyszomirski [wyszomirski@google.com](mailto:wyszomirski@google.com)
   Co-authored-by: Maksim Yermakou [maksimy@google.com](mailto:maksimy@google.com)
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #22852:
URL: https://github.com/apache/airflow/pull/22852#issuecomment-1098580294

   Needs some fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] MaksYermak commented on a diff in pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
MaksYermak commented on code in PR #22852:
URL: https://github.com/apache/airflow/pull/22852#discussion_r847464478


##########
airflow/providers/google/cloud/hooks/dataproc.py:
##########
@@ -324,12 +332,20 @@ def create_cluster(
         labels = labels or {}
         labels.update({'airflow-version': 'v' + airflow_version.replace('.', '-').replace('+', '-')})
 
-        cluster = {
-            "project_id": project_id,
-            "cluster_name": cluster_name,
-            "config": cluster_config,
-            "labels": labels,
-        }
+        cluster = (
+            {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "virtual_cluster_config": virtual_cluster_config,
+            }
+            if run_in_gke_cluster
+            else {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "config": cluster_config,
+                "labels": labels,
+            }
+        )

Review Comment:
   I have removed `run_in_gke_cluster` flag from the code 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #22852:
URL: https://github.com/apache/airflow/pull/22852#discussion_r846090383


##########
airflow/providers/google/cloud/hooks/dataproc.py:
##########
@@ -324,12 +332,20 @@ def create_cluster(
         labels = labels or {}
         labels.update({'airflow-version': 'v' + airflow_version.replace('.', '-').replace('+', '-')})
 
-        cluster = {
-            "project_id": project_id,
-            "cluster_name": cluster_name,
-            "config": cluster_config,
-            "labels": labels,
-        }
+        cluster = (
+            {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "virtual_cluster_config": virtual_cluster_config,
+            }
+            if run_in_gke_cluster
+            else {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "config": cluster_config,
+                "labels": labels,
+            }
+        )

Review Comment:
   ```suggestion
           cluster = {
               "project_id": project_id,
               "cluster_name": cluster_name,
               "virtual_cluster_config": virtual_cluster_config,
           }
           if virtual_cluster_config is not None:
               cluster['virtual_cluster_config'] = virtual_cluster_config
           if cluster_config is not None:
               cluster['config'] = cluster_config
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #22852:
URL: https://github.com/apache/airflow/pull/22852#discussion_r846091260


##########
airflow/providers/google/cloud/hooks/dataproc.py:
##########
@@ -324,12 +332,20 @@ def create_cluster(
         labels = labels or {}
         labels.update({'airflow-version': 'v' + airflow_version.replace('.', '-').replace('+', '-')})
 
-        cluster = {
-            "project_id": project_id,
-            "cluster_name": cluster_name,
-            "config": cluster_config,
-            "labels": labels,
-        }
+        cluster = (
+            {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "virtual_cluster_config": virtual_cluster_config,
+            }
+            if run_in_gke_cluster
+            else {
+                "project_id": project_id,
+                "cluster_name": cluster_name,
+                "config": cluster_config,
+                "labels": labels,
+            }
+        )

Review Comment:
   Thanks to this, we give full control over the parameters to the user.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #22852:
URL: https://github.com/apache/airflow/pull/22852#discussion_r846088438


##########
airflow/providers/google/cloud/hooks/dataproc.py:
##########
@@ -292,7 +292,9 @@ def create_cluster(
         region: str,
         project_id: str,
         cluster_name: str,
-        cluster_config: Union[Dict, Cluster],
+        cluster_config: Union[Dict, Cluster, None],

Review Comment:
   ```suggestion
           cluster_config: Union[Dict, Cluster, None] = None,
   ```



##########
airflow/providers/google/cloud/hooks/dataproc.py:
##########
@@ -292,7 +292,9 @@ def create_cluster(
         region: str,
         project_id: str,
         cluster_name: str,
-        cluster_config: Union[Dict, Cluster],
+        cluster_config: Union[Dict, Cluster, None],
+        virtual_cluster_config: Optional[Dict] = None,
+        run_in_gke_cluster: Optional[bool] = False,

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] MaksYermak commented on pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
MaksYermak commented on PR #22852:
URL: https://github.com/apache/airflow/pull/22852#issuecomment-1104908681

   @potiuk could you review this PR one more time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22852:
URL: https://github.com/apache/airflow/pull/22852#issuecomment-1108991602

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #22852: Add ability for Dataproc operator to create cluster in GKE cluster

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #22852:
URL: https://github.com/apache/airflow/pull/22852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org