You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/07 11:32:59 UTC

[GitHub] [airflow] ashishpatel0720 opened a new pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

ashishpatel0720 opened a new pull request #15250:
URL: https://github.com/apache/airflow/pull/15250


   Closes: #15245
   
   This is my first PR. I tried to follow the CONTRIBUTING guide step by step. It passed all the pre-commit tests.
   
   My PR adds a new argument to  **DataprocCreateClusterOperator **  which allows user to pass **custom image family** while creating **GCP DataProc ** cluster. any advice or suggestions are welcome.
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615353159



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -144,6 +144,26 @@ def test_image_version(self):
             )
             assert "custom_image and image_version" in str(ctx.value)
 
+    def test_custom_image_family_1(self):

Review comment:
       Let's give it a more meaningful test name
   ```suggestion
       def test_custom_image_family_with_image_version(self):
   ```

##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -144,6 +144,26 @@ def test_image_version(self):
             )
             assert "custom_image and image_version" in str(ctx.value)
 
+    def test_custom_image_family_1(self):
+        with pytest.raises(ValueError) as ctx:
+            ClusterGenerator(
+                image_version="image_version",
+                custom_image_family="custom_image_family",
+                project_id=GCP_PROJECT,
+                cluster_name=CLUSTER_NAME,
+            )
+            assert "image_version and custom_image_family" in str(ctx.value)
+
+    def test_custom_image_family_2(self):

Review comment:
       ```suggestion
       def test_custom_image_family_with_custom_image(self):
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 removed a comment on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 removed a comment on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-817265702


   
   - [x] Testing completed on GCP DataProc
   
   ![image](https://user-images.githubusercontent.com/16856802/114296486-3b5d3400-9ac9-11eb-8f84-284797602a09.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296512-65165b00-9ac9-11eb-8206-47a04fbe4035.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296325-5ed3af00-9ac8-11eb-8b21-92a1a4a8c0e1.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296320-5b402800-9ac8-11eb-95ee-4ced7305a6b7.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-821988347


   > @asamasoma Thanks for the contribution! Can you also add an extra test case to verify the generated cluster config is as expected with `custom_image_family` and `single_node`?
   @xinbinhuang Thanks for the review, 
   I have a test case to verify the generated cluster config is as expected with custom_image_family and single_node.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
marcosmarxm commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615285746



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -220,6 +226,9 @@ def __init__(
         if self.custom_image and self.image_version:
             raise ValueError("The custom_image and image_version can't be both set")
 
+        if self.custom_image_family and self.custom_image:

Review comment:
       just for sanity @ashishpatel0720 can you set `custom_image_family` and `image_version` together? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-814839748


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615353159



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -144,6 +144,26 @@ def test_image_version(self):
             )
             assert "custom_image and image_version" in str(ctx.value)
 
+    def test_custom_image_family_1(self):

Review comment:
       Let's give it a more meaningful test name
   ```suggestion
       def test_custom_image_family_error_with_image_version(self):
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615425167



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -188,6 +252,44 @@ def test_build(self):
         cluster = generator.make()
         assert CONFIG == cluster
 
+    def test_build_with_custom_image_family(self):
+        generator = ClusterGenerator(
+            project_id="project_id",
+            num_workers=2,
+            zone="zone",
+            network_uri="network_uri",
+            subnetwork_uri="subnetwork_uri",
+            internal_ip_only=True,
+            tags=["tags"],
+            storage_bucket="storage_bucket",
+            init_actions_uris=["init_actions_uris"],
+            init_action_timeout="10m",
+            metadata={"metadata": "data"},
+            custom_image_family="custom_image_family",
+            custom_image_project_id="custom_image_project_id",
+            autoscaling_policy="autoscaling_policy",
+            properties={"properties": "data"},
+            optional_components=["optional_components"],
+            num_masters=2,
+            master_machine_type="master_machine_type",
+            master_disk_type="master_disk_type",
+            master_disk_size=128,
+            worker_machine_type="worker_machine_type",
+            worker_disk_type="worker_disk_type",
+            worker_disk_size=256,
+            num_preemptible_workers=4,
+            region="region",
+            service_account="service_account",
+            service_account_scopes=["service_account_scopes"],
+            idle_delete_ttl=60,
+            auto_delete_time=datetime(2019, 9, 12),
+            auto_delete_ttl=250,
+            customer_managed_key="customer_managed_key",
+        )
+        cluster = generator.make()
+        print(cluster)

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-822027972


   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615288020



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -220,6 +226,9 @@ def __init__(
         if self.custom_image and self.image_version:
             raise ValueError("The custom_image and image_version can't be both set")
 
+        if self.custom_image_family and self.custom_image:

Review comment:
       yeah, make sense.
    we should not be able to set both together.
   I will make the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang merged pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang merged pull request #15250:
URL: https://github.com/apache/airflow/pull/15250


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-821998592


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615289016



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -220,6 +226,9 @@ def __init__(
         if self.custom_image and self.image_version:
             raise ValueError("The custom_image and image_version can't be both set")
 
+        if self.custom_image_family and self.custom_image:

Review comment:
       - [x]  sanity check added for custom_image_family and image_version




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-821945698


   @asamasoma Thanks for the contribution! Can you also add an extra test case to verify the generated cluster config is as expected with `custom_image_family` and `single_node`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 edited a comment on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 edited a comment on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-817265702


   
   - [x] Testing completed on GCP DataProc
   
   ![image](https://user-images.githubusercontent.com/16856802/114296486-3b5d3400-9ac9-11eb-8f84-284797602a09.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296512-65165b00-9ac9-11eb-8206-47a04fbe4035.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296325-5ed3af00-9ac8-11eb-8b21-92a1a4a8c0e1.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296320-5b402800-9ac8-11eb-95ee-4ced7305a6b7.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615353197



##########
File path: tests/providers/google/cloud/operators/test_dataproc.py
##########
@@ -144,6 +144,26 @@ def test_image_version(self):
             )
             assert "custom_image and image_version" in str(ctx.value)
 
+    def test_custom_image_family_1(self):
+        with pytest.raises(ValueError) as ctx:
+            ClusterGenerator(
+                image_version="image_version",
+                custom_image_family="custom_image_family",
+                project_id=GCP_PROJECT,
+                cluster_name=CLUSTER_NAME,
+            )
+            assert "image_version and custom_image_family" in str(ctx.value)
+
+    def test_custom_image_family_2(self):

Review comment:
       ```suggestion
       def test_custom_image_family_error_with_custom_image(self):
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 edited a comment on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 edited a comment on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-821988347






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615288020



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -220,6 +226,9 @@ def __init__(
         if self.custom_image and self.image_version:
             raise ValueError("The custom_image and image_version can't be both set")
 
+        if self.custom_image_family and self.custom_image:

Review comment:
       yeah, make sense @marcosmarxm.
   I will make the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#discussion_r615352784



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -346,6 +358,16 @@ def _build_cluster_data(self):
             if not self.single_node:
                 cluster_data['worker_config']['image_uri'] = custom_image_url
 
+        elif self.custom_image_family:
+            project_id = self.custom_image_project_id or self.project_id
+            custom_image_url = (
+                'https://www.googleapis.com/compute/beta/projects/'
+                '{}/global/images/family/{}'.format(project_id, self.custom_image_family)

Review comment:
       Let's use f-string here
   ```suggestion
                   f'{project_id}/global/images/family/{self.custom_image_family}'
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashishpatel0720 commented on pull request #15250: [Airflow-15245] - passing custom image family name to the DataProcClusterCreateoperator

Posted by GitBox <gi...@apache.org>.
ashishpatel0720 commented on pull request #15250:
URL: https://github.com/apache/airflow/pull/15250#issuecomment-817265702


   - [x] Testing completed on GCP DataProc
   
   
   ![image](https://user-images.githubusercontent.com/16856802/114296309-3ea3f000-9ac8-11eb-9742-70505aef3e24.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296325-5ed3af00-9ac8-11eb-8b21-92a1a4a8c0e1.png)
   
   ![image](https://user-images.githubusercontent.com/16856802/114296320-5b402800-9ac8-11eb-95ee-4ced7305a6b7.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org