You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/02 13:36:24 UTC

[GitHub] [airflow] piffall opened a new pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

piffall opened a new pull request #14027:
URL: https://github.com/apache/airflow/pull/14027


   Adding Connections parameter to AwsGlueJobHook in order to previse which subnet must be used for Job deployment.
   
   If we have a Connection only accessible from an specific subnet (not the default one) and we do not pass this parameter, Job will be deployed to default subnet, so will not have access to DDBB, resulting in a Timeout error.
   
   This small patch fix this issue based on [Glue Client documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job).
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-787041818


   Awesome work, congrats on your first merged pull request!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-775253193


   > oh you might just need to rebase this was an issue briefly in master i think
   
   Hi @dstandish , It seems to be resolved after rebase. Thanks! ;)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#discussion_r570684518



##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       i think it might make sense to instead add `job_kwargs` because there are [other kwargs that users might want to specify](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job)
   
   a side benefit of doing so is we avoid the confusion of airflow connection vs glue connection
   
   i would also suggest that you might consider making the job kwargs configurable as extra params in the airflow connection object so that something like a subnet could be specified in the airflow connection and not only configurable through python.  this would also mean that the operator would not necessarily need the `job_kwargs` param though including it there also does not hurt




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall commented on a change in pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall commented on a change in pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#discussion_r570792472



##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       Hi @dstandish , I also think that it's a very good idea. I'll implement it like you suggest and push the changes. Thx!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-774944310


   you have a pre-commit hook failing: 
   https://github.com/apache/airflow/pull/14027/checks?check_run_id=1838235536#step:7:123


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall commented on a change in pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall commented on a change in pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#discussion_r570792472



##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       Hi @dstandish , I also think that it's a very good idea. I'll implement it like you suggest and push the changes. Thx!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-775083690


   > you have a pre-commit hook failing:
   > https://github.com/apache/airflow/pull/14027/checks?check_run_id=1838235536#step:7:123
   
   I've passed `pre-commit run --all-files` locally without errors. Is the CI pipeline merging code before run tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-772110156






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-772110156


   [The Workflow run](https://github.com/apache/airflow/actions/runs/531793994) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall edited a comment on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall edited a comment on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-775083690


   > you have a pre-commit hook failing:
   > https://github.com/apache/airflow/pull/14027/checks?check_run_id=1838235536#step:7:123
   
   Hi, @dstandish . I've passed `pre-commit run --all-files` locally without errors. Is the CI pipeline merging code before running tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-772110347


   [The Workflow run](https://github.com/apache/airflow/actions/runs/531841975) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-786859583


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-774070077


   > [The Workflow run](https://github.com/apache/airflow/actions/runs/540202235) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.
   
   Hi again @dstandish, there was a weird error on build. It seems that had failed on Documentation style, but there are not any change on the PR. Could you check it out? How would I proceed?
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-773997785


   [The Workflow run](https://github.com/apache/airflow/actions/runs/540202235) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#discussion_r570684518



##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       i think it might make sense to instead add `job_kwargs` because there are [other kwargs that users might want to specify](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job)
   
   a side benefit of doing so is we avoid the confusion of airflow connection vs glue connection
   
   i would also suggest that you might consider making the job kwargs configurable as extra params in the airflow connection object so that something like a subnet could be specified in the airflow connection and not hardcoded.  this would also mean that the operator would not necessarily need the `job_kwargs` param though including it there also does not hurt

##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       i think it might make sense to instead add `job_kwargs` because there are [other kwargs that users might want to specify](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job)
   
   a side benefit of doing so is we avoid the confusion of airflow connection vs glue connection
   
   i would also suggest that you might consider making the job kwargs configurable as extra params in the airflow connection object so that something like a subnet could be specified in the airflow connection and not only configurable through python.  this would also mean that the operator would not necessarily need the `job_kwargs` param though including it there also does not hurt




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle merged pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
feluelle merged pull request #14027:
URL: https://github.com/apache/airflow/pull/14027


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] piffall edited a comment on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
piffall edited a comment on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-775083690


   > you have a pre-commit hook failing:
   > https://github.com/apache/airflow/pull/14027/checks?check_run_id=1838235536#step:7:123
   
   Hi, @dstandish . I've passed `pre-commit run --all-files` locally without errors. Is the CI pipeline merging code before run tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#discussion_r570684518



##########
File path: airflow/providers/amazon/aws/hooks/glue.py
##########
@@ -60,9 +62,10 @@ def __init__(
         num_of_dpus: int = 10,
         region_name: Optional[str] = None,
         iam_role_name: Optional[str] = None,
+        connections: List[str] = None,

Review comment:
       i think it might make sense to instead add `job_kwargs` because there are [other kwargs that users might want to specify](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job)
   
   a side benefit of doing so is we avoid the confusion of airflow connection vs glue connection
   
   i would also suggest that you might consider making the job kwargs configurable as extra params in the airflow connection object so that something like a subnet could be specified in the airflow connection and not hardcoded.  this would also mean that the operator would not necessarily need the `job_kwargs` param though including it there also does not hurt




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #14027: Adding support to use Glue Connections for Job correct subnet selection

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #14027:
URL: https://github.com/apache/airflow/pull/14027#issuecomment-775109314


   oh you might just need to rebase this was an issue briefly in master i think


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org