You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/26 20:29:58 UTC

[GitHub] [airflow] kumbarsg opened a new issue, #25996: Gluejob creation using gluejoboperator

kumbarsg opened a new issue, #25996:
URL: https://github.com/apache/airflow/issues/25996

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   I tried with all the versions of airflow providers
   
   ### Apache Airflow version
   
   2.2.2
   
   ### Operating System
   
   on AWS
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   I am using Airflow Amazon provider packages on my MWAA with airflow version 2.2.2 and Aws providers version 3.2.0(also tried with further versions)
   I am trying to create a glue job with GlueJobOperator with this configuration 'NumberOfWorkers': 2, 'WorkerType': 'G.1X'. Here's my code for job creation.
   ```
   job_name = "glue_job"
       submit_glue_job = AwsGlueJobOperator(
           task_id="glue_job",
           job_name=job_name,
           wait_for_completion=True,
           # num_of_dpus=10,
           retry_limit=0,
           script_location=f"s3://bucket/etl.py",
           s3_bucket=GLUE_EXAMPLE_S3_BUCKET,
           iam_role_name=GLUE_CRAWLER_ROLE.split("/")[-1],
           create_job_kwargs={
               'GlueVersion': '3.0', 'NumberOfWorkers': 2, 'WorkerType': 'G.1X',
               "DefaultArguments": {"--enable-glue-datacatalog": ''}
           }
   ```
   and here's the error:
   
   ```
   when calling the CreateJob operation: 
   Please do not set Allocated Capacity if using Worker Type and Number of Workers
   ```
   I checked the official documentation to see if the Allocated capacity is assigned to any default value, but it's not.
   
   - Also, it is not updating an existing job's parameters
   
   ### What you think should happen instead
   
   It should create a glue job with provided parameters but an error is showing up saying I cannot assign values to number of workers and worker type along with Allocated capacity, which I am not assigning in my code.
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kumbarsg commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
kumbarsg commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1239534740

   Hi,
   In addition to this issue, the glue jobs/ crawlers are not updating when I rerun them with different configurations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] phanikumv commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
phanikumv commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1240712467

   @kumbarsg I tried with below DAG and the task ran fine.
   
   ```
   with DAG(
           dag_id="example_aws_glue_dag",
           schedule_interval=None,
           start_date=datetime(2022, 1, 1),
           catchup=False,
   ) as dag:
       glue_task = GlueJobOperator(
           task_id="glue_task",
           job_name = 'glue_job_from_airflow',
           job_desc = 'AWS Glue Job with Airflow',
           script_location =  "s3://aws-glue-assets-*********-us-east-2/scripts/providers-job-2.py",
           create_job_kwargs={
               'GlueVersion': '2.0', 'NumberOfWorkers': 2, 'WorkerType': 'G.1X',
           },
           s3_bucket="aws-glue-assets-********-us-east-2",
           iam_role_name="glue_role"
       )
   ```
   ```
   22-09-08, 13:04:29 UTC] {glue.py:142} INFO - Initializing AWS Glue Job: glue_job_from_airflow. Wait for completion: True
   [2022-09-08, 13:04:29 UTC] {base.py:69} INFO - Using connection ID 'aws_default' for task execution.
   [2022-09-08, 13:04:29 UTC] {connection_wrapper.py:292} INFO - AWS Connection (conn_id='aws_default', conn_type='aws') credentials retrieved from login and password.
   [2022-09-08, 13:04:30 UTC] {glue.py:246} INFO - Job doesn't exist. Now creating and running AWS Glue Job
   [2022-09-08, 13:04:31 UTC] {glue.py:107} INFO - Iam Role Name: glue_role
   [2022-09-08, 13:04:32 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:04:38 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:04:44 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:04:51 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:04:57 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:05:03 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:05:10 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:05:16 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:05:22 UTC] {glue.py:222} INFO - Polling for AWS Glue Job glue_job_from_airflow current run state with status RUNNING
   [2022-09-08, 13:05:29 UTC] {glue.py:211} INFO - Exiting Job jr_a3274a900afd04e35d2e8112af725fd403da0579a090de1bccec457c4995631d Run State: SUCCEEDED
   [2022-09-08, 13:05:29 UTC] {glue.py:151} INFO - AWS Glue Job: glue_job_from_airflow status: SUCCEEDED. Run Id: jr_a3274a900afd04e35d2e8112af725fd403da0579a090de1bccec457c4995631d
   [2022-09-08, 13:05:29 UTC] {taskinstance.py:1411} INFO - Marking task as SUCCESS. dag_id=example_aws_glue_dag, task_id=glue_task, execution_date=20220908T130426, start_date=20220908T130428, end_date=20220908T130529
   [2022-09-08, 13:05:29 UTC] {local_task_job.py:163} INFO - Task exited with return code 0
   [2022-09-08, 13:05:29 UTC] {local_task_job.py:272} INFO - 0 downstream tasks scheduled from follow-on schedule check
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kumbarsg commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
kumbarsg commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1241037580

   The params are : 
           task_id="glue_job",
           job_name=job_name,
           wait_for_completion=True,
           # num_of_dpus=10,
           retry_limit=0,
           script_location=f"s3://bucket/etl.py",
           s3_bucket=GLUE_EXAMPLE_S3_BUCKET,
           iam_role_name=GLUE_CRAWLER_ROLE.split("/")[-1],
           create_job_kwargs={
               'GlueVersion': '3.0', 'NumberOfWorkers': 2, 'WorkerType': 'G.1X',
               "DefaultArguments": {"--enable-glue-datacatalog": ''}
               
   Also, I am not able to use the Gluejoboperator on my MWAA. I then used Awsgluejoboperator for this purpose


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1246834320

   Thanks for pinging @Taragolis and you are totally correct. This is a bug in MWAA today which prevent users to upgrade the Amazon provider package. MWAA team is currently working on fixing that issue. Sorry for the inconvenience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] phanikumv commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
phanikumv commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1235638111

   I can pick this next. @eladkal @kaxil can you please assign this to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1246365761

   @kumbarsg I saw somewhere in the airflow slack that MWAA do not support update `apache-airflow-providers-amazon` right now and it uses fixed version, I thought it use [2.4.0](https://github.com/apache/airflow/blob/providers-amazon/2.4.0/airflow/providers/amazon/aws/operators/glue.py). 
   
   @vincbeck @ferruzzi @o-nikolas could you confirm or deny that information? (Sorry I do not actually know does you guys directly affiliated with MWAA)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kumbarsg commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
kumbarsg commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1262378025

   @vincbeck @Taragolis I see an issue, cannot import GlueJobOperator, but when I use AwsGlueJobOperator, it works fine. this is with apache-airflow-providers-amazon>=3.2.0 on my MWAA. 
   Is there a bug that doesn't allow us to import operators from the providers package but allows us to use custom operators?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #25996: Gluejob creation using gluejoboperator
URL: https://github.com/apache/airflow/issues/25996


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1263974129

   from the description it doesn't sound like MWAA bug not Airflow so I'm closing this issue.
   If I'm wrong please clarify


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1262474323

   Nice timing! Just to let everyone knows, this issue has been fixed yesterday. It might not be deployed yet (depending on the region your are using). Worst case scenario, the bug should be deployed by end of next week.
   
   @kumbarsg the bug **was** you could not upgrade the amazon provider package in MWAA. Regardless of the version you specify in `requirements.txt`, the amazon provider package used was 2.4.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kumbarsg commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
kumbarsg commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1241281680

   This is the error that I am facing while I try to use Gluejoboperator on MWAA instance: 
   
   Broken DAG: [/usr/local/airflow/dags/Glue_sample.py] Traceback (most recent call last):
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/usr/local/airflow/dags/Glue_sample.py", line 8, in <module>
       from airflow.providers.amazon.aws.operators.glue import GlueJobOperator
   ImportError: cannot import name 'GlueJobOperator' from 'airflow.providers.amazon.aws.operators.glue' (/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25996: Gluejob creation using gluejoboperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25996:
URL: https://github.com/apache/airflow/issues/25996#issuecomment-1228920353

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org