You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/14 08:08:04 UTC

[GitHub] [airflow] mmenarguezpear opened a new issue #16418: aws Glue operator fails to upload local script to s3 due to wrong argument order

mmenarguezpear opened a new issue #16418:
URL: https://github.com/apache/airflow/issues/16418


   **Apache Airflow version**: 2.1.0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): NA
   
   **Environment**: bare metal k8s in AWS EC2
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release):
   - ```
   - cat /etc/os-release 
   PRETTY_NAME="Debian GNU/Linux 10 (buster)"
   NAME="Debian GNU/Linux"
   VERSION_ID="10"
   VERSION="10 (buster)"
   VERSION_CODENAME=buster
   ID=debian
   HOME_URL="https://www.debian.org/"
   SUPPORT_URL="https://www.debian.org/support"
   BUG_REPORT_URL="https://bugs.debian.org/"
   ```
   - **Kernel** (e.g. `uname -a`): Linux airflow-web-749866f579-ns9rk 5.4.0-1048-aws #50-Ubuntu SMP Mon May 3 21:44:17 UTC 2021 x86_64 GNU/Linux
   - **Install tools**: pip, docker
   - **Others**:
   
   **What happened**:
   Upon providing valid arguments, the following error appeared:
   ```
   
   [2021-06-12 16:31:46,277] {base_aws.py:395} INFO - Creating session using boto3 credential strategy region_name=None
   [2021-06-12 16:31:47,339] {taskinstance.py:1481} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1137, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
       result = task_copy.execute(context=context)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 106, in execute
       s3_hook.load_file(self.script_location, self.s3_bucket, self.s3_artifacts_prefix + script_name)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 62, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 91, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 499, in load_file
       if not replace and self.check_for_key(key, bucket_name):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 62, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 91, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 323, in check_for_key
       self.get_conn().head_object(Bucket=bucket_name, Key=key)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
       return self._make_api_call(operation_name, kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
       request_dict = self._convert_to_request_dict(
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
       api_params = self._emit_api_params(
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
       self.meta.events.emit(
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
       return self._emitter.emit(aliased_event_name, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
       return self._emit(event_name, kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
       response = handler(**kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/handlers.py", line 236, in validate_bucket_name
       raise ParamValidationError(report=error_msg)
   botocore.exceptions.ParamValidationError: Parameter validation failed:
   Invalid bucket name "artifacts/glue-scripts/example.py": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
   [2021-06-12 16:31:47,341] {taskinstance.py:1524} INFO - Marking task as UP_FOR_RETRY. dag_id=glue-example, task_id=example_glue_job_operator, execution_date=20210612T163143, start_date=20210612T163145, end_date=20210612T163147
   [2021-06-12 16:31:47,386] {local_task_job.py:151} INFO - Task exited with return code 1
   ```
   Upon looking at the order of arguments, seems like the 2nd and 3rd are reversed. Furthermore, the operator does not expose the replace option if desired, which is vary valuable.
   Note key and bucket name are passed by position and not reference https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L104
   and they are reversed https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/s3.py#L466
   
   **What you expected to happen**: To succeed uploading the script. To be able to replace existing script in s3
   
   **How to reproduce it**:
   Try to upload the file to any S3 buckets
   ```
    t2 = AwsGlueJobOperator(
           task_id="example_glue_job_operator",
           job_desc="Example Airflow Glue job",
           # Note the operator will upload the script if it is not an s3:// reference
           # See https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L101
           script_location="/opt/airflow/dags_lib/example.py",
           concurrent_run_limit=1,
           script_args={},
           num_of_dpus=1,  # This parameter is deprecated (from boto3). Use MaxCapacity instead on kwargs.
           aws_conn_id='aws_default',
           region_name="aws-region",
           s3_bucket="bucket-name", 
           iam_role_name="iam_role_name_here",
           create_job_kwargs={}
   )
   ```
   
   **Anything else we need to know**:
   
   **How often does this problem occur?** Every time id using local script
   
    I can take a stub at fixing it. I did notice the operator does not allow to update a glue job definition after its creation. boto3 offers an api to do so but it is not exposed in this operator https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.update_job It would be great if I could add that as well, but might fall out of scope


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #16418: aws Glue operator fails to upload local script to s3 due to wrong argument order

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #16418:
URL: https://github.com/apache/airflow/issues/16418


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org