You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "antonio-antuan (via GitHub)" <gi...@apache.org> on 2023/02/20 15:22:44 UTC

[GitHub] [airflow] antonio-antuan opened a new issue, #29640: NoBoundaryInMultipartDefect raised using S3Hook

antonio-antuan opened a new issue, #29640:
URL: https://github.com/apache/airflow/issues/29640

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==7.2.0
   
   ### Apache Airflow version
   
   2.4.3
   
   ### Operating System
   
   Arch Linux
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   the same for MWAA (aws-managed airflow)
   
   ### What happened
   
   exception is raised:
   ```
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - [2023-02-20, 14:32:02 UTC] {connectionpool.py:475} WARNING - Failed to parse headers url=[https://BUCKET.s3.us-west-2.amazonaws.com:443/object-key.json:[NoBoundaryInMultipartDefect()], unparsed data: ''
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - Traceback (most recent call last):
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO -   File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO -     assert_header_parsing(httplib_response.msg)
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO -   File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO -     raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
   [2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
   ```
   
   ### What you think should happen instead
   
   shouldn't be such an exception :)
   
   ### How to reproduce
   
   the code that downloads data is simple:
   ```
   
   def download_from_s3(key: str, bucket_name: str, local_path: str) -> str:
       boto3.set_stream_logger('boto3.resources', logging.DEBUG)
       hook = S3Hook(aws_conn_id='s3_conn')
       file_name = hook.download_file(key=key, bucket_name=bucket_name, preserve_file_name=True)
       return file_name
   
   ```
   
   ### Anything else
   
   anyway, file is downldaed and looks valid.
   
   some logs:
   ```
   [2023-02-20, 15:18:38 UTC] {connection_wrapper.py:337} INFO - AWS Connection (conn_id='s3_conn', conn_type='aws') credentials retrieved from login and password.
   2023-02-20, 15:18:38 UTC boto3.resources.factory [DEBUG] Loading s3:s3
   [2023-02-20, 15:18:38 UTC] {factory.py:66} DEBUG - Loading s3:s3
   2023-02-20, 15:18:38 UTC boto3.resources.factory [DEBUG] Loading s3:Object
   [2023-02-20, 15:18:38 UTC] {factory.py:66} DEBUG - Loading s3:Object
   2023-02-20, 15:18:38 UTC boto3.resources.action [DEBUG] Calling s3:head_object with {'Bucket': 'BUCKET', 'Key': 'object_key.json'}
   [2023-02-20, 15:18:38 UTC] {action.py:85} DEBUG - Calling s3:head_object with {'Bucket': 'BUCKET', 'Key': 'object_key.json'}
   [2023-02-20, 15:18:40 UTC] {connectionpool.py:475} WARNING - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''
   Traceback (most recent call last):
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
       assert_header_parsing(httplib_response.msg)
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
       raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
   urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
   2023-02-20, 15:18:40 UTC boto3.resources.action [DEBUG] Response: {'ResponseMetadata': {'RequestId': 'W3J4VRW3WQVV8AV7', 'HostId': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'x-amz-request-id': 'W3J4VRW3WQVV8AV7', 'date': 'Mon, 20 Feb 2023 15:18:40 GMT', 'last-modified': 'Thu, 09 Feb 2023 10:34:28 GMT', 'etag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'multipart/form-data', 'server': 'AmazonS3', 'content-length': '7004'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 2, 9, 10, 34, 28, tzinfo=tzutc()), 'ContentLength': 7004, 'ETag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'ContentType': 'multipart/form-data', 'ServerSideEncryption': 'AES256', 'Metadata': {}}
   [2023-02-20, 15:18:40 UTC] {action.py:90} DEBUG - Response: {'ResponseMetadata': {'RequestId': 'W3J4VRW3WQVV8AV7', 'HostId': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'x-amz-request-id': 'W3J4VRW3WQVV8AV7', 'date': 'Mon, 20 Feb 2023 15:18:40 GMT', 'last-modified': 'Thu, 09 Feb 2023 10:34:28 GMT', 'etag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'multipart/form-data', 'server': 'AmazonS3', 'content-length': '7004'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 2, 9, 10, 34, 28, tzinfo=tzutc()), 'ContentLength': 7004, 'ETag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'ContentType': 'multipart/form-data', 'ServerSideEncryption': 'AES256', 'Metadata': {}}
   [2023-02-20, 15:18:40 UTC] {connectionpool.py:475} WARNING - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''
   Traceback (most recent call last):
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
       assert_header_parsing(httplib_response.msg)
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
       raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
   urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
   [2023-02-20, 15:18:40 UTC] {connectionpool.py:475} WARNING - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''
   Traceback (most recent call last):
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
       assert_header_parsing(httplib_response.msg)
     File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
       raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
   urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442376775

   Looks like that. Anyway it looks like the issue is not related to airflow, so feel free to close it if you want :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437543247

   > shouldn't be such an exception :)
   
   `¯\_(ツ)_/¯` https://github.com/boto/botocore/issues/2608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1575546070

   @ferruzzi looks like they don't care about commenting on a closed issue)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1438707930

   yes, the same warnings :cry: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1575753856

   How about opening a NEW issue. I always find it a bit odd, to get comments on a closed issue - it might or might not be the same, but creatibg a new issue, with all the details to asses it, is almost always universally better for maintainers (even if requires a bit more effort from the reporter). Worst case it will be marked as duplicate, especially if the reporter will make a relevant `#` comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442380228

   I'll poke it, maybe I can make a contribution to botocore to get the warning gone, but I am inclined to agree that it is not an Airflow issue.  I'll have a look at it, likely not until next week though.  If I make a PR to botocore for it, I'll drop a note here.
   
   If you have a moment, it can't hurt to leave a comment in the issue that Taragolis linked above letting them know it's still around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442374253

   If I understand correctly: everything is working fine, but it's printing an unexpected error message in the logs?  Is that accurate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1452530133

   @antonio-antuan  I am trying to reopen [the ticket](https://github.com/boto/botocore/issues/2608) that Taragolis found on the botocore side.  It looks like it has been raised a couple times but gets dropped.  Maybe we can help them sort it out.  Would you mind keeping an eye on their ticket in case they ask for more details I don't have?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437187553

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis closed issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis closed issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
URL: https://github.com/apache/airflow/issues/29640


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442417038

   Let's close it. Just couple of comments
   
   Even it it not a problem with airflow, but I think we need to remove part with check if key exists, from this method in S3Hook. The reason is simple this check limited by default settings under non-versioned object or last version of object.
   
   https://github.com/apache/airflow/blob/e6d317608251d2725627ac2da0e60d5c5b206c1e/airflow/providers/amazon/aws/hooks/s3.py#L978-L986
   
   ---
   
   I think, but I'm not sure, `S3.Object.download_fileobj` comes from [s3transfer](https://github.com/boto/s3transfer) rather than botocore
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437551366

   Just for confirmation do you have same problem with same version of `boto3` and `botocore` if you call [S3.Client.download_file](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file) or [S3.Client.download_fileobj](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_fileobj) over this file?
   
   Could you tried to run this instead of your code?
   
   ```python
   def download_from_s3_native(key: str, bucket_name: str, local_path: str) -> str:
       hook = S3Hook(aws_conn_id='s3_conn')
       s3_client = hook.conn
       with open(local_path, "wb") as data:
           s3_client.download_fileobj(key, bucket_name, data)
   
       return local_path
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org