You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "antonio-antuan (via GitHub)" <gi...@apache.org> on 2023/02/20 15:22:44 UTC
[GitHub] [airflow] antonio-antuan opened a new issue, #29640: NoBoundaryInMultipartDefect raised using S3Hook
antonio-antuan opened a new issue, #29640:
URL: https://github.com/apache/airflow/issues/29640
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==7.2.0
### Apache Airflow version
2.4.3
### Operating System
Arch Linux
### Deployment
Docker-Compose
### Deployment details
the same for MWAA (aws-managed airflow)
### What happened
exception is raised:
```
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - [[34m2023-02-20, 14:32:02 UTC[0m] {[34mconnectionpool.py:[0m475} WARNING[0m - Failed to parse headers url=[https://BUCKET.s3.us-west-2.amazonaws.com:443/object-key.json:[NoBoundaryInMultipartDefect()], unparsed data: ''[0m
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - Traceback (most recent call last):
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - assert_header_parsing(httplib_response.msg)
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
[2023-02-20, 14:32:02 UTC] {subprocess.py:92} INFO - urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
```
### What you think should happen instead
shouldn't be such an exception :)
### How to reproduce
the code that downloads data is simple:
```
def download_from_s3(key: str, bucket_name: str, local_path: str) -> str:
boto3.set_stream_logger('boto3.resources', logging.DEBUG)
hook = S3Hook(aws_conn_id='s3_conn')
file_name = hook.download_file(key=key, bucket_name=bucket_name, preserve_file_name=True)
return file_name
```
### Anything else
anyway, file is downldaed and looks valid.
some logs:
```
[[34m2023-02-20, 15:18:38 UTC[0m] {[34mconnection_wrapper.py:[0m337} INFO[0m - AWS Connection (conn_id='s3_conn', conn_type='aws') credentials retrieved from login and password.[0m
2023-02-20, 15:18:38 UTC boto3.resources.factory [DEBUG] Loading s3:s3
[[34m2023-02-20, 15:18:38 UTC[0m] {[34mfactory.py:[0m66} DEBUG[0m - Loading s3:s3[0m
2023-02-20, 15:18:38 UTC boto3.resources.factory [DEBUG] Loading s3:Object
[[34m2023-02-20, 15:18:38 UTC[0m] {[34mfactory.py:[0m66} DEBUG[0m - Loading s3:Object[0m
2023-02-20, 15:18:38 UTC boto3.resources.action [DEBUG] Calling s3:head_object with {'Bucket': 'BUCKET', 'Key': 'object_key.json'}
[[34m2023-02-20, 15:18:38 UTC[0m] {[34maction.py:[0m85} DEBUG[0m - Calling s3:head_object with {'Bucket': 'BUCKET', 'Key': 'object_key.json'}[0m
[[34m2023-02-20, 15:18:40 UTC[0m] {[34mconnectionpool.py:[0m475} WARNING[0m - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''[0m
Traceback (most recent call last):
File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
assert_header_parsing(httplib_response.msg)
File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
2023-02-20, 15:18:40 UTC boto3.resources.action [DEBUG] Response: {'ResponseMetadata': {'RequestId': 'W3J4VRW3WQVV8AV7', 'HostId': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'x-amz-request-id': 'W3J4VRW3WQVV8AV7', 'date': 'Mon, 20 Feb 2023 15:18:40 GMT', 'last-modified': 'Thu, 09 Feb 2023 10:34:28 GMT', 'etag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'multipart/form-data', 'server': 'AmazonS3', 'content-length': '7004'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 2, 9, 10, 34, 28, tzinfo=tzutc()), 'ContentLength': 7004, 'ETag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'ContentType': 'multipart/form-data', 'ServerSideEncryption': 'AES256', 'Metadata': {}}
[[34m2023-02-20, 15:18:40 UTC[0m] {[34maction.py:[0m90} DEBUG[0m - Response: {'ResponseMetadata': {'RequestId': 'W3J4VRW3WQVV8AV7', 'HostId': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'uRLn/mC6mUAPtgAZRcPbdIlkzWNQ8/AKuPn5HuHjJK1CLNAxfES3DXQsnF7HYSia4guuylFLItY=', 'x-amz-request-id': 'W3J4VRW3WQVV8AV7', 'date': 'Mon, 20 Feb 2023 15:18:40 GMT', 'last-modified': 'Thu, 09 Feb 2023 10:34:28 GMT', 'etag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'multipart/form-data', 'server': 'AmazonS3', 'content-length': '7004'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 2, 9, 10, 34, 28, tzinfo=tzutc()), 'ContentLength': 7004, 'ETag': '"e7d2a315e24716624b1085cfa7f31ad8"', 'ContentType': 'multipart/form-data', 'ServerSideEncryption': 'AES256', 'Metadata': {}}[0m
[[34m2023-02-20, 15:18:40 UTC[0m] {[34mconnectionpool.py:[0m475} WARNING[0m - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''[0m
Traceback (most recent call last):
File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
assert_header_parsing(httplib_response.msg)
File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
[[34m2023-02-20, 15:18:40 UTC[0m] {[34mconnectionpool.py:[0m475} WARNING[0m - Failed to parse headers (url=https://BUCKET.s3.us-west-2.amazonaws.com:443/object_key.json): [NoBoundaryInMultipartDefect()], unparsed data: ''[0m
Traceback (most recent call last):
File "/home/***/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 469, in _make_request
assert_header_parsing(httplib_response.msg)
File "/home/***/.local/lib/python3.7/site-packages/urllib3/util/response.py", line 91, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [NoBoundaryInMultipartDefect()], unparsed data: ''
```
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442376775
Looks like that. Anyway it looks like the issue is not related to airflow, so feel free to close it if you want :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437543247
> shouldn't be such an exception :)
`¯\_(ツ)_/¯` https://github.com/boto/botocore/issues/2608
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1575546070
@ferruzzi looks like they don't care about commenting on a closed issue)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] antonio-antuan commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1438707930
yes, the same warnings :cry:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1575753856
How about opening a NEW issue. I always find it a bit odd, to get comments on a closed issue - it might or might not be the same, but creatibg a new issue, with all the details to asses it, is almost always universally better for maintainers (even if requires a bit more effort from the reporter). Worst case it will be marked as duplicate, especially if the reporter will make a relevant `#` comment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442380228
I'll poke it, maybe I can make a contribution to botocore to get the warning gone, but I am inclined to agree that it is not an Airflow issue. I'll have a look at it, likely not until next week though. If I make a PR to botocore for it, I'll drop a note here.
If you have a moment, it can't hurt to leave a comment in the issue that Taragolis linked above letting them know it's still around.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442374253
If I understand correctly: everything is working fine, but it's printing an unexpected error message in the logs? Is that accurate?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ferruzzi commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1452530133
@antonio-antuan I am trying to reopen [the ticket](https://github.com/boto/botocore/issues/2608) that Taragolis found on the botocore side. It looks like it has been raised a couple times but gets dropped. Maybe we can help them sort it out. Would you mind keeping an eye on their ticket in case they ask for more details I don't have?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437187553
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis closed issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis closed issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
URL: https://github.com/apache/airflow/issues/29640
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1442417038
Let's close it. Just couple of comments
Even it it not a problem with airflow, but I think we need to remove part with check if key exists, from this method in S3Hook. The reason is simple this check limited by default settings under non-versioned object or last version of object.
https://github.com/apache/airflow/blob/e6d317608251d2725627ac2da0e60d5c5b206c1e/airflow/providers/amazon/aws/hooks/s3.py#L978-L986
---
I think, but I'm not sure, `S3.Object.download_fileobj` comes from [s3transfer](https://github.com/boto/s3transfer) rather than botocore
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29640:
URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437551366
Just for confirmation do you have same problem with same version of `boto3` and `botocore` if you call [S3.Client.download_file](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file) or [S3.Client.download_fileobj](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_fileobj) over this file?
Could you tried to run this instead of your code?
```python
def download_from_s3_native(key: str, bucket_name: str, local_path: str) -> str:
hook = S3Hook(aws_conn_id='s3_conn')
s3_client = hook.conn
with open(local_path, "wb") as data:
s3_client.download_fileobj(key, bucket_name, data)
return local_path
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org