You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nicolas Renkamp (Jira)" <ji...@apache.org> on 2021/01/07 14:26:00 UTC
[jira] [Created] (ARROW-11161) [Python][C++] S3Filesystem: file
Content-Type not set correctly?
Nicolas Renkamp created ARROW-11161:
---------------------------------------
Summary: [Python][C++] S3Filesystem: file Content-Type not set correctly?
Key: ARROW-11161
URL: https://issues.apache.org/jira/browse/ARROW-11161
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Nicolas Renkamp
Attachments: Screen Shot 2021-01-07 at 15.23.07.png
I am using the Fileystem abstraction to write out html / text files to the local filesystem as well as s3.
I noticed that when using s3_fs.open_output_stream in combination with file.write(bytes), the object that gets created has a Content-Type of 'application/xml' even tough it's plain text, which is problematic for me.
Here is a minimal example:
{code:java}
import boto3
BUCKET = "my-bucket"
path = f"s3://{BUCKET}/pyarrow_encoding.txt"
s3_fs, output_path = FileSystem.from_uri(path)
with s3_fs.open_output_stream(path=output_path, compression=None) as f:
f.write('hello'.encode('UTF-8'))
s3 = boto3.client('s3')
response = s3.get_object(Bucket=BUCKET, Key='pyarrow_encoding.txt')
print(response['ContentType']) # Output: application/xml
print(response['Body'].read().decode('UTF-8')) # Output: hello
s3.put_object(Bucket=BUCKET,
Key='boto3_encoding.txt',
Body='hello'.encode('UTF-8'))
response = s3.get_object(Bucket=BUCKET, Key='boto3_encoding.txt')
print(response['ContentType']) # Output: binary/octet-stream
print(response['Body'].read().decode('UTF-8')) # Output: hello
{code}
I know, that the S3Filesystem implementation of pyarrow might no have mime type inference implemented, but I am wondering, why always 'application/xml' is the resulting Content-Type? Maybe this is hardcoded somewhere?
Originally, I tried this with '.html' files and also there, the objects on s3 always got the 'application/xml' Content-Type.
!Screen Shot 2021-01-07 at 15.23.07.png!
Any help or pointer is appreciated.
Thank you,
Nicolas
--
This message was sent by Atlassian Jira
(v8.3.4#803005)