You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 17:01:16 UTC

[GitHub] [arrow] AlekseyYuvzhikVB commented on pull request #916: ARROW-1213: [Python] Support s3fs filesystem for Amazon S3 in ParquetDataset

AlekseyYuvzhikVB commented on pull request #916:
URL: https://github.com/apache/arrow/pull/916#issuecomment-718074080


   > Great to see arrow and s3fs working together, thanks for looking into it.
   > Note that you can also give your credentials via files (typically in ~/.aws) or environment variables, if you don't want them to be stored within your code. Also, if you are on AWS hardware, then credentials should generally be available via the IAM service - see the s3fs docs.
   
   I'm using pyarrow and several aws profiles in ~/.aws/credentials and my code works fine with default profile but it returns 
   `    data_set = pq.ParquetDataset(paths, filesystem=fs)
     File "/Library/Python/3.7/site-packages/pyarrow/parquet.py", line 1170, in __init__
       open_file_func=partial(_open_dataset_file, self._metadata)
     File "/Library/Python/3.7/site-packages/pyarrow/parquet.py", line 1365, in _make_manifest
       .format(path))    
   OSError: Passed non-file path: s3://<valid path to parquet file>`
   if i'm using not default profile to get an access to s3 bucket. 
   Details are here https://stackoverflow.com/questions/64565926/getting-oserror-passed-non-file-path-using-pyarrow-parquetdataset 
   Do you know how to fix such issue? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org