You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/18 15:09:00 UTC

[jira] [Work logged] (BEAM-9094) Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs

     [ https://issues.apache.org/jira/browse/BEAM-9094?focusedWorklogId=537441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537441 ]

ASF GitHub Bot logged work on BEAM-9094:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jan/21 15:08
            Start Date: 18/Jan/21 15:08
    Worklog Time Spent: 10m 
      Work Description: ConverJens commented on pull request #13180:
URL: https://github.com/apache/beam/pull/13180#issuecomment-762308020


   @dandy10 @pabloem 
   Great work with this PR!
   I'm trying to get s3 (Minio) to work for TFX, and I get it to work for all but the beam components where I get this strange error:
   
   '''
   Traceback (most recent call last):
     File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py", line 742, in apache_beam.runners.common.PerWindowInvoker.invoke_process
     File "apache_beam/runners/common.py", line 867, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/iobase.py", line 1129, in process
       self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/options/value_provider.py", line 135, in _f
       return fnc(self, *args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 196, in open_writer
       return FileBasedSinkWriter(self, writer_path)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 417, in __init__
       self.temp_handle = self.sink.open(temp_shard_path)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/options/value_provider.py", line 135, in _f
       return fnc(self, *args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 138, in open
       return FileSystems.create(temp_path, self.mime_type, self.compression_type)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filesystems.py", line 229, in create
       return filesystem.create(path, mime_type, compression_type)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3filesystem.py", line 171, in create
       return self._path_open(path, 'wb', mime_type, compression_type)
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3filesystem.py", line 151, in _path_open
       raw_file = s3io.S3IO(options=self._options).open(
     File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3io.py", line 63, in __init__
       raise ValueError('Must provide one of client or options')
   ValueError: Must provide one of client or options
   '''
   
   Do you have any idea what I'm doing wrong? 
   
   These are the beam pipeline args that I'm supplying and I know for sure that at least the multi process and nr_of_workers arguments are applied:
   '''
   '--direct_running_mode=multi_processing',
   f'--direct_num_workers={NR_OF_CPUS}',
   '--s3_endpoint_url=minio-service.kubeflow:9000',
   f'--s3_access_key={ACCESS_KEY}',
   f'--s3_secret_access_key={SECRET_ACCESS_KEY},
   '--s3_verify=False'
   '''
   
   Help would be greatly appreciated!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 537441)
    Time Spent: 5h 10m  (was: 5h)

> Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-9094
>                 URL: https://issues.apache.org/jira/browse/BEAM-9094
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-ideas
>    Affects Versions: 2.19.0
>            Reporter: Keunhyun Oh
>            Priority: P3
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> AWS S3 File System is implemented in BEAM-2572.
> To use local s3 like minio, It is need to support setting some options such as endpoint_url and credential infos.
> In my idea, it can be implemented to use environment variables.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)