You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/18 15:09:00 UTC
[jira] [Work logged] (BEAM-9094) Support setting some options such
as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs
[ https://issues.apache.org/jira/browse/BEAM-9094?focusedWorklogId=537441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537441 ]
ASF GitHub Bot logged work on BEAM-9094:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jan/21 15:08
Start Date: 18/Jan/21 15:08
Worklog Time Spent: 10m
Work Description: ConverJens commented on pull request #13180:
URL: https://github.com/apache/beam/pull/13180#issuecomment-762308020
@dandy10 @pabloem
Great work with this PR!
I'm trying to get s3 (Minio) to work for TFX, and I get it to work for all but the beam components where I get this strange error:
'''
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 742, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 867, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/iobase.py", line 1129, in process
self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
File "/usr/local/lib/python3.7/dist-packages/apache_beam/options/value_provider.py", line 135, in _f
return fnc(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 196, in open_writer
return FileBasedSinkWriter(self, writer_path)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 417, in __init__
self.temp_handle = self.sink.open(temp_shard_path)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/options/value_provider.py", line 135, in _f
return fnc(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filebasedsink.py", line 138, in open
return FileSystems.create(temp_path, self.mime_type, self.compression_type)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/filesystems.py", line 229, in create
return filesystem.create(path, mime_type, compression_type)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3filesystem.py", line 171, in create
return self._path_open(path, 'wb', mime_type, compression_type)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3filesystem.py", line 151, in _path_open
raw_file = s3io.S3IO(options=self._options).open(
File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/aws/s3io.py", line 63, in __init__
raise ValueError('Must provide one of client or options')
ValueError: Must provide one of client or options
'''
Do you have any idea what I'm doing wrong?
These are the beam pipeline args that I'm supplying and I know for sure that at least the multi process and nr_of_workers arguments are applied:
'''
'--direct_running_mode=multi_processing',
f'--direct_num_workers={NR_OF_CPUS}',
'--s3_endpoint_url=minio-service.kubeflow:9000',
f'--s3_access_key={ACCESS_KEY}',
f'--s3_secret_access_key={SECRET_ACCESS_KEY},
'--s3_verify=False'
'''
Help would be greatly appreciated!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Issue Time Tracking
-------------------
Worklog Id: (was: 537441)
Time Spent: 5h 10m (was: 5h)
> Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs
> -----------------------------------------------------------------------------------------------------------
>
> Key: BEAM-9094
> URL: https://issues.apache.org/jira/browse/BEAM-9094
> Project: Beam
> Issue Type: Improvement
> Components: io-ideas
> Affects Versions: 2.19.0
> Reporter: Keunhyun Oh
> Priority: P3
> Time Spent: 5h 10m
> Remaining Estimate: 0h
>
> AWS S3 File System is implemented in BEAM-2572.
> To use local s3 like minio, It is need to support setting some options such as endpoint_url and credential infos.
> In my idea, it can be implemented to use environment variables.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)