You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/01/12 03:51:04 UTC
[jira] [Updated] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem
[ https://issues.apache.org/jira/browse/BEAM-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-6821:
----------------------------------
This Jira ticket has a pull request attached to it, but is still open. Did the pull request resolve the issue? If so, could you please mark it resolved? This will help the project have a clear view of its open issues.
> FileBasedSink is not creating file paths according to target filesystem
> -----------------------------------------------------------------------
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.11.0
> Environment: Windows 10
> Reporter: Gregory Kovelman
> Priority: P3
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
> base_path, last_component = FileSystems.split(file_path_prefix)
> if not last_component:
> # Trying to re-split the base_path to check if it's a root.
> new_base_path, _ = FileSystems.split(base_path)
> if base_path == new_base_path:
> raise ValueError('Cannot create a temporary directory for root path '
> 'prefix %s. Please specify a file path prefix with '
> 'at least two components.' % file_path_prefix)
> path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
> return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
> def open_writer(self, init_result, uid):
> # A proper suffix is needed for AUTO compression detection.
> # We also ensure there will be no collisions with uid and a
> # (possibly unsharded) file_path_prefix and a (possibly empty)
> # file_name_suffix.
> file_path_prefix = self.file_path_prefix.get()
> file_name_suffix = self.file_name_suffix.get()
> suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>
>
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue
--
This message was sent by Atlassian Jira
(v8.20.1#820001)