You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2017/08/01 23:52:02 UTC
[jira] [Created] (BEAM-2712) SerializablePipelineOptions should not
call FileSystems.setDefaultPipelineOptions.
Eugene Kirpichov created BEAM-2712:
--------------------------------------
Summary: SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions.
Key: BEAM-2712
URL: https://issues.apache.org/jira/browse/BEAM-2712
Project: Beam
Issue Type: Bug
Components: runner-apex, runner-core, runner-flink, runner-spark
Reporter: Eugene Kirpichov
Assignee: Kenneth Knowles
https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on deserialization calls FileSystems.setDefaultPipelineOptions.
This is obviously problematic and racy in case the same process uses SerializablePipelineOptions with different filesystem-related options in them.
The reason the PR does this is, Flink and Apex runners were already doing it in their respective SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably should have.
I believe this is done for the sake of having the proper filesystem options automatically available on workers in all places where any kind of PipelineOptions are used. Instead, all 3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions there.
It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but that's a topic for a separate JIRA.
CC'ing runner contributors [~aljoscha] [~aviemzur] [~thw]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)