You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/11/06 17:12:01 UTC

[jira] [Updated] (BEAM-10295) FileBasedSink: allow setting temp directory provider per dynamic destination

     [ https://issues.apache.org/jira/browse/BEAM-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-10295:
---------------------------------
    Labels: Clarified  (was: Clarified stale-P2)

> FileBasedSink: allow setting temp directory provider per dynamic destination
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-10295
>                 URL: https://issues.apache.org/jira/browse/BEAM-10295
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-hadoop-file-system, sdk-java-core
>            Reporter: David Janicek
>            Priority: P3
>              Labels: Clarified
>
> Dynamic file destinations allow value-dependent writes in FileBasedSink. When using hadoop file system this means user can write some values to destination at *cluster-A* and some values to destination at *cluster-B*.
> Since BEAM-7613 was fixed this works fine until the *moveToOutputFiles* method is called. This method internally calls *FileSystems.rename* which obviously requires that source files (temporary files) and target files (resolved by dynamic destination's function) are on the same cluster. But the temp directory provider can be set only one per file sink.
> This could be fixed by adding some kind of *getTempDirectoryProvider* method into dynamic destinations (e.g. into *DefaultFilenamePolicy.Params*).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)