You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Stephen Sisk (JIRA)" <ji...@apache.org> on 2017/04/26 01:16:04 UTC

[jira] [Commented] (BEAM-2031) Hadoop FileSystem needs to receive Hadoop Configuration

    [ https://issues.apache.org/jira/browse/BEAM-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983958#comment-15983958 ] 

Stephen Sisk commented on BEAM-2031:
------------------------------------

yeah - in that doc, I think that "2. Construct FileSystemConfig (conceptually a serializable map)" is the world I'm hoping to live in :)

Luke and I were talking, we think that there's a possible way to make multiple hadoopfilesystem configurations work - if the below assumptions are true.

Assumptions:
* fs.default.name is always set on Hadoop Configurations used to connect to filesystems
* fs.default.name always represents a unique prefix for different servers/useful configurations for user's purposes
* the user always uses prefixes that match to fs.default.name
(I'm not sure if those assumptions are true or not given my naivete in the hadoop ecosystem)

Given those, we could:
* Allow the user to provide a list of configurations (via pipelineoptions)
* Register for the unique set of schemes present in the configurations (might require some small changes to allow this to work)
* Inside of HadoopFileSystem, maintain a map of fs.default.name -> configuration
* When hadoop file system is given a uri, it would just look up the configuration based on the prefix, and then use that configuration.

This is aspirational for first stable release, but if anyone has insights into whether or not those assumptions are true, that'd be useful.

This may be moot if we use option 2 (Construct FileSystemConfig) in davor's doc.

> Hadoop FileSystem needs to receive Hadoop Configuration
> -------------------------------------------------------
>
>                 Key: BEAM-2031
>                 URL: https://issues.apache.org/jira/browse/BEAM-2031
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Since Beam FileSystem objects are configured via PipelineOptions, we need to pass a Hadoop Configuration through PipelineOptions. I think that's very solvable, but it does seem semi-complicated.
> cc [~peihe0@gmail.com] I believe you mentioned in the past that you had an answer to this - is that written down anywhere?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)