You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/08 22:43:04 UTC

[jira] [Commented] (BEAM-2211) DataflowRunner (Java) rejects all but GCS paths for FileBasedSource/Sink

    [ https://issues.apache.org/jira/browse/BEAM-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001691#comment-16001691 ] 

ASF GitHub Bot commented on BEAM-2211:
--------------------------------------

GitHub user dhalperi opened a pull request:

    https://github.com/apache/beam/pull/2968

    [BEAM-2211] DataflowRunner: remove validation of file read/write paths

    Now that users can implement and register custom FileSystems,
    we can no longer really effectively validate filesystems they
    can read or write files from. They can even register file://
    to point to some HDFS path, e.g.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhalperi/beam b2211-dataflow-allow-filesystems

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2968
    
----
commit cd6dc1c4cfc940095ee1dd4c7c9d6a080d425d25
Author: Dan Halperin <dh...@google.com>
Date:   2017-05-08T22:37:29Z

    [BEAM-2211] DataflowRunner: remove validation of file read/write paths
    
    Now that users can implement and register custom FileSystems,
    we can no longer really effectively validate filesystems they
    can read or write files from. They can even register file://
    to point to some HDFS path, e.g.,

----


> DataflowRunner (Java) rejects all but GCS paths for FileBasedSource/Sink
> ------------------------------------------------------------------------
>
>                 Key: BEAM-2211
>                 URL: https://issues.apache.org/jira/browse/BEAM-2211
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Daniel Halperin
>            Assignee: Daniel Halperin
>             Fix For: 2.0.0
>
>
> {{FileBasedSource}} and {{Sink}} have switched in Beam to the {{FileSystems}} API from the the {{IOChannelUtils}} API, which means they now support HDFS and GCS and others.
> However, the {{DataflowRunner}} still uses {{GcsPathValidator}}, which means it will likely currently disallow HDFS and other new {{FileSystem}} implementations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)