You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Udi Meiri (JIRA)" <ji...@apache.org> on 2017/11/21 01:30:00 UTC

[jira] [Commented] (BEAM-3099) Implement HDFS FileSystem for Python SDK

    [ https://issues.apache.org/jira/browse/BEAM-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260132#comment-16260132 ] 

Udi Meiri commented on BEAM-3099:
---------------------------------

Doc exploring Python HDFS library options: https://docs.google.com/document/d/1-uzKf4VPlGrkBMXM00sxxf3K01Ss3ZzXeju0w5L0LY0/edit?usp=sharing

> Implement HDFS FileSystem for Python SDK
> ----------------------------------------
>
>                 Key: BEAM-3099
>                 URL: https://issues.apache.org/jira/browse/BEAM-3099
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Chamikara Jayalath
>            Assignee: Udi Meiri
>
> Currently Java SDK has HDFS support but Python SDK does not. With current portability efforts other runners may soon be able to use Python SDK. Having HDFS support will allow these runners to execute large scale jobs without using GCS. 
> Following suggests some libraries that can be used to connect to HDFS from Python.
> http://wesmckinney.com/blog/python-hdfs-interfaces/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)