You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/04 21:28:00 UTC

[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

     [ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353800 ]

ASF GitHub Bot logged work on BEAM-8399:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Dec/19 21:27
            Start Date: 04/Dec/19 21:27
    Worklog Time Spent: 10m 
      Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353991891
 
 

 ##########
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##########
 @@ -115,42 +116,59 @@ def __init__(self, pipeline_options):
       hdfs_host = hdfs_options.hdfs_host
       hdfs_port = hdfs_options.hdfs_port
       hdfs_user = hdfs_options.hdfs_user
+      self.full_urls = hdfs_options.hdfs_full_urls
 
 Review comment:
   Make this private? `self._full_urls`
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 353800)
    Time Spent: 20m  (was: 10m)

> Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
> ----------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8399
>                 URL: https://issues.apache.org/jira/browse/BEAM-8399
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Chamikara Madhusanka Jayalath
>            Assignee: Udi Meiri
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the correct filename formats for HDFS based on [1] but we currently support format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both versions by default (based on [2] seems like HDFS does not allow colons in file path so this might be possible) (2) make  "hdfs://namenodehost/parent/child" optional for now and change it to default after few versions.
> We should also make sure that Beam Java and Python HDFS file-system implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)