You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2012/08/27 09:27:08 UTC

[jira] [Comment Edited] (HADOOP-8724) Add improved APIs for globbing

    [ https://issues.apache.org/jira/browse/HADOOP-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442315#comment-13442315 ] 

Chris Douglas edited comment on HADOOP-8724 at 8/27/12 6:25 PM:
----------------------------------------------------------------

bq. Simply instantiating a Glob object when the path might not be a glob may be problematic. Perhaps a better way to handle might be to have a static Path.create(String) which returns either a Path or GlobPath.

Distinguishing intent where the {{Path}} is created (as above) could solve part of the problem with HADOOP-8709 (caller can resolve which API to use?), but I don't think a subtype of {{Path}} will solve the other issues. Dispatch is still on the static type, so nothing is solved for the callee.

bq. The first was to simply have a base path and a string pattern as the parameters to globStatus. I thought it would be better to encapsulate the two into a single Glob object so it is obvious when an API takes a glob and when it does not.

Having an API advertise its "globiness" is useful, I like it. Though users specifying a single resource will need to create {{Glob}} objects on top of {{Path}} objects which are really URIs... which seems unnecessarily confusing. Still, methods for {{Configuration}} are also straightforward; {{setGlob()}} would need to escape everything in the URI side first (so users could continue to specify globs on the commandline as {{Paths}} with special characters), but aside from that it seems straightforward. "Correcting" it everywhere in the code may be prohibitive, though...

Since many of these are user-facing, do you think we need a more specific type than {{String}} for the glob part? {{ls /users/hadoop/\*.foo}} translated into:
{code}fs.globStatus(new Glob(new Path("hdfs://nn:8020/users/hadoop"), Pattern.compile("*.foo"))){code}
seems like it's strayed from sanity...
                
      was (Author: chris.douglas):
    bq. Simply instantiating a Glob object when the path might not be a glob may be problematic. Perhaps a better way to handle might be to have a static Path.create(String) which returns either a Path or GlobPath.

Distinguishing intent where the {{Path}} is created (as above) could solve part of the problem with HADOOP-8709 (caller can resolve which API to use?), but I don't think a subtype of {{Path}} will solve the other issues. Dispatch is still on the static type, so nothing is solved for the callee.

bq. The first was to simply have a base path and a string pattern as the parameters to globStatus. I thought it would be better to encapsulate the two into a single Glob object so it is obvious when an API takes a glob and when it does not.

Having an API advertise its "globiness" is useful, I like it. Though users specifying a single resource will need to create {{Glob}} objects on top of {{Path}} objects which are really {{URI}}s... which seems unnecessarily confusing. Still, methods for {{Configuration}} are also straightforward; {{setGlob()}} would need to escape everything in the {{URI}} side first (so users could continue to specify globs on the commandline as {{Paths}} with special characters), but aside from that it seems straightforward. "Correcting" it everywhere in the code may be prohibitive, though...

Since many of these are user-facing, do you think we need a more specific type than {{String}} for the glob part? {{ls /users/hadoop/\*.foo}} translated into:
{code}fs.globStatus(new Glob(new Path("hdfs://nn:8020/users/hadoop"), Pattern.compile("*.foo"))){code}
seems like it's strayed from sanity...
                  
> Add improved APIs for globbing
> ------------------------------
>
>                 Key: HADOOP-8724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8724
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> After the discussion on HADOOP-8709 it was decided that we need better APIs for globbing to remove some of the inconsistencies with other APIs.  Inorder to maintain backwards compatibility we should deprecate the existing APIs and add in new ones.
> See HADOOP-8709 for more information about exactly how those APIs should look and behave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira